Monte Carlo Greeks in the lognormal Libor market model

Size: px

Start display at page:

Download "Monte Carlo Greeks in the lognormal Libor market model"

Derek Strickland
5 years ago
Views:

1 Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science Delft Institute of Applied Mathematics Monte Carlo Greeks in the lognormal Libor market model A thesis submitted to the Delft Institute of Applied Mathematics in partial fulfillment of the requirements for the degree MASTER OF SCIENCE in APPLIED MATHEMATICS by Jan van der Linden Delft, the Netherlands November 2011 Copyright c 2011 by Jan van der Linden. All rights reserved.

3 MSc THESIS APPLIED MATHEMATICS Monte Carlo Greeks in the lognormal Libor market model Jan van der Linden Delft University of Technology Prof. C.W. Oosterlee Dr. J.A.M. van der Weide Dr. N. Borovykh Responsible professor Dr. R. Fokkink Drs. S. van Weeren November 2011 Delft, the Netherlands

5 Abstract Greeks are sensitivities of option prices with respect to certain parameters. The calculation of Greeks is needed for hedge strategies and to manage or measure risk. As the underlying models get more complicated, the calculation of these Greeks can become far more difficult than the pricing of options. In this thesis we consider the Greeks of both European- and Bermudan-style Libor rate contracts. To model forward Libor rates we use the lognormal forward Libor market model. Because of the dimensionality of the model, only Monte Carlo methods are capable to estimate these Greeks. Therefore various Monte Carlo methods to estimate the Greeks will be considered and adjusted to our model settings. The methods are tested and compared, and if possible, improved. We improve the likelihood ratio method for the Greeks of Bermudan-style options by the use of a predictor-corrector scheme. Another succesful method which can be used for Greek calculations is given by the pathwise sensitivity method.

6 Acknowledgements I would like to thank a number of people who helped me during the development of this thesis. First of all, my graditude goes to my supervisor at TU Delft, Prof. Kees Oosterlee, for his excellent guidance during the last years of my study and for the useful comments on this thesis. Furthermore I want to thank the whole DR&V team of Rabobank International and in particular Natalia Borovykh, who was my supervisor at Rabobank International, for all helpful discussions and comments. Finally I also want to thank my family, especially my girlfriend Marlieke van Dijk, for the encouragement and support they have given me such that I was able to finish the study. 2

7 Contents 1 Introduction 8 2 Libor market model, products and Greeks Introduction Libor rates Libor rate contracts European options Bermudan-type contracts The lognormal forward Libor market model Main assumptions Lognormal forward Libor rates under T m forward measure Dynamics under terminal measure Discretization schemes: Euler and predictor-corrector Pricing using Monte Carlo methods European products Bermudan products Greeks Conditions for interchanging derivative and expectation Analytic pricing formulas Caplet Digital Statistics Variance Bias MSE Estimating first order sensitivities Introduction The bump method European options Bermudan options Application of the control variate Likelihood ratio method European options Delta Bermudan options Application of the control variate Probability density function Delta predictor-corrector modification Computing the vega of Bermudan options Vega predictor-corrector modification Analytic differentiations Pathwise sensitivity method The method

8 3.4.2 Bermudan options and the control variate Pathwise derivatives for the Euler- and PC discretization scheme Vegas Adjoint implementation Vector-matrix multiplications Adjoint or forward deltas? Adjoint or forward vegas? Vibrato Monte Carlo method Delta Vega Pathwise kernel method Delta Pathwise digitals Estimating gamma Pathwise Part VMC and LR gamma Results European options European test setting PS Greeks of a caplet Result caplet price Results digital option Application of the VMC method Results digital delta Results digital vega Results gamma Variance properties Improvement by predictor-corrector scheme Bump method PS method LR method Conclusion Results cancellable options Introduction Bumping with fixed decision times Test setting Causes of errors Cancellable swap Delta Estimates Vega Changing the number of Libors Cancellable range swap Delta Standard deviations Estimating the delta Standard deviations in other settings Vega Other settings Vega Conclusion Conclusions 107 4

9 CONTENTS 5 8 Further research 109 A Implementation 110

10 Preliminaries List of short-hand notations LR method: likelihood ratio method. PS method: pathwise sensitivity method. VMC method: vibrato Monte Carlo method. PC scheme: predictor-corrector scheme. bump* method: special version of the bump method, explained in section 6.2. CV: control variate for the cancellable options. MSE: mean squared error. sd: standard deviation of an estimator. se: standard error of an estimate. List of symbols Here a list of frequently used symbols. All these symbols are also explained in the text. c j (T i ): constant which comes from the least square method used in the L-S method. This factor corresponds with tenor point T i and the j-th basis function. δ i = T i T i 1 : time between the (i)-th and (i 1) th expiry dates. D(t k ): matrix where the elements D ij (t k ) are given by the pathwise derivatives of the i-th forward Libor rate at time point t k+1 with respect to the j-th forward Libor rate at time point t k. (t k ): matrix where the elements ij (t k ) are given by the pathwise derivative of the i-th forward Libor rate at time point t k+1 with respect to the j-th forward Libor rate at time point 0. E m (..): expectation of (..) under terminal measure. E k (..) expectation of (..) under T k forward measure. f(x, ˆL(t n 1 )): distribution for the logarithm of the Euler estimate for the forward Libor rate at time point t n given the forward Libor rates at time point t n 1. g [optionname] : payoff function of the [optionname] option. For example g cap (L) is the payoff of a caplet option given the Libor rates L. g [optionname] : payoff function of the [optionname] option discounted until T m. g k [optionname] : payoff function of a cancellable option corresponding to the payoff at T k. 6

11 CONTENTS 7 h k : size of the k-th time step in the time discretization. K: strike price of an option. L(t), L k (t): vector of forward Libor rates [L 1 (t),..., L m (t)]. The expiry dates T 0, T 1,..., T m should be defined in the context and the i-th element of L(t) is defined by L i (t) = l(t, T i 1, T i ). If the notation contains a superscript k, then L k (t) is simulated under the k-th forward measure. Otherwise the terminal measure is used. L(t k+1) L(t k ) : matrix of pathwise derivatives of the forward Libor rates at time point t k+1 with respect to the forward Libor rates at time point t k. L(t k+1) θ : pathwise derivative of the forward Libor rates at time point t k+1 with respect to a parameter θ. l(t, T a, T b ): forward Libor rate. Stochastic process until time T a. It is the rate which can be fixed at time t which gives one currency at time T b given an investment from time T a to T b in the bank account. The superscript k refers to the k-th forward measure. m: number of Libor rates which are involved (size of vector L(t)). M: sample size which is used in the Monte Carlo estimation. µ k (t, X): drift of the stochastic process L k (t). N: notional. N (x): standard Gaussian cumulative distribution function. P T (t): price at time t of a zero-coupon bond with expiry T. P (k) : the measure corresponding to the numeraire P Tk (t) (in words: the T k forward measure ). σ k (t): (horizontal) m-dimensional vector-function of volatilities corresponding to the k-th underlying value. t k : time points for discretization of the stochastic processes. T 0, T 1,..., T m : tenor dates which are considered. T : maturity date. θ: parameter of a stochastic process. Θ(t k ): matrix of partial derivatives of an option with respect to a parameter θ. ϕ i (x): i-th basis function used in the L-S algorithm. V [optionname] (t, L(t)): price of the [optionname] option at time point t. W k (t): a (vertical) m-dimensional Brownian motion under the P (k) measure, this Brownian motion is used for the dynamics of the k-th forward rate. W k i (t k): a standard normal random variable (mean 0 and variance 1).

12 Chapter 1 Introduction The calculation of Greeks (derivatives of option prices with respect to a parameter) in the lognormal Libor market model is the main objective of this thesis. Before we explain several methods to estimate the Greeks, we give an introductory chapter which consists of Libor rates, dynamics, European options, cancellable options, pricing and the Longstaff Schwartz algorithm and some analytic solutions for prices and Greeks. The model is a variant of the lognormal forward Libor market model given in [1]. The basics of the Longstaff Schwartz algorithm for callable options are explained in [11] by Piterbarg. We have adjusted this algorithm for our cancellable options. This includes the choice of explanatory variables. We have devoted one chapter to the estimation of Greeks by several methods, for both European style contracts as for Bermudan style contracts. Most methods were well described in literature. For the European options these methods are the bump method, the likelihood ratio method, the pathwise sensitivity method (forward and adjoint version), the vibrato Monte Carlo method and the kernel method. The work consisted of adjusting the methods on the setting which we use. We have calculated all sorts of pathwise derivatives for our model settings to use the pathwise sensitivity method, and we have derived the probability density functions corresponding to our discretized model which are needed for the likelihood ratio method. For some methods the calculation of all Greeks has been explained, but for the likelihood ratio and the vibrato Monte Carlo method we had to derive the methods for the gamma (second order sensitivity with respect to an initial value) with the same idea as was used for the first order sensitivities. Also we have found two test options, and their analytic prices and Greeks, which can be used while testing the methods in the European settings. For the vibrato Monte Carlo method we introduce a new technique to use a random sample size to estimate a conditional expectation and for the pathwise sensitivity method we compare the adjoint and the forward version of the pathwise sensitivity method in the case of multiple payoffs. In the Bermudan setting, only the pathwise sensitivity method and the bump method are directly from the literature. We have extended the European version of the likelihood ratio method to the Bermudan version to calculate the delta as well as the vega. However, this method was also partly descibed in [3]. Because of the high variance, the likelihood ratio method was only generally applicable if the one-step Euler discretization was used. We tried to reduce the variance by a control variate and tried to reduce the bias by the predictor-corrector scheme. The succesful combination of the predictor-corrector scheme (to reduce the bias) and the use of our control variate (to reduce the variance) is totally new. In chapter 5 we test the European options with the help of the test options for which the Greeks are calculated analytically. In chapter 6 we give some results of the cancellable options. Here we will see whether the methods work and which method is best for a particular setting. All implementation in C#, on which the result sections are based, is made from scratch. The program consists of building Libor rate-, European option- and cancellable option objects, simulation methods, the Longstaff Schwartz algorithm, pricing and of course the estimation of the Greeks. The appendix gives a short overview of this implementation. 8

13 Chapter 2 Libor market model, products and Greeks 2.1 Introduction Before we can start with numerical methods to estimate Greeks in the Libor market model, we first give a description of necessary prior knowledge. This includes theory about the underlying model which we consider, such as a description of (forward) Libor rates, model assumptions and discretization schemes to simulate the underlying. It is not only important to have a good introduction in Libor rates and to know how the underlying can be simulated, but also for understanding the pathwise sensitivity method, since this method depends heavily on the discretization scheme. This chapter is also used to introduce the payoff function which will be used to test the numerical methods. We will consider European options and cancellable options here, with continuous payoff functions as well as discontinuous payoff functions. We conclude the chapter with the pricing of these contracts using Monte Carlo methods and an introduction in Greek calculations. 2.2 Libor rates Before we introduce the Libor market model, we have to define the Libor rates. A Libor rate is a simply compounded interest rate for a loan starting at T α and ending at T β. This rate is not known before T α. However, one could lock a rate for a loan starting at T α and ending at T β at an earlier time t < T α. In this case the rate is called a forward Libor rate. In this thesis a forward Libor rate is denoted by l(t, T α, T β ), where t is the current time, T α is the starting time and T β is the end time of the time interval on which the interest rate will be applied. A forward Libor rate changes until it is settled at T α. Definition 1. Forward Libor rate l(t, T α, T β ). The value of a forward Libor rate can be expressed in terms of zero coupon bond prices. Denote by P T (t) the time t < T α zero coupon bond price maturing at T, then: ( ) 1 PTα (t) l(t, T α, T β ) = T β T α P Tβ (t) 1. (2.1) If t = T α the rate is settled and we have: 1 l(t α, T α, T β ) = T β T α ( ) 1 P Tβ (t) 1. (2.2) In this thesis we will consider sets of Libor rates corresponding to the disjoint time intervals (T 0, T 1 ], (T 1, T 2 ],..., (T m 1, T m ]. These time intervals that are related to Libor rates will be referred to as tenor intervals, the time points T 0, T 1,..., T m are called the tenor points. We 9

14 will talk often about a vector of forward Libor rates L(t). The definition of this vector is given in Definition 2. Definition 2. Vector of forward Libor rates L(t). L(t) is defined by: L(t) = [l(t, T 0, T 1 ), l(t, T 1, T 2 ),..., l(t, T m 1, T m )]. (2.3) We also give the definition of lifetimes of Libor rates: Definition 3. A lifetime of a Libor rate is defined by the length of the corresponding tenor interval: δ i = T i T i 1. (2.4) 2.3 Libor rate contracts In this thesis we will consider European as well as Bermudan products. The difference between these products is that for the Bermudan type one has to make (optimal) exercise decisions and for the European type not European options A European option is a contract which can be purchased at time t and will pay a certain amount of money according to a certain payoff function depending on the underlying at maturity time T. European options can again be split into two types: European options with continuous payoffs and discontinuous payoffs. For the continuous type we consider caplets and swaptions as specific examples, and for the discontinuous type we consider digital options. Definition 4. Payoff g [name], European version. In general we define the payoff of an option named [name] by g [name]. For the European case this function only depends on the vector of forward Libor rates at the expiry date and therefore we denote it by g [name] (L(T )), where T is the expiry date of the option. Caplets One of the most easy European options, which can be defined on Libor rates, is the so-called caplet. A caplet on l(t 1, T 1, T 2 ) with maturity T 2 has the following payoff: Definition 5. Caplet payoff function g caplet. g caplet (L(T 0 )) = Nδ 1 max(l 1 (T 0 ) K, 0), (2.5) where N is the notional and where K is the strike of the option, which is a constant rate. The formulation of the contract is as follows: A caplet gives the holder the right but not the obligation to get, at time T 2, an amount of money which is equivalent to an extra rate of L 1 (T 0 ) = l(t 0, T 0, T 1 ) K on N units of currency over the time between T 0 and T 1. In this way, a company with a floating debt can protect itself against the risk of increasing forward Libor rates. Swaptions Another basic option, which can be defined on Libor rates, is an option on a forward Libor rate swap. This is an option to enter a Libor rate swap at a future time T 0. The payoff which is paid by exercising this option at T 0 is: 10

15 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 11 Definition 6. Swaption payoff function g swaption. m g swaption (L(T 0 )) = max( P Ti (T 0 )N δ i (L i (T 0 ) K), 0), (2.6) i=1 with K the strike of the option. The advantage of choosing this option is that it is typically cheaper then holding several caplets covering the same period. Digitals Under digital options we consider the options which either pay nothing or one amount of money. An example of such an option is based on the following payoff at maturity T 2 : Definition 7. Digital payoff function g dig. g dig (L(T 0 )) = N 1 L1 (T 0 )>K δ 1. (2.7) We will refer to this option as the digital option which explains the subscript in equation (2.7). The holder gets either N amounts of money or nothing Bermudan-type contracts By a Bermudan-type contract we mean a contract where the owner can make exercise decisions at any exercise date. More specific, here we will consider cancellable contracts. We introduce two examples which we call the cancellable swap and the cancellable range swap (digital version). By a swap contract we mean a contract corresponding to a cash flow of N(T i T i 1 )(L i (T i 1 ) K i ) paid at time T i for every i {1, 2,..., m}. Note that the i-th payment depends on the Libor rate which is settled at time T i 1 and is paid at time T i (not at T i 1 ). Definition 8. Bermudan payoff functions g i+1 [name] and gi+1 [name]. The total payoff of a cancellable option of the type we consider can be written as a sum of payments at the tenor dates. If the option is not cancelled at or before T i, then the holder gets a payoff at time T i+2. We denote this payoff by g i+1 [name]. Under the T m measure (the use of this measure will be explained later) we often will need P T i+1 (T i ) P Tm (T i ) gi+1 [name]. Therefore we define: g i+1 [name] (L(T i)) = P T i+1 (T i ) P Tm (T i ) gi+1 [name] (L(T i)). Hence, if the option is bought at time 0, the holder gets g 1 [name] (L(T 0)) for sure. If the option is not cancelled at time T 0, the holder gets also the payment g 2 [name] (L(T 1)). Definition 9. Cancellable payoff g [name] (total payoff). The payoff of a cancellable option, given the exercise time τ, is a sum until the exercise date of payoffs: g [name] (L(T 0 ),..., L(T τ 1 )) = τ i=1 P Ti (T i 1 ) P Tm (T i 1 ) gi [name] (L(T i 1)) = τ g [name] i (L(T i 1)), i=1 where τ is the integer such that T τ 1 is equal to the exercise time. We will call it the total payoff so that we are reminded to the fact that this payoff consist of multiple payoffs at different tenor points.

16 Cancellable swap We define the cancellable swap as a swap contract which can be cancelled at every tenor time. If the contract is cancelled at time T i, then only the payoffs up to and including time T i+1 are considered. So, if one buys a contract at some time before time T 0 and cancels the contract at T 0, then one gets only the payment at T 1 (note: there is no payment at T 0 ). Definition 10. The total payoff of the cancellable swap reads: g CSwap (L(T 0 ),..., L(T τ 1 )) = Cancellable range swap τ g CSwap(L i i (T i 1 )) = i=1 τ i=1 P Ti (T i 1 ) P Tm (T i 1 ) Nδ i(l i (T i 1 ) K). (2.8) An even more complicated example can be given by a contract which we call the cancellable range swap. We define this contract to be almost the same as the cancellable swap. The only difference is that the payment at time T i is only done if the Libor rate l(t i 1, T i 1, T i ) lies in a certain interval [K Low i, K up i ]. Definition 11. The total payoff of the cancellable range swap. g CRSwap (L(T 0 ),..., L(T τ 1 )) = τ i=1 P Ti (T i 1 ) P Tm (T i 1 ) Nδ i(l i (T i 1 ) K) 1 K Low i <L i (T i 1 )<K up. (2.9) i The decision points, tenor times and payoff dates of these cancellable payoffs are illustrated by Figure 2.1. Figure 2.1: Illustration of a cancellable contract. The payoff g 1 (L 1 (T 0 )) is always paid if the contract is bought at t = 0. The payoff g 2 (L 2 (T 1 )) is paid if decision A is the decision to hold the contract. The payoff g 3 (L 3 (T 2 )) is paid if both decision A and decision B says to hold the contract. 12

17 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS The lognormal forward Libor market model Main assumptions The lognormal forward Libor market model in this chapter is defined as in [1] (chapter 6.3). Here the k-th forward Libor rate is assumed to be lognormal under the T k forward measure P (k). The k-th Libor rate has the following dynamics under the P (k) measure: dl k k (t) = Lk k (t)σ k(t) dw k (t) for t T 0, (2.10) with W k (t), σ k (t) and L k k (t) Rm 1. W k (t) is a vector of Brownian motions. The superscript corresponds to the measure which is used. In a next tenor (if t (T i 1, T i ]) the Libor L i (t) and the Brownian motion W i (t) are frozen. The dynamics of the forward Libors with t (T i 1, T i ] are defined by: dl k k (t) = Lk k (t) m j=i+1 σ kj (t)w k j (t), for k i and t (T i 1, T i ]. (2.11) Here σ kj (t) is the j-th element of the vector σ k (t). Note that the sum starts at i + 1 since σ kj (t) is zero for j i if t (T i 1, T i ]. The matrix with elements σ kj (t) in the k-th row and j-th column will be denoted by σ(t). Definition 12. Volatility function σ(t). The matrix σ(t) is a piecewise constant matrix in time: σ(t 0 ) if t (0, T 0 ], σ(t 1 ) if t (T 0, T 1 ], σ(t) =.. σ(t m 1 ) if t (T m 2, T m 1 ]. (2.12) The i-th row corresponds to the volatility parameters of the forward Libor rate L i (t), this row is denoted by σ i (t) and the j-th element of this row by σ ij (t). Because at time T i the Brownian motion W i+1 is frozen, the volatility parameters σ k(i+1) (t) has to be zero for t > T i for all k {1,..., m}. For example, if m = 4 we have: σ(t 0) = x x x x x x x x x x x x x x x x, σ(t1) = x x x 0 x x x 0 x x x Here x indicates that the entry is nonzero., σ(t2) = x x 0 0 x x, σ(t3) = x In contrast to [1] chapter 6.3 we will assume that the components of W k (t) will be independent. Note that the Libor rates are still correlated since the same Brownian motions appear in the dynamics of each Libor. Furthermore we will assume that σ k (t) is piecewise constant in time in such a way that it is constant at each tenor interval. Since we have to simulate paths of all Libor rates under just one measure, we have to find a way to transform W k (t) to a different measure. Girsanov s theorem is used to achieve this, how this is done is given in next subsection..

18 2.4.2 Lognormal forward Libor rates under T m forward measure In this chapter we will see how to transform random variables such that the forward Libor rates can be modelled under the terminal measure. To do this we give the dynamics of the (i 1)-th forward Libor rates in terms of the dynamics of the i-th forward Libor rates. This is a similar derivation as is done in [2]. Denote the Radon-Nikodým derivative dp(i 1) dp (i) for k < i, by: by Z(t). From equation 2.5 in [1], Z(t) is given, Z(t) = dp(i 1) dp (i) Note that from equation (2.2) we have: = P T i 1 (t) P Ti (t) P Ti (0) P Ti 1 (0). (2.13) Therefore P Ti 1 (t) P Ti (t) = 1 + l(t, T i 1, T i )(T i T i 1 ) = 1 + L i (t)δ i. (2.14) Z i (t) = (1 + L i (t)δ i ) P T i (0) P Ti 1 (0). (2.15) Using Îto s Lemma and equation (2.10) gives: dz i (t) = (δ i L i (t)σ i (t)) P T i (0) P Ti 1 (0) dwi (t) = Z i (t) δ il i (t)σ i (t) 1 + L i (t)δ i dw i (t). (2.16) Now define the m-dimensional function Θ(t) by: One can check that: Θ i (t) = δ il i (t)σ i (t) 1 + L i (t)δ i. (2.17) Z i (t) = exp{ t 0 Θ i (u)dw i (u) 1 2 t 0 Θ i (u) 2 du}, and therefore we have by the Girsanov theorem (multiple dimensional version), as stated in [13] Theorem 5.4.1: dw i 1 (t) = dw i (t) + Θ i (t)dt = dw i (t) δ il i (t)σ i (t) 1 + L i (t)δ i dt. (2.18) Hence the dynamics of {L i 1 (t), t 0} under the P (i) measure (denoted by dl i i 1 (t)) will be: dl i i 1(t) = σ i 1 (t)l i 1 (t) δ il i (t)σ i (t) 1 + L i (t)δ i dt + σ i 1 (t)l i 1 (t)dw i (t), (2.19) and therefore the dynamics under the T m -forward measure (with m i) becomes: dl m i 1(t) = σ i 1 (t)l i 1 (t) m j=i δ j L j (t)σ j (t) 1 + L j (t)δ j dt + σ i 1 (t)l i 1 (t)dw m (t). (2.20) 14

19 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS Dynamics under terminal measure Suppose T j 1 < t < T j, then the Libor rates corresponding to the tenor interval (T 0, T 1 ], (T 1, T 2 ],..., (T j 1, T j ] will be settled. The dynamics of the forward Libor rates under the terminal measure corresponding to the other tenor intervals are given by: dl m k (t) = Lm k (t)σ k(t) m i=k+1 σ i (t)δ i L m i (t) 1 + δ i L m i (t) + σ k(t)l m k (t)dw (t) for j k m. (2.21) So, this gives a system of stochastic differential equations: ( dl m 1 (t) = Lm 1 (t) m σ 1 (t) σ j (t)δ j L m j (t) ) j=2 1+δ j L m j (t) dt + σ 1 (t) dw k (t),. ( ) dl m m 1 (t) = Lm m 1 (t) σm 1 (t) σ m(t)δ ml m m(t) 1+δ ml dt + σ m m 1 (t) dw m(t) k (t), dl m m(t) = L m m(t)σ m (t)(t) dw k (t). The dynamics of the logarithm of the forward Libor rates are given by: [ ] ) d log(l m k ( (t)) = 1 m 2 σ k 2 δ i L m i + σ k (t) (t)σ i(t) 1 + δ i L m i (t) dt + σ k (t) dw (t). i=k+1 This will be useful if we consider the log-euler discretization in section Discretization schemes: Euler and predictor-corrector (2.22) To be able to simulate the forward Libor rates in time, we need a discretization scheme. Suppose we wish to simulate from 0 to T, we first define some grid points t 0 = 0 < t 1 < t 2 <... < t N = T. We choose this grid so that all tenor points within the interval [0, T ] coincide with a grid point. One of the most simple schemes to simulate the forward Libor rate is given by the log-euler scheme: L E i (t k+1 ) = L E i (t k ) exp ( [ 1 2 σ i(t k ) 2 + σ i (t k ) m n=i+1 ] δ n L E n (t k )σ n (t k ) 1 + δ n L E h k + σ i (t k ) W (t k ) ) h k. n (t k ) (2.23) Here h k = t k+1 t k and W (t k ) are independent standard normal random variables. A second scheme which is used in the thesis is the predictor-corrector scheme (PC scheme). This scheme uses the Euler approximation to get an improved approximation of the contribution of the drift term. Define the drift function µ i as: µ i (x, σ) = m m σ ij j=1 n=i+1 δ n x n σ nj 1 + δ n x n (2.24) for every vector x R m and matrix σ R m m. Then equation (2.23) can be written as: ( [ ] 1 L E i (t k+1 ) = L E i (t k ) exp 2 σ i 2 + µ i (L E (t k ), σ(t)) h k + σ i (t k ) W (t k ) ) h k. (2.25) Now we consider the PC scheme. Suppose that we have already simulated until t k and that the forward Libor rates are given by L P i C (t k ). The predictor corrector scheme consists of two steps for each time step. First we calculate a predictor of forward Libor rates at t k+1. This predictor is given by applying the Euler scheme. Given L P i C (t k ) at t k, the predictor becomes:

20 L E. i (t k+1 ) = L P C i ( [ ] 1 (t k ) exp 2 σ i 2 + µ i (L P C (t k ), σ(t)) h k + σ i (t k ) W (t k ) ) h k. (2.26) This predictor is used to correct the drift term. The final estimator for the Libor rate L i (t k+1 ) is given by L P i C (t k+1 ) and obtained by: L P C i ( [ 1 (t k+1 ) = L P i C (t k ) exp σ i(t k ) 2 (2.27) ( µi (L P C (t k ), t k ) + µ i (L E (t k+1 ), σ(t k )) )] h k + σ i (t k ) W (t k ) ) h k. 2.5 Pricing using Monte Carlo methods European products The price of a European option can be written as an expectation under the terminal measure if T equals one of the tenor points. The numeraire corresponding to this measure is given by P Tm (t). It is important to note that, at every tenor point, this numeraire can be written as a function of forward Libor rates: Definition 13. The numeraire of the terminal measure is given by: P Tm (T i ) = 1 m k=i+1 (1 + δ kl k (T i )). (2.28) Derivation of price under T m measure Suppose that we have a European option which pays g(l(t 0 )) at T 1. The option pays at time T 1, but is already known at T 0. We denote the price of the option at T 0 by V (T 0, L(T 0 )). Under the T m forward measure we have: V (0, L(0)) = P Tm (0)E m [ V (T0, L(T 0 )) P Tm (T 0 ) ] F(0). (2.29) The pricing formula for V (T 0, L(T 0 )) under the T 1 measure is known and is given by: [ ] V (T 0, L(T 0 )) = P T1 (T 0 )E 1 g(l(t0 )) P T1 (T 1 ) F(T 0) = P T1 (T 0 )g(l(t 0 )). (2.30) The right hand-side of equation (2.30) follows from the fact that P T1 (T 1 ) = 1 and g(l(t 0 )) is known at T 0. Substituting (2.30) into equation (2.29) and using the tower property gives: [ ] V (0, L(0)) = P Tm (0)E m PT1 (T 0 ) P Tm (T 0 ) g(l(t 0)) F(0). (2.31) Now we have the pricing formula for an option maturing at T 0 < T m under the T m measure. The quotient inside the expectation on the right hand-side of equation (2.31) is in fact a function of the forward Libor rates L 2 (T 0 ),..., L m (T 0 ): P T1 (T 0 ) m P Tm (T 0 ) = i=2 P Ti (T 0 ) m P Ti 1 (T 0 ) = (1 + (T i T i 1 )L i (T 0 )). However, to save notation we keep the notation on the left hand-side. The expectation in equation (2.31) can be estimated with a mean using a sample size M. Suppose ˆL(T i ) is an estimate for the vector of simulated (forward) Libor rates under the terminal measure (with help of the Euler or PC scheme), then our estimator ˆV (0, L(0)) for the option price is given by: i=2 16

21 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 17 ( ) ˆPTi (T i 1 ) ˆV (0, L(0)) = P Tm (0)Mean M ˆP Tm (T i 1 ) g(ˆl(t i 1 )), (2.32) with Mean M (.) the average over M realizations. We use a short-hand notation to indicate a payoff discounted until T m : Bermudan products g(l(t i 1 )) = P T i (T i 1 ) P Tm (T i 1 ) g(l(t i 1)). (2.33) The pricing of Bermudan options is somewhat more involved. Consider for example the cancellable option of section In this case one has first to determine optimal exercise regions at every exercise time so that the optimal exercise time is given by τ: τ = inf(t i : L(T i ) in the exercise region at T i ), (2.34) with the time point τ so that the expected payoff is as high as possible. Given the forward Libor rates at time T i we should know whether it is optimal to exercise at this time point or not. This exercise region can be calculated by the Longstaff-Schwartz algorithm. The pricing formula under the terminal measure becomes: [ τ ] V (0, L(0)) = P Tm (0)E m g i (l(t i 1, T i 1, T i )) F(0). (2.35) i=1 Longstaff-Schwartz algorithm The Longstaff-Schwartz algorithm is described in general in [10]. Here we will describe the algorithm for the payoffs in section The algorithm is a backward algorithm. First we simulate M paths of forward Libor rates. Then suppose that we are at the last decision point, T m 2. There we have two possibilities: 1. We cancel the contract and we will not get the payment at T m. 2. We don t cancel the contract and we will get the payment at T m. The payoff at T m 1 depends on the decision at T m 3 and is independent of the decision at T m 2. Therefore we want to compare the payoff at T m obtained by decision one and by decision two, we introduce two values: Definition 14. Exercise value at T m 2. The exercise value at time T m 2 is the payoff at time T m, discounted to T m 2 given that the contract is exercised at time T m 2. For cancellable contracts, this exercise value is zero. Definition 15. Hold value at T m 2. The hold value H(L(T m 2 ), T m 2 ) at time T m 2 is the payoff at time T m, discounted to T m 2, given that the contract is not exercised at time T m 2. Since T m 2 is the last decision time, this value can be priced like the pricing of a European contract at time T m 2 with payoff function g m (L(T m 1 )). Recall that all Libors are settled at T m 1, but that this last payoff is paid at time T m. The exercise value (decision one) is 0, the hold value (decision two) can be positive or negative. If this value is positive, then we should not cancel the contract. The idea is to estimate the function H(L(T m 2 ), T m 2 ) as a linear combination of basis functions which are functions of some explanatory variables that can be calculated with the forward Libor rates L(T m 2 ). For the other decision points we have to make a similar decision, consider the possible decisions at time T m 3 :

22 1. We cancel the contract and we will not get the payments at T m 1 and T m. 2. We don t cancel the contract, we will get the payment at T m 1 and have the possibility to cancel the contract at T m 2 or not. For this decision we want to estimate the hold value H(L(T m 3 ), T m 3 ) to make the decision. Therefore we want to be able to estimate the hold value H(L(T i ), T i ) for every 0 i m 2. Definition 16. Hold value H(L(T i ), T i ) and the definition of γ(t i ). H(L(T i ), T i ) = P Tm (T i )E m [ g i+2 (L(T i+1 )) + 1 P Tm (T i+1 ) max(h(l(t i+1), T i+1 ), 0) F(T i )]. We will denote the estimated hold value by Ĥ(L(T i), T i ). Later in this section we need the random variable inside this expectation. Therefore we define: ( ) γ(t i ) = P Tm (T i ) g i+2 1 (L(T i+1 )) + P Tm (T i+1 ) max(h(l(t i+1), T i+1 ), 0). Before we explain how these functions can be approximated, we define the explanatory variables and basis functions. Definition 17. Explanatory variables and basis functions. Suppose we are at time step T i, we define three explanatory variables: The first payoff discounted until T m which is received by holding the option but would not be received if the contract was cancelled at T i : B 1 (L(T i )) = P T i+2 (T i ) P Tm (T i ) gi+2 (L(T i )). (2.36) Sum of payoffs, discounted until T m, that might be received by holding the contract (this depends on future decisions): B 2 (L(T i )) = Discount factor from T m to T i+2 : m 1 j=i+3 P Tj (T i ) P Tm (T i ) gj (L(T i )). (2.37) B 3 (L(T i )) = P T m (T i ) P Ti+2 (T i ). (2.38) Because B 3 (L(T m 2 )) = 1, this type of explanatory variable is not used for the last decision point. Note that B 2 (L(T i )) = 0 for i = m 2 and i = m 3, hence the explanatory variable of this type is not used for the last two decision points. The basis functions we consider are constant, linear and quadratic without mixed terms: ϕ 1 = 1, ϕ 2 (B 1 (L(T i )) = B 1 (L(T i ), ϕ 3 (B 1 (L(T i ))) = (B 1 (L(T i ))) 2. At time points T i < T m 2 : ϕ 4 (B 2 (L(T i ))) = B 2 (L(T i )), ϕ 5 (B 2 (L(T i ))) = (B 2 (L(T i ))) 2. At time points T i < T m 3 : ϕ 6 (B 3 (L(T i ))) = B 3 (L(T i )), ϕ 7 (B 3 (L(T i ))) = (B 3 (L(T i ))) 2. 18

23 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 19 Because the number of explanatory variables changes in time, we have to define an integer valued function α(j) which indicates the number of basis functions we have to use: 3 if i = m 2 α(i) = 5 if i = m 3. 7 if i < m 3 The linear combination which is used to estimate the hold value is given by: α(i) H(L, T i ) c j (T i )ϕ j (B(L(T i ), T i )). (2.39) j=1 The parameters c j (T i ), for 0 j α(i) and 0 i m 2, have to be calculated with the help of Algorithm 1. 1 Simulate M paths of Libor rates. Denote the vector of forward libor rates from the j-th path at time point T i as L(T i ) (j). The zero coupon bonds and γ(t i ) corresponding to the j-th path are denoted by P Tk (T i ) (j) and γ (j) (T i ). for i = m 2 to 1 do for j = 1 to M do See Definition 16 for the definition of γ (j) (T i+1 ). if i=m-2 then 2 γ (j) (T i+1 ) = P Tm (T i ) (j) g i+2 (L(T i+1 ) (j) ) else ) 3 γ (j) (T i+1 ) = P Tm (T i ) ( g (j) i+2 (L(T i+1 ) (j) 1 ) + P Tm (T i+1 ) max(ĥ(l(t i+1) (j), T (j) i+1 ), 0) end end We want to have that Ĥ(L(T i), T i ) approximates E i+2 [γ(t i ) F(T i )] as close as possible by choosing the coefficients c k (T i ) for 1 k α(i). This can be done with linear regression: 4 Construct a matrix A(T i ) with elements: a jk (T i ) = ϕ k (L(T i+1 ) (j) ). 5 Calculate the vector of coefficients using regression: 6 Evaluate the hold value: c(t i ) = ( A(T i ) A(T i )) 1 A(Ti ) γ i (T i+1 ). Ĥ(L(T i+1 ) (j), T i+1 ) = 7 c k (T i )ϕ k (L(T i+1 )). end The coefficients c(t m 2 ), c(t m 3 ),..., c(t 0 ) are estimated and can be used to approximate the hold values. Algorithm 1: Longstaff-Schwartz algorithm. To estimate the Greeks, we choose our paths independent of the paths that were used for the Longstaff-Schwartz pre-simulation. In this way we eliminate the foresight bias which is stated in [4] by C.P. Fries. If we have bought a cancellable contract at time 0, then the optimal exercise date is estimated by Algorithm 2. k=1

24 1 Estimate all coefficients c(t i ) backwards as described in Algorithm 1. If this is done we can evaluate the hold values H(L(T i ), T i ) for every path and at any tenor point.2 Set i = 0. 3 Simulate the forward Libors until T 0. while H(L(T i ), T i ) > 0 do 4 i = i Simulate the forward Libors until T i. end 6 The contract is cancelled at T i, now the exercise time is known and the total payoff can be calculated from the path. Algorithm 2: Optimal exercise decision. Control variates for cancellable options Because of the random exercise time and multiple payoff dates of the cancellable contracts, we expect a larger variance of price and Greek estimators than for the European options. For this reason, we use a control variate to reduce the variance of: τ g i (L(T i 1 )). (2.40) i=1 The idea is to first estimate the expectation τ of the exercise time τ. We denote this estimate of the expected exercise time as ˆτ. The control variate which we want to use is ˆτ i=1 g i (L(T i 1 )). (2.41) Note that, for the payoffs in section 2.3.2, the following expectation can be calculated analytically: [ τ ] P Tm (0)E m g i τ ] (L(T i 1 )) L(0) = P Tm (0) E [ g m i (L(T i 1 )) L(0). (2.42) i=1 Every term of the sum in (2.42) is given by a price of a European option. With this control variate, the estimator for the option price of the cancellable contract is given by: V (0, L(0)) = P Tm (0)E m [ τ i=1 g i (L(T i 1 )) i=1 ] τ g i (L(T i 1 )) L(0) + i=1 τ i=1 V i eur(0, L(0)), (2.43) where V i eur(0, L(0)) is the price of the European option with maturity T i 1. The payments g 1,..., g min(τ,τ) cancel out and the estimator can be written as: V (0, L(0)) = P Tm (0)E m (1 τ>τ 1 τ<τ ) 2.6 Greeks max(τ,τ) i=min(τ,τ)+1 g i (L(T i 1 )) L(0) + τ i=1 V i eur(0, L(0)). (2.44) The main goal of this thesis is the accurate computation of the so-called Greeks. We will consider the delta, vega and gamma. Respectively the delta, vega and gamma of an option with option price V (0, L(0), σ) are given by: V (0, L(0)), L j (0) σ ij V (0, L(0)), 20 2 V (0, L(0)). (2.45) L i (0)L j (0)

25 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 21 We would like to calculate the Greeks for all i, j {1, 2,..., m}. We use the following notations: The vector of deltas [ ] V (0, L(0)) = V (0, L(0)),..., V (0, L(0)), L(0) L 1 (0) L m (0) the matrix of gammas 2 V (0, L(0)) = L(0) 2 and the matrix of vegas (recall that σ is a matrix) V (0, L(0)) = σ 2 L 1 V (0, L(0), )... 2 (0) 2 L 1 (0)L m(0) V (0, L(0)) L m(0)l 1 (0) V (0, L(0))... 2 L m(0) V (0, L(0)) 2 σ 11 V (0, L(0)) σ 1m V (0, L(0)). σ m1 V (0, L(0))... σ mm V (0, L(0)) Suppose we want to differentiate with respect to some parameter θ, which can be, for example, the vector of initial forward Libor rates L(0) or a vector of volatility parameters σ. Since the option price is an expectation of some random variable Y, where Y can be written as a function of Libor rates L at settle time, the quantity of interest is given by: θ V (0, L(0)) = θ Em [Y F 0 ]. (2.46) To estimate this derivative we could consider three ways to look at this problem. The most obvious way is to estimate the derivative with the derivative of a mean: θ V (0, L(0)) θ Mean (Y F 0). (2.47) This approximation can be done by the finite difference method, which is also known as the bump method. Another way is to, if allowed, first interchange the differentiation and expectation operators and then use the multidimensional chain rule with the fact that Y is a function of L. Then the estimator will become: ( ) Y L Mean L θ F 0. (2.48) This will lead to the pathwise estimators. However, the interchanging of expectation and differentiation is not always allowed, the conditions are described in subsection For example discontinuous payoffs may give problems. Two methods to overcome this problem are the vibrato Monte Carlo method and the kernel method. The last way to estimate the Greeks is to first write the expectation as an integral and then interchange the differentiation and the integration. Now the justification of the interchange depends on the probability density function inside the integral. Since this is usually a smooth function, this interchange is typically allowed. When the differentiation inside the integral is done, the integral can be estimated with an expectation, so that we are able to approximate the Greek by a Monte Carlo method. A drawback of this method is that we need the probability density function. In the lognormal Libor market model we could also use the multivariate normal distribution of the forward Libor rate estimates using one time step. The bias which is introduced can be reduced by the use of the predictor-corrector scheme. A more detailed description of all these methods is given in Chapter 3. Results of the methods applied on the payoffs which are described in section 2.3.1, with the model dynamics of section 2.4.3, will be described in the last part of the thesis.., (2.49)

26 2.6.1 Conditions for interchanging derivative and expectation In this section the conditions which justify the interchange of the derivative and expectation as in equation 3.46 are given. In total there are 4 conditions, two for the dynamics of the underlying asset and two for the payoff function. We will only give this conditions and the steps which are needed to understand the proof for the cancellable options, the original derivation can be found in [7]. The conditions are constructed under a basic setting which can be applied for some relatively simple products. We consider a discounted payoff function g which is a function of a finite number of underlying assets X at a finite number of dates. We assume that the evolution of the underlying assets depends on a parameter θ Θ, where Θ is an interval of the real line. Thus: g(x(θ)) = g(x 1 (θ),..., X m (θ)). Now we give conditions from [7] such that g(x(θ)) θ exists for all θ Θ with probability one. Conditions for the dynamics of the underlying asset 1. At each θ Θ, X i(θ) θ exists with probability There exist random variables k i with E[k i ] < such that: X i (θ 2 ) X i (θ 1 ) k i θ 2 θ 1 i {1,..., m} and θ 1, θ 2 Θ. Conditions for the payoff function 3. For all θ Θ, the discounted payoff g is differentiable with respect to X(θ) with probability There exist a constant c g such that: g(x) g(y) < c g x y x, y R m. By conditions 1 and 3 g(x(θ)) θ is differentiable with probability 1. By combining conditions 2 and 4, Glasserman shows that: m g(x(θ 2 )) g(x(θ 1 )) k Y θ 2 θ 1 with k Y = k f k i. (2.50) Choosing θ 1 = θ and θ 2 = θ + h and dividing by h gives: g(x(θ)) g(x(θ + h)) h < k Y. (2.51) Then we have, by the dominated convergence theorem: ] [ g(x(θ)) g(x(θ + h)) θ Em [ g(x(θ))] = lim E m h 0 h ] [ ] = E [lim m g(x(θ)) g(x(θ + h)) = E h 0 h θ g(x(θ)). Conditions in practice As stated by Glasserman in [7], the main obstacle is the fourth condition. The other conditions are typically satisfied for practical problems. Therefore we do not check these conditions for the lognormal Libor market model. i=1 22

27 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 23 Cancellable options One of the payoff types we consider in this thesis are the cancellable options, where: g CL (L(T 0 ), L(T 1 ),..., L(T m 1 )) = τ g i (L(T i 1 )). (2.52) Until now it is not clear whether or not we can interchange differentiation and expectation for this type of payoffs. At first glance, the decision boundary looks like to cause some problems. We cannot easily say that the fourth condition is satisfied since a small difference in an input parameter can give two completely different exercise points (and hence introduce a relatively large difference in the total payoff). In this subsection we prove that the interchange between differentiation and expectation is allowed if the conditions from the beginning of this section hold for every g i and if the option value is Lipschitz continuous in θ. The main idea of the proof comes from [11] page 1034, where Piterbarg proves it for callable instead of cancellable contracts. The main idea is that the indicator functions, which come from the exercise decision, can be written as max functions if we consider the option value as an expectation of the sum of an option value and a payoff at the next tenor date. Denote the option price of the cancellable option at time T i by V (T i, L(0)). The first payoff depending on forward Libor rates after T i is g i+2 (x(t i+1 )) paid at T i+2. We have: [ τ ] V (T i, L(T i )) = P Tm (T i )E m g k (L(T k 1 )) F(T i). (2.53) k=i+2 The option price at time T i is equal to the sum of two terms. The first term comes from the hold value of the option and should only be added if the option is not cancelled at T i, therefore an indicator function is used. The second term comes from the payoff which the holder of the contract gets at T i+2. [ ] V (T i, L(T i )) = P Tm (T i )E m 1 P Tm (T i+1 ) V (T i+1, L(T i+1 ))1 V (Ti+1,L(T i+1 ))>0 + g i+2 (L(T i+1 )) F(T i). Taking the derivative gives: θ V (T i, L(T i )) = [ ] θ P T m (T i )E m 1 P Tm (T i+1 ) V (T i+1, L(T i+1 ))1 V (Ti+1,L(T i+1 ))>0 + g i+2 (L(T i+1 )) F(T i). Note that we can write V (T i+1, L(T i+1 ))1 V (Ti+1,L(T i+1 ))>0 as max(v (T i+1, L(T i+1 )), 0). Because of the linearity of differentiation: θ V (T i, L(T i )) = [ ] θ P T m (T i )E m 1 P Tm (T i+1 ) max(v (T i+1,, L(T i+1 )), 0) F(T i) + θ [ ] P Tm (T i )E m g i+2 (L(T i )) F(T i). For the second term we typically can interchange expectation and differentiation if the payoff functions are continuous and almost sure differentiable. For the first part, note that also max(x, y) is continuous in x and that there is only one point where it is not differentiable. Now use that the option price V (T i+1, L(T i+1 )) is Lipschitz continuous in θ to justify the interchange of the differentiation and expectation in the first term to get: i=1 [ ] θ V (T i,, L(T i )) = E m P Tm (T i ) 1 V (Ti+1,L(T i+1 ))>0 θ P Tm (T i+1 ) V (T i+1, L(T i+1 )) F(T i) [ ] + E m θ P T m (T i ) g i+2 (L(T i+1 )) F(T i). (2.54)

28 We can substitute [ ] P Tm (T i ) θ P Tm (T i+1 ) V (T i+1,, L(T i+1 )) = E m P Tm (T i ) 1 V (Ti+2,L(T i+2 ))>0 θ P Tm (T i+2 ) V (T i+2,, L(T i+2 )) F(T i+2) [ ] + E m θ P T m (T i ) g i+3 (L(T i+2 )) F(T i+1) into equation (2.54) and use the tower property to get: [ ( θ V (T i, L(T i )) = E m P Tm (T i ) 1 θ V (Ti+1,L(T i+1 ))>0 P Tm (T i+1 ) V (T i+2, L(T i+2 )) i+2 k + 1 V (Tj 1,L(T j 1 ))>0P Tm (T i ) g k+1 (L(T k )) F(T i). k=i+1 j=i+2 Starting this recursion with i = 1 (using T 1 = 0) and using V (T m 1, L(T m 1 )) = 0 at the end gives: m V (0, L(0)) = Em k 1 θ θ V (Tj 1,,L(T j 1 ))>0P Tm (0) g k (L(T k 1 )) F(0) k=1 j=1 [ ] = E m τ P Tm (0) g k (L(T k 1 )) θ F(0). (2.55) k=1 2.7 Analytic pricing formulas The price and Greeks of the options considered in section can be calculated analytically. We show this for both the caplet and the digital option. For both options, we first need the dynamics of L 1 (t) under the T 1 measure: dl 1 (t) = L 1 (t)(σ 11 (t)dw 1 (t) + σ 12 (t)dw 2 (t) σ 1m (t)dw m (t)) for 0 < t < T 0. σ(t) is constant over [0, T 0 ). The dynamics are equal to the dynamics given by: dl 1 (t) = L 1 (t) v d W (t), T0 with v = T m 0 i=1 σ 1i(0) 2 and W (t) a standard scalar Brownian motion. The solution of this stochastic differential equation is given by: ( L 1 (T 0 ) = L 1 (0) exp 1 2 v2 + v ) W (T0 ). T 0 To obtain the option value we need to calculate: [ ] P T1 (0)E T 1 g(l 1 (T 0 )) F(0) = P T1 (0) where f(y) is the standard normal distribution Caplet g(l 1 (0) exp( 1 2 v2 + vy))f(y)dy, (2.56) In the lognormal Libor market model the price of a caplet is derived analytically in [1] (proposition 6.4.1): V cap (0, L(0)) = P T1 (0)δ 1 (L 1 (0)N (d + ) KN (d )), (2.57) where N is the standard Gaussian cdf function. The values d ± and v are given by: d ± = log( L 1(0) K ) ± 1 2 v2 1 v 1 v 1 = T 0 σ 1 (t). 24

29 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS 25 Derivation of caplet price First substitute the caplet payoff in equation (2.56): [ ] P T1 (0)E T 1 δ 1 (L 1 (T 0 ) K) + F(0) = P T1 (0) δ 1 (L 1 (0) exp( 1 2 v2 + vy) K) + f(y)dy. We have: which gives: L 1 (0) exp( 1 2 v2 + vy) > K y > log( L 1(0) K ) v2 v = d, P T1 (0)E T 1 [ ] ( δ 1 (L 1 (T 0 ) K) + F(0) = P T1 (0)δ 1 L 1 (0) exp( 1 ) d 2 v2 + vy)f(y)dy Kf(y)dy. d For the second integral in equation (2.58) we have: (2.58) d Kf(y)dy = K (1 N ( d )) = KN (d ). (2.59) For the first integral in equation (2.58) we have to do some more work. We rewrite the term inside the integral to get: Then substitute ỹ = y v: L 1 (0) exp( 1 2 v2 + vy)f(y) = L 1(0) 2π e y2 2yv+v 2 2 = L 1(0) 2π e (y v)2 2. d L 1 (0) exp( 1 2 v2 + vy)f(y)dy = L 1 (0) d 1 e (y v)2 2 dy = L 1 (0) 2π d + 1 e (ỹ)2 2 dỹ 2π = L 1 (0) (1 N ( d + )) = L 1 (0)N (d + ). (2.60) Combining equation (2.58), (2.59) and (2.60) gives equation (2.57). Delta In this subsection we calculate ( the sensitivity ) with respect to the bond P T1 (0). We first substitute the equation L(0, T 0, T 1 ) = 1 PT0 (0) δ 1 P T1 (0) 1 into equation (2.57): ( ( ) ) 1 PT0 (0) V cap (0, L(0)) = P T1 (0)δ 1 δ 1 P T1 (0) 1 N (d + ) KN (d ). The product and chain rule for differentiation give: with ( ( ) ) V cap 1 P T1 (0) (0, L(0)) = δ PT0 (0) 1 δ 1 P T1 (0) 1 N (d + ) KN (d ) Other derivatives are given by: [( ) PT0 (0) +P T1 (0) P T1 (0) 1 N (d + ) d + P T1 (0) P T 0 (0) P T1 (0)δ 1 KN d (d ) P T1 (0) ] (P T1 (0)) 2 N (d +) d P T1 (0) = d + P T1 (0) = 1 v 1 P T0 (0) (P T1 (0)) 2 1 ( PT0 (0) P T1 (0) 1 ).,

30 [( ) V cap P T0 (0) (0, L(0)) = P T 1 (0)δ 1 KN d (d ) P T0 (0) +P PT0 (0) T 1 (0) P T1 (0) 1 N (d + ) with d + P T0 (0) + 1 d P T0 (0) = d + P T0 (0) = ( v 1 P T1 (0) PT0 ). (2.61) (0) P 1 (0) 1 Furthermore we see from (2.57) that all derivatives of the form k 0 are zero. Vcap P Tk (0)(0, L(0)) with k i and We also wish to test the caplet vega, which is the derivative with respect to a volatility parameter. For example, we can calculate the derivative with respect to σ 1j for an arbitrary integer j: [ V cap (0, L(0)) = P T1 (0)δ 1 L 1 (0)N (d + ) d + KN (d ) d ] v1, σ 1j v 1 v 1 σ 1j with ] (P T1 (0)) N (d +), d ± v 1 = log(l 1((0)/K)) v 2 1 ± 1 2 and v 1 σ 1j = σ 1jT 0 v Digital Also the digital option can be priced analytically. The derivation of the pricing formula goes in a similar way as the derivation for the caplet. To obtain the option value we need to calculate: V dig (0, L(0)) = P T1 (0)E T 1 [δ 1 1 L1 (T 0 )>K] = P T1 (0) δ 1 1 L1 (0) exp( 1 2 v2 +vy)>k (y)f(y)dy, where f is the standard normal distribution function. Since: L 1 (0) exp( 1 2 v2 + vy) > K y > log( L 1(0) K ) 1 2 v2 v the price is given by: P T1 (0)δ 1 (1 N ( d )) = P T1 (0)δ 1 N (d ), = d, where N (x) is given by the cumulative standard-normal distribution function. With help of this function, the Greeks can be calculated analytically in the same way as is done for the caplet. V dig P T1 (0) (0, L(0)) = δ 1N (d ) + P T1 (0)δ 1 N d (d ) P T0 (0), (2.62) V dig P T0 (0) (0, L(0)) = P T 1 (0)δ 1 N d (d ) P T1 (0), (2.63) V dig (0, L(0)) σ 1j The derivatives of d are given in section = P T1 (0)δ 1 N (d ) d σ 1j. (2.64) 2.8 Statistics The quality of our estimators is determined by three quantities: the CPU time, the bias and the variance. Sometimes we will also use the mean squared error (MSE) to see whether a method has converged or not. We will describe the variance and bias in this subsection. Therefore we consider a random variable Y. Our estimator of Y is denoted by Ŷ. The mean of our estimator is denoted by Y and we want to estimate E[Y ]. 26

31 CHAPTER 2. LIBOR MARKET MODEL, PRODUCTS AND GREEKS Variance The variance is a quantity that reflects the variability of a random variable. Monte Carlo estimates are random variables and hence will have some positive variance. This quantity is defined by: Var(Ŷ ) = E[(Ŷ Y )2 ]. The variance of the Monte Carlo estimator gets smaller by choosing a larger sample. However, if one chooses a larger sample it will cost more time. Therefore the variance of the Monte Carlo estimate and CPU time are closely linked Bias The bias is another type of error: Bias(Ŷ, Y ) = E[Y ] Y. The bias cannot be reduced by taking a larger sample size and comes from the discretization (like the log-euler or PC scheme) grid or parameter choice (bump parameter, bandwidth in kernel method). Hence all methods which we consider have a bias coming from the discretization grid ( discretization bias ), the bump method and the kernel method have some extra bias. The discretization bias can be reduced by taking smaller time steps MSE For the examination of some results we use the mean squared error of the estimator Ŷ, because the mean square error reflects both the variance as well as the square of the bias. The square root of the mean squared error is an upper bound for the bias as well for the standard error. If y = E[Y ] is the analytic solution, then the MSE is: MSE(Ŷ, y) = E[y Ŷ ] = V ar(ŷ ) + Bias(Ŷ, Y )2. Another examination is to plot the analytic solution as a function of some of its parameters, lets say an initial condition, and then plot some estimations into the figure. In this way we have a nice visualization of the performance of the methods.

32 Chapter 3 Estimating first order sensitivities 3.1 Introduction This chapter forms the main part of the thesis. We will consider various methods to estimate the first order Greeks for European as well as cancellable options. These methods are the bump method, the likelihood ratio method (LR method), the pathwise sensitivity method (PS method), the Vibrato Monte Carlo method (VMC method) and the kernel method. The vegas in this thesis are the sensitivities with respect to model parameters. The vegas obtained from the pathwise sensitivity method and the LR method can be mapped to market volatility parameters. How this can be done is described by Piterbarg at page 1103 of ([12]). 3.2 The bump method European options The most simple method to estimate sensitivities is given by the bump method, which is also known as the forward difference method. Consider a discounted payoff g(l(t i )) which depends on the (forward) Libor rates at time T i. Let θ be a parameter on which the dynamics of L(t) depends. If the finite difference method is applied, V θ is approximated by: ˆV θ (0, L(0)) = P T m (0) Mean( g(ˆl(t i, θ + h)) F 0 ) Mean( g(ˆl(t i, θ)) F 0 ) h + P T m (0) Mean( g(ˆl(t i, θ)) F 0 ) θ for a (small) scalar h and where Mean(.) is the average. In contrast to the rest of the thesis we point to the dependence of the simulated forward Libor rates on a model parameter θ. Note that if we would apply this method to the delta, we should evolve the forward Libor rates m + 1 times for each sample. Now we say more about the role of h. Given that E( g) is twice differentiable in θ, the bias is given in [7] page 378 by: Bias( ˆV θ, V θ ) = 1 2 V 2 θ 2 h + O(h2 ). (3.2) The variance of first term on the right hand-side of equation (3.1) is given by: P Tm (0) 2 h 2 Var( Mean( g(l(t i, θ + h))) Mean( g(l(t i, θ)))) (3.3) Hence by choosing h smaller, the bias will also be smaller. But what happens with the variance of the difference in the numerator of equation (3.1)? Glasserman presented three cases in [7]: (3.1) 28

33 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES The means are computed using independent underlying random variables for the paths of L. If V ar( g(l(t i, θ))) is continuous in θ, then the variance of the numerator of equation (3.1) is of order O(1). It follows that the variance of the bump estimator is of order O(h 2 ). 2. Same as previous but now with the same underlying random variables for the paths of L, the variance of the numerator of equation (3.1)is of order O(h). It follows that the variance of the bump estimator is of order O(h 1 ). 3. The means are computed using the same underlying random variables for the paths of L where g(l(t i, θ)) is continuous in θ for all paths. In this case the variance is of order O(h 2 ). It follows that the variance of the bump estimator is of order O(1). So in the first and second case we have to be careful with choosing h, since for too small values the variance of the estimator in equation (3.3) may explode, and for too high h the bias will be bigger. In the third case h can be taken as small as possible, now one should only take care of the machine precision Bermudan options The method could be applied for the Bermudan case without any changes to the method itself, but the method becomes inefficient if the optimal exercise regions have to be re-evaluated for every bumped parameter. Note that in the lognormal Libor market model the exercise region for a decision at time t does, given L(t), not dependent on L(0) and hence we do not have this problem for the delta and gamma. But for the vega the case is different. However, Piterbarg noted in [11] that the first order sensitivity to any input that affects the exercise boundary is zero and therefore, if some input is bumped, the effect due to the exercise boundary is of second order. Due to this, the same exercise boundary can be used in the limit as h 0. Therefore we expect that the contribution of bumping the exercise boundary is limited by taking the bump size small enough Application of the control variate At the end of section 2.5 we introduced a control variate in the form of a European option for the pricing of the cancellable contracts of section The application of this control variate to calculate the bump estimator is straightforward, and could be applied if the sensitivity of the European option, which is used as control variate, is known analytically. In equation (2.44) the following pricing formula was given: V (0, L(0)) = P Tm (0)E m (1 τ>τ 1 τ<τ ) max(τ,τ) i=min(τ,τ)+1 g i τ (L(T i 1 )) L(0) + Veur(0, i L(0)). (3.4) The first term of the right hand-side of equation (3.4) can be seen as an option price. We denote this price by V CV (L(0), τ). After differentiation equation (3.4) becomes: V V CV τ (0, L(0)) = θ θ (L(0), τ) + i=1 V i eur θ i=1 (0, L(0)). (3.5) Now the second term on the right hand-side (the sum of European option prices) can be calculated analytically and the first term can be estimated with the bump method. The implementation of the bump method for this sensitivity is not so different as the original implementation, in fact we only change the range of the sum and, if τ > τ, the sign of the individual payoffs.

34 3.3 Likelihood ratio method Another way to estimate the Greeks is the likelihood ratio method (LR method). This method is also described in [7] page 401. The idea is, that if we are using the Euler discretization scheme, we have for every step the probability density function for L(t k+1 ) given L(t k ). The expectation can be written as a multidimensional integral using these probability density functions. The conditions to interchange the derivative and integral operator are not as restrictive as interchanging the differentiation and expectation operator. Before we do this interchange the payoff function only depends on a dummy variable over which we perform the integration. After interchanging the derivative and integral operator we only have to differentiate the probability density function. In this way we find another way to estimate Greeks of options with discontinuous payoff functions European options Suppose that the objective is to estimate the derivative with respect to a certain parameter θ: V θ (0, L(0)) = θ Em [ g(ˆl(t i ))]. (3.6) Assume that we know the probability density function f(x, L(0), σ) of log(l(t i )), then we can write the expectation as an integral: V θ (0, L(0)) = θ Em [ g(ˆl(t i ))] = g(e x )f(x, L(0), σ)dx. (3.7) θ R m The conditions to interchange the integral end the differentiation in equation (3.7) are less restrictive than the conditions of the pathwise sensitivity method. Because we will only use the smooth multivariate normal distribution we assume that this interchange is valid. The expression in (3.7) becomes: V θ (0, L(0)) = θ Em [ g(ˆl(t i ))] = g(e x ) f (x, L(0))dx. (3.8) R m θ By the chain rule it holds that: and therefore: log(f) (x, L(0)) = θ f θ By substituting (3.9) into (3.8) we get: V θ (0, L(0)) = θ Em [ g(ˆl(t i ))] = which can be written as 1 f(x, L(0)) f (x, L(0)), θ log(f) (x, L(0)) = (x, L(0))f(x, L(0)). (3.9) θ g(e x ) {log(f(x, L(0)))}f(x, L(0))dx, (3.10) R m θ V θ (0, L(0)) = θ E[ g(ˆl(t i ))] = E m [g(e x ) {log(f(x, L(0)))} L(0)]. (3.11) θ Delta Bermudan options As usual, the Bermudan case is more difficult. The LR method can still be applied with the method described in [7]. For the options considered in this thesis it holds that if one buys the contract, the buyer always get the first payment at T 1 (the end of the first tenor interval), which depends on the forward Libor rates at T 0. First recall the quantity of interest: 30

35 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 31 [ τ ] V L(0) (0, L(0)) = L(0) Em g k (L(T k 1 )) k=1 = [ g L(0) Em 1 (L(T 0 )) ] [ τ ] + L(0) Em g k (L(T k 1 )). k=2 Since the first term of the right hand side can be calculated with the European version of the LR method, we only consider the calculation of [ τ ] L(0) Em g k (L(T k 1 )). (3.12) Using the tower law gives: [ τ ] L(0) Em g k (L(T k 1 )) k=2 k=2 [ = L(0) Em [ τ ]] E m g k (L(T k 1 )) L(T 0). (3.13) k=2 The next step is to write the outer expectation as an integral, using the probability function f(x, L(0)) of L(T 0 ) given L(0). [ τ ] L(0) Em g k (L(T k 1 )) = [ τ ] E m g k (L(T k 1 )) L(0) L(T 0) = x f(x, L(0))dx. k=2 R m k=2 (3.14) Now we can interchange the integration and differentiation with the same reasoning as was done for the European version of the LR method. Note that the conditional expectation given L(T 0 ) = x is independent of L(0) since L(t) is a Markov process, hence 3.14 becomes: [ τ ] [ τ L(0) Em g k (L(T k 1 )) = E m g k (L(T k 1 )) L(T 0) = x k=2 R m k=2 In the same way as for the European LR method we get: [ τ ] [ [ τ ] L(0) Em g k (L(T k 1 )) = E m E m g k (L(T k 1 )) L(T 0) = x k=2 k=2 and by the tower law we end up with: [ τ ] [ τ L(0) Em g k (L(T k 1 )) = E m k=2 k=2 ] f (x, L(0))dx. L(0) (3.15) ] log(f) L(0) (L(T 0), L(0)), (3.16) ] g k (L(T k 1 )) log(f) L(0) (L(T 0), L(0)). (3.17) Now if we define the sum of all payoffs until τ by g(l(t 0 ), L(T 1 ),..., L(T m 1 )), then the Greek of the Bermudan-type options can be estimated with: [ V (0, L(0)) = Em g(l(t 0 ), L(T 1 ),..., L(T m 1 )) log(f) ] L(0) L(0) (L(T 0), L(0)). (3.18) Calculating this expectation is very similar to the calculation of the option price. The only difference is the multiplication with log(f) L(0) for every payoff simulation. Every payoff evaluation has to be multiplied, regardless of the exercise date, with log(f) L(0) (L(T 0), L(0)) which is calculated using the same random numbers as were used to evaluate g(l(t 0 ), L(T 1 ),..., L(T m 1 )). In Figure 3.1 we have illustrated the LR method and in Algorithm 3 we give the computation algorithm of the LR estimator.

36 1 Estimate the exercise regions with Longstaff Schwartz. for j = 1 to j = M do 2 Generate a matrix of random variables W (j). The element W (j) pl on the p-th row and l-th column corresponds to the random variable for the l-th time step and p-th Libor rate. 3 Simulate the path of forward Libor rates using the random variables W (j) until exercise time. We denote this path by L (j) 4 Calculate the payoff (under T m measure) 5 Calculate with W (j) :1 the first column of the matrix W (j). g(l (j) ). (j) log(f) = 1 ( (σ(t0 )) ) 1 (j) W :1 µ(l(0)), T0 6 Multiply the random variables corresponding to the first time step with 1 and calculate, using this random variables, a second payoff which is denoted by: g(l ( j) ). 7 end 8 Return ( ) µ(l(0)) 1 M L(0) M j=1 (j). (j) = ( ) (j) log(f) g(l (j) ) g(l ( j) ). µ(l(0)) Algorithm 3: Algorithm to compute the Bermudan delta with the LR method using the multivariate normal distribution. Figure 3.1: Figure corresponding to the LR method to calculate the Bermudan delta. The logarithm of the Libor L(T i ) is sampled from the distribution f (i). For the delta we only need to differentiate log(f (0) ). This derivative has to be multiplied with a sum of payoffs, from which the number of terms depends on when the contract is cancelled (in this figure we have τ = 4). This sum is also needed in the usual Monte Carlo pricing of the contract. 32

37 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES Application of the control variate We will explain the application of the control variate for the LR estimator in a similar way as for the bump method. In equation (2.44) the following pricing formula was given: V (0, L(0)) = P Tm (0)E m (1 τ>τ 1 τ<τ ) max(τ,τ) i=min(τ,τ)+1 g i (L(T i 1 )) L(0) + τ i=1 V i eur(0, L(0)). (3.19) In the LR method we estimate a mean of the total payoff times a derivative of the logarithm of a probability density function. So if we want to estimate the first term of the right hand-side we only have to define the new total payoff (see definition (9)). The corresponding (undiscounted) total payoff for this option is given by: g CV (L(T min(τ,τ)+1 ),..., L(T max(τ,τ) )) = (1 τ>τ 1 τ<τ ) max(τ,τ) i=min(τ,τ)+1 g i (L(T i 1 )). (3.20) Besides this and the analytical calculation of the sum of European option prices (second term of the right hand-side in equation (3.19)), the calculation of the LR- estimator with control variate is identical to the original LR estimator Probability density function As mentioned, we would come back to the availability of the probability density function. For the lognormal Libor market model the probability density function of L(T i ) is not known. However, since, for time reasons, we only want to use one discretization step for each tenor interval, we log( ˆf) can use θ (ˆL(T i ), L(0)). Here ˆf(x, L(0)) is the probability density function of log(ˆl(t i )) using the discretization grid. The probability density function of log(ˆl(t i )) using the log-euler method is the multivariate normal distribution. The PC scheme can be used to reduce the bias of the simulation of L(T i ), but we don t know the probability density function of the forward Libor rate evolved with the PC scheme. Despite this we have found a way to apply the LR method using the PC scheme Delta predictor-corrector modification Recall the discretization function for the logarithm of the forward Libor rates at time T corresponding to the PC scheme: [ 1 log(l i (T ) P C ) = log(l i (0)) 2 σ i(t ) ( µi (L(0)) + µ i (L E (T )) )] T + σ i (T ) W (0) T. 2 (3.21) Here L E (T ) is given, as was stated in equation (2.25), by: ( [ ] 1 L E i (T ) = L i (0) exp 2 σ i(t ) 2 + µ i (L(0)) T + σ i (T ) W (0) ) T, (3.22) with σ i (T ) the volatility parameters for (0, T ]. log(l i (T ) P C ) can be expressed as a function q i which depends on log(l E i (T )) and L(0):

38 Definition 18. PC function q. Let x, y R m, then the functions q is defined by: q i (x, y) = log(x) + T 2 (µ i(y) µ i (x)). (3.23) Now, if one discretization step is used, the PC estimate at t k+1 can be written as a function of the Euler estimate at t k+1 and the PC estimate at t k. log(l i (t k+1 ) P C ) = q i (log(l E i (t k+1 )), L P C (t k )). (3.24) For the delta we will only use that we can write the PC estimate obtained by making 1 time step from 0 to T and hence can write log(l i (T ) P C ) as q i (log(l E i (T )), L(0)). The value of interest is given by: [ g(l(t L(0) Em ) P C ) ] = g(exp(q(ˆx, L(0))))f(ˆx, L(0))dˆx, L(0) R m where f(ˆx, L(0)) is the probability density function of log(l E (T )). The problem is that g(q(ˆx, L(0))) depends on L(0) and that if g is discontinuous, we cannot differentiate with respect to L(0). The solution of this is to perform the following change in variables: x i = q i (ˆx, L(0)). Now we have to express the inverse (in the first argument) qi 1 of q i, i.e.: we have to find a function qi 1 such that qi 1 ( x, L(0)) = ˆx i. Note that qi 1 is a function of x and L(0). To see how this function can be found, recall that µ i is given by: µ i (exp(ˆx)) = σ i m n=i+1 δ n exp(ˆx n )σ n 1 + δ n exp(ˆx n ). (3.25) Note that µ i (exp(ˆx)) does not depend on ˆx j for j i and that µ m (ˆx) = 0. Therefore we can calculate ˆx i by a backward recursion: ˆx m = qm 1 ( x, L(0)) = x m, (3.26) ˆx k = q 1 k ( x, L(0)) = x k T m ( 2 σ Ln (0) k σ n δ n 1 + δ n L n (0) exp(q 1 ) n ( x, L(0))) 1 + δ n=k+1 n exp(qn 1 ( x, L(0))) ( ) = q 1 k+1 ( x, L(0)) + x k T 2 σ L k+1 (0) kσ k+1 δ k δ k+1 L k+1 (0) exp(q 1 k+1 ( x, L(0))) 1 + δ k+1 exp(q 1. ( x, L(0))) After the change of variables we get: L(0) Em [ g(l(t ))] = g(exp( x))f(q 1 ( x, L(0)), L(0)) q 1 L(0) R m x d x. (3.27) So we also need to calculate the determinant q 1 x, which turns out to be easy because of the following: qi 1 ( x, L(0)) = 0 if k < i and q 1 i ( x, L(0)) = 1. (3.28) x k x i Hence the matrix q 1 x is a triangular matrix with ones on the diagonal. This means that the determinant is one, hence (3.27) becomes: [ g(l(t L(0) Em ) P C ) ] = g(exp( x))f(q 1 ( x, L(0)), L(0))d x. (3.29) L(0) R m 34 k+1

39 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 35 We now apply the LR method to get: [ g(l(t L(0) Em ) P C ) ] ( ( = g(exp( x)) log(f) q 1 ( x, L(0)), L(0) )) f(q 1 ( x, L(0)), L(0))d x R m L(0) (3.30) Before we can perform the change in variables in opposite directions, we have to derive the derivative: ( ( log(f) q 1 ( x, L(0)), L(0) )). L(0) Therefore we introduce some notation. The probability density function f is presented as a function of two vector variables (we treat σ(t 0 ) as a matrix of constant parameters). We want to be able to differentiate with respect to all elements of the two input vectors using the notations: log(f) x i (x, y): the derivative with respect to the i-th element of the first input vector. log(f) L i (0) (x, y): the derivative with respect to the i-th element of the second input vector. The derivative of log(f) ( q 1 ( x, L(0)), L(0) ), where we take into account that the first argument depends on L(0), is given by: ( ( log(f) q 1 ( x, L(0)), L(0) )) = (3.31) L i (0) m q 1 k L i (0) ( x, L(0)) ( log(f)(1,k) q 1 ( x, L(0)), L(0) ) + log(f) (2,i) ( q 1 ( x, L(0)), L(0) ). k=1 If we wish to compute the derivatives of equation (3.31) for all i at once, the following matrixvector multiplications are more efficient than a term by term calculation: with for j = 1: ( ( log(f) q 1 ( x, L(0)), L(0) )) = L(0) q 1 log(f) ( ( x, L(0)) q 1 ( x, L(0)), L(0) ) + log(f) ( q 1 ( x, L(0)), L(0) ), L(0) x L(0) log(f) x for j = 2: log(f) L(0) ( q 1 ( x, L(0)), L(0) ) [ log(f) = x 1 ( q 1 ( x, L(0)), L(0) ) = [ log(f) L 1 (0) ( q 1 ( x, L(0)), L(0) ),..., log(f) ( q 1 ( x, L(0)), L(0) ) ], x m ( q 1 ( x, L(0)), L(0) ),..., log(f) ( q 1 ( x, L(0)), L(0) )], L m (0) and with the matrix: q 1 ( x, L(0)) = L(0) q 1 1 L 1 (0) ( x, L(0))... q 1 1. L m(0) qm 1 L 1 (0) ( x, L(0))... qm 1 L m(0) ( x, L(0)). ( x, L(0)) The elements of this matrix have to be calculated in a backward algorithm. equation (3.26) we get:. Differentiating

40 qm 1 ( x, L(0)) = 0, L j (0). qj 1 ( x, L(0)) L j (0) = 0, qj 1 1 L j (0) ( x, L(0)) = T σ j 1 σ j δ j 2 (1 + δ j L j (0)) 2, q 1 k L j (0) ( x, L(0)) = q 1 k+1 L j (0) ( x, L(0)) + T 2 exp(q 1 q k+1 ( x, L(0))) σ k σ k+1 δ 1 k+1 k+1 L j (0)( x, L(0)) ( 1 + δk+1 exp(q 1 k+1 ( x, L(0)))) 2, if j > k + 1. Now we know how to calculate the necessary derivatives, we return to equation (3.30). After applying the chain rule: [ g(l(t L(0) Em ) P C ) ] ( q 1 log(f) ( = g(exp( x)) ( x, L(0)) q 1 ( x, L(0)), L(0) ) R m L(0) x + log(f) L(0) ( q 1 ( x, L(0)), L(0) )) f ( q 1 ( x, L(0)), L(0) ) d x. Performing the change in variables in opposite directions gives: ˆx i = qi 1 ( x, L(0)), and using the notion that the matrix q ˆx diagonal, equation (3.30) becomes: is also a triangular matrix with only ones on the [ g(l(t L(0) Em ) P C ) ] ( q 1 log(f) = g(exp(q(ˆx, L(0)))) (q(ˆx, L(0)), L(0)) (ˆx, L(0)) R m L(0) x + log(f) ) (ˆx, L(0)) f (ˆx, L(0)) dˆx L(0) ( q = E [ g(l m P C 1 (T )) L(0) (log(lp C (T )), L(0)) log(f) ( log(l E (T )), L(0) ) x + log(f) ( log(l E (T )), L(0) ))]. (3.32) L(0) In comparison with the log-euler version of the LR method, this expectation is much more expensive to compute. For the log-euler version of the LR method we only need: L E (T ) and log(f) L(0) but in case of the PC method we need: ( log(l E (T )), L(0) ), L E (T ), L P C (T ), q 1 L(0) (log(lp C (T ), L(0))), log(f) x ( log(l E (T )), L(0) ) and log(f) L(0) ( log(l E (T )), L(0) ). The matrix q 1 /L(0) is expensive to calculate, but relatively cheap in comparison with the path simulation. A larger disadvantage is that the LR method becomes more difficult to implement for the PC scheme because of this matrix. 36

41 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES Computing the vega of Bermudan options Exercise region Note that the exercise regions depend on the volatility parameters σ(t 0 ),..., σ(t m 1 ). However, Piterbarg claims in [11] (page 24) that we can consider this volatility input for the exercise region as constant if we want to calculate the derivatives of the option price with respect to these parameters. In our case this means that we can consider the hold values at time T i as functions of L(T i ) and T i only, the volatility parameters can be treated as constant for the calculation of the hold values. This is essential for the use of the LR method for the vegas. We would like to have just one indicator function which equals one if the option is not exercised at one of the time points T 0, T 1,..., T k. Define the function H k as: H k (L(T 0 ), L(T 1 ),..., L(T k )) = min(h(l(t 0 ), T 0 ), H(L(T 1 ), T 1 ),..., H(L(T k, T k ))). (3.33) Then we have: The method 1 H k (L(T 0 ),L(T 1 ),...,L(T k ))>0 = 1 H(L(T0 ),T 0 )>01 H(L(T1 ),T 1 )> H(L(Tk,T k ))>0. Suppose we want to estimate the derivative of a cancellable option with respect to σ kl (T j ) by using the LR method. To show how this can be done we need to write the option price as a nested multidimensional integral. Therefore we use the probability density function f (k 1) (x, ˆL E (T k 1 ), σ(t k )) which is the probability density function of log(ˆl E (T k )) given ˆL E (T k 1 ) corresponding to the Euler discretization method. The estimated option price ˆV (0, L(0)) is given by: [ τ ] ˆV (0, L(0)) = P Tm (0)E m g i (ˆL E m ] (T i 1 )) F(0) = P Tm (0) E [1 m τ k g k (ˆL E (T k 1 )) F(0). i=1 (3.34) To save notation and since the derivatives of all terms can be calculated in the same way, we will only consider one term of this sum. Therefore we define: ] ˆV k (0, L(0)) = E [1 m τ k g k (ˆL E (T k 1 )) F(0), so that ˆV (0, L(0)) = P Tm (0) k=1 m ˆV k (0, L(0)). k=1 To take the decision point into account, we use the function H k 2 as defined in equation (3.33). We use the tower property and rewrite the expectation as an integral: [ [ ] ] ˆV k (0, L(0)) = E m... E m... 1 H k 2 ( ˆL E (T 0),..., ˆL E (T k 2 )>0) [ g Em k (ˆL E (T k 1 )) F(T k 1)... 0)] F(T F(0) =... 1 H k 2 (x (1),x (2),...,x (k 1),x )>0 g k (exp(x (k) ))f (k 1) (x (k), exp(x (k 1) ), σ(t (k) k 1 )) R m 1 R m k+1 = R m dx (k)... f (1) (x (2), exp(x (1) ), σ(t 1 ))dx (2) f (0) (x (1), L(0), σ(t 0 ))dx (1)... 1 H k 2 (x (1),x (2),...,x (k 1),x )>0 g k (exp(x (k) )) (k) R m R m 1 R m k+1 ( k ) f (i 1) (x (i), exp(x (i 1) ), σ(t i 1 )) dx (k)... dx (2) dx (1). (3.35) i=1

42 Suppose we want to estimate the derivative with respect to σ kl (T j ). Because only f (j) (x (j+1), exp(x (j) ), σ(t j )) depends on σ kl (T j ), we only have to differentiate this term. Recall that we assumed that we can pretend the volatility input for the calculation of the hold values as constant. We use the LR method, if k > j then: ˆV k (0, L(0)) σ kl (T j ) = =... 1 H k 2(x R m Rm k+1 (1),x (2),...,x (k 1),x (k))>0 g k (exp(x (k) (j) f )) (x (j+1), x (j), σ(t j )) σ kl k f (i 1) (x (i), x (i 1), σ(t i 1 )) dx (k)... dx (1) i=1,i j H k 2 (x (1),x (2),...,x (k 1),x )>0 g k (exp(x (k) )) log(f (j) ) (x (j+1), x (j), σ(t (k) j 1 )) R m R m k+1 σ kl ( k ) f (i 1) (x (i), x (i 1),σ(Ti 1) ) dx (k)... dx (1) i=1 [ = E m 1 τ>k g k (L(T k 1 )) log(f (j) ] ) (L(T j ), L(T j 1 ), σ(t j )) σ kl F(0), (3.36) for k j we have: ˆV k (0, L(0)) = 0. σ kl (T j ) Taking all terms together and multiplying with P Tm (0) gives the vega: ˆV τ σ kl (T j ) (0, L(0)) = P T m (0)E m g i (ˆL E (T i 1 )) log(f (j) ) (log(l(t j )), L(T j 1 ), σ(t j )). σ kl i=j Vega predictor-corrector modification (3.37) Suppose we have L P C (T i ) and that we would like to simulate L P C (T i+1 ). The first part of the PC scheme, is that we have to make an predictor step with the use of the log-euler discretization. Define L E (T i+1 ) as: L E k (T i+1) = L P C In section 3.3.5, we showed that: k [ ] 1 (T i ) exp( 2 σ k(t i+1 ) 2 + µ k (L P k C (T i )) δ i+1 + δ i+1 σ k W (T i )). log(l P k C (T i+1 )) = log(l E k (T i+1)) + δ i+1 2 (µ k(l P C (T i )) µ k (L E (T i+1 ))). Therefore we can denote the vector of PC estimates as a function of the logarithm of the log- Euler estimate L E (T i+1 ), the estimated rates L P C (T i ) at T i and the volatilities σ(t i+1 ). The PC estimate depends on the volatilities since the drift terms depend on them. We denote the function that gives the PC estimates by q (i) : q (i) (log(l E (T i+1 )), L P C (T i ), σ(t i+1 )) = log(l P C (T i+1 )) = log(l E k (T i+1)) + δ i+1 2 (µ k(l P C (T i )) µ k (L E (T i+1 ))). We rewrite the option price as an integral by using the probability density function of the 38

43 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 39 log-euler scheme, the function q and the tower property: ] ˆV k (0, L(0)) = E [1 m τ>k g k (L P C (T k 1 )) L(0) [ ] = E m 1 τ>k g k (L P C (T k 1 )) LP C (T 0 ) = exp(q (0) (x (1), L(0), σ(t 0 ))) f (0) (x (1), L(0), σ(t 0 ))dx (1) R m [ ] = 1 τ>k g k (L P C (T k 1 )) LP C (T 1 ) = exp(q (1) (x (2), exp(q (0) (x (1), L(0), σ(t 0 ))), σ(t 1 ))) = R m R m R m 1 E m f (1) (x (2), exp(q (0) (x (1), L(0), σ(t 0 ))), σ(t 1 ))f (0) (x (1), L(0), σ(t 0 ))dx (2) dx (1)... R m 1 R m k+1 1 H k 2 (q (0) (x (1),L(0),σ(T 0)),...,q (k 1) (x (k),q (k 2) (x (k 1),exp(q (k 3) (...)),σ(t k 2 )),σ(t k 1 )))>0 g k (exp(q (k 1) (x (k), q (k 2) (x (k 1), exp(q (k 3) (...)), σ(t k 2 )), σ(t k 1 )))) ( k ) f (i 1) (x (i), exp(q (i 2) (x (i 1), exp(q (i 3) (...)), σ(t i 2 ))), σ(t i 1 )) i=2 f (0) (x (1), L(0), σ(t 0 ))dx (k)... dx (2) dx (1). (3.38) We perform the following change of variables: x (1) = q (0) (x (1), L(0), σ(t 0 )), x (2) = q (1) (x (2), exp(q (0) (x (1), L(0), σ(t 1 ))), σ(t 0 )),. (3.39) x (k) = q (k 1) (x (k), exp(q (k 2) (x k 1, q(...), σ(t k ))), σ(t k )). Therefore we need also the inverse of q (j 1) in the first argument. We will denote this inverse by q (j 1), 1. The calculation of the inverse function value has to be done for every element seperately (the k-th element of the vector function q (j 1), 1 or q (j 1) is denoted as q (j 1), 1 k or q (j 1) k ), starting from the last element: x (j) m = qm j 1, 1 ( x (j), exp( x (j 1) ), σ(t j 1 )) = x (j) m, x (j) k = q (j 1), 1 k ( x (j), exp( x (j 1) ), σ(t j 1 )) (3.40) ( ) = x (j) k δ m j 1 2 σ exp( x n (j 1) ) k σ n δ n 1 + δ n exp( x n (j 1) ) exp(q (j 1), 1 n ( x (j), exp( x (j 1) ), σ(t j 1 ))) 1 + δ n exp(q n (j 1), 1 ( x, exp( x (j 1) ), σ(t j 1 ))) n=k+1 = q (j 1), 1 k+1 ( x (j), exp( x (j 1) ), σ(t j 1 )) + x (j) k ( δ j 1 2 σ exp( x (j 1) k+1 kσ k+1 δ ) ) k δ k+1 exp( x (j 1) k+1 )(0) exp(q (j 1), 1 k+1 ( x (j), exp( x (j 1) ), σ(t j 1 ))). 1 + δ k+1 exp(q (j 1), 1 k+1 ( x (j), exp( x (j 1) ), σ(t j 1 ))) Note that qj 1, 1 x, the derivative of q j 1, 1 with respect to its first input argument, is an upper triangular matrix with ones at the diagonal. Therefore: q(j 1), 1 = 1 d x (α) = dx (α) α. x After the coordinate transformation, equation (3.38) becomes: ˆV k (0, L(0)) =... 1 H k 2 ( x (1), x (2),..., x )>0 g k (exp( x (k) )) (k) R m R m 1 R m k+1 ( k ) f (i 1) (q i 1, 1 ( x (i), exp( x (i 1) ), σ(t i 1 )), exp( x (i 1) ), σ(t i 1 )) i=2 f (0) (q 0, 1 ( x (1), L(0), σ(t 0 )), L(0), σ(t 0 ))d x (k)... d x (2) d x (1).

44 We would like to differentiate with respect to σ pl (T j ). Since only f (j) (q j, 1 ( x (j+1), exp( x (j) ), σ(t j )), exp( x (j) ), σ(t j )) depends on σ pl (T j ) we only have to differentiate this term. This derivative can be written as: σ pl (T j ) f (j) (q (j), 1 ( x (j+1), exp( x (j) ), σ(t j )), exp( x (j) ), σ(t j )) = (3.41) ( m q 1 i σ pl (T j ) ( x(j+1), exp( x (j) ), σ(t j )) log(f (j) ) (q j, 1 (...), exp( x (j) ), σ(t j )) x i i=1 + log(f (j) ) σ p,l (q (j), 1 (...), exp( x (j) ), σ(t j )) ) f (j) (q (j), 1 ( x (j+1), exp( x (j) ), σ(t j )), exp( x (j) ), σ(t j )). Here log(f (j) ) x i and log(f (j) ) σ p,l are the derivatives of log(f (j) ) with respect to respectively the i-th element of the first input argument and the element on the p-th row and l-th column of the third input argument of f (j). Using this, we can differentiate the right hand-side of equation (3.41). This derivative is zero if j k and otherwise we have: σ pl (T j ) ˆV k (0, L(0)) =... 1 H k 2 ( x (1), x (2),..., x )>0f (0) (q (0), 1 ( x (1), L(0), σ(t (k) 0 )), L(0), σ(t 0 )) R m R m 1 R m k+1 ( m ) g k (exp( x (k) q (j), 1 i )) σ i=1 pl (T j ) ( x(j+1), exp( x (j) ), σ(t j )) log(f (j) ) + log(f (j) ) x i σ pl (T j ) ( k ) f (i 1) (q (i 1), 1 ( x (i), exp( x (i 1) ), σ(t i 1 )), exp( x (i 1) ), σ(t i 1 )) d x (k)... d x (2) d x (1). i=2 Performing the backward coordinate transformation: σ pl (T j ) ˆV k (0, L(0)) =... R m 1 R m R m k+1 1 H k 2 (q(x (1),L(0),σ(T 0)),...,q (k 1) (x (k),q (k 2) (x (k 1),exp(q (k 3) (...)),σ(t k 2 )),σ(t k 1 )))>0 g k (exp(q (k 1) (x (k), q (k 2) (x (k 1), exp(q (k 3) (...)), σ(t k 2 )), σ(t k 1 )))) ( m q (j), 1 i σ i=1 pl (T j ) (q(j) (x (j+1), exp(q (j 1) (...)), σ(t j )), exp(q (j 1) (x (j), exp(q (j 2) (...)), σ(t j 1 ))), σ(t j )) log(f (j) ) x i + log(f (j) ) ( ) ) k f (i 1) (x (i), exp(q (i 2) (x (i 1), exp(q (i 3) (...)), σ(t i 2 )), σ(t i 1 )) σ pl (T j ) i=2 f (0) (x (1), L(0), σ(t 0 ))dx (k)... dx (2) dx (1). Since we need a Monte Carlo estimator we have to rewrite it again as an expectation: ( = E [1 m k>τ g (k) (L P C q (j), 1 i (T k 1 )) σ pl (T j ) (log(lp C (T j )), L P C (T j 1 ), σ(t j )) log(f) x i (log(l E (T j )), L P C (T j 1 ), σ(t j )) + log(f (j) ) σ pl (T j ) (log(le (T j )), L P C (T j 1 ), σ(t j )) )]. 40

45 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 41 Recall that this was the derivative with respect to one term of the sum in equation (3.34). Taking the derivatives of all these terms together: ˆV (0, L(0)) σ pl (T j ) = P Tm (0)E m τ k=j+1 ( g (k) (L P C (T k 1 )) q (j), 1 i σ pl (T j ) (log(lp C (T j )), L P C (T j 1 ), σ(t j )) )] log(f (j) ) (log(l E (T j )), L P C (T j 1 ), σ(t j )) + log(f) x i σ pl (T j ) (log(le (T j )), L P C (T j 1 ), σ(t j )), where the derivative of q (j), 1 m is given by: q (j), 1 m σ pl (T j ) (log(lp C (T j )), L P C (T j 1 ), σ(t j )) = 0. and the derivative of q 1 i for i < m is: q (j), 1 i σ pl (T j ) (log(lp C (T j )), L P C (T j 1 ), σ(t j )) = q (j), 1 i+1 σ pl (T j ) (log(lp C (T j )), L P C (T j 1 ), σ(t j )) [ ( δ j 1 2 δ L P i+1 C i+1 (1 p=i σ i+1,l + 1 p=i+1 σ i,l ) (T j 1) 1 + δ i+1 L P i+1 C (T j 1) LE i+1 (T ) j) 1 + δ i+1 L E i+1 (T j) ( q(j), 1 i+1 σ pl (T j ) (log(lp C (T j )), L P C σi,l σ i+1,l L E i+1 (T j 1 ), σ(t j )) (T )] j) ( 1 + δi+1 L E i+1 (T j) ) Analytic differentiations Some analytical derivatives are needed to apply the LR method. Consider the case in which we model the logarithm of the Libor rates as multivariate normally distributed random variables with mean vector u(l(t j 1 )) and a covariance matrix Σ = σ(t j )σ(t j ). Since this does not depends on the state of the forward Libor rates, the joint probability density is given, by: log(f (j) (X, L(T j 1 ), σ(t j ))) = 1 2 log Σ 1 2 (X u(l(t j 1))) Σ 1 (X u(l(t j 1 ))) 1 2 m log(2π). We need the derivatives log(f (j) ) µ (in case of the delta, vega and gamma), log(f (j) ) Σ (in case of the vega) and 2 log(f (j) ) u i u j for i, j {0, 1,..., m} (in case of the gamma). The first two are given in [6]: log(f (j) ) (X, L(T j 1 ), σ(t j )) = Σ 1 (X u), (3.42) µ log(f (j) ) (X, L(T j 1 ), σ(t j )) = 1 Σ 2 Σ Σ 1 (X u(l(t j 1 )))(X u(l(t j 1 ))) Σ 1. (3.43) To estimate the gamma, also second order derivatives with respect to the mean vector are needed. Since all mixed partial derivatives are continuous, the order of differentiation does not matter. We have: 2 log(f (j) ) u i u j = 2 log(f (j) ) u j u i = u j ( ) ( log(f (j) ) = m ) Σ 1 u i u ik (X k u k ), j k=1

46 and hence: ( ) 2 log(f (j) ) u 2 = Σij 1 2 log(f (j) ) u 2 = Σ 1 ij X. (3.44) The second order derivative of log(f) with respect to the vector L(0) can be calculated by using the fact that only the mean vector depends on L(0), the chain rule gives: 2 log(f (j) ) L i (0)L j (0) = j i k=1 l=1 2 log(f (j) ) u l u k u l u k L i (t n 1 ) L l (t n 1 ) + min(i,j) k=1 log(f (j) ) u k 2 u k L i (t n 1 )L j (t n 1 ). The second order derivatives with respect to the initial bonds can be obtained by using the chain rule and the fact that the initial Libor rates can be written as a function of initial bonds. 42

47 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES Pathwise sensitivity method The LR method was based on an interchange between the integral- and the derivative operator, the pathwise sensitivity method is based on the interchange between the expectation- and the derivative operator. In this way we have to differentiate random variables. To do this, we will introduce the so called pathwise derivatives and we consider the random variables as functions of the Brownian motion and the input parameters. The main disadvantage of this method is that there is a problem if the function g is discontinuous, in this case we cannot differentiate within the expectation The method In the PS method the aim is to calculate a sensitivity of a discounted payoff g(l(t )) with respect to a certain number of parameters θ 1,..., θ m given a certain stochastic model of the form: dl(t) = α(l(t), t, θ)dt + ϑ(l(t), σ, t)dw (t) 0 t T. (3.45) Here L(t) R n,w (t) R d, θ R m with n, d, m N 1. For now we will not distinguish between deltas and vegas, therefore θ can be the vector of initial values L(0) as well as a vector of model parameter values σ. Under certain conditions (see section 2.6.1) on the payoff function and the stochastic process L(t), we have: θ j E m [ g(l(t ))] = E m [ θ j g(l(t ))]. (3.46) Before we continue, we introduce the pathwise derivative : Definition 19. Pathwise derivative. Consider the stochastic process L(t) which is driven by a Brownian motion W (t). Suppose that L(t) is simulated until an arbitrary time t k. If we consider W (t), for 0 t t k as constant input, then we can think of L(t k ) as a function of L(0), t k and θ. The derivative of this function with respect to any of its parameters is called a pathwise derivative and is denoted as: L(t k ) L(0), L(t k ) t k Then we have a matrix of pathwise derivatives: L(t k ) L(0) = and L 1 (t k ) L 1 (0).... L m(t k ) L 1 (0)... L(t k ), θ L 1 (t k ) L m(0). L m(t k ) L m(0) With the notion that L(T ) can be seen as a function, we can apply the chain rule for differentiation to equation (3.46) which gives: E m [ m g(l(t ))] = E m g [ (L(T )) L i(t ) ]. (3.47) θ j L i=1 i θ j Now we wish to use a Monte Carlo method to make an estimation. So we have to generate approximating values for: m i=1. g L i (L(T )) L i(t ) θ j. (3.48) Since the solution of the stochastic differential equation in (3.45) is usually not known, we have to simulate the forward Libor rates with schemes like the Euler- or PC scheme. First we

48 discretize the time interval into a grid consisting of a finite number of time points: 0 = t 0, t 1,..., t N = T. An estimate for L(t k+1 ) will be denoted by ˆL(t k+1 ). Treating the Brownian motion as constant, this estimate can be seen as a function of the vector of previous forward Libor rate values ˆL(t k ), the volatility parameters σ(t n ) and the time point t k : ˆL i (t n+1 ) = F (i) (t n, ˆL(t n ), σ(t n )). (3.49) The derivative of the i-th element of the vector L(T ) with respect to the j-th parameter value, denoted by L i(t ) θ j, has to be estimated numerically for all i, j {1,..., m}. This is done by differentiating the scheme function given in (3.49). We are able to estimate the value of equation (3.48): m i=1 Assume that the derivative g g (L(T )) L i(t ) L i θ j L m i=1 g (ˆL(T )) ˆL i (T ). (3.50) L i θ j is known, then we only have to consider the computation of ˆL i (T ) θ j. To perform this computation, we use the scheme function as was given in equation (3.49). We denote the derivative of F with respect to its second input argument by F L and the derivative with respect to its third input argument by F σ. Now we will distinguish between deltas and vegas, so instead of θ we use L(0) in the delta case and σ j in the vega case. If t n > 0, ˆL(t n ) is in fact a function of L(0) and σ, so we have to use the multidimensional chain rule: Delta case: Vega case: ˆL i (t n+1 ) σ j ˆL i (t n+1 ) L j (0) = = Define the following matrices: F (i) L j (0) (t n, ˆL(t n ), σ) = F (i) σ j (t n, ˆL(t n ), σ) = Θ(t n ) = ˆL(n) σ m k=1 m k=1 F (i) L k (t n, ˆL(t n ), σ) L k(t n ) L j (0). (3.51) F (i) (t n, L ˆL(t n ), σ) L k(t n ) k σ j (0) F (i) + (t n, σ ˆL(t n ), σ). j (3.52) and (t n ) = ˆL i (t n ) L j (0). (3.53) The elements of these matrices are defined by Θ ij (t n ) = ˆL i (t n) σ j and ij (t n ) = ˆL i (t n) L j (0). Then equation (3.51) leads to the following matrix-matrix recursions: (t n+1 ) = D(t n ) (t n ) and Θ(t n+1 ) = D(t n )Θ(t n ) + B(t n ), (3.54) where D and B are matrices given by: D ik (t n ) = F (i) L k (t n, ˆL(t F (i) n ), σ) and B ik (t n ) = (t n, σ ˆL(t n ), σ). (3.55) k For the vega case, if σ is a vector of model parameters, we have Θ(0) = ˆL(0) σ = 0 since the initial values are constant and hence do not depend on the model parameters. For deltas we have that (0) is given by the identity matrix I. Finally we wish to calculate the vector-matrix product g L (ˆL(t N ))Θ(t N ) or g L (ˆL(t N )) (t N ), by one of the recursions in equation (3.54). The total expression for the calculation of the delta is given by: g L (L(t N))Θ(t N ) = g L (L(t N))(D(t N 1 )D(t N 2 )...D(0)). (3.56) In Figure 3.2 is showed how this delta can be calculated in a forward algorithm. 44

49 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 45 Figure 3.2: Data stream corresponding to the forward calculation of g L Θ(t N) from equation (3.56). The product D(t k )D(t k 1 ) D(0) is denoted by Ĉ(t k) and g L (ˆL(t N )) is denoted by ġ. After the Libor rates are simulated from t k to t k+1, the random variables D(t k ) and L(t k+1 ) are known (for example, for t = 0 this is indicated with an arrow from L(0) to L(t 1 ) and D(t 0 )). From D(t k ) and Ĉ(t k 1) we can calculate Ĉ(t k). From L(t k+1 ) the scheme is evolved further. In the end we know the underlying value at maturity, then the payoff derivative ġ can be calculated. Multiplying ġ with Ĉ(t N 1) gives the deltas. The total expression for the calculation of the vega is given by: g L (ˆL(t N ))Θ(t N ) = g L (ˆL(t N ))(B(t N 1 ) + D(t N 1 )B(t N 2 ) + D(t N 1 )D(t N 2 )B(t N 3 )) = g N L (ˆL(t i 1 N )) D(t N j ) B(t N i ). (3.57) i=1 Again we give a data stream corresponding to the forward method in Figure 3.3. j=1

50 Figure 3.3: Data stream corresponding to the forward calculation of the right hand-side of equation (3.57). The product D(t k )D(t k 1 ) D(0) is denoted by Ĉ(t k), g L (ˆL(t N )) is denoted by ġ and A = ( N i 1 ) i=1 j=1 D(t N j) B(t N i ). In comparison with Figure 3.2, we also have to calculate B(t k ). We need all B(t k ) and Ĉ(t k) to calculate the matrix A. Multiplying ġ with A will give your sensitivities Bermudan options and the control variate The PS method can also be used for Bermudan-type options. In section we show that, under some conditions, we still may interchange the expectation and differentiation. This means that the Greeks of the cancellable options can be calculated in a similar way as the Greeks of the European options with the PS method. The difference is that we need the exercise regions and that we have to deal witha sum of different payoffs at different tenor times. The price of a Bermudan option was given in equation (2.35) by: V (0, L(0)) = P Tm (0)E m [ We consider the problem of estimating: θ Em [ τ g k (L(T k 1 ))]. (3.58) k=1 τ g k (L(T k 1 ))]. (3.59) k=1 If the conditions described in section hold, then it is allowed to interchange the differentiation and expectation and we can apply the PS method: E m [ θ τ g k (L(T k 1 ))] = k=1 ( g k ) L(T k 1 ) θ = m k=1 m k=1 E m [1 k τ θ gk (L(T k 1 ))] E m [ 1 k τ g k L(T k 1 ) ] L(T k 1 ). (3.60) θ The random variable L(T k 1 ) can be written out in exactly the same way as was done for the European options. The implementation algorithm is somewhat more complicated 46

51 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 47 since we have multiple payoff dates at different tenor dates. We only give the algorithm for the adjoint version of the pathwise sensitivity method. This algorithm follows just after the section about the adjoint method, but first we want to say more about the matrices of pathwise derivatives: D(t i ) and B(t i ). Application of the control variate If the control variate is used, we need m k=1 E m [ (1 τ>τ 1 τ<τ ) 1 1+min(τ,τ)<k max(τ,τ) g k L(T k 1 ) ] τ L(T k 1 ) θ i=1 V i eur θ (0, L(0)) (3.61) instead of the right hand-side of equation (3.60). Furthermore the sensitivities of the sum of European options should be added with the pathwise estimator (as is done in equation (3.5)) Pathwise derivatives for the Euler- and PC discretization scheme The calculation of the matrices D(t i ) and B(t i ) depends on the discretization scheme. In this subsection we consider the PS method with the Euler scheme and with the predictor-corrector scheme for the lognormal Libor market model as was given in section 2.4. We use the superscript E for the estimates using the Euler discretization and the superscript pc for the predictorcorrector estimates. Euler Scheme First we will consider the matrices D(t k ) under the lognormal forward Libor market model where we used the Euler discretization of the logarithm under the T m forward measure. The Euler scheme, with step size h k and standard normal random variables W (t k ) was given in equation (2.25): ( [ ] 1 L E i (t k+1 ) = L E i (t k ) exp 2 σ i(t k ) 2 + µ i (L E (t k ), σ(t k )) h k + σ i (t k ) W (t k ) ) h k. (3.62) Now we consider the matrix of derivatives. We will need the derivative of the vector function µ with respect to the first input argument (the forward Libor rates) and denote this function by. The element at the i-th column and j-th row is given by: µ L µ i L j (L E (t k ), σ(t k )) = σ i(t k )σ j (t k )δ j (1 + δ j L E j (t k)) 2 and µ i L j (L E (t k ), σ(t k )) = 0 if j < i. Differentiating (3.62) with respect to L i (t k ) gives: ( [ ] 1 Dii E (t k ) = exp 2 σ i(t k ) 2 + µ i (L E (t k ), σ(t k )) h k + σ i (t k ) W (t k ) ) h k. (3.63) Note, by comparing equations (2.25) and (3.63), that these values can be calculated efficiently by reusing the values of L i (t k+1 ): D E ii (t k ) = LE i (t k+1) L E i (t k). Using the chain rule for differentiation, the derivatives D ij (t k ) with j > i are given by: D E ij(t k ) = µ i L j (L E (t k ), σ(t k ))h k L E i (t k+1 ). The derivatives D E ij (t k) with j < i are zero. Hence, the matrix is an upper triangular matrix.

52 Predictor corrector scheme First recall that in the predictor-corrector scheme, which is given in equation (2.27), the forward Libor rates at a next time point are estimated by: L pc i (t k+1) = L pc i ( (t k) exp [ 1 2 σ i ( µi (L pc (t k ), σ(t k )) + µ i (L E (t k+1 ), σ(t k )) )] h k + σ i (t k ) W (t k ) ) h k. 2 (3.64) Where L E (t k+1 ) is simulated from L pc (t k ) by using one Euler step. We wish to calculate the derivative of the scheme function with respect to L pc i (t k): D pc ii (t k+1) = Lpc i (t k+1) L pc i (t k) + h k 2 ( m n=1 ) n (t k+1 ) µ i L pc i (t (L E (t k+1 ), σ(t k )) k) L n L pc i (t k+1), (3.65) L E and the derivative with respect to L pc i (t k) (with j i): ( D pc ij (t k+1) = h m ) k L E n (t k+1 ) µ i 2 L pc n=1 j (t (L E (t k+1 ), σ(t k )) + µ i (L pc (t k ), σ(t k )) L pc i k) L n L (t k+1). n (3.66) The matrix D is an upper triangular matrix Vegas Euler scheme To compute the vega we need the derivatives of the approximated forward Libor rates with respect to volatility parameters: B(t k ) = F σ :,j (t k, L(t k, σ), σ) = F L (t k, L(t k ), σ(t k )) F σ :,j (t k, L(t k 1, σ), σ) + F σ :,j (t k, L(t k ), σ). F The matrices L(t k ) (t k, L(t k ), σ(t k )) were given in the previous subsection. In this subsection we estimate the sensitivities with respect to a column of parameters of the matrix σ(t k ). The j-th column of this matrix is denoted as σ :,j (t k ). The elements of the upper triangular matrix F σ :,j (t k, L(t k ), σ(t k )) are given by: Diagonal elements: with F i σ i,j = ( σ i,j h k µ i σ i,j (L E (t k ), σ(t k ))h k + h k W j (t k ) µ i σ i,j (L E (t k ), σ(t k )) = m n=i+1 Elements in the l-th column and i-th row with l > i: δ n L E n (t k )σ nj 1 + δ n L E n (t k ). ) L E i (t k+1 ), with Otherwise: F i σ l,j = µ i σ l,j (L E (t k ), σ(t k ))h k L E i (t k+1 ), µ i (L E δ l L E l (t k ), σ(t k )) = σ (t k) i,j σ l,j 1 + δ l L E l (t k). F i σ l,j = 0 if l < i. 48

53 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 49 Predictor corrector scheme For the predictor-corrector scheme the calculation of the derivatives is somewhat more work. We also have to differentiate µ(l E (t k+1 ), σ(t k )). We have to take into account that L E (t k+1 ) itself is a function of σ(t k ) and hence we have to use the multidimensional chain rule. We only give the results: Diagonal elements: F i σ i,j = ( σ i,j h k 1 ( µi 2 σ i,j (t k ) (Lpc (t k ), σ(t k )) + µ i (L E (t k ), σ(t k )) σ i,j m µ i + (L E (t k ), σ(t k )) LE j (t k+1 ) h k + h k W j (t k ) L i (t k+1 ). L j j=1 σ i,j Elements in the l-th column and i-th row with l > i: F i = 1 ( µi (L(t k ), σ(t k )) + µ i (L E (t k ), σ(t k ))+ σ l,j 2 σ l,j σ l,j m µ i (L E (t k ), σ(t k )) LE j (t k+1 ) h k L i (t k+1 ). L j j=1 σ l,j Otherwise: F i σ l,j = 0 if l < i.

54 3.5 Adjoint implementation If we are using the pathwise sensitivity method, we have to perform a lot of matrix matrix multiplications to calculate the pathwise derivative L(T ) L(0). At the end we multiplicate this with a vector ( g L (L(T ))). If we change the order of computations the matrix matrix multiplications can be replaced by vector matrix multiplications Vector-matrix multiplications The adjoint method decreases the computational cost of calculating vector-matrix products where the matrix is a product of other matrices. This can be done in the following manner: Suppose we wish to calculate q = x A with A = BC. (3.67) So in fact the aim is to calculate x BC. Define the vector y by y = x B. Note that: Hence we can also obtain q by computing: q = x BC = y C. q = y C with y = x B. (3.68) In fact we compute (x B)C instead of x (BC). This will give a computational benefit since in equation (3.68) only two vector-matrix products have to be calculated, which is cheaper than calculating a vector-matrix product and a matrix-matrix product as in the original problem. The method can be used multiple times for a product of one vector with multiple matrices. Suppose the objective is to calculate the vector-matrix product: Then the problem can be formulated as: q = x (A 1 A 2 A 3 ). (3.69) Calculate q = x B 1 with B 1 = A 1 B 2 and B 2 = A 2 A 3. (3.70) This is equivalent to: Calculate q = y B 2 with y = x A 1 and B 2 = A 2 A 3. (3.71) And this can be rewritten as: Calculate q = y 2 A 3 with y 2 = y 1 A 2 and y 1 = x A 1. (3.72) In other words: Calculate [(x A 1 ) A 2 ] A 3 instead of x (A 1 A 2 A 3 ). (3.73) Hence we end up with calculating three times a vector-matrix product instead of one vector matrix product with two matrix-matrix products as in the original problem. In the same way the cost of calculating q = x A 1 A 2...A n can be reduced by the cost of calculating n vector-matrix products. Applying this idea to calculate a product of the form gives the backward recursion: y(0) = g n L (L(T )) (D(t n j )), j=1 y(k) = y(t k+1 )D(k), y(t n ) = g (L(T )), 0 k < n. L 50

55 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 51 In this way, the computations of the (forward) pathwise sensitivity method for the delta can be improved by the adjoint method. The data stream corresponding to this method is given in Figure 3.4. Figure 3.4: Adjoint counterpart of the data stream in Figure 3.2. Now ĉ(t k ) is defined by the vector g L (ˆL(t n ))D(t k )D(t k 1 ). The difference between Figure 3.2 and this figure is that the arrows on the bottom of the figure are pointing to the left instead of the right. The Greek calculations are reversed in time. Note that this will need vector-matrix multiplications instead of the matrix-matrix multiplications needed in the forward method. Also for the computation of the vega, the adjoint implementation can improve the pathwise sensitivity method. Figure 3.5: Adjoint version corresponding to the forward version of Figure 3.3. The vectors ĉ(t k ) are defined by ġd(t k )D(t k 1 ) D(0). The difference is that some calculations are reversed such that matrix-matrix multiplications are replaced by vector-matrix multiplications Adjoint or forward deltas? In this section we explain why the adjoint method is not always the best choice. This may be the case if multiple payoff functions are considered. In this case the matrix-matrix products

56 of the forward method can be reused for every payoff. To illustrate this idea we will assume that the matrices and vectors do not have a special structure so that the computational cost for calculating a matrix-matrix or vector-matrix product is: with: C mm = m 3 C m + (m 1)m 2 C a C vm = m 2 C m + (m 1)mC a, C mm : cost of calculating matrix-matrix product. C vm : cost of calculating vector-matrix product. C m : cost of one scalar multiplication. C a : cost of one scalar addition. Note that C mm = mc vm. The problem which will arise when considering k payoffs and two time steps: Calculate q 1 = x 1 A,..., q k = a k A with A = BC. (3.74) These vectors can be calculated with the forward method as well as the adjoint method. The difference is given in Table 3.1 Formulations: Forward Adjoint Calculation A = D(t 1 )D(t 0 ) y1 = x 1 D(t 1),..., yk = x k D(t 1) x 1 A,..., x k A q 1 = y 1 D(t 0),..., qk = y k D(t 0) Costs C mm + k C vm = (m + k)c vm 2k C vm Table 3.1: Given the matrices of pathwise derivatives D(t 1 ) = L(t 2) L(t 1 ) and D(t 0) = L(t 1) L(t 0 ) and the vector x j = g j L (L(t 2)), the difference between the calculations with the forward and adjoint method are given in this table. Now we see that the computational cost of the adjoint method is lower than the cost of the forward method if k < m, the costs are equal if k = m and the forward method should be used if k > m. A similar comparison can be done for n time steps, then the objective is to calculate: The calculation for both methods are given in table 3.2. q i = x i (A 1 A 2... A n ). (3.75) Again we see that the computational cost of the adjoint method is lower than the cost of the forward method if k < m and n > 1, the costs are equal if k = m or n = 1 and the forward method should be used if k > m and n > 1. Hence, if one has to calculate sensitivities of a few payoffs with respect to many parameters, one has to use the adjoint method. But if one has to compute sensitivities of many payoff functions with respect to a few parameters, one has to use the forward method. 52

57 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 53 Formulations: Forward Adjoint Calculation A = D(t n 1 )D(t n 2 )... D(t 0 ) x (2) 1 = x 1 D(t n 1),..., x (2) k = x k D(t n 1) x 1 A,..., x k A x(3) 1 = x (2) 1 D(t n 2 ),..., x (3) k = x (2) k D(t n 2 ) q 1. = x(n) 1 D(t 0 ),..., q k = x(n) k D(t 0 ) Costs (n 1) C mm + k C vm k n C vm = ((n 1) m + k)c vm Table 3.2: Given the matrices of pathwise derivatives D(t i ) = L(t i+1) L(t i ) and the vector x = g L (L(t n)), the difference between the calculations with the forward and adjoint method are given in this table Adjoint or forward vegas? Now suppose we wish to approximate the vegas. Therefore we have to calculate a sum of products: g n L (ˆL(T i 1 )) D(t n j ) B(t n i ), i=1 j=1 with the matrices B and D from equation (3.55). For both the adjoint and the forward method we need the matrices D(t i ) and B(t i ), therefore we compare the computational costs of calculating m sensitivities of k payoffs using n time steps if the matrices D(t i ) and B(t i ) are already given for i = 0,..., n 1. So, we neglect the costs of building these matrices since they are for both methods the same.

58 Forward Adjoint for j = 1,..., k: calc. Ĉ 2 = D(t n 1 )D(t n 2 ) ĉ j 1 = g j L (L(T ))D(t n 1) Ĉ 3 = Ĉ2D(t n 3 ) ĉ j 2 = ĉj 1 D(t n 2).. Ĉ n 1 = Ĉn 2D(t 1 ) ĉ j n 1 = ĉj n 2 D(t 1) A = n 1 i=1 ĈiB(t n i 1 ) +B(t n 1 ) g j L (L(T ))B(t n 1)+ tn 1 i=1 ĉj i B t n i 1 g 1 L (L(T ))A,..., g k L (L(T ))A Costs (n 2)C mm + (n 1)C vm k ((n 1)C vm + n(c vm ) + (n 1)C va ) +(n 1)C ma + kc vm Table 3.3: Given the matrices of pathwise derivatives D(t i ) = L(t i+1) L(t i ) and B(t i ) = L(t i+1) θ, the difference between the calculations with the forward and adjoint method for the vega are given in this table. An overview of the computations that should be made so that we can compare the computational costs is given in Table 3.3. C mm, C vm, C m, C a : as defined before. C ma : cost of m m matrix addition. C va : cost of 1 m vector addition. The values of C mm, C vm,... depend on the matrix structures, however, we can already give a qualitative result. Note that if m gets larger, the cost of the forward method will grow compared to the adjoint method by the matrix modifications. On the other hand, if k grows by one, we see that the extra cost for the forward method is equal to C vm while the extra cost for the adjoint method is (2n 1)C vm + (n 1)C va C vm (and equal if n = 1). So for large k (for a large number of payoffs) with a relatively small number of sensitivities m, the forward method should be used and for large m and relatively small k the adjoint method should be used. 54

59 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 55 A visual interpretation in terms of data streams is given in Figure 3.6. Figure 3.6: Data stream of the forward method applied to two payoffs. The product D(t k )D(t k 1 ) D(0) is denoted by Ĉ(t g k), L (ˆL(t n )) is denoted by ġ and A = ( n i 1 ) i=1 j=1 D(t n j) B(t n i ). Dotted lines refers to streams which are needed for the second payoff. The figure is almost the same as Figure 3.3, the only difference is that ġ 2 is added.

60 For the forward method we see that the method can reuse matrix A for a second payoff. The extra work for this payoff consists therefore only of the evaluation of g 2 L (ˆL(t n )) and one vectormatrix product. For the adjoint method things are different. Since the payoff functions get involved in more computations, all these computations have to be repeated for the second payoff function which is illustrated in Figure 3.7. Figure 3.7: Data stream of the adjoint method applied on two payoffs. The vectors ĉ(t k ) are defined by ġd(t k )D(t k 1 ) D(0). Dotted lines refers to streams which are needed for the second payoff. We see that all vector-matrix multiplications in the third row of figure 3.5 have to be repeated with ġ 2 instead of ġ 1. 56

61 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 57 Adjoint implementation of Bermudan delta In the case of Bermudan options, we take the discretization points equal to our tenor points. The implementation is as follows: 1 Estimate the exercise regions with Longstaff Schwartz. for j = 1 to j = M do 2 Generate a path of forward Libor rates until it falls into the exercise region. Store the random variables of this path so that we can calculate payoff and pathwise derivatives. 3 X j = 0. for k = τ to k = 1 do 4 Calculate g k (L(T k 1 )) and set X j = X j + L gk (L(T k 1 )). 5 Calculate D(T k 2 ), the matrix of pathwise derivatives of L(T k 1 ) with respect to L(T 6 k 2 ). Set X j = X j D(T k 2 ). end end 7 The expectation can be estimated: θ Em [ τ g k (L(T k 1 ))] 1 M k=1 M X j. j=1 Algorithm 4: Algorithm to compute the Bermudan delta with the adjoints version of the PS method.

62 3.6 Vibrato Monte Carlo method The disadvantage of the pathwise sensitivity method are the discontinuous payoffs. One way to use the pathwise sensitivity method on these type of payoffs is to combine it with the LR method. The idea, which is presented in [6], is to use the tower law inside the expectation and replace the payoff function by an expectation of this payoff conditioned on the Libor rates one step before T. Instead of the derivatives of g with respect to the Libor rates at the end date, we estimate the derivatives of the conditional expectation with respect to L(T v step ) with use of the LR method. Then we can use the pathwise derivatives Delta L(T vstep) L(0) for the first part. Suppose we wish to approximate E m [ g(ˆl(t n )) L(0)] for a discontinuous payoff function g. We first note that if a discretization scheme like the log-euler discretization would be applied, then log(ˆl(t n )) given ˆL(t n 1 ) is a multivariate normally distributed random variable. Recall that, for the log-euler discretization in section 2.4.4, log(ˆl(t n )) was given by: log(ˆl(t n )) = log(ˆl(t n 1 )) + µ(ˆl(t n 1 ))h n 1 + h n 1 σ(t n 1 ) W (t n 1 ). (3.76) The multivariate normal distribution is determined by a mean vector µ and a covariance matrix Σ: µ = log(ˆl(t n 1 )) + µ(ˆl(t n 1 ))h n 1, Σ = h n 1 σ(t n 1 )σ(t n 1 ). We denote the multivariate normal distribution corresponding to this mean vector and covariance matrix by f(ˆl tn, ˆL(t n 1 )). Since the probability density function is known, the conditional expectation can be written as an integral. We first apply the tower property and then rewrite the conditional expectation inside: V (0, L(0)) = P Tj (0) P Tj (0) = [ g(ˆl(t ] E m n )) L(0) = P Tj (0) P Tj (0) [ [ g(ˆl(t ] ] E m E m n )) ˆL(t n 1 ) L(0) ] E [ R m g(e x )f(x, ˆL(t n 1 ))dx L(0). (3.77) m Pathwise part LRM part 0 t 1 t 2 t n 1 t n = T 0 Figure 3.8: The VMC method uses pathwise derivatives until t n 1. This coincides with the pathwise senstivity method. For the last discretization step the LR ] method is used to estimate a derivative of the conditional expectation E [ g(ˆl(t m n )) ˆL(t n 1 ). The PS method is applied until time point t n 1 as is shwon in Figre 3.8 and the conditional expectation is used in the same way as the payoff function was used for the continuous case as is illustrated in Table

63 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES 59 Pathwise Vibrato last timepoint t n t n 1 calculate / estimate derivative at last timepoint Multiply with pathwise derivatives g L (L(t n)) ˆL(t n) P Tj (0) E[ g(l(t n)) L(t n 1 )] L(t n 1 ) ˆL(t n 1 ) P Tj (0) Table 3.4: Comparison between the PS method and VMC method. Since the conditional expectation E[ g(l(t n )) L(t n 1 )] does typically not have a discontinuity (because the discontinuity in the payoff is smoothened by diffusion), the conditions to interchange the differentiation and expectation will hold. After this interchange we use the LR method to estimate the derivative of the conditional expectation. From now, we use the following notation: J( g, f, y) = g(e x )f(x, y)dx, R m so that we need to compute: ] [ ] [ ] E [J( g, m f) L(0) = E m J( g, f, ˆL(t n 1 )) L(0) = E m f J( g,, ˆL(t n 1 )) L(0). P Tj (0) P Tj (0) P Tj (0) (3.78) The conditional expectation may be calculated analytically since the probability density function is known. However, if multiple Libors are involved this is very unpractical and therefore the LR method will be applied to compute (3.78): P Tj (0) ] E [J( g, m f, ˆL(t n 1 )) L(0) = E [J( g, m f ˆL(t n 1 ), ˆL(t n 1 )) ˆL(t ] n 1 ) P L(0). (3.79) Tj (0) Using the chain rule, the right-hand side can be rewritten as: ] E [J( g, m f, ˆL(t n 1 )) L(0) [ = E m g(e x ) R m ˆL(t n 1 ) {log(f(x, ˆL(t n 1 )))}f(x, ˆL(t n 1 ))dx ˆL(t ] n 1 ) P L(0) Tj (0) [ ] = E m E [ g(e m x ) ˆL(t n 1 ) {log(f(x, ˆL(t n 1 )))} ˆL(t n 1 ) ˆL(t ] n 1 ) P L(0). Tj (0) P Tj (0) The next step is to estimate the conditional expectation. From the chain rule we have: ( ) ˆL(t n 1 ) {log(f(x, ˆL(t log(f) n 1 )))} = (x, µ ˆL(t µ n 1 )) ˆL (ˆL(t n 1 )), (3.80) using Σ/L j (t n 1 ) = 0 for every j. We know how to calculate µ/ ˆL(t n 1 ); the partial derivatives log(f)/µ are given in [6] by: log(f) (x, µ ˆL(t ( ) n 1 )) = Σ 1 x µ(ˆl(t n 1 )). (3.81)

64 Note that if W (t n 1 ) is a multivariate normal distributed random variables, x can be simulated by: ˆx = µ(l(t n 1 )) + σ(t 0 ) W (t n 1 ), Substituting this in the right-hand side of equation (3.81), and using that Σ = h n 1 σ(t 0 )σ(t 0 ) gives: log(f) µ (x, ˆL(t ( n 1 )) Σ 1 µ(ˆl(t n 1 )) + ) h n 1 σ(t 0 ) W (t n 1 ) µ(ˆl(t n 1 )) = = hn 1 σ(t 0 ) σ(t 0 ) 1 σ(t 0 ) W (t n 1 ) h n 1 1 (σ(t 0 ) ) W (t n 1 ). hn 1 Hence we are able to simulate the random variable inside the conditional expectation in (3.80) so that this expectation can be approximated with a mean. Let z 0, z 1,... be realizations of W (t n 1 ): [ E m g(e x ) log(f) ] ˆL(t n 1 ) L(0) ( ) 1 M [ ] g(e µ+σ(t 0) z k ) g(e µ+σ(t 0) z k ) σ(t 0 ) µ z k M L ˆL(t n 1 ). k=1 (3.82) The computation in equation (3.82) is done by using antithetic sampling as is done in [6] where the fact that W (t n 1 ) has the same distribution as W (t n 1 ) is used. The method can be implemented in the adjoint implementation: given a path of forward Libor rates until t n 1, the conditional expectation is a vector and the pathwise derivatives are (using the multidimensional chain rule in matrix notation, see section 3.4) given by: ˆL(t n 1 ) P Tj (0) = L(0) n 1 ˆL(t n i ) P Tj (0) i=1 ˆL(t n i 1 ). Therefore, the values which we have to simulate are given by: n 1 L(0) ˆL(t n i ) P Tj (0) i=1 ˆL(t i 1 ) E[.. ˆL(t n i 1 )]. (3.83) In the adjoint implementation, the value in (3.83) is computed by the following recursion: ( y n = E[.. ˆL(t n 1 )] y i = y ˆL(t ) i+1 ) i+1 ˆL(t. i ) In this way we only have to compute vector-matrix products. To use this method there are two new decisions to make: Which accuracy do we need to estimate the conditional expectation? How large should the last time step be? 60

65 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES Vega A digital vega can be approximated in a similar way as the delta is computed in subsection 3.6.1, therefore we will not repeat all steps, the small computational changes in comparison with the delta will be mentioned. First we rewrite the expectation, with help of the tower property, as an expectation of a conditional expectation given ˆL(t n 1 ) so that we can interchange the differentiation and the expectation: [ ] [ ( ] σ :,j (T 0 ) Em g(ˆl(t n )) L(0) = E m E m [ g(e x ) log(f(x, σ :,j (T 0 ) ˆL(t n 1 )))dx n 1)]) F(t L(0). (3.84) We apply the PS method until time t n 1 : [ ( ] E m E m [ g(e x ) log(f(x, σ :,j (T 0 ) ˆL(t n 1 ))) n 1)]) F(t L(0) = [ [ E m E m g(e x ) log(f) ] σ :,j (T 0 ) (x, ˆL(t n 1 )) n 1)] F(t L(0), (3.85) and rewrite the derivative inside the conditional expectation. Note that the probability function f depends on x and L(t n 1 ), but also on the volatilities. The derivative of f with respect to the volatilities can be split in multiple terms. Every discretization step, in which the Libor rate is simulated for the next time step, gives a contribution to the total derivative of f with respect to the volatilities. This follows from the chain rule for differentiation. Therefore, we have to add the following terms: log(f) σ :,j (T 0 ) ˆL(tn 1 ) + ˆL(t n 1 ) σ :,j (T 0 ) ˆL(t n 2 ) log(f) ˆL(t n 1 ) + ˆL(t n 2 ) σ :,j (T 0 ) ˆL(t n 3 ) ˆL(t n 1 ) log(f) ˆL(t n 2 ) ˆL(t n 1 ) +..., which leads to the final expression for the derivative inside the conditional expectation: log(f) σ :,j = log(f) σ :,j + ˆL(tn 1 ) n 1 i=1 ˆL(t i ) σ :,j ˆL(t i 1 ) ( n i 1 k=1 ˆL(t n k ) ˆL(t n k 1 ) We first have to estimate the following sum of conditional expectations: [ E m g(e x ) log(f) n 1 i=1 σ :,j ˆL(t i ) σ :,j ˆL(tn 1 ) ˆL(t i 1 ) ] F(t n 1) + ( n i 1 k=1 ) log(f) ˆL(t n 1 ). (3.86) ˆL(t ) [ ] n k ) ˆL(t E m g(e x ) log(f) n k 1 ) ˆL(t n 1 ) F(t n 1). Efficient estimators for the conditional expectation of g(e x ) log(f) Σ and g(e x ) log(f) L(t n 1 ) conditioned on F(t n 1 ) are given in [6]. From the estimator of the conditional expectation of g(e x ) log(f) Σ conditioned on F(t n 1 ) an estimator for the conditional expectation of g(e x ) log(f) σ :,j conditioned on F(t n 1 ) is obtained. Denote these estimators by: X cel,mv E m [ g(e x ) log(f) L(t n 1 ) F(t n 1)], X ceσ,mv E m [ g(e x ) log(f) σ :,j F(t n 1 )]. (3.87) The parameter M v represents the number of samples used to approximate the expectation. The final expectation which has to be approximated is given by: n 1 E m [X ceσ,mv ] + i=1 ˆL(t i ) σ :,j ˆL(t i 1 ) ( n i 1 k=1 ˆL(t n k ) ˆL(t n k 1 ) ) E m [ g(e x )X cel,mv ]. (3.88)

66 3.7 Pathwise kernel method Besides the VMC method there is another method which can extend the pathwise sensitivity method to a version which can be used for Greek calculations of options with discontinuous payoffs. We will refer to this method as the pathwise kernel method, presented in [9]. To apply this method, the option payoff function g should be written as a product of an indicator function of a function g s and a continuous function g c : g(l) = 1 gs(l)>0 g c (L) L R m. The functions g s and g c should be differentiable. If the conditions at the end of this section are satisfied then: θ E[ g(l(t )) F(0)] = (3.89) [ ] g c E 1 gs(l(t ))>0 θ (L(T )) F(0) y [ E g s g c (L(T ))1 gs(l(t ))>y θ (L(T )) F(0)]. y=0 The left term at the right hand-side can be estimated with the pathwise sensitivity method. In [9] the following relationship is used for the other term: [ ] [ y E g s g c (L(T ))1 gs(l(t ))>y θ (L(T )) F(0) y=0 = lim E g c (L(T )) g s ɛ 0 θ (L(T ))1 ɛ 2 h(l) ɛ 2 ( gs (L) = lim ɛ 0 E with K g the Gaussian kernel function Delta [ g c (L(T )) g s θ (L(T ))K g L(0) (L(T )) can be calcu- We know how to calculate L(T ) g L(0) and the derivatives lated as in the pathwise sensitivity method: g s (L(T )) = g θ s L(0) = g c L(T ) L(T ) L(0) ɛ c L(0) (L(T )) and gs g c L(0) = )], (3.90) g c L(T ) L(T ) L(0). (3.91) If we combine the equations (3.91), (3.90) and (3.89), and if we choose the bandwith ˆɛ small, we get: [( ( ) ) ] L(0) E[ g(l(t )) g c gs (L) gc L(T ) F(0)] E 1 L(T ) gs(l(t ))>y + g c (L(T ))K g. ˆɛ L(T ) L(0) (3.92) The calculations are almost identical to the pathwise sensitivity method. The difference is that we have to replace g L(T ) by: ( ( ) ) g c gs (L) gc 1 L(T ) gs(l(t ))>y + g c (L(T ))K g. ɛ L(T ) The following conditions need to be satisfied, in order for the kernel method to produce accurate results: 1. E [ g c (L(T )) 2] < and E [ g s (L(T )) 2] <. 2. For all θ, g c (L(T )) and g s (L(T )) are differentiable with respect to θ with probability There exist random variables K gc and K gs with finite second moments such that: g c (L(T, θ + θ)) g c (L(T, θ)) K gc θ g s (L(T, θ + θ)) g s (L(T, θ)) K gs θ. ] 62

67 CHAPTER 3. ESTIMATING FIRST ORDER SENSITIVITIES Pathwise digitals The last method which we consider is given in [12] at page 1039 and Here we interchange the derivative and expectation operators as is done with the PS method, now the discontuous function is rewritten in a specific form such that its derivative can be considered as a Dirac delta function. Consider the digital payoff: g dig (L(T 0 ) = T 0 1 L1 (T 0 )>K. (3.93) We wish to estimate the derivative with respect to the first initial Libor of the expectation of g(l T0 ): [ ] L 1 (0) Em g dig (L(T 0 )) L(0). (3.94) The PS method for digital options may be applied after some modifications. In [12] page 1039 and 1040, Piterbarg interchanges the expectation and differentiation operators and uses a Dirac delta function as derivative of the digital payoff functions: m i=1 The first term of this sum is given by: E m [ Li (T 0 ) L 1 (0) ] L i (T ) g dig(l(t 0 )). 0 L i (T 0 ) L 1 (0) L i (T ) g dig(l(t 0 )) 0 ( m ) L 1 (T 0 ) =... (1 + δ i x i ) i=2 L 1 (0) 1 x1 >Kf(x 1,.., x m )T 0 dx 1... dx m L1 =x 1 x 1 ( m ) L 1 (T 0 ) =... (1 + δ i x i (T 0 )) i=2 L 1 (0) δ(x 1 K)f(x 1,.., x m )T 0 dx 1... dx m L1 =x 1 ( m ) L 1 (T 0 ) =... (1 + δ i x i (T 0 )) L 1 (0).., x m )T 0 dx 2... dx m. (3.95) L1 =Kf(K, i=2 The conditional probability density function of L 2 (T 0 ),..., L m (T 0 ) given that L 1 (T 0 ) = K is given by the joint probability density function of all Libors divided by the probability density function of L 1 (T 0 ): f(l 2 (T 0 ),..., L m (T 0 ) L 1 (T 0 ) = K) = f(l 1(T 0 ), L 2 (T 0 ),..., L m (T 0 )). f 1 (K) Where f 1 is the probability density function of L 1 (T 0 ) and f(l 2 (T 0 ),..., L m (T 0 ) L 1 (T 0 ) = K) is the conditional probability density function. Using this in equation (3.95) gives: L i (T 0 ) L 1 (0) L i (T ) g dig(l(t 0 )) = 0 ( m )... (1 + δ i x i (T 0 )) i=2 L 1 (T 0 ) L 1 (0) [( m ) = E (1 + δ i x i (T 0 )) i=2 1 (K)f(x 2,.., x m L 1 (T 0 ) = K)T 0 dx 2... dx m L1 =Kf L 1 (T 0 ) L 1 (0) ] 1 (K)T 0 L 1 (T 0 ) = K. (3.96) L1 =Kf Hence the last step is to modify the Monte Carlo engine such that it simulates Libor rates L 2 (T 0 ),..., L m (T 0 ) given L 1 (T 0 ) = K.

68 Remark 1. For the simple payoff we considered this was a satisfactory alternative for the VMC method. However, applying the idea from this section to a more complicated payoff can make things more difficult. Since the problem should be revised for every payoff, this method might be unpractical in comparison with the VMC method for which only the payoff has to be given. 64

69 Chapter 4 Estimating gamma Besides the first order Greeks of the previous chapter, we also describe three methods to calculate the European gamma, i.e. the second order derivative with respect to the initial values. These methods are the PS method, the LR method and the VMC method. Since the VMC method is a mix of the PS method and the LR method, we divided the chapter in two parts: the pathwise part and the LR part. 4.1 Pathwise Part The Greek considered here is the gamma: 2 V 2 (0, L(0)) = (L(0)) 2 V L 1 (0, L(0))... 2 V (0) L 1 (0)L m(0) (0, L(0)) 2 V L m(0)l 1 (0) (0, L(0))... 2 V L m(0) (0, L(0)) 2 We start with the description of the PS method, from which a part also will be used for the VMC method. To explain it for both methods at once, we define T and the function η(l( T )) as: { η(l( T g(l( )) = T )) if the PS method is used, E[X ce L( T )] if the VMC method is used, and T = { T T v step if the PS method is used, if the VMC method is used. The random variable X ce is a random variable which will be specified later. From now we continue with η and assume that the matrix 2 η (L( T ))) of second order derivatives and the L 2 vector η L (L( T )) are known. These derivatives are referred to as the payoff derivatives. For the PS method we assume that the derivatives can be interchanged with the expectation: [ ] [ ] L i (0)L j (0) E η(l( T )) L(0) = E L i (0)L j (0) η(l( T )) L(0). We apply the multi-dimensional chain rule for the derivative with respect to L j (0): [ ( m... = E L i (0) k=1 η (L k ( L T )) L k( T ) ] [ ) k L j (0) L(0) m = E k=1 ( L i (0). η (L k ( L T )) L k( T ) k L j (0) ) L(0) ]. 65

70 The product rule for differentiation gives: [ m ( η... = E (L k ( L T 2 L k ( )) T ) ) + k L i (0)L j (0) k=1 m ( ) η (L k ( L i (0) L T Lk ( )) T ] ) k L j (0) L(0). Using the multidimensional product rule again: [ m ( η 2 L k (... = E T ) ( ) m m 2 η L l ( k=1 L k ( T + T ) ) L k ( T ] ) ) L i (0)L j (0) k=1 l=1 L l ( T )L k ( T ) L i (0) L j (0) L(0). (4.1) This expectation can be estimated as a mean. For the adjoint implementation, we first consider the double sum and note that this is equal to the element corresponding to row number i and column number j of the matrix ( L( T ) L(0) k=1 ) 2 η L( T )L( T ) L( T ) L(0). (4.2) Hence, we calculate the second term of equation (4.1) for all gammas at once with the above matrix products. Since these are two matrix-matrix products it does not make sense to apply the adjoint method (this method is only helpful if we have to calculate a matrix-vector product in the end). However, for the first sum in equation (4.1) an adjoint implementation can be advantageous. Here, the second order derivatives 2 L k ( T ) L i (0)L j (0) chain and product rules for differentiation. Consider the calculation of rule gives: ( ) ( Lk (t n ) = m L k (t n ) L i (0) L j (0) L i (0) L α=1 α (t n 1 ) L(tn 1 ) Applying the product rule, this becomes ( m ) L k (t n ) 2 L α (t n 1 ) + L α (t n 1 ) L j (0)L i (0) α=1 L(tn 1 ) ( m L k (t n ) L i (0) L α (t n 1 ) and using the chain rule for the second time gives: ( m ) L k (t n ) 2 L α (t n 1 ) L α=1 α (t n 1 ) + L(tn 1 ) L j (0)L i (0) ( m m 2 ) L k (t n ) + L β (t n 1 ) L β (t n 1 )L α (t n 1 ) L i (0) α=1 β=1 α=1 L(tn 1 ) can be calculated with help of the 2 L k (t n) L i (0)L j (0), the chain ) L α (t n 1 ). L j (0) L(tn 1 ) ) L α (t n 1 ). L j (0) L α (t n 1 ), L j (0) These terms can be rewritten in matrix-vector notation. For this, we define the matrices Q k (t z ) and the vector v ij for z = 0,..., n 1 and i, j, k = 1,..., m as Q k (t z ) = 2 L k (t z+1 ) 2 L(t z ) and v ij (t z ) = L(t z )L(t z ) L j (0)L i (0). L(tz) Recall that the matrices D(t z ) and (t z ) were defined as: D(t z ) = L(t z+1) and (t z ) = L(t z) L(t z ) L(0). L(tz) Hence the double sum can be written as a dot product of a vector with the result of a matrixvector product yielding 66

71 CHAPTER 4. ESTIMATING GAMMA 67 v ij (t n ) = D k,: (t n 1 )v ij (t n 1 ) + ( :i (t n 1 )) Q k (t n 1 ) :j (t n 1 ). (4.3) To apply the adjoint method we rewrite this as: V j (t n ) = D(t n 1 )V j (t n 1 ) + R j (t n 1 ), (4.4) where V i (t z ) is a matrix (because L(t z ) and L(0) are vectors) which is defined by V j (t z ) = 2 L(t z ) L(0)L i (0). R j is defined so that the element at row k and column i is given by: (R j ) ki (t z ) = (t z ) :i Q k (t z ) :j (t z ). Suppose the grid is chosen so that T = t n, then: V j (t n ) = D(t n 1 )V j (t n 1 ) + R j (t n 1 ) = D(t n 1 )D(t n 2 )V j (t n 2 ) + D(t n 1 )R j (t n 2 ) + R j (t n 1 ) = n β=1 ( β 1 α=1 D(t n α ) However, we need for each column j the value of: ( η ) L( T ) n β=1 ( β 1 α=1 ) R j (t n β ). D(t n α ) ) R j (t n β ). Element number i of this vector corresponds with the sensitivity with respect to L i (0) and L j (0). ( Here, the adjoint implementation can be used since η/l( T )) is a vector which has to be multiplied with products of matrices. Furthermore, note that the products of D have to be calculated only once and can be used for all columns. It may not be clear how the gamma has to be calculated. Therefore we give an algorithm to compute the gamma: 1 Set Γ = 0 which is a matrix of zeros. for i = 0 to i = M do 2 Simulate the forward Libor rates under the terminal measure until the expiry date of the option. Save the random variables that are used. 3 Estimate (in case of VMC version) or calculate (in case of PS method) V = set α = n (number of time steps). for α = n to α = 1 do 4 Calculate G :,j = V R j (t α ) for all j and update α with α = α 1. 5 V = V D(t n α ). 6 G :,j = G :,j + V R j (t n α ), for all j. end 7 Γ = Γ + 1 mg, erase all variables except isample and Γ. end Algorithm 5: The VMC algorithm for the gamma. ( ) η and L( T ) At the end, the matrix Γ is an estimator for 2 L(0) 2 E[ η(l( T ))]. With the help of the derivatives L j (0) P Ti (0) for i {0, 1,..., m} and j {1,..., m} one can get the derivatives of the option price with respect to the initial bond prices.

72 4.2 VMC and LR gamma If the option payoff is not twice differentiable, the PS method cannot be used directly. The VMC method for gammas was not discussed in [6], but it can be done in the similar manner, we use the same techniques only now with second order derivatives. To simplify notation, assume that we are interested in the second order derivative with respect to L i (0) and L j (0). The LR method is given by the case T = 0. 2 [ ] L i (0)L j (0) E g(l(t )) L(0) [ 2 [ ] = E L i (0)L j (0) E g(l(t )) )] L( T L(0) [ 2 ] = E g(x)f(x, L( L i (0)L j (0) T ))dx L(0). R m Note that the derivative m L i (0)L j (0) = k=1 L i (0)L j (0) can be written as: 2 L k ( T ) L i (0)L j (0) L k ( T ) + m m k=1 α=1 L α ( T ) L k ( T ) L i (0) L j (0) This is the same as is done to get equation (4.1) and hence the calculations of L k ( T ) L j (0), L α ( T ) L i (0) and 2 L k ( T ) L i (0)L j (0) 2 L α ( T )L k ( T ). belong to the pathwise part in section 4.1. Note that the random variable X ce from section 4.1 equals E [ g(l(t )) L(0)]. For the LR part it remains to estimate: L k ( T ) E[ g(l(t )) L( T )] and 2 L α ( T )L k ( T ) E[ g(l(t )) L( T )]. (4.5) The estimation of the first order derivative in equation (4.5) is already handled by the vibrato delta, for the second order derivative in equation (4.5) we can apply the same technique as is used for the vibrato delta. First we assume that the distribution function for L(T ) given L( T ) is known and given by f(l(t ), L( T )), so that the expectation can be written as an integral: 2 [ g(x) R m L α ( T )L k ( T ) f(x, L( T log(f(x, L( ))dx = g(x) T ] ))) R m L α ( T ) L k ( T f(x, L( T )) dx. ) Using the product rule in combination with the equality gives: f L α( T ) = log(f(x,s(t 2))) L α( T f(x, L( T )), ) [ 2 log(f(x, L( = g(x) T ] [ ))) R m L k ( T )L α ( T ) f(x, L( T log(f(x, L( )) + g(x) T ))) L k ( T ) which can be rewritten as a sum of expectations: E [ g(x) 2 log(f(x, L( T ))) L k ( T )L α ( T ) ] [ L( T ) + E g(x) log(f(x, L( T ))) L k ( T ) log(f(x, L( T ] ))) L α ( T f(x, L( T )) dx, ) log(f(x, L( T ] ))) L α ( T ) L( T ). 68

73 Chapter 5 Results European options Because we already have analytic solutions of the caplet and digital Greeks, we can test our Monte Carlo methods for these Greek estimations. This test will give us an insight in the biases and variances of the estimators. In this chapter, we use the Euler discretization to simulate the forward Libor rates. In section 5.5 we will compare the results obtained by using the Euler discretization with results obtained by using the PC discretization. Furthermore we can check which methods will be desirable for the Bermudan style options. 5.1 European test setting To test the methods for European options, the following data is used for respectively the strike, initial zero coupon bonds (from which the initial forward Libor rates can be calculated) and volatility matrix. K = , P T0 (0) P T1 (0) P T2 (0) P T3 (0) P T4 (0) = , σ = Furthermore we use δ i = (T i T i 1 ) = 1 for all i. 5.2 PS Greeks of a caplet To test the PS method, we consider the price of a caplet and digital at time t = 0 corresponding to the Libor l(t 0, T 0, T 1 ) with t < T 0 < T 1. Recall from section that the payoffs of these options were defined by equations (2.5) and (2.7): g cap = N(L 1 (T 0 ) K) + and g dig (L(T 0 ), K) = N1 L1 (T 0 )>K(T 1 T 0 ). Here N is a notional amount of money and without loss of generality we set it to one. The option pays at T 1, but the payoff is already known at T 0 and thus we only have to simulate until the first tenor date. We know the exact price and exact sensitivities of the caplet under the T 1 measure (see section 2.7.1). To test the PS method, we also consider the forward Libor rates at T 0 corresponding to all tenor intervals until T m. Now we have to apply the PS method under the T m forward measure. Our goal is to test the methods on a payoff which depends on several Libors. The caplet price can be written as an expectation under the T m measure (see section 2.5.1): m V cap (0, L(0)) = P Tm (0)E Tm ((L 1 (T 0 ) K) + (1 + δ i L i (T 0 ))). (5.1) 69 i=2

74 The sensitivities with respect to the zero coupon bonds are given by: ( V cap m ) (0, L(0)) = P Tm (0)E Tm ((L 1 (T 0 ) K) + (1 + δ i L i (T 0 ))). (5.2) P Tj (0) P Tj (0) Actually it does not really matter whether we estimate the sensitivities with respect to the initial Libor rates or bond prices, since the initial Libor rates can be written as a function of bond prices. The rate L k (0) can be written as a function of the two bonds, P Tk 1 (0) and P Tk (0). The forward Libor rate L 1 (T 0 ), simulated under the T m measure, depends on P Tj (0) for all 0 j m since L 1 (T 0 ) is affected by all other forward Libor rates L k (0) due to the drift term of L 1 (t), see (2.22). So, under the T m forward measure we have to simulate values which depend on m Libor rates. Now we can test the PS method for the caplet using m forward Libor rates Result caplet price i=2 Figure 5.1: Logarithm of the square root of the mean square error (for MSE see (2.8.3)) of the Monte Carlo price estimator plotted against the logarithm of the sample size M for two discretization grids, n refers to the number of time steps for each time unit. The standard settings of section 5.1 are used. The implementation is tested for the caplet price under the T 4 measure in Figure 5.1. For the coarse grid, using one time step for each year (solid line), some bias comes in which is caused by the discretization. The figure shows this because the square root of the MSE does not decrease linearly in the sample size. However, this error disappears when choosing a finer grid (dotted line), which is exactly what we expect. Since the square root of the mean square error is decreasing linearly as long as the grid is fine enough, and since the estimated standard deviation and bias of our best estimated option price is bounded by (because MSE is an upper bound for both), we may assume that the method is implemented correctly. Result caplet delta The real objective is to estimate the Greeks. This is done with the PS method for this same caplet option. First we estimate the deltas for a certain setting using both the PS method and the bump method. The errors are given in Table 5.1. In Table 5.1 we see that both the PS method as well as the bump method give the desired estimates of the deltas. It may be remarkable that the difference between the bump and pathwise deltas is small in comparison with the difference of the estimated deltas and the true ones. However, this is not only due to the bias, it is also because the same random numbers are used 70

75 CHAPTER 5. RESULTS EUROPEAN OPTIONS 71 Error of caplet delta CPU time Bump(10 10 ) PS Table 5.1: Errors of the bump and pathwise approximations of the deltas using T 0 = 1, a sample size of 10 6 and 5 time steps. A bump size of is used. The rest of the setting can be found in section 5.1. For both methods the same seed is used. The CPU time is measured in seconds. The analytic solutions are: Vcap P T0 = , V cap P T1 (0) = and Vcap P Tj (0) = 0 for j > 1. (which makes the comparison more fair). This does not mean that the bump method is as good as the PS method, since the PS method is about four times faster for this estimation. Figure 5.2: Validation of PS method for the caplet. The three lines marked with circles (upper three) correspond to Vcap P T0 (0)(0, L(0)) and the lines marked with triangles (lower three) correspond to Vcap P T3 (0)(0, L(0)). A solid, dashed or dotted line refers to the use of respectively 1, 5 or 10 time steps for each year. In Figure 5.2, the MSE of two with the PS method approximated sensitivities are plotted against the sample size. For (0, L(0)), we see that the mean squared error is reduced with the V cap P T0 (0) sample size. This implies that the error is dominated by the variance of the estimator. It is therefore not surprising to see the same magnitude of error for a larger number of time steps (see dashed and dotted lines). For V cap P T3 (0)(0, L(0)) we see a mean squared error smaller than the mean squared error of V cap P T0 (0)(0, L(0)). However, the mean squared error for the line corresponding to 1 time step does not decrease after 10 5 samples. This implies that the small number of steps introduces a bias and we indeed see a smaller error if a larger number of time steps is chosen. The results for V cap Vcap P T1 (0)(0, L(0)) were similar to P T0 (0, L(0)) and the results for V cap P T4 (0)(0, L(0)) were similar to Vcap Result caplet vega P T3 (0)(0, L(0)). V cap P T2 (0)(0, L(0)), Besides the deltas, also the vegas are considered. The results were similar to the results of the delta:

76 Error of caplet vega ν 11 ν 21 ν 31 ν 41 CPU time Bump PS Table 5.2: Errors of approximations of the vegas done with the PS method and the bump method using T 0 = 1, bump size 10 10, a sample size of 10 6 and 5 time steps. For both methods the same seed is used. The CPU time is measured in seconds. The only analytic nonzero value is V cap σ 11 = Overall, these results for the delta and vega confirm that the PS method is accurate for this type of payoff. 5.3 Results digital option Another option which can be priced analytically is the digital option which depends on one Libor rate. We consider a digital option with the following payoff: g dig (L(T 0 )) = 1 L1 (T 0 )>K(T 1 T 0 ). (5.3) Exactly the same settings are used as in the caplet example, see section 5.1. The digital option can be priced under the T m forward method in the same way as the caplet option. Hence the option price can be written as: ] m V dig (0, L(0)) = P Tm(0)E [1 Tm L1 (T 0 )>K(T 1 T 0 ) (1 + δ i L i (T 0 )). (5.4) The PS method from section 3.4 does not work, because the payoff does not satisfy the conditions (section 2.6.1) to interchange expectation and differentiation, which is caused by the discontinuity. However, also the finite difference or bump method gives difficulties in the digital case. This comes from the fact that the bump size cannot be chosen as small as possible anymore since the variance of the estimator will explode for a very small bump size (see equation 3.3). There will be some optimal value for this bump size. To visualize this, we plotted the square root of the mean square error of the estimator as a function of the bump size in Figure 5.3. i=2 Figure 5.3: Interpolated results for different bump sizes (for MSE see (2.8.3)). In the figure we see that the optimal bump size lies around 10 3 for this option in the standard setting. However, the optimal value will depend on the model, the tenor-times and the option payoffs. The analytic solution for the delta was used to make Figure 5.3, but in practice this analytic solution will not be available. In those cases we don t know the bias of the estimator and hence 72

77 CHAPTER 5. RESULTS EUROPEAN OPTIONS 73 it is much harder to determine the optimal bump size. In practice, the optimal bump size has then to be guessed. Therefore we will consider the VMC method (which is introduced in [6]) which can deal with digital payoffs. This method is a version of the PS method which is combined with the LR method Application of the VMC method Before we can use the VMC method we have to know how we choose the size of the last time step and the sample size to estimate the conditional expectation inside the outer expectation. Estimating the conditional expectation For now we will only consider the conditional expectation used in the VMC method. In [6] an efficient estimator, which we will denote by X ce, is found with the help of variance reduction techniques so that ] [ ] E [g(e m x ) ˆL(t n 1 ) {log(f(x, ˆL(t n 1 )))} ˆL(t n 1 ) = E m X ce ˆL(tn 1 ). (5.5) In the implementation we first have to simulate a path of forward Libor rates until L(t n 1 ). A conditional expectation has to be estimated for every simulated path. The conditional expectation is estimated by the mean of X ce : E m [X ce ˆL(t n 1 )] 1 m ce x ce,i, (5.6) m ce for i {1,..., m ce }, x ce,i is a realization of the random variable X ce. In [6] it is stated that for differentiable payoffs only one realization is required, but if the evaluation of the payoff function is cheap, it may be more efficient to use a larger sample size m ce. In this thesis we have chosen for another implementation, which comes from the notion that the variance of the random variable inside the conditional expectation depends strongly on the state L(t n 1 ). First this dependency is shown by scattering the estimated variance against L 1 (t n 1 ). The random variable X ce also depends on the other Libors, but since the discontinuity is in the first forward Libor rate, we expect to see a high variance in the neighborhood of the strike. In Figure 5.4 we see that the variance of X ce strongly depends on the state at L(T 0 v step ). So, if one estimates the conditional expectation using a constant sample size, independent of ˆL(t n 1 ), the standard error of the estimated conditional expectation will be large if ˆL 1 (t n 1 ) falls in the neighborhood of the strike or the number of samples is unnecessary large if ˆL 1 (t n 1 ) does not fall in the neighborhood of the strike. The idea is to choose m ce so that the standard error of the estimated conditional expectation is smaller than a certain constant tolerance level. Therefore, we first choose a tolerance level tol, then we estimate the variance of the random variable X ce after simulating a path until T 0 v step. We denote the estimated variance by varx. The variance of the expression in equation (5.6) is given by ( varx m ce ). The last step is to choose m ce such that Vibrato parameter i=1 varx m ce < tol. Another degree of freedom is the choice of v step, the size of the last time step which is used in the VMC method. What happens if this parameter changes and what is its optimal value? In general, the error which we make by estimating a Greek comes from the variance and the bias. Let s first consider the variance with the help of some figures. On the left side of Figure 5.5 one can see that by using a larger v step the peak of the blue figure is more smeared over the different states L(t n 1 ). This is not surprising and caused by diffusion.

78 Figure 5.4: Standard error of random variable X ce given a certain state L(T 0 v step ). The number of forward Libors used in this simulation is m = 4 and the size of the last time step is equal to v step = 0.2. The paths of forward Libor rates are simulated until T 0 v step under the T 4 measure. We estimated the variances of X ce given L(T 0 v step ). We plotted the square root of the maximum of these variances against L 1 (T 0 v step ). Figure 5.5: left: Same figure as figure (5.4) but now along with the case v step = 0.5 (in red). right: plot of the standard error of a delta against v step for two tolerance levels. The influence of this effect on the standard error of our final Greek estimation can be seen in the right figure. Here we plotted the variance for different vibrato parameters. From this last figure, on the right side of Figure 5.5, we see that the standard error explodes when choosing v step very small. The smallest v step in the figure was equal to 0.01, for smaller v step the standard error may be much bigger. Hence, for the standard error in this model setting it is optimal to choose v step as large as possible. However, as mentioned before, the total error also includes a bias. Since this bias will be larger by choosing a larger v step, it is possible that the largest possible v step is not the optimal choice. For the delta of the digital option this bias seems to be very small in comparison with the variance. For the sample sizes used it was optimal to choose the vibrato parameter as large as possible. So the vibrato parameter was taken equal to T 0 and in fact we are only using the LR method with the multivariate normal distribution function (see section 3.6). 74

79 CHAPTER 5. RESULTS EUROPEAN OPTIONS Results digital delta The first step to get an overview of the digital delta estimation is given in Table 5.3, where some estimated deltas are given. As expected, the errors of the approximations are larger than the errors of the caplet deltas. For the bump method we see that the deltas with respect to P T0 (0) and P T1 (0) are relatively poor, while the other deltas estimated with the bump method are quite satisfactory. For the LR (see remark 2 on page 77) and VMC method it is vice versa. Because of the large error of the deltas with respect to P T0 (0), P T1 (0) and P T2 (0), a more accurate vibrato approximation is generated with smaller step size and tolerance level and larger sample size (this improved version is marked with ). The result is there, but on the other side it costs computing time. Also results of the pathwise method, which is explained in section 3.8, are added. This method has some disadvantages, but the result for this option is satisfactory. Furthermore it has the advantage that, besides the sample size, no additional parameters have to be chosen. Error of digital delta CPU time samples Bump VMC VMC LR (19) 10 6 PS Kernel (11) 10 6 Table 5.3: Errors of the approximations of the deltas estimated with various methods using T 0 = 1 and 5 time steps (except for the LR method, here 1 time step is used, therefore the CPU time is smaller and placed between brackets). The CPU time is measured in seconds. For VMC and VMC, respectively a tolerance level of 10 and 5 was used with v step = 0.5 and v step = 0.3. The rest of the settings can be viewed in section 5.1. The analytical nonzero values V are: dig P T0 (0) = and V dig P T1 (0) = The rest of this subsection is added to compare the methods for different parameters and visualize the behaviour of the delta. First we plot the sensitivity of the digital with respect to P T0 (0) against the value P T0 (0) itself. This sensitivity is estimated with the LR method, the bump method and the VMC method. Five time steps are used for the discretization grid involving the last two methods. We see that the LR method has the best results in least computing time. Figure 5.6: Left: LR method. Right: Bump method with bump parameters 10 3 and All estimators were computed with a sample size of As we can see in Figure 5.6, the LR estimator performs very well in comparison with the bump method. The bump method is applied for two different bump sizes: the estimated optimal bump size 10 3 (see Figure 5.3) and the bump size This last extra bump parameter is chosen

80 so that we can see the effect of a too small bump size. The LR method needs less time for more accurate results (even if we also take one time step in the bump method). Similar results are plotted for the VMC method in Figure 5.7. The computing time was, due to the variable sample size for the estimator of the conditional expectation, between 12 and 20 seconds for each approximation. We see that the VMC method performs better than the bump method. The difference in performance with the LR method is relatively small, there only are visible differences on the fourth point from left in Figure 5.7. Also the Kernel method performs well. Figure 5.7: Left: Kernel method with sample size Right: VMC method with v step = 0.2 and a sample size of A tolerance level of 10 was used for the conditional expectation. This tolerance level partly determines the sample size used for the last timestep. To get a closer view of the method, we have to zoom in. Instead of doing this, we plot the same figure for different values of σ 11. Now the delta lies inside the interval [7, 10] and in the meanwhile we see the behaviour of the delta with a changing volatility parameter. Figure 5.8: Left: LR method. Right: Bump method with bump parameter 10 3 and All estimators were computed with a sample size of With this view in Figure 5.8 we see a better difference between the methods than in the previous figures. For the bump method with bump size 10 3 it looks like there is some bias (probably due to a too large bump size), and for the bump method with bump size 10 4 we see a very large variance in comparison with the other methods. A small remark is that the optimal bump size may depend on the volatility parameter and therefore the chosen values are not optimal. 76

81 CHAPTER 5. RESULTS EUROPEAN OPTIONS 77 In Figure 5.9 we see the result for the VMC method. The method seems better than the bump method and LR method. However, the VMC method is much slower than LR. Recall that the VMC methods use two sample sizes (from which one is random), and therefore the comparison is not completely fair. A disadvantage of the LR method is that it gets worser if the bias of the discretization scheme gets larger (this cannot be solved by choosing a smaller discretization grid). Figure 5.9: VMC method with v step = 0.2 and a sample size of A tolerance level of 10 was used for the conditional expectation. Remark 2. For the LR method, the probability density function of the discretized approximation is used instead of the true probability density function (which is not known for the system of Libor rates under the forward measure). Hence the probability density function which is used is the lognormal distribution. The method coincides with the VMC method in section 3.6 if v step is taken equal to the expiry date of the option. Remark 3. The bias of the LR method may be small because of the simplicity of the lognormal Libor market model as used in this chapter. Application of a stochastic volatility model or a time dependent volatility function could negatively influence the performance of this method. In this case the VMC method may become a better option. Remark 4. With a small vibrato step, the approximation of the conditional expectation (and its variance) can take significant amount of computing time. It can be interesting to look for a Monte Carlo method which samples more in important regions, but does not need to approximate the variance as in this section since this takes time. Note that the VMC method in this form, does not need derivatives of the payoff function of the option.

82 5.3.3 Results digital vega We start this result section with estimates shown in Table 5.4. Recall that we use the Euler discretization scheme in this section. As expected, the bump method has a worse accuracy relative to its computing time. The VMC method as well as the LR method gives better vegas, and especially the LR method has a small computing time. Error of digital vega ν 11 ν 21 ν 31 ν 41 CPU time Bump VMC LR Table 5.4: Errors of vegas approximations of the digital option for three different methods. For the bump and VMC method a discretization scheme with 5 time steps is used. For the VMC method, the parameter v step was chosen equal to 0.5 and the tolerance level was set to 5. The expiry of the option was equal to T 0 = 1. The only nonzero analytical vega is V dig σ 11 = In the table we can t really see which part of the error originates from the bias and which part originates from variance. We compare the LR and VMC method with help of some figures and get insight into the influence of the vibrato parameter. To do this we first consider plots of the digital vega for different values of the initial bond price P T0 (0). The hypothesis is that the LR method will become worse if the discretization error gets larger. Figure 5.10: Two plots of a LR, VMC and analytic vega ( V dig σ 11 ). The numerical methods were applied under the T 4 forward measure. We used a sample size of 10 6 for the likelihood ratio vega. For the VMC vega a sample size of was used with a tolerance level of 10, the minimal sample size for the conditional expectation was equal to M v = 10 and the step size in the pathwise part was equal to 0.2 (so that the two methods need approximately an equal amount of time). In the left plot the expiry T 0 is equal to 1, in the right figure the same vega is plotted for an expiry of T 0 = 5. In contrast with the delta we see in the left plot in Figure 5.10 that the VMC method performs at least as good as the LR method. However, to get a more convincing result in favour of the VMC method, we increased the expiry of the digital from one to five so that the LR method, which only uses one time step, will have a larger bias. After doing this the weakness of the LR method became visible in the plot at the right side of Figure The LR estimators certainly have a positive bias, while the vibrato estimators perform quite well in comparison with the LR vegas. Hence it is not optimal to choose the vibrato parameter as large as possible, what can we say about the magnitude of the vibrato parameter? Due to the small sample size to estimate the MSE the figure is noisy, but it is clear enough to draw some conclusions. For a small v step we expect that the variance of the vega estimator, and 78

83 CHAPTER 5. RESULTS EUROPEAN OPTIONS 79 Figure 5.11: Left: Mean square error of the vega estimator for two different tolerance levels plotted against the used vibrato parameter. The minimal sample size for the conditional expectation was minimal 10 or else determined by an estimated variance and the tolerance level. The sample size of the pathwise part was equal to 10 4 and a sample size of 40 was used to estimate the M SE. Right: Computing times which were needed for the estimated mean square errors corresponding to the left plot. therefore the mean square error, becomes very large. First consider the green line corresponding to a tolerance level of 10. We see that the MSE increases fast when v step gets smaller, but does not explode. This is because of the tolerance level for the standard error of the conditional expectation. It will not get larger than a certain level. However, now the computation time explodes by controlling this variance, which is an undesired result. To check this hypothesis we also plotted the result for a higher tolerance level. As expected the MSE for a small v step is larger. Because of this error or large computing time it is not optimal to choose a small v step with a small tolerance level. Since the variance of the conditional expectation reduces when we choose a larger vibrato step, the MSE first decreases if v step is chosen somewhat larger. This decrease lasts until the bias of the LR part becomes significant in comparison with the variance of the estimator. The variance is lower for the estimator with the small tolerance level, and therefore the contribution of the bias is earlier visible in the green line. From this point (v step 0.3) the MSE starts increasing because of the bias of the LR part, hence it increases while v step gets larger. The same pattern can be seen for the blue line. This line starts increasing at a later state because the variance dominates the MSE more than it does with a lower tolerance level. Besides the vibrato step we also can choose the tolerance level. However, from this figure it seems that a low tolerance level does not have a benefit since the computing time explodes. The improvement in terms of MSE when using a smaller tolerance level may also be reached with a larger sample size for the pathwise part. Without tolerance level, the variance of the conditional expectation does not have to be estimated which will give additional time savings. Then, for the settings used in this subsection, v step has to be chosen around 1.2.

84 5.3.4 Results gamma We only consider the LR method and VMC method for the digital gamma: Error of caplet gamma Γ 00 Γ 01 Γ 02 Γ 03 Γ 04 CPU time samples tol VMC VMC LR (26) Table 5.5: Absolute errors of the approximations of the gammas with the VMC method (5 time steps, v step = 0.5) and the LR method using 1 timestep (therefore the CPU time is low). The CPU time is measured in seconds. The nonzero analytic gammas are 2 V dig P T0 (0)P T1 (0) = V dig P T0 (0) 2 = and From Table 5.5 we see that the LR method has the best performance: In 26 seconds the LR method gives similar results as the VMC method in 8524 seconds. This difference in CPU time is partly due to the number of timesteps used in the VMC method. For larger expiry dates, the LR method will have a larger bias. This bias gets visible by choosing a larger T 0 and plotting the results in a figure. To do this, we estimate the gammas for different initial bond prices, the results are shown in Figure (5.12). Figure 5.12: Results of the gamma 2 V dig P T0, the second derivative of the price of an digital (0) 2 option with respect to P T0 (0), plotted against the initial value of P T0 (0) itself. The methods were applied under the T 4 forward measure. Left: LR method. Right: VMC method with v step = 2. For the pathwise part a time step of 1.5 was used. The tolerance level tol bounds the standard error of the conditional expectations in the vibrato part using a variance estimation obtained from 20 samples. Two tolerance levels were used: a tolerance level of 50 (triangles) and a tolerance level of 10 (squares). In Figure 5.12 we see that the LR method as well as the VMC 1 method works for the digital gamma. These results are very similar as for the other Greeks. The LR method has a small bias (the point are all translated to the left with respect to the analytic solution) and the VMC method yield in a higher variance. First a high tolerance level was used which causes the variance of the vibrato results market with the triangles. To confirm that the errors were caused by the variance, a lower tolerance level was used. The result (the squares in the right figure of Figure 5.12) is a satisfactory approximation without a visible bias. However, the execution time of this last approximations was very high. 1 v step was not chosen optimal but by practice. 80

85 CHAPTER 5. RESULTS EUROPEAN OPTIONS 81 Remark 5. The computing time for the vibrato gammas are much larger than the vibrato deltas. Variance reduction techniques for estimating the conditional expectation in the last time step become more interesting in this case. Remark 6. For all Greeks it is not clear how to choose v step and the tolerance level if the analytic solution is not known. Therefore it might be more interesting to reduce the bias of the LR method using a probability density function which follows from a more advanced discretization scheme. 5.4 Variance properties Until now we focused on the Greeks of options which expire at T 0 with respect to P T0 (0) and P T1 (0). Despite the fact that we know that the analytical deltas of the caplet and digital of section with respect to P T2 (0), P T3 (0),..., P Tm (0) are zero, we are also interested in the outcome of the numerical methods for this deltas. These deltas are created artificially by considering the Monte Carlo pricing under the T m measure, but will be needed if we are going to consider options depending on more than one Libor. This section will give rise to an unexpected (and undesired) result. We will see that the estimates of this delta behave differently in terms of standard errors. Standard deviation caplet delta Method CPU time bump LR PS VMC Table 5.6: Standard deviation of caplet delta estimates using a sample size of 10 6 and T = 1. In Table 5.6 we see that the standard deviation of the estimators is equal for the bump and PS method in this setting. The LR method is the fastest, but has a larger standard error, in particular the derivatives with respect to P T2 (0), P T3 (0) and P T4 (0) have a large variance in comparison with the PS and bump method. Also the VMC method has a larger variance. In the case of the digital option, in Table 5.7, we have this same problem for the LR method. However, the digital option payoff has a discontinuity in L 1. This affects the derivative approximation with respect to P T0 (0) and P T1 (0) in particular. For the derivatives which are denoted by 0 and 1 in Table??Vardig, the standard deviations became larger for the bump estimated than for the LR or VMC estimates. Because of the difference in standard deviation in each delta, it is advantageous, in the case of the digital option, to estimate the derivatives with respect to P T0 (0) and P T1 (0) with the LR method and the other deltas with the bump method using different sample sizes. Also the VMC method can be used instead of LR, but this method is harder to implement, slower and depends on some parameter choices. In case of the caplet, the PS method is the best choice. Standard deviation digital delta Method CPU time bump(10 3 ) bump(10 4 ) LR VMC Table 5.7: Standard errors of digital delta estimates using a sample size of 10 6 and T = 5.

86 5.5 Improvement by predictor-corrector scheme Until now, all results were based on the Euler discretization scheme. But in section and section we showed how we can improve the Euler based LR and PS method with the help of the predictor-corrector scheme. In this section we compare results of the delta estimators which are obtained by using the Euler scheme, with estimators for the same Greeks using the PC scheme Bump method Figure 5.13: The improvement of the PC scheme for the bump method with bump size The circles are generated using the PC scheme, while for the triangles Euler was used. For all estimators a sample size of 10 5 was used. In Figure 5.13 we see that the bump method performs better if the PC discretization is applied instead of the Euler discretization. This is also the case for the PS method as can be seen in Figure For the pathwise sensitivity method we have used a larger sample size to reduce the standard error of the estimator, this makes the improvement of the PC discretization scheme more visible PS method Figure 5.14: The improvement of the PC scheme for the Pathwise sensitivity method. The circles are generated using the PC scheme, while for the triangles Euler was used. For all estimator a sample size of 10 6 was used. 82

87 CHAPTER 5. RESULTS EUROPEAN OPTIONS LR method With the LR method we are forced to only use one step. Therefore, a discretization method with a low bias is even more important. Since we derived an own way to apply the LR method in combination with the PC discretization scheme, we would like to test this idea. From section 5.4 we know that the variance of the LR estimators for the caplet are high in comparison with the variances of the bump and PS estimators. Therefore we use a sample size of In Figure 5.15 we see that the LR method in combination wit the PC discretization reduces the error in comparison with the Euler version of the LR method. Figure 5.15: Plot of the analytic and LR caplet delta with Euler and the predictor-corrector discretization. The numerical methods were applied under the T 4 forward measure. We used a sample size of

88 5.6 Conclusion The PS method is the best method if it can be applied. It is fast, has relative low variance (compared to vibrato or LR) and can control the bias by choosing the number of time steps. If the PS method cannot be used (deltas and vegas of digital options, gammas of non differentiable options), it can be extended to the kernel or VMC method. The results of the kernel method (which was only tested for the delta) are promising, but we have not considered the method for Bermudan options. The VMC method was slow, can give high variances and is difficult to implement. Besides that, we do not see how to apply this method for Bermudan type contracts. If the PS method may be applied, the bump method gives, if the bump size is taken small enough, similar results as the PS method. The advantage of the PS method is that it is much faster than the bump method. The bump method can also be used for digital options. However, this can give variance problems if the bump size is not chosen appropriately. Also the LR method has some disadvantages. For continuous options it has a high variance in comparison with the other methods. Therefore we only want to use it for payoffs with a discontinuity. Then we still have a problem with the bias. Because the ordinary LR method only uses 1 Euler step, this can give a large bias. To overcome this problem we derived in a previous chapter a LR method which is based on the PC scheme instead of the Euler scheme (section 3.3.5). Furthermore we found in section how to apply the LR method for Bermudan options. Because of the high variances we will use European options (from which the payoff is known) as control variates for the Bermudan options of the next chapter. The PC scheme can also be applied within the PS and bump method. In this chapter we have seen the improvement. Since the bias is so small by using the PC scheme we will use only 1 timestep for each tenor interval in the next chapter. Choosing more timesteps for each tenor interval will be too expensive if we have to simulate over several tenor intervals. 84

89 Chapter 6 Results cancellable options 6.1 Introduction In this chapter we will discuss some results of the Greek calculations for cancellable options. First we will give the standard settings which are used. These settings are different from the European settings because we simulate further in time. Besides that we have used more realistic settings than before. After that, we consider two contracts: the cancellable swap and the cancellable range swap. The sections about these contracts will have a similar structure: Comparing the standard deviation of several estimators in a standard setting. The standard deviation can be estimated relatively easy in comparison with the bias. We will reduce the standard error of the estimator by using the control variate of section After the standard errors are known, we know how large we should take the sample size to be able to compare the estimators in the next subsection. Estimating the Greeks in standard settings using different methods. The standard error of the estimator strongly depends on these settings. Therefore we also compare the standard errors for some other settings. During this section we wish to answer three questions: Do all the methods give the same result? When should we use a particular method? May we choose (in case of the bump method) the same decision time for the bumped path (explained in section 6.2) as was obtained from the original path? The reason why the first two questions are interesting is clear. The third question is interesting for this thesis because it can be used directly by practitioners who are using the bump method, or, practitioners who are using this version of the bump method may wish to know when it is allowed. More about this in the next section. 6.2 Bumping with fixed decision times Practitioners sometimes fix the exercise time determined by the original path, and reuse it in the bumped path, which is the path which is obtained by using the same random variables but where a parameter is incremented by a bump size h. This reduces the variance of the estimator, but is it allowed? From now we will denote this variation of the bump method by the bump* method. Our hypothesis is that it is certainly allowed for all payoffs where the pathwise sensitivity method can be applied. The argument is that, if the pathwise sensitivity method is allowed, we have: 85

90 V (0, L(0)) θ [ τ ] [ θ Em g i τ ] (ˆL(T i 1 )) L(0) = E m θ gi (ˆL(T i 1 )) L(0) (6.1) i=1 i=1 [ τ = E m g i L (ˆL(T i 1 )) ˆL(T ] i 1 ) θ L(0). (6.2) i=1 In the pathwise sensitivity method we calculate θ gi (ˆL(T i 1 )) analytically by using the multidimensional chain rule and the derivatives g i L (ˆL(T i 1 )) and ˆL(T i 1 ). θ However, the derivative θ gi (ˆL(T i 1 )) can also be estimated by finite differences. If we do this, we have exactly the bump* method where we use the same decision time for the bumped paths. Sometimes the differentiation and expectation operators cannot be interchanged, for example if g is discontinuous. There is no theorem if or when it is allowed to choose the same exercise regions for the bumped paths as for the original paths. Sometimes we will also give the bump* estimator to see if it works for the particular settings. 6.3 Test setting In the case of the European setting we only simulated the forward Libor rates until T 0, but for Bermudan options we have to simulate until time point T m 1. The matrix σ(t) was piecewise constant over each tenor interval, hence we have to give an input for this matrix at each t = 0,..., T m 2. Because each time a tenor time has passed, the number of driving Brownian motions drops by one. Therefore zero rows and columns appear in the matrices of volatility parameters. We resize all necessarry vectors and matrices in the computation, but we do not resize the matrices in the thesis for notational reasons. For this chapter we will choose our volatility parameters more carefully. The correlation between the forward Libor rates in the European setting is low, which is unrealistic. For this reason we choose new, more realistic volatility parameters. These parameters are chosen such that the variance of every underlying forward Libor rate at time point T i given the rate at T i 1 is in between 0.2δ i and 0.3δ i, where δ i was the length of the i-th tenor interval. The correlation between two forward Libor rates corresponding to neighboring tenor intervals is around 0.88 and the correlation between L k and L k 1 is higher than the correlation between L k and L i for all i < k 1, for all k. Furthermore, we note that a lower triangular matrix is sufficient to establish all correlations between the Libor rates. First we choose the matrix σ(t) for the interval [0, T 0 ]: σ(t 0 ) = ,

91 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 87 Since we do not use market data we use a standard procedure to construct σ(t k+1 ) from σ(t 0 ). For i {2,..., m}, we multiply row i with a value so that the variance of the stochastic procces for the i-th forward Libor rate stays around the same value. σ ij (T 1 ) = m k=1 σ i,k(0) m k=2 σ i,k(0) σ ij(0) i, j {2,..., m}. Similarly, if we move from the tenor interval (T l 1, T l ] to (T l, T l+1 ], we set σ ij (T l ) by using the equation: m k=1 σ ij (T l ) = σ i,k(0) m ij (0) i, j {l + 1,..., m}. k=l+2 i,k(0)σ σ Also the initial bond prices have changed to: [ P T0 (0)... P Tm (0) ] = [ ]. (6.3) The initial vector of forward Libor rates L(0) is determined by these bond prices. In our standard setting we choose a vector of strikes: K = 0.9L(0). For the options we consider, the derivatives of the initial option price with respect to L(0) or P T0 (0) are equal to the same delta of the European option with the same payoff at T 1 and can be calculated analytically. Therefore we do not consider these deltas. 6.4 Causes of errors Before we start with estimating the Greeks, we sum up the possible errors: Standard deviation of our estimator. Bias caused by the discretization grid. Bias caused by bump parameter. Bias by non optimal exercise regions. For every Monte Carlo method we estimate an expectation of a certain random variable X [method]. Before we estimate a Greek we estimate the standard deviation of X [method] for all methods of interest, we call this the standard deviation of the estimator and denote it by sd(x [method] ). If we estimate the Greek by using a sample size of M, the estimated standard error of the estimate is given by: sd(x [method] ) M. Hence this error can be reduced by choosing a larger sample size or by using a variance reduction technique. After we have estimated a Greek we compare the difference between the methods and compare them with the standard error of the difference of the estimates: sd(x methoda X methodb ) = Var(X methoda X methodb ) = Var(X methoda ) + Var(X methodb ) = (sd(x methoda )) 2 + (sd(x methodb )) 2.

92 The bias caused by the discretization grid is reduced by the use of the predictor-corrector discretization scheme (PC scheme) or by choosing a smaller discretization step. In section 5.5 we have seen that this error is very small for the European options we considered. All estimators share this type of error. The third type of error only applies to the bump method. If the bump parameter is taken too large, we expect a larger bias. Since the exercise regions are estimated, also this estimation will contibute to an extra bias in the Greek estimation. Despite that all estimators use the same exercise regions, this may cause differences between the different Greek estimators. In all methods we replace the optimal exercise region by an approximated optimal exercise region. This can result in unexpected differences between the estimators, even if the same exercise regions are used. 6.5 Cancellable swap Recall that the total payoff of the cancellable swap was given by: τ τ g CSwap (L(T 0 ),..., L(T τ 1 )) = g CSwap(L i P Ti (T i 1 ) i (T i 1 )) = P Tm (T i 1 ) Nδ i(l i (T i 1 ) K). i=1 We estimated the price by using a sample size of 10 6 for the Longstaff-Schwartz simulation as for calculating the Monte Carlo price. The price was: i=1 V cswap (0, L(0)) The standard error of this estimate (by using constant exercise regions) was estimated by: = Delta Standard errors of different methods First we will compare the standard error of the PS, LR and bump estimators in the standard setting. This is done for three reasons. The first is that we have to know whether it is likely that the difference between the estimators is caused by the standard deviations of the estimator. The second reason is that we can see the impact of the control variate. The third reason is that we can compare the various methods in terms of their variance. Standard deviation cancellable swap delta Method bump(10 2 ) PC bump(10 3 ) PC LR PC PS PC Table 6.1: Estimated standard deviation of cancellable swap delta estimators using a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. Without the value of the delta itself we cannot say whether the standard errors in Table 6.1 are high or not. However, we can say that the standard errors for the LR estimator in comparison with the bump or PS estimators are high. Even if we take the speed of the LR method into account, the bump as well as the pathwise method perform much better in terms of variance. In Table 6.2, the same values are generated for the same methods, only now with use of the control variate. The control variate has the most impact for the PS and the LR method. The standard errors of the LR estimators are reduce more than two times. 88

93 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 89 Standard deviation cancellable swap delta Method bump(10 2 ) CV PC bump(10 3 ) CV PC LR CV PC PS CV PC Table 6.2: Estimated standard deviation of cancellable swap delta estimators using a sample size of 10 5 and T m = 5. CV means that the control variate of section is used. The expected exercise time in this setting was estimated by τ = 3. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. CPU times One of the disadvantages of the bump method is that it is slow. In Table 6.3 we see the computing time to estimate the deltas. Computing the deltas with the bump method takes more or less the same time for all bump parameters. The method is almost three times slower than the LR method and almost four times slower than the PS method. Later we will see that the bump method is even slower if more Libors are involved. Method bump LR PS CPU time (seconds) Table 6.3: CPU time of each method using 10 5 paths. For all methods the control variate and the PC scheme are used Estimates To test the quality of our estimators we have estimated the deltas for the standard setting in Table 6.4. Cancellable swap delta Method bump(10 2 ) CV PC bump(10 3 ) CV PC bump(10 4 ) CV PC LR CV PC PS CV PC Table 6.4: Deltas of cancellable swap using 10 7 paths. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 6 was used. For all methods, the same seed was used. The results in Table 6.4 shows that almost all estimators give more or less the same value. Only the bump(0.01) estimator differs, but this is not a surprise because of the high bump estimator. A question is whether the difference between the LR and PS estimator is explained by the standard error? In Table 6.5 we see that the absolute differences between the LR and PS estimators are high in comparison with the standard errors of this difference. Because of this difference we estimate the LR delta with more paths in Table 6.6, but since the difference between the pathwise and LR estimator does not decrease it is likely that it comes from a bias.

94 Difference between LR and PS difference se Table 6.5: Absolute differences of the LR and PS estimators for the cancellable swap delta estimators (from Table 6.4) compared to the standard error of the difference of the estimators. Difference between LR and PS Method LR CV PC PS CV PC difference Table 6.6: Now the deltas are estimated with for the LR method instead of The PS estimates are the estimates of Table 6.4. Also the difference between the bump(10 3 ) and the pathwise estimator is too high to claim that it is caused by variance. Therefore we plotted the bump and PS estimators in Figure 6.1, where we plot estimates for different initial values of the initial forward Libor rates. We denote the vector of initial Libor rates under the standard setting by L std and estimated the delta with the initial forward Libor rates L(0) = α L std, where α ranges between 0.6 and 1.2. The bump(10 3 ) estimator is calculated for two seeds, which gives the same result and is therefore only plotted once. The red, dotted line of the bump(10 3 ) estimator, looks to have a positive bias if we compare with the PS estimator. This indicates that the error is indeed caused by a bias. If we choose a smaller bump size (red dots) we see, for most points, that the bias is smaller. But recall that this choice introduces a larger variance. Besides the standard bump method, we consider the bump* method were we use the same exercise dates for the bumped paths. These estimators are given by the black dots in Figure 6.1. We see that this estimator almost equals the pathwise estimators. This coincides with our hypothesis in section 6.2. Although the LR estimator is not the method of choice for a payoff where the pathwise sensitivity method can be applied, we would like to test the method for the cancellable swap delta. We have seen that the standard error for this method is relative large, and that we suspect a bias in the LR method or the PS method (or both). In Figure 6.2 the PS estimator is plotted in the same figure as the LR estimator of the cancellable swap delta for two seeds. The difference between the LR estimators shows the variance of the method. The PS and bump* method gives the best results, and since the PS method is faster tha the bump* method, this is the method of choice. 90

95 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 91 Figure 6.1: Estimates of cancellable swap delta calculated with bump- and PS method and sample size For the bump method a bump parameter of 10 3 and 10 4 is used. The bump* method is a variation of the original bump method were we use the same exercise dates for the bumped paths as for the original paths.

96 Figure 6.2: Estimates of cancellable swap delta calculated with LR and PS method and sample size For the LR method we have used two different seeds. The variation in the LR plots is because of the standard error: for the sensitivities with respect to P T2 (0), P T3 (0) and P T4 (0) the standard error (with help of Table 6.2 ) is around 10/ 10 6 = The deltas with respect to P T1 (0) and P T5 (0) are larger and have a smaller standard deviation, the relative difference with the PS estimator is smaller. 92

97 CHAPTER 6. RESULTS CANCELLABLE OPTIONS Vega The vega of the cancellable swap can be estimated with the same methods as were used for the cancellable swap deltas. We start with the same estimation of the standard deviations of the estimators. Results are presented in Tables 6.7 (without the use of the control variate) and Table 6.8 (with the use of the control variate). Standard deviation of different estimators Standard deviation cancellable swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) PC bump(10 3 ) PC LR PC PS PC Table 6.7: Estimated standard deviation of cancellable swap vega estimators using a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. Standard deviation cancellable swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) PC CV bump(10 3 ) PC CV LR PC CV PS PC CV Table 6.8: Estimated standard deviation of cancellable swap vega estimators using the control variate, a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. Again we see that the standard deviation of the PS estimator is lowest of all. The LR method gives a higher standard deviation than the bump method with bump size 0.01, but lower if a bump size of 10 3 is used. After comparing Tables 6.7 and 6.8 we see that the control variate especially reduces the standard deviation of the LR estimators and a part of the PS estimators. CPU times The PS vega needs more pathwise derivatives than the PS delta, however, the bump method is still the slowest method. The LR- and PS methods are, respectively, two and three times faster than the bump method, as can be seen in Table 6.9. Method bump LR PS CPU time (seconds) Table 6.9: CPU time of each method using 10 5 paths. For all methods the control variate and the PC scheme are used. Estimations Also the results of the estimation are similar to the results of the cancellable swap delta. The different estimates are close, but the differences, given in Table 6.11, are high in comparison

98 Cancellable swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) CV PC bump(10 3 ) CV PC LR CV PC PS CV PC Table 6.10: Vegas of cancellable swap using 10 7 paths. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. For all methods, the same seed was used. with the standard errors of the estimates. Therefore we estimated the cancellable LR and PS vega for a different seed in Table In this table we see a positive bias in comparison with the PS estimator. Difference between LR and PS vegas Method ν 21 ν 31 ν 41 ν 51 abs. difference se difference Table 6.11: Difference of LR and PS estimator compared to the standard deviation of this difference. Cancellable swap vega Method ν 21 ν 31 ν 41 ν 51 LR CV PC PS CV PC Table 6.12: Vegas of cancellable swap using 10 7 paths for a different seed than was used in Table For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. For all methods, the same seed was used. We also give a plot to visualize the cancellable swap vega as a function of the inititial forward Libor rates in Figure 6.3. We see that all estimators give more or less the same vega. If we choose the decision times for the bumped paths equal to the original paths (the bump* estimator), then we get, as expected, the same results as the PS method. The LR estimate sometimes differs from the PS method. Since both methods employ the same discretization grid and do not have an extra bias from a parameter, we presume that this difference is caused by the estimated exercise regions. 94

99 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 95 Figure 6.3: Estimated cancellable swap vegas calculated with the bump(10 2 )-,PS- and LR method, a sample size of 10 6 was used. For the bump method a bump parameter of 10 2 was used. The bump* method is a variation of the original bump method were we use the same exercise dates for the bumped paths as for the original paths Changing the number of Libors The CPU time needed to estimate the delta or vega depends strongly on the number of Libors which are considered. In practice, the number of Libors will be larger than the 5 Libors which we use in the standard setting. Therefore we plotted the estimated standard deviation and CPU time of the estimators for a different number of Libors. In Figure 6.5 we see that the computing time for the bump method gets very large in comparison with the LR and PS method. From Figure 6.4 we conclude that the LR estimator should not be used if a large number of Libors are involved. The standard deviation for this estimator grows fastest for this method. The PS method is the fastest method as well as the method with the best variance properties.

100 Figure 6.4: Maximum of the estimated standard deviations of the deltas (left) and vegas (right) estimators as function of the number of Libors. If the bump parameter is small enough, the bump* and PS estimator give approximately the same value and hence have the same standard deviation. For every number of Libors we plotted respectively the estimated standard deviation of the PS and bump* method (dark green), the original bump method (blue), the LR method (purple) and the bump method with a small bump size (red). Figure 6.5: Computing time for the bump (red), LR (purple) and PS (green) delta (left) and vega (right). 6.6 Cancellable range swap Recall that the total payoff of the cancellable range swap was given by: g crswap (L(T 0 ),..., L(T τ 1 )) = τ i=1 P Ti (T i 1 ) P Tm (T i 1 ) Nδ i(l i (T i 1 ) K) 1 K Low i <L i (T i 1 )<K up. i The bounds K Low and K up are chosen equal to respectively 0.8 L(0) and 1.2 L(0). We estimated the price by using a sample size of 10 6 for the Longstaff-Schwartz simulation as for calculating the Monte Carlo price. The price was: V crswap (0, L(0)) The standard error of this estimate (by using constant exercise regions) was estimated by: =

101 CHAPTER 6. RESULTS CANCELLABLE OPTIONS Delta Standard deviations Before we start with estimating the deltas of the cancellable range swap, we would like to know the order of magnitude of the standard deviations of the estimators. We do this for the same reason as for the cancellable swap. Especially for Bermudan type contracts with discontinuous payoffs, a large standard deviation is the main difficulty in estimating Greeks with the bump method. In the first table (Table 6.13), estimates of the standard deviations are given for the delta estimation of the cancellable range swap, where the estimation is done for the bump and LR method without control variates. In Table 6.14, estimates of the same standard deviations are given, but now for the method combined with the control variate. The pathwise sensitivity method cannot be used for this type of contract. Standard deviations cancellable range swap delta bump(10 3 ) PC bump(10 4 ) PC LR PC Table 6.13: Estimated standard deviation of cancellable range swap estimators using a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. Standard deviations cancellable range swap delta bump(10 3 ) CV PC bump(10 4 ) CV PC LR CV PC Table 6.14: Estimated standard deviation of cancellable swap estimators using a sample size of 10 6 and T m = 5. Now the control variate is used for all estimates. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. The first conclusion we can make, at least for this setting, is that the control variate has a large impact on the standard error. The magnitude of the standard error is decreased 10 to 20 times. In the next section we make estimates of the delta using a large sample size. With help of these standard errors we can say whether the differences between the estimates are natural or not. Furthermore we see, in terms of standard deviations, that the standard deviation for the LR method is comparable to the standard deviation of the bump method using a bump size of If the bump parameter has been chosen equal to 10 4, for example because the delta needs to be estimated more accurately, then the LR estimator has a lower standard deviation than the bump method. CPU times The CPU time of the LR estimator of the cancellable range delta is almost three times as fast as the bump estimator. The CPU times are given in Table If more Libors are used, the computing times change in the same way as for the cancellable swap, as was given in Figure Estimating the delta To check our methods, we estimate the deltas in the standard setting and print them in Table 6.16.

102 Method bump LR CPU time (seconds) Table 6.15: CPU time of each method using 10 5 paths. For all methods the control variate and the PC scheme are used. Cancellable range swap delta bump(10 2 ) CV PC bump(10 3 ) CV PC bump*(10 3 ) CV PC LR CV PC Table 6.16: Deltas of cancellable range swap using 10 7 paths. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. For all methods, the same seed was used. The * indicates that we have used the same decison times for the bumped paths as for the original paths. Difference bump(0.001) and LR difference se Table 6.17: Difference between bump(0.001) and LR estimator compared to the standard error of that difference. In the first row of Table 6.16 we see that there is a relative large bias in the delta estimates of the bump method with bump size But also for the bump(10 3 ) method some differences are large in comparison with the standard error as can be seen in Table We do not know whether it is allowed to use the same decision times for the bumped paths. If we neglect this, the standard deviation is smaller than the original bump method if we compare with the LR estimates. In Figure 6.6 we see that all methods gives the same values for the sensitivities with respect to P T1 (0) and P T2 (0), even the bump* method coincides with the original bump method and the LR method. In the bottom three figures we see differences which are caused by variance of the estimators. Since we do not have an analytic value, we cannot say which method performs best. The bump method gives occasionally very large spikes, which is caused by bumped paths that have another exercise date than the original path or by bumped paths which end in the exercise region while the original path does not (or vice versa). This is a main disadvantage for the bump method. 98

103 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 99 Figure 6.6: Various deltas for the cancellable range swap plotted against L(0) L std, were L(0) is the vector of initial forward Libor rates and L std the vector of initial forward Libor rates under standard setting. For each estimate a sample size of 10 6 is used. The red, dashed lines corresponds to the Bump*(10 3 ) method, the solid line to the LR method and the blue dotted lines to the bump(10 3 ) method. The * indicates that we have used the same decison times for the bumped paths as for the original paths.

6.6.4 Standard deviations in other settings One of the main questions of this chapter was when to use which method. For the cancellable range swap we tested the bump, bump* and LR method.

104 6.6.4 Standard deviations in other settings One of the main questions of this chapter was when to use which method. For the cancellable range swap we tested the bump, bump* and LR method. Since we cannot estimate the bias of the methods we compare the standard deviations of each method for different settings. We do this by choosing different number of forward Libor rates on which the contract depends, the length of the tenor times and by choosing other strikes. The estimators are random vectors, each element has his own standard deviation. We compare the maximum of the standard deviations of these estimates. Figure 6.7: Maximum of the vector of standard errors of the cancellable bump deltas and LR deltas as function of the number of forward Libor rates which are used. For each number of Libors respectively from left to right, the LR-, bump(0.001)-, bump*(0.001)-, bump(0.0001)- and bump*(0.0001) method are used. In Figure 6.7 we see that a small bump size leads in all cases to a relatively high standard error. Furthermore the bump* versions gives in all these settings a lower standard error than the original bump method. The standard error of the LR method is comparable to the standard error of the bump(0.001) method, but it seems that the bump method becomes more interesting if more Libors are involved. This effect is not as clear as it was in Figure 6.7 in the case of the cancellable swap delta. Figure 6.8: Maximum of the vector of standard errors of the cancellable bump* deltas and LR deltas as function of the strike divided by the initial Libor rates (in standard setting we have K/L(0) = 0.9). The estimated standard deviations of the bump*(0.001) are of the same magnitude as for the 100

105 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 101 LR method for all K/L(0) (0.6, 1.4) as can be seen in Figure 6.8. As expected, a smaller bump parameter gives higher standard errors. More interesting results are given in Figure 6.9. Here rough estimates of the standard deviations are plotted against the length of the tenor times. Surprisingly, we see a peak around a tenor length of 0.5. Here the standard error of the LR method is the worst of the three methods. We see that for small tenor lengths, both bump methods perform better (if we only look to the standerd error) than the LR method. The LR method is not that good for tenor lengths smaller than 0.8. Figure 6.9: Maximum of the vector of standard errors of the cancellable bump* deltas and LR deltas as function of the tenor length. In standard setting we have δ i = 1 i 0, 1, m Vega Standard deviations of different methods We do the same analysis for the vega of the cancellable range swap. So we start with an indication of the standard errors of each method. By comparing Tables 6.18 and 6.19 we see that also for this Greek the control variate reduces the standard deviation of the LR method. For the bump(10 3 ) method, this is only not the case for the sensitivity with respect to σ 51 (T 0 ). Furthermore we see that the LR method has the lowest standard deviation. Standard deviation cancellable range swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) PC bump(10 3 ) PC LR PC Table 6.18: Standard deviations of cancellable range swap vega estimates using a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. CPU times The LR method is about two times faster than the bump method. The computing times are comparable to the computing times of the cancellable swap vega and are given in Table If more Libors are used, the computing times changes in the same way as for the cancellable swap, as was given in Figure 6.5.

106 Standard deviation cancellable range swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) PC CV bump(10 3 ) PC CV LR PC CV bump*(10 3 ) PC CV Table 6.19: Standard errors of cancellable range swap vega estimates using a sample size of 10 6 and T m = 5. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. The difference with Tablre 6.18 is that now the control variate is used. Method bump LR CPU time (seconds) Table 6.20: CPU time of each method using 10 5 paths. For all methods the control variate and the PC scheme are used. Estimating the vega Also for this Greek we start with a comparison of the estimates in the standard setting. These estimates are given in Table Cancellable range swap vega Method ν 21 ν 31 ν 41 ν 51 bump(10 2 ) CV PC bump(10 3 ) CV PC LR CV PC Table 6.21: Vegas of cancellable range swap using 10 7 paths. For the Longstaff-Schwartz algorithm and the calculation of the expected exercise time, a sample size of 10 5 was used. For all methods, the same seed was used. If we compare the difference between the LR and bump(10 3 ) (Table 6.22) or bump(10 2 ) (Table 6.23) estimator, we see that only for the sensitivity with respect to σ 51 (T 0 ) it is unlikely that the difference between LR and bump(10 3 ) is caused by variance. Difference between bump and LR Method ν 21 ν 31 ν 41 ν 51 difference se Table 6.22: Absolute difference between LR estimator and bump(0.001) estimator in the first row, compared to the standard deviation of this difference. To get a better overview of the estimators, we plot the vegas as functions of the initial forward Libor values in Figure First we use the bump* estimator instead of the original bump method. 102

107 CHAPTER 6. RESULTS CANCELLABLE OPTIONS 103 Difference between bump and LR Method ν 21 ν 31 ν 41 ν 51 difference se Table 6.23: Absolute difference between LR estimator and bump(0.01) estimator in the first row, compared to the standard deviation oft his difference. Figure 6.10: Bump* and LR estimates of the cancellable range swap vega using a sample size of For all bump estimators a bump size of 10 3 was used. Now we see in Figure 6.10, that there is some positive bias in the bump* estimator if we assume that the LR vegas are a better estimate. Of course we cannot just assume that the LR estimate gives a better vega. Therefore we plot the original bump method in Figure 6.11 along the plots from Figure We see that this estimate is much closer to the LR estimator, which gives us confidence that the bump* estimator contains a bias. In Figure 6.12 this is more clear. Here we used a larger bump size so that the standard deviation decreases. The bias appears to be small. Therefore we presume that the choice of the same decision times for the bumped paths is not allowed. However, we make the remark that this is only a weak indication. Also in the cancellable swap case we saw a difference between the PS and LR estimators. It is possible that also here the LR estimator has such a bias or the difference is caused by the exercise regions.

108 Figure 6.11: Bump, bump* and LR estimates of the cancellable range swap vega using a sample size of For all bump estimators a bump size of 10 3 was used. Figure 6.12: Bump, bump* and LR estimates of the cancellable range swap vega using a sample size of For the bump* and bump estimators a bump size of respectively 10 3 and 10 2 was used. 104

109 CHAPTER 6. RESULTS CANCELLABLE OPTIONS Other settings Vega To test when which method should be used for the cancellable range vega, we compare the estimated standard estimation for different settings. The result is the same as for the delta: the bump* method has a smaller standard deviation than the bump method, and a smaller bump parameter always leads to a higher standard deviation. This time the standard deviation of the LR method is significantly lower than the bump(10 2 ) method as well as the bump(10 3 ) method. This follows from Figure 6.13 for a different number of initial Libor rates (left) or for different strikes (right). Figure 6.13: Maximum of the vector of standard errors of the cancellable bump*-, bump and LR vegas as function of the number of Libors (left) and the tenor strike divided by the initial Libor rates(right). In standard setting we have 5 Libors and K = 0.9L(0). Left: with control variate. Right: without control variate. In Figure 6.14, the estimated standard deviation is plotted against the length of the tenor times. We see that the estimated standard deviation is large for tenor length around 0.4. Here the LR method is disadvantageous. The same plot is given on the right side of Figure 6.14, but now it is plotted along with the estimated standard deviations if the control variate is not used (dashed line). Note the change of the vertical axis of the right plot in comparison with the left plot. The three solid lines in the right figure are the same plots as the three lines in the left figure. Figure 6.14: Maximum of the vector of standard errors of the cancellable bump and LR vegas as function of tenor length.

110 6.7 Conclusion We are positive about the results that all estimators gives roughly the same deltas for the cancellable swap and the cancellable range swap. This confirms that also our version of the LR method works well. Overall there were some differences which have to be caused by a small bias. This is because of the estimation of the exercise regions, which has a different effect for each method. Furthermore we have seen that the control variate has a positive impact on the standard deviations of the estimators. For the options where the pathwise sensitivity method can be applied, the same exercise times can be used for the bump method. Then the bump method gives approximately the same results as the pathwise method, as long as the bump parameter is chosen small enough. The speed of the PS method makes this method superior with respect to the bump method, especially if the number of forward Libor rates is large. The standard deviation of the LR method is the largest for these options. Hence the pathwise sensitivity method should be applied. A disadvantage of the pathwise sensitivity method is that some conditions have to be satisfy, while in practice not all conditions will be checked. If the pathwise sensitivity method cannot be applied, the use of the same exercise times for the bump method may not be allowed. For the cancellable range delta, de bump method was approximately as good as the LR method. But for the LR method we do not have to care about a bump parameter. Furthermore the LR method becomes faster with respect to bumping if more Libors are involved. In the case of the cancellable range swap, the LR method was even better than the bump method. However, we are not sufficiently convinced to claim that the LR method is always satisfactory for discontinuous payoffs. At least we have seen that it is a good alternative for the bump method. Because both methods have to deal with large variances for options with discontinuous payoffs, we advice to always estimate the standard error of the estimator along with the estimator itself. Even if this is done one has to be alert. For example if the bump method is used with a small bump parameter for a Bermudan style option, the probability that a bumped path has another exercise date is small, but if it happens it could have a large impact on the estimated standard error. Therefore it is possible that the estimated standard error is actually way too small. 106

111 Chapter 7 Conclusions We have considered several methods to estimate the Greeks in the lognormal Libor market model, we summarize all of these methods one more time. At the end we will give the method of choice for all types of contracts we considered. Bump method The bump method is the most simple method and is widely used by practitioners. To estimate the Greeks we need to compute two option prices: the original price and the price if one of the parameters from which we want the sensitivity is incremented by the bump size. The estimator is given by the difference divided by the bump size. Hence if we are able to price the option, the bump estimator can easily be obtained. The disadvantage is that it is slow and we need to choose a bump parameter. For continuous (European and Bermudan) options this bump parameter is no problem and has to be chosen small, but we have seen that for discontinuous options this is different. Despite of all the disadvantages we have seen that, if the bump parameter is not too large and the sample size is taken large enough, the bump estimator is able to estimate the Greeks. However, the bump estimators sometimes resulted in unexpected errors (such as the spikes in Figure 6.6). The objective was to find an alternative for this method. PS method The pathwise sensitivity method was based on the interchange between the derivative and the expectation operator. We had to calculate the so-called pathwise derivatives. This method typically can be applied if the payoff is continuous. In the result sections for the European options as well the cancellable options, we have seen that the method outperforms all other methods if the use of it is allowed. This succesful result is partly due to the adjoint implementation, which seriously speeds up the calculations. However, this method could not be applied for discontinuous payoffs. LR method For the LR method we had to rewrite the expectation from which we need the derivative as an integral. In this way we were able to move the differentiation to the probability density function. We have extended the method to the Bermudan setting. Thanks to our control variate and the application of the predictor-corrector scheme, this was a succesful method to estimate the Greeks of Bermudan options with discontinuous payoff functions. However, problems can arise with high variances, especially if the length of the tenor intervals is small. 107

112 Vibrato Monte Carlo method The vibrato Monte Carlo method was a combination of the pathwise sensitivity method and the LR method. The pathwise sensitivity method was used for the first part, and the likelihood ratio method was used to estimate sensitivities of an expectation conditioned on L(T v step ). One of the results for this method was that if we choose the last time step small, it can give very large variances. This can be solved by choosing the samplesi ze for the conditional expectation proportional to the variance of the random variable inside this conditional expectation, but this was very time consuming. The advantage of the vibrato Monte Carlo method in comparison with the LR method was that the LR method had a larger bias coming from the Euler step. Because of our succesful application of the predictor-corrector discretization scheme within the LR method, this advantage of the VMC method is less important for the lognormal Libor market model. For these reasons and the fact that this method is most difficult to implement, this is not the method of choice for the European option. Furthermore we did not see how to apply this method for Bermudan type options. Pathwise kernel method The pathwise kernel method is only considered briefly for the European delta. It was a version of the pathwise sensitivity method where a theorem was used to rewrite the derivative. One reason why we do not get further into this method was that we had to choose a bandwidth, which can result in the same problems as for the bump method: a too small bandwidth gives a high variance and a too large bandwidth gives a bias. A second reason is that the theorem on which the method was based has to be extended to Bermudan options. If this is possible, and if the bandwidth can be choosen efficiently, we think this method is very interesting. PC scheme and control variate The control variate was applied on all methods. Especially for the LR method this was useful, but also for the other Greek estimators it appears to be advantegeous. The predictor-corrector scheme, which reduces the bias of the estimators, was already applied for the bump and pathwise method in literature. We did this also for the LR method. Method of choice The method which should be used depends on whether the payoff function is continuous or not, hence we split up this section in two subsections. Continuuous payoffs For European continuous options as well as for Bermudan continuous options, we advice to use the pathwise senstivity method. Based on our results, this method has the lowests variances and is the fastest. However, one should realize that the conditions to use this method are in fact rather restrictive. Discontinuuous payoffs For discontinuous payoffs we advice to use both the bump and LR methods. The standard deviation should be calculated for both estimators. In this way we can choose the estimator with the lowest standard deviation. Recall that the bump method gives an extra bias due to the bump size. If both standard deviations are too small in comparison with the difference between the estimators, the LR estimator should be the method of choice because this method does not have such an extra bias. 108

113 Chapter 8 Further research In this thesis we considered the lognormal Libor market model for two specific type of payoffs. We are also interested if the same Greek estimators can be applied for other models. For the bump- and pathwise sensitivity method this is no problem. For the bump method we only need to simulate the Libor rates with the new model and for the pathwise sensitivity method we have to recalculate the pathwise derivatives. The only problem is that choosing more discretization steps will be very expensive in a multidimensional stochastic differential equation. In case of the LR method things are different. For a stochastic term in the volatility part of the stochastic differential equation, the idea which is applied by the predictor-corrector scheme for the drift can also be applied for the volatility term, if we have the stochastic differential equation: We first perform the predictor step: dx(t) = µ(x(t), t)dt + σ(x(t), t)dw (t) with t [0, T ]. dx P (t k+1 ) = X(t k ) + µ(x(t k ), t k )(t k+1 t k ) + σ(x(t k ), t k ) W (t k ), and then calculate the predictor-corrector estimate: dx P c (t k+1 ) = X(t k )+ 1 2 (µ(x(t k), t k )+µ(x P (t k+1 ), t k+1 ))dt+ 1 2 (σ(x(t k), t k )+σ(x P (t k+1 ), t k+1 )) W (t k ). However, if this is still not enough and more than one discretization steps should be used for each tenor interval, the LR method gives a problem for the vega calculation. The delta and gamma are still possible as long the underlying model is a Markov process, but then the standard deviation will increase if the timestep is chosen smaller. Therefore it might be interesting to investigate whether the kernel method can be applied for Bermudan options or not. This method is comparable to the pathwise sensitivity method, but can handle discontinuous payoffs. We would be very succesful, if the kernel method will also have better variance properties than the LR and bump methods. 109

114 Appendix A Implementation In this chapter we will shortly explain the implementation which is used to generate the results in this thesis. It is a description of Figure A.1. The implementation consists of several classes: Libor class: a Libor object contains the initial bond en Libor values, tenor points and volatilities. The class has methods for changing this input values (which is needed for bumping or making plots). Grid: the grid object contains information about the stepsizes and number of steps for each tenor interval. SimLibors: with this abstract class a vector of forward libor rates can be simulated in time. The class has two derived classes: one for the Euler scheme and one for the PC scheme. Besides the simulation, these derived classes also contains method which returns pathwise derivatives or derivatives of the mean vector of the multivariate normal probability density function. Euroption: this is an abstract class with several derived classes. The derived classes consists of methods which specify the payoff functions, payoff derivatives and analytical solutions of prices and Greeks. The abstract class contains methods who estimate the price or some Greeks (delta, vega or gamma) of the European option with a Monte Carlo method. The Greek estimation can be done with different methods (bumping, PS method, LR method, vibrato method and kernel method). CNoption: class which corresponds to the cancellable contracts which are considered in this thesis. The main difference in comparison with the European option class is that it contains a method to determine the optimal exercise regions. The derived classes needs to contain a method which specify the explanatory variables which are used. The abstract class contains a method which determines the exercise regions by setting a certain number of parameters. Once these parameters are setted, another method can be used which returns an approximation of the hold value of the method. Also the application of the control variate needed some methods. Finally the Greeks can be estimated by the LR-, PS-, bump- or bump* method. For this estimation we can use the Euler- or PC scheme and we can choose whether we want to use the control variate or not. 110

115 APPENDIX A. IMPLEMENTATION 111 Figure A.1: Simplified class diagram of the implementation in C#. The figure is only illustrative and should support the description in this section. Hence only the important methods are printed.

Market interest-rate models

Market interest-rate models Marco Marchioro www.marchioro.org November 24 th, 2012 Market interest-rate models 1 Lecture Summary No-arbitrage models Detailed example: Hull-White Monte Carlo simulations