Lund University Faculty of Engineering Statistics in Finance Centre for Mathematical Sciences, Mathematical Statistics HT 2011 Parameter estimation in SDE:s This computer exercise concerns some estimation of parameters in stochastic differential equations. It contains two main assignments with several sub-assignments. Many of these are labelled Extra. You should solve all the regular problems (you need to solve at least two extra problems if you want to get 2 bonus credits from the computer exercise.) It is preferred if you use Matlab, but you are allowed to use the programming language or package of your choice. If you choose not to use Matlab, please note that you are required to document your code extra carefully. 1 Preparations for the exercise Read chapters 8.1, 11 and 14 in Madsen et. al. [2006 and this instruction. Then you should prepare the computer exercise by writing down the Matlab functions needed for the exercise. Before the computer exercise some of the questions below will be posed. All of the posed question must be answered correctly in order to pass the computer exercise. 2 Catalogue of questions You should be able to answer the following questions before the computer exercise. 1. Suggest suitable moment restrictions for estimating the model using GMM. ε i are i.i.d. normal. y i = β 0 + β 1 x i + ε i, (1) 2. Discretize and find suitable moment restrictions for the CKLS-model. 3. Write down the approximate likelihood function for the CIR model when discretized using the Euler scheme. 4. (If you do exercise 3.4.2) Calculate first, second and third order moments for the Cox-Ingersoll-Ross process using Dynkin s formula (see equation 4). 1
3 Computer Exercises 3.1 Generalized Method of Moments Estimation with the Generalized Method of Moments (GMM) is an alternative that should be used mainly when maximum likelihood estimation isn t possible, or when we want to aviod explicit assumptions about the distribution of the innovations. The method has a certain amount of arbitrariness when specifying suitable moment restrictions which sometimes may lead to lower efficiency than Maximum Likelihood 1. On the other hand, for simpler models and with proper moment restrictions, one can show that both methods give identical estimators. If the number of moment restrictions equals the number of parameters, some effort can be spared by solving a system of equations rather than solving an optimization problem. The method is then referred to as the Method of Moments. At a first glance, the method might seem to be a purely theoretical construction since the true parameter values are needed to calculate the weight matrix which in turn are needed to calculate the estimates. It turns out that this is not a problem since it can be shown that the Generalized Method of Moments is a consistent (but not efficient) for all positive definite weight matrices. The pseudo algorithm for the estimation becomes: 1. Estimate the parameters with an arbitrary pos. def. weight matrix W, for instance the identity matrix to get an initial estimate of the parameters. ( 1 ) T ( 1 ) θ = arg min J N (θ) = arg min f(xt, θ) W f(xt, θ) N N 2. Use these parameters to calculate the optimal weight matrix, chosen as the inverse of the covariance matrix for the moment restrictions. 3. Estimate new parameter with the new weight matrix, estimate a new weight matrix from these parameters and so on until convergence. 3.1.1 Tests Complex models can be compared with LR-tests if we use Maximum Likelihood techniques. There are similar tests for the GMM-method. The test statistic is given by: [ N J N ( θ Restricted ) J N ( θ Unrestricted ) and under the null-hypothesis it has an asymptotic χ 2 (k)-distribution where k is the difference in the number of parameters. 1 This is not always a bad thing, one can for instance choose economically sound restrictions. 2
Note: You can not use MLmax to derive confidence intervals for your estimates when using GMM. You should instead use fminunc and compute the Hessian and the Jacobian of the moment restrictions separately. 3.2 Regression As an introduction to GMM we will consider a linear regression model. The data set is the same as in computer exercise 1. Load the data set by writing: > > load regdata.mat Assignment: Use your results from from preparatory question 1 to estimate the parameters. Calculate a 95% confidence interval and compare your results with the results from exercise 1. Assignment: Test the linear model against the quadratic model: y i = β 0 + β 1 x i + β 2 x 2 i + ε i. Conclusions? 3.3 Parameter estimation in stochastic differential equations You will meet two different processes in this part of the computer exercise: the Cox- Ingersoll-Ross (CIR) process and the CKLS-process. the CIR process is often used to model interest rates and is defined as: This can be generalized to the CKLS model given by: dr t = κ(θ r t )dt + σ r t dw t. (2) dr t = κ(θ r t )dt + σr γ t dw t. (3) These models can also be used to model the volatility in stochastic variance models, since they always remain positive. Load the data by writing: > > load cirdata.mat > > load cklsdata.mat By discretizing the stochastic differential equation one can use for instance GMM for parameter estimation. The discretization is usually done with an Euler scheme but of course higher order methods works even better. The advantage of using the discretized process is that we can estimate complex models without much problem. 3
Assignment: Estimate the parameters and their covariance matrices in the models above by using GMM. Assignment: Estimate the parameters and their covariance matrices in the models above by using approximate Quasi Maximum Likelihood (by discretizing the model and using the likelihood generated by the approximate model). 3.4 Improved approximate estimators All problems in this Sections are extra assignments. 3.4.1 Transformation of data for the CIR-model and the Shoji and Ozaki approximated likelihood (Extra) Using the transformation y t = 2 r t we obtain a state independent diffussion term. Itos formula gives [ 1 dy t = κ(θ r t ) + 1 1 rt 2r 3/2 2 σ2 r t dt + 1 rt σdw t rt t [ 2 = κ(θ y2 t y t 4 ) + 4 σ 2 y2 t dt + σdw yt 3 t 8 [ 2 = κθ κ y t y t 2 + 2 σ 2 dt + σdw t y t 4 [ ) 2 = (κθ σ2 κ y t dt + σdw t 4 2 y t = µ(y t )dt + σdw t 4
We can now use the Shoji and Ozaki [1998 method to approximate the likelihood with a Gaussian likelihood. N log(p ytk y tk 1 (y tk )) k=2 N ( (ytk m k ) 2 1 ) 2v k 2 log(2πv k) k=2 m k = y tk 1 + a k K k + σ2 c k (K k b k k ) b k 2b 2 k v k = σ2 (exp(2b k k ) 1) 2b k k = t k t k 1 K k = exp(b k k ) 1 a k = µ(y tk 1 ) = 2 y tk 1 b k = µ (y tk 1 ) = 2 yt 2 k 1 c k = µ (y tk 1 ) = 4 y 3 t k 1 ) (κθ σ2 κ y t k 1 4 2 ) (κθ σ2 κ 4 2 ) (κθ σ2 4 Assignment: Estimate the parameters and their covariance matrices in the CIR model by using the Shoji and Ozaki approximated likelihood. 3.4.2 Simulated Maximum Likelihood (Extra) Sometimes it is possible to use Maximum Likelihood on a simulated likelihood in stochastic differential equations. Assume that we have observations y n, n = 1,... N from some model. Also assume that the sampling interval is. We now want to find the likelihood N L(θ) = p(y i y i 1, θ). i=1 If this is not available in closed form, one might approximate it using simulation. First we discretize the dynamics of the stochastic differential equation using a scheme that gives a simple transition probability, preferably Euler. Then, divide the interval into M subintervals of length δ = /M. The idea is the to simulate K trajectories on a grid of size δ up to subinterval M 1 starting in y n. Do this for every time n, resulting in K samples at every n. Under the Euler discretization, the transition density from M 1 to M is Gaussian with mean µ n,k and standard deviation σ n,k, both given by the 5
model in question. Then we approximate the likelihood by the following: L(θ) = N p(y n+1 y n, θ) n=1 ( N 1 K n=1 ) K ϕ(y n+1, µ n,k θ, σ n,k θ ) k=1 where ϕ(y, µ, σ) is the density of the Normal distribution. The log likelihood is given by ( ( )) N N 1 K l(θ) = log(l(θ)) = log(p(y n+1 y n ), θ) log ϕ(y n+1, µ n,k θ, σ n,k θ ) K n=1 We then use numerical optimization techniques to maximize the loglikelihood. In this exercise we keep the number of intermediate steps fairly small, say M = 2 or 3. If M is too big, the variance of the sample will become large and make the estimation difficult. The remedy for this is to use an importance sampler. n=1 k=1 Assignment: Estimate the parameters and their covariance matrices in the models above by using Simulated Maximum Likelihood. Assignment: model. Do the same thing for the transformed data using the transformed CIR Hint: It is usually a good idea to use the same sequence of random numbers each time the likelihood is evaluated. This is referred to as Common Random Numbers and is a way to avoid the Monte Carlo error in the minimization. Practically this is done by drawing a large number of random numbers, in this case N (M 1) K and use these as input to the likelihood. 3.4.3 Exact Moments (GMM/EF) (Extra) Sometimes the bias that discretization brings can be avoided by calculating the moments exactly. The moments can be calculated using the so called Dynkin s formula, which is nothing more than an application of the Itô formula. Dynkin s formula is given by: [ t E [f(x t ) X t 1 = f(x 0 ) + E Af(X s )ds X t 1 (4) where the operator A (standard Itô s formula) is defined as, n f Af = µ i + 1 n ( ) σσ T 2 f. x i 2 i,j x i x j i=1 6 i,j=1 0
and where the process {X t } is the solution to dx(t) = µ(t, X(t))dt+σ(t, X(t))dW (t). For our purposes {X t } is a one-dimensional process which makes Dynkin s formula a lot easier (n = 1). Hint: To show how Dynkins formula can be used we will study an example with f(x) = x 2. Suppose that we want to calculate the moments of the CIR process. Ax 2 = κ(θ x) x2 x + 1 ) ((σ 2 x) 2 x 2 2 x 2 = κ(θ x) 2x + 1 2 σ2 x 2 = 2κx 2 + (2κθ + σ 2 )x By plugging this into the formula we get: E [ [ t X Xt 2 = x 2 0 + E X ( 2κXs 2 + (2κθ + σ 2 )X s )ds 0 Finally we assume that the expectation and the time integral can be interchanged, followed by taking the derivatives with respect to t on both sides. E X [X 2 t t = 2κE X [ X 2 t + (2κθ + σ 2 )E X [X t. Extra: Estimate the parameters and their covariance matrix for the Cox-Ingersoll- Ross process by using exact moment, that is using f(x) = {x, x 2, x 3 }. 3.5 Exact likelihood for the CIR model (Extra) The CIR process has a known transition density. If we let k = t k t k 1, c = 2κ/(σ 2 (1 exp( κ k ))), and Y = 2cX, then Y tk Y tk 1 is distributed as a noncentral chi-squared with 2κθ/σ 2 degrees of freedom and noncentrality parameter Y tk 1 exp(κ k ). Using this we can write up the density as p Xtk X tk 1 (x tk x tk 1 ) = ce u v ( v u) q/2 Iq (2 uv) (5) where u = cx tk 1 exp( κ k ), v = cx tk, q = 2κθ/σ 2 1 and I q (z) is the modified Bessel function of the first kind of order q (besseli(q,z) in Matlab). Extra: Estimate the parameters and their covariance matrix for the Cox-Ingersoll- Ross model by using the exact likelihood. Compare the result with the previous estimates where you used GMM, Simulated ML, Shoji and Ozaki and Exact moments. 7
4 Feedback Comments and ideas relating to the computer exercise are always welcome. Send them to Erik Lindström, erikl@maths.lth.se or call 046 222 45 78. 5 MATLAB routines fminsearch Simplex based optimisation routine. fminunc Numerical minimization of a multidimensional function. The routine is based on quasi-newton (BFGS) and it can return the minimising parameter value, the minimal function value and the Hessian. The cryptic name comes from function minimization unconditional. MLmax Customized Quasi-Newton based optimization algorithm for maximum likelihood estimation. Maximises the likelihood function by using the score function s quadratic variations to estimate Fisher Information matrix. Needs the log-likelihood returned as a vector. >> [xout,logl,covm=mlmax(@lnl,x0,indata) Referenser Madsen, H., Nielsen, J. N., Lindström, E., Baadsgaard, M. and Holst, J. Statistics in Finance, IMM;DTU; Lyngby och Matematisk Statistik, LTH, Lund 2006. Nolsøe, K., Madsen, H., Nielsen, J. N. and Baadsgaard, M. Lecture Notes in estimation functions, IMM;DTU; Lyngby Shoji, I. and Ozaki, T. (1998). Estimation for nonlinear stochastic differential equations by a local linearization method, Stochastic Analysis and Applications, 16, 733 752. 8