1 Motivations Stochastic Volatility (SV) Models Jun Yu Some stylised facts about financial asset return distributions: 1. Distribution is leptokurtic 2. Volatility clustering 3. Volatility responds to return news asymmetrically 4. Returnsarecrossdependent 5. Volatilities are cross dependent 6. Often a lower dimensional factor structure explains most of the correlation 7. Time varying correlations 1
Series : sp9303daily ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 Lag ACF for typical financial asset returns If an ARMA model has to be used, this model suggests an ARMA(0,0). The ACF for squared return suggests that ARMA-type models are NOT suitable for financial returns. 2
Series : sp9303daily^2 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 Lag 3
2 Univariate SV Models Model 1: The basic SV model (Taylor, 1982) Suppose y t is the return for a financial asset. The basic SV model is yt = σ t ε t =exp(0.5h t )ε t,ε t N(0, 1) h t = λ + αh t 1 + v t,v t N(0,σ 2 v), where {ε t } and {v t } are uncorrelated. Rewriting: yt = σ exp(0.5h t )ε t,ε t N(0, 1) h t = αh t 1 + v t,v t N(0,σ 2 v), Volatility evolves according to an AR(1) process. Condition on I t 1,h t is not deterministic any more since v t is not realized. The assumption could be more realistic than ARCH-type models. In the SV model α captures volatility clustering. Kurtosis of y t is 3exp(σ 2 v/(1 α 2 )) which is larger than 3 as long as σ v 6= 0. So the unconditional distribution of y t has fatter tails than the normal distribution. The tail thickness is due to the presence of the second noise. E(y t y s )=0, t 6= s. E(y 2 t y 2 t i) =exp(2μ h + σ 2 h (1 + αi )), whereμ h = λ 1 α and σ2 h = We can linearize the model, xt =ln(yt 2 )=h t +ln(ε 2 t ). h t = λ + αh t 1 + v t σ2 v. 1 α 2 4
Model 2: The t-sv model: yt = σ t ε t =exp(0.5h t )ε t,ε t t (k) h t = λ + αh t 1 + v t,v t N(0,σ 2 v). Model 3: Asymmetric SV Model (Yu, 2005): yt = σ t ε t =exp(0.5h t )ε t,ε t N(0, 1) h t = λ + αh t 1 + v t,v t N(0,σ 2 v), (2.1) where corr( t,v t+1 )=ρ. Yu (2005) showed that E(h t+1 X t )=λ + λφ µ y 1 φ + ρσ 2 v exp σv 4 4(1 φ 2 ) + σv 2λ 2 (1 φ 2 )(1 φ) y t. This is a linear function in y t and implies that if ρ<0, and everything else is held constant, a fall in the stock price/return leads to an increase of E(ln σ 2 t+1 y t ). So this model allows volatility responds asymmetrically to good/bad news 5
3 Multivariate SV Models For the purpose of illustration, we focus on the bivariate multivariate SV models. Let the observed log-returns at time t be denoted by y t =(y 1t,y 2t ) 0 for t = 1,...,T. Let ² t =( 1t, 2t ) 0, η t =(η 1t,η 2t ) 0, μ = (μ 1,μ 2 ) 0, h t =(h 1,t,h 2,t ) 0, Ω t = diag(exp(h t /2)), and Φ = µ µ φ11 φ 12 1 ρ, Σ φ 21 φ = 22 ρ 1 µ, Σ η = ση 2 1 ρ η σ η1 σ η2 ρ η σ η1 σ η2 ση 2 2. Model 1 (Basic-MSV or MSV): y t = Ω t ² t, ² t N(0,I), h t+1 = μ + diag(φ 11,φ 22 )(h t μ)+η t, η t N(0, diag(σ 2 η 1,σ 2 η 2 )), with h 0 = μ. This model is equivalent to stacking two basic univariate SV models together. Clearly, this specification does not allow for correlation across the returns or across the volatilities. However, it does allow for leptokurtic return distributions and volatility clustering. 6
Model 2 (Constant Correlation-MSV or CC-MSV): y t = Ω t ² t, ² t N(0, Σ ), h t+1 = μ + diag(φ 11,φ 22 )(h t μ)+η t, η t N(0, diag(σ 2 η 1,σ 2 η 2 )), with h 0 = μ. In this model, the return shocks are allowed to be correlated. As a result, the returns are cross dependent. Model 3 (MSV with Granger Causality or GC-MSV): y t = Ω t ² t, ² t N(0, Σ ), h t+1 = μ + Φ(h t μ)+η t, η t N(0, diag(σ 2 η 1,σ 2 η 2 )), with h 0 = μ and φ 12 =0.Sinceφ 21 can be different from zero, the volatility of the second asset is allowed to be Granger caused by the volatility of the first asset. Consequently, both the returns and volatilities are cross dependent. However, the cross-dependence of volatilities are realized via Granger causality and volatility clustering jointly. Furthermore, when both φ 12 and φ 21 are nonzero, a bilateral Granger causality in volatility between the two assets is allowed. 7
Model 4 (Generalized CC-MSV or GCC-MSV): y t = Ω t ² t, ² t N(0, Σ ), h t+1 = μ + diag(φ 11,φ 22 )(h t μ)+η t, η t N(0, Σ η ), with h 0 = μ. In this model both returns and volatilities are cross dependent. Obviously, both GC-MSV and GCC-MSV can generate cross dependence in volatilities. Model 5 (Dynamic Correlation-MSV or DC-MSV): y t = Ω t ² t, ² t Ω t N(0, Σ,t ), µ 1 ρt Σ,t =, ρ t 1 h t+1 = μ + diag(φ 11,φ 22 )(h t μ)+η t, η t N(0, diag(ση 2 1,ση 2 2 )), q t+1 = ψ 0 + ψ(q t ψ 0 )+σ ρ v t,v t N(0, 1), ρ t = exp(q t) 1 exp(q t )+1. with h 0 = μ,q 0 = ψ 0. In this model, not only volatilities but also correlation coefficientsaretimevarying. Ofcourse,ρ t has to be bounded by 1 and 1 for Σ to be a well-defined correlation matrix. This constraint is achieved by using the Fisher transformation. 8
Model 6 (Heavy-tailed MSV or t-msv): y t = Ω t ² t, ² t t(0, Σ,ν), h t+1 = μ + diag(φ 11,φ 22 )(h t μ)+η t, η t N(0, diag(σ 2 η 1,σ 2 η 2 )), with h 0 = μ. In this model, a heavy-tailed multivariate Student t distribution for the return shock is used and hence extra excess kurtosis is allowed. 9
Model 7 (Additive Factor-MSV or AFactor-MSV): y t = Df t + ² t, ² t N(0, diag(σ 1,σ 2 2)) 2 f t = exp(h t /2)u t,u t N(0, 1), h t+1 = μ + φ(h t μ)+σ η η t,η t N(0, 1), with h 0 =0.Thefirst component in the return equation has a smaller number of factors which capture the information relevant to the pricing of all assets while the second one is idiosyncratic noise which captures the asset specific information. Like the univariate SV model, the AFactor-MSV model allows for excess kurtosis and volatility clustering. Clearly, it also allows for cross dependence in both returns and volatilities. Note that in this model that will be introduced below, h t represents the log-volatility of the common factor, f t. The conditional correlation coefficient between y 1t and y 2t is given by: d exp(h t ) p (exp(ht )+σ 2 1)(d 2 exp(h t )+σ 2 2) = d p (1 + σ 2 1 exp( h t ))(d 2 + σ 2 2 exp( h t )). Unless σ 1 2 = σ 2 2 =0, the correlation coefficientsaretimevaryingbutthedynamicsofthecorrelationsdependonthedynamicsofh t. Moreover, correlation is an increasing function of h t, implying that the higher the volatility of the common factor, the higher the correlation in returns. 10
4 Estimation of SV Models 4.1 Likelihood function The ML method is more difficult to use for the SV models than for the ARCHtype models since the likelihood function is numerically more difficult to evaluate for the SV models. Let θ =(α, λ, σ v ) be the parameters of interest in the basic SV model. We wish to estimate θ from y = {y 1,y 2,,y T }. Denote the vector of log-volatilities by h = {h 1,h 2,,h T }. The likelihood function of the parameter vector θ canbewrittenas Z L(θ y) =pdf(y; θ) = pdf(y, h; θ)dh (4.2) The integral cannot be solved analytically. It is of T dimensional. Its dimension cannot be reduced either. 11
4.2 QMLEwithKalmanFilter The basic SV model can be represented by a linearized version without losing any information, yt = ln(rt 2 )=h t +ln(ε 2 t )= 1.27 + h t + μ t h t = λ + αh t 1 + v t, (4.3) with E(μ t )=0,Var(μ t )=π 2 /2. If we approximate the distribution of μ t by a normal distribution with mean 0 and variance π 2 /2, the linearized SV model is approximated by a linear Gaussian State-Space model. We here follow the standard notations. yt = A 0 x t + H 0 ξ t + w t ξ t+1 = Fξ t + v t+1, (4.4) with A 0 = 1.27 + λ 1 α, x t =1, H 0 =1, ξ t = h t λ 1 α, F = α, Q = σ2 v, R = π 2 /2. 12
Based on the linear Gassuain State-Space approximation, Kalman filter can be applied as, Initialisation: ˆξ1 0 = 0 Σ 1 0 = σ 2 v/(1 α 2 ), (4.5) Sequential updating: ˆξt t = ˆξ t t 1 + Σ t t 1 (Σ t t 1 + π 2 /2) 1 (y t +1.27 λ 1 α ˆξ t t 1 ) Σ t t = Σ t t 1 Σ t t 1 (Σ t t 1 + π 2 /2) 1, (4.6) Σ t t 1 In-sample sequential prediction: ( ˆξt+1 t = αˆξ t t 1 + α(1 + π2 Σ t+1 t = α 2 Σ t t + σ 2 v 2Σ t t 1 ) 1 (y t +1.27 λ 1 α ˆξ t t 1 ), (4.7) ŷt+1 t = 1.27 + λ 1 α + ˆξ t+1 t E[(y t+1 ŷ t+1 t )(y t+1 ŷ t+1 t ) 0 ]=Σ t+1 t + π 2 /2, (4.8) Out-of-sample forecasting: ( ˆξT +h T = α h Ê(ξ T I T )=α h ˆξT T ŷ T +h T = 1.27 + λ 1 α αh ˆξT T, (4.9) Smoothing: ˆξ t T = ˆξ t t + J t [ˆξ t+1 T ˆξ t+1 t ] Σ t+1 T = Σ t t + J t ( Σ t+1 T + Σ t+1 t )Jt 0 J t = Σ t t ασ 1 t+1 t, (4.10) with t = T 1,T 2,, 1. 13
The quasi-likelihood is computed by ln L(α, λ, σv)= 2 1 X log(σt t 1 + π 2 /2) 1 X (y t +1.27 λ ˆξ 1 α t t 1 ) 2. 2 2 Σ t t 1 + π 2 /2 The h-day ahead forecast is computed by (4.9) with the QML estimates plugged in. 14
4.3 Simulated Maximum Likelihood QML estimators are inefficient To do the full ML estimation, we need to calculate the likelihood function, ie, to evaluate the multi-dimensional integral numerically. One way to do this is to usingamontecarlomethod. Here we introduce an class of Monte Carlo methods via importance sampling technique. We rewrite the multi-dimensional integral by Z Z Z pdf(y, h; θ) pdf(y, h; θ) L(θ y) = pdf(y, h; θ)dh = q(h)dh = dq(h) q(h) q(h) (4.11) where q(h) is an importance density and Q(h) is an importance distribution function. The idea of the simulated ML method is to draw sample h (1),...,h (S) from q so that we can approximate L(θ y) by 1 S P S i=1 pdf(y,h (s) ;θ) q(h (s) ). 15
The key of the simulated ML method is to match pdf(y, h; θ) and q(h) as closely as possible while ensuring that it is easy to simulate from q. Todothat,wepropose to base the importance sampler on the Laplace approximation to pdf(y, h; θ). In particular, we choose the mean of q to be the mode of ln pdf(y, h; θ) with respect of h, and the variance of q to be the negative of inverse of the second derivative of ln pdf(y,h; θ) with respect of h evaluated at the mode. h (s) N(h, Ω 1 ) where That is h =argmaxln pdf(y,h; θ) (4.12) h and Ω = 2 ln pdf(y, h ; θ) h h 0 (4.13) 16
4.4 MCMC Bayesian inference is then based on the posterior distribution of the unobservables given the data. In the sequel, we will denote the probability density function of a random variable θ by p(θ). By successive conditioning, the joint prior density is p(σ 2,α,σ 2 v,h 0,h 1,...,h T )=p(σ 2,α,σ 2 v)p(h 0 σ 2 v) Q n t=1 p(h t h t 1,α,σ 2 v). We assume h 0 N(0,σ 2 v/(1 α 2 )), prior independence of the parameters σ 2,α, and σv, 2 and use the standard noninformative priors, i.e. p(σ 2 ) 1, α 1, σ 2 and p(σv 2) 1. p(h σv 2 t h t 1,α,σv 2 ) is defined through the variance equation. The likelihood p(y 1,...,y T σ 2,α,σv,h 2 0,...,h T ) is defined through the return equation and the conditional independence assumption: p(y 1,...,y T σ 2,α,σ 2 v,h 0,...,h T )= TY p(y t h t,σ 2 ). (4.14) Then, by Bayes theorem, the joint posterior distribution of the unobservables given the data is proportional to the prior times likelihood, i.e. t=1 p(σ 2,α,σv,h 2 0,...,h T y 1,...,y T ) p(σ 2 )p(α)p(τ 2 )p(h 0 σv) Q 2 T t=1 p(h t h t 1,α,σ Q v) 2 T t=1 p(y t h t,σ 2 ). (4.15) 17