Bayesian Estimation of the Markov-Switching GARCH(1,1) Model with Student-t Innovations

Bayesian Estimation of the Markov-Switching GARCH(1,1) Model with Student-t Innovations Department of Quantitative Economics, Switzerland david.ardia@unifr.ch R/Rmetrics User and Developer Workshop, Meielisalp, July 2007

MOTIVATION Why using MS-GARCH models? High persistence with GARCH models (structural breaks); Markov-switching ARCH [Hamilton and Susmel, 1994]; Markov-switching GARCH [Gray, 1996; Klaassen, 2002; Haas et al., 2004]; Very flexible models. Better volatility forecasts.

MOTIVATIONS Why using the Bayesian approach? ML (EM) estimation is difficult (local max); MCMC methods can explore the full posterior; Model discrimination is possible (w.r.t. the number of states); Probabilistic statements.

MOTIVATIONS Why using R? Quick and easy coding; C or Fortran implementation to speed up calculations; Use of the coda (or boa) library to check the MCMC output; Nice plots (legend, symbols, etc...);...

OUR CONTRIBUTION MCMC scheme for MS-GARCH(1, 1) model of Haas et al. [2004] with Student-t innovations: Model parameters are updated by block; The state variables are updated in a multi-move manner; The degrees of freedom parameter is generated via an efficient rejection technique. Application to real data set. In-sample and out-sample performance analysis.

OUTLINE 1 MS-GARCH(1, 1) model 2 Bayesian estimation 3 Application 4 Conclusion

MS-GARCH(1, 1) MODEL Conditional variance process Haas et al. [2004] hypothesize K separate GARCH(1, 1) processes for the conditional variance: h k t. = α k 0 + α k 1y 2 t 1 + β k h k t 1 for k = 1,..., K. This formulation has practical and conceptual advantages: Allows to generate the states in a multi-move manner; Interpretation of the variance dynamics in each regime; Theoretical results on single-regime GARCH(1, 1) available.

MS-GARCH(1, 1) MODEL Model specification The MS-GARCH(1, 1) model with Student-t innovations may be written as follows: y t = ε t (ϱh st t ) 1/2 ε t iid S(0, 1, ν) ϱ. = ν 2 ν for t = 1,..., T where the latent process {s t } with state space {1,..., K} is assumed to be a stationary, irreducible Markov process with transition matrix P.

MS-GARCH(1, 1) MODEL Model specification (cont.) Equivalent specification (via data augmentation) to perform the Bayesian estimation in a convenient manner: y t = ε t (ω t ϱh st t ) 1/2 ε t iid N (0, 1) ω t iid IG ( ν 2, ν 2 ). for t = 1,..., T

BAYESIAN ESTIMATION Simulating from the joint posterior Our MCMC sampler can be decomposed as follows: (s 1 s T ) using FFBS P using Gibbs (α0 1 α2 0 αk 0 α1 1 α2 1 αk 1 ) using M-H (β 1 β K ) using M-H (ω 1 ω T ) using Gibbs ν using efficient rejection

BAYESIAN ESTIMATION Label switching Likelihood function and the joint prior are invariant to relabeling the states; The joint posterior distribution will also be invariant; Multimodality (K! modes); Need an identification constraint. We use the permutation sampler of Frühwirth-Schnatter [2001].

APPLICATION Data set Demeaned daily log-returns of the SMI; Total of 3 800 observations; The first 2 500 log-returns are used for the estimation; The remaining data are used in a forecasting performance analysis.

APPLICATION Estimation Single-regime and two-state Markov-switching models; Asymmetric GJR(1, 1) specification of Glosten et al. [1993]: h k t ( ). = α0 k + α1i k {yt 1 0} + α2i k {yt 1 <0} y 2 t 1 + β k h k t 1; Joint posterior sample of size 10 000.

APPLICATION Posterior results for the single-regime model High persistence of the conditional variance process; Presence of the leverage effect: P(α 2 > α 1 y) = 0.999; Conditional leptokurtosis; Unconditional variance exists. Posterior mean 1.179 [1.173,1.189]. Empirical variance 1.136.

APPLICATION Posterior results for the Markov-switching model Presence of leverage effect in both states; Conditional leptokurtosis but posterior mean and median slightly larger than for the single-regime model; Infrequent mixing between states; Posterior mean of the unconditional variance is 0.56 [0.557,0.563] in state 1 and 2.00 [1.992,2.012] in state 2; Posterior mean of the unconditional variance is 1.134 [1.128,1.139]. Empirical variance 1.136.

APPLICATION Misspecification tests Probability integral transforms [see Diebold et al., 1998]; Test of autocorrelation and autocorrelation of squares; Joint test for zero mean, unit variance, zero skewness, and the absence of excess kurtosis; No evidence of misspecification at the 5% significance level for both models.

APPLICATION Deviance information criterion Alternative to AIC and BIC, as well as LR which are not consistent in a Markov-switching context; The DIC consists of two terms: a component that measures the goodness-of-fit and a penalty term for increasing model complexity (effective number of parameters); Smallest DIC is preferred. Model DIC D p D GJR 6770.4 6765.6 4.76 [6769.9,6770.8] [6765.3,6765.8] [4.49,4.93] MS-GJR 6713.3 6704.4 8.84 [6712.6,6713.8] [6793.9,6794.9] [8.49,9.04] [ ]: 95% confidence interval obtained by bootstrap.

APPLICATION Model likelihood Estimate the model likelihood for the two models; Bridge sampling of Meng and Wong [1996]. Model ln p(y) GJR -3408.04 (2.644) MS-GJR -3389.66 (3.191) ln p(y): bridge sampling; ( ) numerical standard error ( 100).

APPLICATION Forecasting performance analysis We forecast the one-day ahead VaR (backtest); Quantile of interest that corresponds to the probability associated to a certain extreme loss; Compute the predictive VaR by simulation; Test the joint hypothesis of independence and unconditional coverage of the VaR [Christoffersen, 1998].

APPLICATION Forecasting performance analysis (cont.) We consider the GJR and MS-GJR models; Also a rolling GJR model: 750 log-returns used to estimate the model; Next 50 log-returns used as a forecasting window; The methodology fulfills the recommendations of the BIS in the use of internal models. Test the models over the 1 300 out-of-sample observations.

APPLICATION Forecasting performance analysis (cont.) MS-GJR and rolling GJR outperform the static GJR model; MS-GJR and rolling GJR perform equally well; However, the MS-GJR model has two advantages: Can anticipate structural breaks in the conditional variance process through the filtering probabilities; MS-GJR needs only to be estimated once. Rolling GJR is merely and ad-hoc approach.

CONCLUSION MS-GARCH more flexible than GARCH; Bayesian estimation has many advantages; We provide a new block updating scheme for performing the Bayesian estimation for the MS-GARCH model of Haas et al. [2004] with Student-t innovations; Better in-sample and out-sample performance than single regime GARCH.

References References REFERENCES Christoffersen PF (1998). Evaluating Interval Forecasts. International Economic Review, 39(4), 841 862. Symposium on Forecasting and Empirical Methods in Macroeconomics and Finance. Diebold FX, Gunther TA, Tsay AS (1998). Evaluating Density Forecasts with Applications to Financial Risk Management. International Economic Review, 39(4), 863 883. Dueker MJ (1997). Markov Switching in GARCH Processes and Mean-Reverting Stock-Market Volatility. Journal of Business and Economic Statistics, 15(1), 26 34. Frühwirth-Schnatter S (2001). Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models. Journal of the American Statistical Association, 96(453), 194 209. Glosten LR, Jaganathan R, Runkle DE (1993). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance, 48(5), 1779 1801. Gray SF (1996). Modeling the Conditional Distribution of Interest Rates as a Regime-Switching Process. Journal of Financial Economics, 42(1), 27 62. Haas M, Mittnik S, Paolella MS (2004). A New Approach to Markov-Switching GARCH Models. Journal of Financial Econometrics, 2(4), 493 530. Hamilton JD, Susmel R (1994). Autoregressive Conditional Heteroskedasticity and Changes in Regime. Journal of Econometrics, 64(1 2), 307 333. Klaassen F (2002). Improving GARCH Volatility Forecasts with Regime-Switching GARCH. Empirical Economics, 27(2), 363 394. Meng XL, Wong WH (1996). Simulating Ratios of Normalizing Constants via a Simple Identity: a Theoretical Exploration. Statistica Sinica, 6, 831 860.

References References