Machine Learning for Multi-step Ahead Forecasting of Volatility Proxies

Machine Learning for Multi-step Ahead Forecasting of Volatility Proxies Jacopo De Stefani, Ir. - jdestefa@ulb.ac.be Prof. Gianluca Bontempi - gbonte@ulb.ac.be Olivier Caelen, PhD - olivier.caelen@worldline.com Dalila Hattab, PhD - dalila.hattab@equensworldline.com MIDAS 2017 - ECML-PKDD Hotel Aleksandar Palace, Skopje, FYROM Monday 18 th September, 2017

Problem overview First series CAC40 [2012 01 02/2013 11 04] Last 47.255 45 40 35 30 25 50 40 30 Volume (100,000s): 345,721 20 10 0 3 2 Moving Average Convergence Divergence (12,26,9): 1 0 1 MACD: 1.335 Signal: 1.258 2 3 Jan 02 2012 Mar 01 2012 May 02 2012 Jul 02 2012 Sep 03 2012 Nov 01 2012 Jan 02 2013 Mar 01 2013 May 02 2013 Jul 01 2013 Sep 02 2013 Nov 01 2013 2/32

What is volatility? Definition Volatility is a statistical measure of the dispersion of returns for a given security or market index. 1 High volatility Low volatility 0.5 rt 0 0.5 1 0 20 40 60 80 100 t [days] 3/32

A closer look on data - Volatility proxies Calendar Day 0 Calendar Day 1 10.2 P h 0 P h 1 Pt 10 P c 0 Pre-opening P c 1 9.8 P o 0 P l 0 1 f f 1 f P o 1 P l 1 P o t 0 0.2 0.4 0.6 0.8 1 1.2 1.4 t [days] P h t P l t P c t Volatility proxy σ P t 4/32

Models for volatility Volatility models Past volatility Averagebased HA MA ES EWMA STES Simple Regression SR-AR SR-TAR SR-ARMA Random Walk ARCH Symmetric ARCH (q) Asymmetric E GJR- Q ST- RS- Extended Component- R Machine Learning Univariate NN k-nn SVR Multivariate 5/32

Models for volatility SVR k-nn NN Univariate Machine Learning Multivariate ES MA Future Research Established Research STES EWMA HA R Component- Averagebased Volatility models Extended ST- RS- SR-TAR SR-AR Simple Regression Past volatility ARCH Asymmetric Q GJR- SR-ARMA E Random Walk Symmetric ARCH (q) 5/32

Multistep ahead TS forecasting - Taieb [2014] Definition Given a univariate time series {y 1,, y T } comprising T observations, forecast the next H observations {y T +1,, y T +H } where H is the forecast horizon. Hypotheses: Autoregressive model y t = m(y t 1,, y t d ) + ε t with lag order (embedding) d ε is a stochastic iid model with µ ε = 0 and σ 2 ε = σ 2 6/32

Multistep ahead forecasting for volatility State-of-the-art NAR [σp t d σ P t 1 ] m(σ P ) [ˆσ P t ˆσ P t+h ] 1 Input 1 Output 7/32

Multistep ahead forecasting for volatility State-of-the-art NAR [σp t d σ P t 1 ] Proposed model NARX [σt d P [σt d X σt 1 P ] σt 1 X ] m(σ P ) m(σ P, σ X ) [ˆσ P t ˆσ P t+h ] [ˆσ P t ˆσ P t+h ] 1 Input 1 Output 2 inputs 1 output 7/32

Multistep ahead forecasting for volatility State-of-the-art NAR [σp t d σ P t 1 ] Proposed model NARX [σt d P [σt d X σt 1 P ] σt 1 X ] Future work [σt d P [ [σ X M t d σt 1 P ] ] σ X M t 1 ] m(σ P ) m(σ P, σ X ) m(σ P,, σ X M) [ˆσ P t ˆσ P t+h ] 1 Input 1 Output [ˆσ P t ˆσ P t+h ] 2 inputs 1 output [ˆσ P t [ [ˆσ X M t ˆσ P t+h ] ] ˆσ X M t+h ] M + 1 inputs M + 1 outputs 7/32

Multistep ahead forecasting for volatility Direct method A single model f h for each horizon h. Forecast at h step is made using h th model. Dataset examples (d = 3, h = 3): Direct NAR Direct NARX x y x y σ P 3 σ P 2 σ P 1 σ P 5 σ4 P σ3 P σ2 P σ6 P............ σ P T 5 σ P T 6 σ P T 7 σ P T 2 σ P 3 σ P 2 σ P 1 σ X 3 σ X 2 σ X 1 σ P 5 σ4 P σ3 P σ2 P σ4 X σ3 X σ2 X σ6 P..................... σ P T 5 σ P T 6 σ P T 7 σ X T 5 σ X T 6 σ X T 7 σ P T 2 8/32

Experimental setup [σt d P [σt d X m(σ P, σ X ) σt 1 P ] σt 1 X ] Data: Volatility proxies σ X, σ P Price based from CAC40: σ i family - Garman and Klass [1980] Return based (1,1) model - Hansen and Lunde [2005] Sample standard deviation [ˆσ P t ˆσ P t+h ] 2 TS Input 1 TS Output Models: Feedforward Neural Networks (NAR,NARX) k-nearest Neighbours (NAR,NARX) Support Vector Regression (NAR,NARX) Naive (w/o σ X ) (1,1) (w/o σ X ) Average (w/o σ X ) 9/32

????????????? Correlation meta-analysis (cf. Field [2001]) Volume σ 1 σ 6 σ 4 σ 5 σ 2 σ 3 r t σ 0 250 σ SD 100 σ SD 50 σ SD σ G Volume σ 1 σ 6 σ 4 σ 5 σ 2 σ 3 r t σ 0 250 σ SD 100 σ SD 50 σ SD σ G 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 40 time series (CAC40) Time range: 05-01-2009 to 22-10-2014 1489 OHLC samples per TS Hierarchical clustering using Ward Jr [1963] All correlations are statistically significant 10/32

NARX forecaster - Results ANN 11/32

NARX forecaster - Results ANN 12/32

NARX forecaster - Results KNN 13/32

NARX forecaster - Results KNN 14/32

NARX forecaster - Results SVR 15/32

NARX forecaster - Results SVR 16/32

Conclusions Correlation clustering among proxies belonging to the same family, i.e. σ i t and σ SD,n t. All ML methods outperform the reference method, both in the single input and the multiple input configuration. Only the addition of an external regressor, and for h > 8 bring a statistically significant improvement (paired t-test, pv=0.05). No model appear to clearly outperform all the others on every horizons, but generally SVR performs better than ANN and k-nn. 17/32

Thank you for your attention! Any questions/comments? jacopo.de.stefani@ulb.ac.be Find the paper at: 18/32

Bibliography I References Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of econometrics, 31(3):307 327, 1986. Andy P Field. Meta-analysis of correlation coefficients: a monte carlo comparison of fixed-and random-effects methods. Psychological methods, 6(2):161, 2001. Mark B Garman and Michael J Klass. On the estimation of security price volatilities from historical data. Journal of business, pages 67 78, 1980. 19/32

Bibliography II Peter R Hansen and Asger Lunde. A forecast comparison of volatility models: does anything beat a garch (1, 1)? Journal of applied econometrics, 20(7):873 889, 2005. Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal of forecasting, 22(4): 679 688, 2006. Souhaib Ben Taieb. Machine learning strategies for multi-step-ahead time series forecasting. PhD thesis, Ph. D. Thesis, 2014. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58 (301):236 244, 1963. 20/32

Appendix 21/32

System overview Raw OHLC data Missing values imputation User choice Data preprocessing Imputed OHLC data Model choice {ANN, KNN} Proxy generation σt, i σt SD, σt G Model identification Correlation analysis Evaluation choice User choice {RO, RW} m, θ Forecaster 22/32

Correlation analysis - Methodology [ σ i(1), σ SD(1), σ G(1)] corr( ) corr(σ (1) ) [ σ i(j), σ SD(j), σ G(j)] corr( ) corr(σ (j) ) Metaanalysis toolkit corr(σ AGG ) [ σ i(n), σ SD(N), σ G(N)] corr( ) corr(σ (N) ) 40 Time series (CAC40) Time range: 05-01-2009 to 22-10-2014 1489 OHLC samples per TS 23/32

NARX forecaster - Methodology Disturbances σ J p Original DGP σ J f d e σ X p Model m (θ, σ J p, σ X p ) ˆσ J f m (, σ J p, σ X p ) θ {ANN,KNN} {RO, RW} Structural Parametric identification identification Model identification 24/32

Volatility proxies (1) - Garman and Klass [1980] Closing prices ˆσ 0(t) = Opening/Closing prices [ ( )] 2 ˆσ 1(t) = 1 2f P (o) t+1 ln + P (c) t }{{} Nightly volatility [ ln ( )] 2 P (c) t+1 = r P (c) t 2 (1) t [ ( )] (c) 2 1 P 2(1 f) t ln P (o) t }{{} Intraday volatility (2) OHLC prices ˆσ 3(t) = a f [ ˆσ 2(t) = 1 2 ln 4 ln [ ln ( P (o) t+1 P (c) t )] 2 } {{ } Nightly volatility ( )] (h) 2 P t (3) P (l) t + 1 a 1 f ˆσ2(t) }{{} Intraday volatility (4) 25/32

Volatility proxies (2) - Garman and Klass [1980] OHLC prices u = ln ( ) (h) P t P (o) t d = ln ( ) (l) P t P (o) t c = ln ( ) (c) P t P (o) t (5) ˆσ 4(t) = 0.511(u d) 2 0.019[c(u + d) 2ud] 0.383c 2 (6) ˆσ 5(t) = 0.511(u d) 2 (2 ln 2 1)c 2 (7) ˆσ 6(t) = a f log ( P (o) t+1 P (c) t ) 2 } {{ } Nightly volatility + 1 a 1 f ˆσ4(t) }{{} Intraday volatility (8) 26/32

Volatility proxies (3) (1,1) model - Hansen and Lunde [2005] p q σt G = ω + β j(σt j G )2 + α iε 2 t i j=1 where ε t i N (0, 1), with the coefficients ω, α i, β j fitted according to i=1 Bollerslev [1986]. Sample standard deviation σ SD,n t = 1 n 1 (r t i r) n 1 2 i=0 where r t = ln ( ) P (c) t P (c) t 1 r n = 1 n t j=t n r j 27/32

Hyndman and Koehler [2006] - Error measures RelX MdRAE Percent- Better Relative Measures MRAE Relative Errors GMRAE MASE smdape Error measures MAPE MdAE MdAPE Scale independant smape Scale dependant MAE RMdSPE RMSPE MSE RMSE 28/32

Hyndman and Koehler [2006] - Scale dependant Scale dependant MdAE e t = y t ŷ t MSE : 1 nt=0 n (y t ŷ t ) 2 RMSE : 1 n nt=0 (y t ŷ t ) 2 MAE MAE : 1 n nt=0 y t ŷ t MSE RMSE MdAE : Md t {1 n} ( y t ŷ t ) 29/32

Hyndman and Koehler [2006] - Scale independant MAPE : 1 n nt=0 100 yt ŷt y t MAPE smdape MdAPE : Md t {1 n} ( 100 yt ŷt y t ) Scale independant smape RMSPE : 1 nt=0 n (100 yt ŷt y t ) 2 MdAPE RMSPE RMdSPE RMdSPE : Md t {1 n} ((100 yt ŷt y t ) 2 ) smape : 1 n nt=0 200 yt ŷt y t+ŷ t smdape : Md t {1 n} (200 yt ŷt y t+ŷ t ) 30/32

Hyndman and Koehler [2006] - Relative errors MdRAE r t = e t e t MRAE MRAE : 1 n nt=0 r t GMRAE MdRAE : Md t {1 n} ( r t ) Relative Errors GMRAE : n t = 0 n r t 1 n MASE MASE :( Tt=1 1 T e t 1 T T 1 i=2 Y i Y i 1 ) 31/32

Hyndman and Koehler [2006] - Relative measures Percent- Better RelX Relative Measures RelX : X X bench Percent Better : P B(X) = 100 1 n forecasts I(X < X b) where X: Error measure of the analyzed method X b : Error measure of the benchmark 32/32