A Empirical Study of the Behaviour of the Sample Kurtosis i Samples from Symmetric Stable Distributios J. Marti va Zyl Departmet of Actuarial Sciece ad Mathematical Statistics, Uiversity of the Free State, Bloemfotei, South Africa e-mail: wwjvz@ufs.ac.za Kurtosis is see as a measure of the discrepacy betwee the observed data ad a Gaussia distributio ad is defied whe the 4 th momet is fiite. I this work a empirical study is coducted to ivestigate the behaviour of the sample estimate of kurtosis with respect to sample size ad the tail idex whe applied to heavy-tailed data where the 4 th momet does ot exist. The study will focus o samples from the symmetric stable distributios. It was foud that the expected value of excess kurtosis divided by the sample size is fiite for ay value of the tail idex ad the sample estimate of kurtosis icreases as a liear fuctio of sample size ad it is approximately equal to (1 α / ). Keywords: kurtosis, stable distributio, tail idex Mathematics Subject Classificatio: 6F1; 6P05 1. Itroductio For heavy-tailed distributios, the theoretical kurtosis is defied ad fiite whe the 4 th momet is fiite or i terms of the tail idex, α, where α > 4. I practice data is observed with a ukow distributio ad kurtosis is used to measure how leptokurtic the sample is. I fiacial data it is ofte observed that α < ad the estimated kurtosis is used to get a idicatio of how leptokurtic the data is. Estimates of kurtosis i asset 1
returs rage from 4 to 50 (Egle ad Patto, 001). Heavy-tailed distributios with α < 4 is fitted to log-returs, see for example Xu,Wu ad Xiao (011). I this work a empirical study is coducted to check the behaviour ad usefuless of sample kurtosis for symmetric stable distributio with α, ad specifically where α 1which is mostly foud whe applied to real data. The mai result foud was that the expected value of the sample kurtosis icreases as a liear fuctio of the sample size ad it was foud that that for symmetric samples from stable distributios, the approximate sample estimate of kurtosis for a sample size ad tail idex α is (1 α / ). More tha oe method was suggested to estimate kurtosis but i this work the Pearso kurtosis as discussed by Fiori ad Zega (009) which is used i fiace ad risk aalysis is used. Kurtosis is defied as β ( x) = E( X µ ) / ( E( X µ )) 4 4 = µ σ, (1) 4 / with µ 4 the fourth cetral momet ad σ the variace of X. β ( ) x is locatio-scale ivariat ad all data simulated will be for a locatio parameter µ = 0 ad scale parameter σ = 1. For a regular distributio, µ µ 4, except whe the distributio is oly cocetrated at two poits (Kedall, Stuart ad Ord (1987, p.107). The excess kurtosis is
γ =, () β 3 which is also equal to γ = κ / κ whe expressed i terms of cumulats. For the 4 ormal distributio, the excess kurtosis is zero. The sample kurtosis is deoted by b ad the excess kurtosis by g = b 3. Algebraic iequalities which does ot deped o distributioal properties were derived for the sample kurtosis ad it was show that for a sample of size, x,..., 1 x, the sample estimate of kurtosis is less tha the sample size (Johso, Lowe (1979), Cox (010)), thus 1 1 b = x x x x 4 ( j ) /( ( j ) ) j= 1 j= 1 (3) = 4 ( j ) / ( ( j ) ) j= 1 j= 1 x x x x c( x,..., x ) = 1. This iequality shows that the fuctio c( x1,..., x ) 1 ad the expected value of E( c( x,..., x )) = E( b / ) 1, would be fiite for all distributios, which meas that 1 divergece of the sample kurtosis is because of a icrease i the sample size. The behaviour of the sample kurtosis will be like that of a ratio, ad ot by cosiderig the umerator ad deomiator separately as is doe i the theoretical defiitio. Usig simulatio studies it was checked if c( x1,..., x ) ca be approximated as a fuctio of 3
α. It ca also be see ad was cofirmed usig simulatio that the variace of the sample kurtosis is of the form var( c( x1,..., x )). This work will focus o symmetrical stable distributed data. Properties ad applicatios of it ca for example be foud i the work of Cizek, Härdle ad Wero, eds. (011). The characteristic fuctio of the family of stable distributios is deoted by φ ( t) where α α log φ( t) = σ t {1 iβsig( t) ta( πα / )} + iµ t, α 1, ad log φ( t) = σ t {1 + iβsig( t)( / π )log( t )} + iµ t, α = 1. The parameters are the tail idex, α (0, ], a scale parameter σ > 0, coefficiet of skewess β [ 1,1] ad locatio parameter µ. The symmetric case with β = 0 will be cosidered i this work. I the followig figure m = 500 radom samples were simulated, α 's were radomly chose o the iterval [1,] ad m = 500 radom sample sizes betwee =00 ad =1500 ad the estimated excess kurtosis plotted. The focus of this study is applicatios i fiace ad these sample sizes cover 1 to 5 years whe workig with daily data. To get a idea of the relatioship ivolved, multiple regressio was performed ad it was foud that the relatioship is approximately g α /. There is little variatio i the regressio coefficiets whe repeatig the simulatio ad this relatioship will be ivestigated further usig simulatio studies. Assumig that g (, α ) = (1 α / ), ad by otig that 4
g (, ) α = 1 α / g ad (, α ) = /, α it ca be see that the sample kurtosis is very sesitive with respect chages i α ad a slowly icreasig fuctio of the sample size. 100 1000 excess kurtosis 800 600 400 00 0 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 α Figure 1. A scatterplot of 500 sample estimates of the excess kurtosis for radom α [1,] ad the sample size betwee 00 ad 1500. Samples from a symmetric stable distributio. The behaviour of sample skewess was checked too usig the simulated samples ad it was foud that the expected value of the sample skewess is zero for symmetric data 5
but the variace is a icreasig liear fuctio of the sample size ad icreases for smaller α. This is ot the focus of the work, but a skewess estimate i a large sample might ot be a sigificat idicatio of skewess if the large variace is take ito accout. A measure to order differet symmetric distributios accordig to the term used heavytailess was derived by va Zwet (1964), Groeeveld ad Meede (1984). It was prove that if a distributio is more heavy-tailed tha aother accordig to this measure, the kurtosis will also be larger for the heavier-tailed distributio.. Simulatio study Say a sample of size is available ad = k1 +... + kr. The sample kurtosis will be calculated at icreasig sample sizes, say k1, k1 + k, k1 + k + k3,...,, ad for differet values of α. The followig plot shows the rate of icrease i the expected value of the estimated excess kurtosis agaist the umber of observatios used to calculate it. The slope α = 1for is b = 0.4917, for α = 1.5 it is b = 0.447 ad approximately zero if α =. The average at each sample size was calculated usig m = 5000 samples. This relatioship ca be cosidered as a approximatio. A similar plot where the data is from a studet t-distributio with degrees of freedom ν = 3,4 shows that the relatioship for the t-distributio is ot liear. 6
50 00 mea estimated excess kurtosis 150 100 50 0-50 50 100 150 00 50 300 350 400 450 500 sample size Figure. Plot of average of excess sample kurtosis usig simulated samples from a symmetric stable distributio. The averages were calculated usig 5000 samples, calculated usig sample sizes = 50,100,...,500. The solid lie is for α = 1, dashed lie α = 1.5 ad dash-dot lie α =. I the followig figure the cumulative calculated excess kurtosis is show for simulated data from a t-distributio with ν = 3,4,5 degrees of freedom. It ca be see that for small degrees of freedom the relatioship betwee kurtosis ad sample size is ot liear. The liear tred icrease i kurtosis with respect to sample size for sample from a stable distributio ca thus be a useful property. It may ot be uique but if observed i a practical problem, it meas that a possible cadidate to fit might be a stable distributio. 7
0 18 16 mea estimated excess kurtosis 14 1 10 8 6 4 0 50 100 150 00 50 300 350 400 450 500 sample size Figure 3. Plot of average of excess sample kurtosis usig simulated samples from a t- distributio. The averages were calculated usig 5000 samples, calculated usig sample sizes = 50,100,...,500. The solid lie is for ν = 3, dashed lie ν = 4 ad dash-dot lie ν = 5 degrees of freedom. For a give value of α, deote by b the slope of icrease with respect to sample size. If it is assumed that the kurtosis is zero for α =, regressio through the origi ca be performed to fid the relatioship betwee the slope ( b ) of icrease with respect to sample size for a give α ad chages i the tail idex. As the sample size icreases the estimated slope is closer to exactly, leadig to the approximate relatioship, b 1 α /. If this is applied ad takig the liear relatioship betwee sample size ad sample kurtosis ito accout, oe fids that g (1 α / ). 8
Figure 3 below is based o the average of 5000 slopes calculated at each α = 1,1.1,..., ad fixed sample size = 50. 0.5 0.4 estimated slope 0.3 0. 0.1 0 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 - α Figure 4. The relatioship betwee the estimated slope of icrease of kurtosis with respect to sample size ad α. Each poit calculated as the average of 5000 estimated value ad = 50. The expected value of the sample excess kurtosis icreases as a liear fuctio of the umber of observatios used to calculate the sample kurtosis ad the approximate expected excess kurtosis i large samples is thus E( g ) (1 α / ). (5) 9
I the above simulatios a fixed sample size was used. To cofirm the results it will be checked by usig radom sample sizes. The cosistecy of the orderig of kurtosis with tail idex will also be cosidered. As a example 5000 samples with radom sample sizes betwee =00 ad =1500 were geerated from a stable distributio with idex α = 1.5, estimated excess kurtosis divided by the sample size ad the sample mea was 0.136 compared to 1 α / = 0.150. Similarly 5000 samples with radom sample sizes ad α = 1.75 were geerated. The sample mea for α = 1.75 is 0.3668 compared to 1 α / = 0.3750. This cofirms the approximate relatioship, E( g ) (1 α / ). Comparisos were made i pairs betwee the sample kurtosis of the two samples. This resulted i approximately 8% correct larger values of the excess kurtosis whe the data is more heavy-tailed. If sample kurtosis was divided by the sample size the percetage icrease by about %. A few such examples were simulated ad whe comparig betwee a ormal samples ( α = ) ad samples with α <, the percetage correct orderig of the tail idex with respect to kurtosis is very high ad ofte as high as 100%. The coclusio ca be made that for symmetric stable distributios, kurtosis is a effective measure to compare the tail-heaviess of samples from two distributios with differet parameters. The variatio of the sample estimate is a fuctio of α as see i figure 5 ad the variace for poits used is proportioal to for all values of α, except whe α =. 10
0.09 0.08 0.07 0.06 Variace/(*)) 0.05 0.04 0.03 0.0 0.01 0 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 α Figure 5. Estimated variace of sample kurtosis divided by of sample of size = 500 as a fuctio of α, based o 5000 simulated samples for each α. 3. A applicatio to log-returs The chage i kurtosis with respect to the umber of observatios used to calculate the sample kurtosis was ivestigated whe applied to log-returs of the New York stock exchage (NYSE). The daily closig values of 5 years startig i May 013 to May 018 were used. Log-returs are approximately symmetrically distributed with sample mea zero ad the stable distributio is cosidered as a possible distributio for logreturs. Log-returs are also approximately idepedetly distributed. The idex is show i figure 6. There is a iitial period, a major correctio ad the period after the correctio. 11
11500 11000 10500 NY Stock Exchage Idex 10000 9500 9000 8500 8000 7500 7000 6500 0 00 400 600 800 1000 100 1400 Figure 6. Idex of the NY stock exchage, 5 years daily data. The sample excess kurtosis of the log-returs usig icreasig sample sizes is plotted i figure 7. It ca be see that the distributio of the log-returs seems to chage ad the stay the same for a period if oe cosiders a chage i slope as a idicatio of a chage i the distributio. 1
4 3.5 estimated excess kurtosis, log-returs 3.5 1.5 1 0.5 0 0 00 400 600 800 1000 100 1400 Figure 7. Excess kurtosis of the log-returs calculated as a fuctio of the umber of poits used to calculate the sample kurtosis. Kurtosis is very sesitive with respect to chages i the idex eve though the kurtosis is calculated usig log-returs ad the differece i behaviour of the sample kurtosis over time ad before ad after the correctio is very clear. The program Stableregkw developed for Matlab by Borak, Misiorek ad Wero for the book of Cizek, Härdle ad Wero, eds. (011).based o the Kogo-Williams (Kogo, Williams (1998)) estimatio method was applied to two series of observatios 100 400 ad 800 1100 to see if there was a chage i the tail-idex as idicated by the chage i sample kurtosis. The estimated tail idex for the first period is ˆ α = 1.8554 ad ˆ α = 1.7165 for the secod period. The estimated parameters whe all 157 log-returs are used is ˆ α = 1.753, ˆ β =0.1184, σ = 0.0044. 13
This is cosistet with the chage i kurtosis, showig that the secod period ca be more volatile ad heavy-tailed. This is a period durig ad after a electio i the USA. 4. Coclusios There is a relatioship betwee kurtosis ad the tail-idex for samples from the stable distributios. For a sample of size, the sample kurtosis ca be cosidered as time the ratio of two polyomials which both are of degree 4 ad the expected value of the ratio is fiite, eve if expected value of the umerator or deomiator does ot exist. This property makes kurtosis useful i heavy-tailed data if the proportioality to is take ito accout. Thus for α > 0, 4 lim ( x j x) / ( ( x j x) ) 1 α / j = 1 j = 1. This property ca be used to compare the tail-heaviess by usig kurtosis of two samples from a stable distributio. The liear relatioship betwee the icrease as more poits are used to calculate kurtosis ca be used as a property to exclude or iclude a stable distributio as a possible distributio which ca be fitted to for example log-returs. For Garch models the 4 th momet should be fiite whe fitted to log-returs. By plottig the estimated kurtosis as a fuctio of a icreasig umber of observatios a icrease i sample kurtosis might be a idicatio that the 4 th momet is ot fiite. 14
Usig bootstrap methods to estimate a variace, the relatioship g / 1 α / ca be used i large samples to test hypotheses cocerig α ad especially to test if α <. Refereces Cox, N.J. (010). Speakig Stata: The Limits of sample skewess ad kurtosis. The Stata Joural, 3. 48 495. Čižek, P., Härdle, W.G., Wero, R. (011). Statistical Tools for Fiace ad Isurace. Spriger, Heidelberg. Egle, R.F., Patto, A.J. (001). What good is a volatility model, Quatitative Fiace, 1, 37 45. Fiori, A.M., Zega, M. (009). Karl Pearso ad the Origi of Kurtosis. Iteratioal Statistical Review, 77, 40-50. Groeeveld, R.A., Meede, G. (1984). Measurig Skewess ad Kurtosis. J. of the Royal Society, Series D, 33 (4), 391-399. Johso, M.E., Lowe, V.W. (1979). Bouds o the Sample Skewess ad Kurtosis. Techometrics, 1, 377-378. M. Kedall, A. Stuart ad J.K. Ord, Kedall s Advaced Theory of Statistics. Volume I. Charles Griffi ad Compay, Lodo, 1987. S.M.Kogo, D.B.Williams (1998) "Characteristic Fuctio Based Estimatio of Stable Distributio Parameters", i "A Practical Guide to Heavy Tails: Statistical Techiques ad Applicatios", R.J.Adler, R.E.Feldma, M.Taqqu eds., Birkhauser, Bosto, 311-335. 15
Va Zwet, W.R. (1964). Covex Trasformatios of Radom Variables. Math. Cetrum, Amsterdam. Xu, W., Wu, C., Dog, W. (011). Modelig Chiese stock returs with stable distributios. Mathematical ad Computer Modellig, 54, 610 617. 16