International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 6 An Improved Version of Kurtosis Measure and Their Application in ICA Md. Shamim Reza 1, Mohammed Nasser and Md. Shahjaman 1 Pabna Science & Technology University, Department of Mathematics, Rajapur, Pabna-6600, Bangladesh. Department of Statistics, Rajshahi University, Department of Statistics, Begum Rokeya University, mshamim_stat@yahoo.com, mnasser.ru@gmail.com, shahjaman_brur@yahoo.com Abstract: Kurtosis plays important role in defining shape characteristics of a probability distribution, and also in extracting as well as sorting independent components. From recent research on various versions of classical kurtosis we see that all the measures substantially underestimate kurtosis parameter and exhibit high variability when underlying population distribution is highly skewed or heavy tailed. This is unwanted for independent component analysis (ICA). In this paper, we propose a bootstrap bias corrected kurtosis estimator and compare performances of proposed bootstrap bias corrected estimator with two empirical bias corrected kurtosis measure that is found best in recent works. We use both simulated and real data and investigate the bias, standard error, MSE of each estimator under a variety of situations and also take help of various plots to judge their performances. We observe that our proposed bootstrap bias corrected kurtosis estimators perform better than the class of classical estimators for non-normal situations of univariate population. We then apply our measure in sorting independent components of both data sets and try to examine the capacity of PCA, ICA and ICA on PCA for finding groups. In both data sets ICA on PCA a new visualization technique shows the maximum discriminating power whereas PCA the least. We recommend using our proposed measure in both extracting and sorting independent components. Keywords: Kurtosis, Monte Carlo Simulation, Bootstrapping, PCA, ICA. 1. Introduction It is typically noted in introductory statistics courses that distributions can be characterized in terms of central tendency, variability, and shape. With respect to shape, virtually every textbook defines and illustrates skewness. On the other hand, another aspect of shape, which is kurtosis, is either not discussed or, worse yet, is often described or illustrated incorrectly ( DeCarlo, 1997 and Joanes et al., 1998). Kurtosis is also useful for IC s ordering (Scholz et al., 00; Scholz and Selbig, 007 etc). In principal component analysis, PC s are ordered by corresponding eigen values. But in independent component analysis, these components have no order. For practical reasons to define a criterion for sorting these components to our interest. One measurement which can match our interest very well, is kurtosis. In a recent work Lihua and Ahmed (008) proposed two unbiased sample measures of kurtosis and compared them with three sample measures of kurtosis adapted by various software packages (Minitab, SAS etc) for data from normal and nonnormal populations. Their proposed second estimator is the best performer in normal situations but in non-normal situations all estimators show unwanted large fluctuations. For this reason they put forward two new empirical bias corrected kurtosis estimator. In order to correct the bias, their empirical formulas are provided only for student-t and chisquared distributions. However, empirical estimates are subject to extra variation which results in inflated MSE. In this article we place a bootstrap bias corrected kurtosis estimator. It is worth mentioning that ICA is meaningful for non-normal situation. For purely Gaussian (Normal) distributed data, no unique independent components can be extracted (Hyvarinen and Oja. 000). In section we define the classical measure of kurtosis estimator that we consider in our study. Section we propose a bootstrap bias-corrected kurtosis estimator. In section we compare two empirical bias corrected kurtosis estimators with our propose bootstrap bias corrected kurtosis estimator and finds the overall best performer. We then apply our estimator in sorting independents components for finding data clustering. The final section gives conclusion.. Kurtosis Pearson (1905) introduced kurtosis as a measure of how flat the top of a symmetric distribution is when compared to a normal distribution of the same variance. Kurtosis can be formally defined as the standardized fourth population moment about the mean E X E X Where E is the expectation operator, is the mean, is the fourth moment about the mean, and is the standard deviation. The normal distribution has a kurtosis of, and is often used so that the reference normal distribution has a kurtosis zero. A sample counterpart to can be obtained by replacing population moments with the sample moments, which gives b X n X X i X i (1)
International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 7 where b is the sample kurtosis, X bar is the sample mean, and n is the number of observations..1 Some Classical Measure of Kurtosis Estimator Let X 1, X,,X n be a random sample of size n, then a commonly used consistent estimator of is given by x xi x () x i The above estimator is not unbiased. Cramer (196) gave the amount of bias of the following results for normal distributions: 6 Bias () n 1 Another frequently used estimator of adopted by SAS is defined as U n 1 n 1 n n 6 It has been proved that is unbiased for normal distributions. We refer to Fisher (199), Joanes and Gill (1998) and others. U () The kurtosis measure adapted by MINITAB is defined by M n 1 n Joanes and Gill (1998) showed that for normal distributions M Bias n 1 (5) 1 (6) n n 1 n 1 Recently developed two Kurtosis estimator proposed by Lihua and Ahmed(008). They are correcting the bias given in () and (6) yields two new estimators as follows: N 1 6 n 1 (7) And N M n 1 1 (8) n n 1 Consequentially, for normal data, unbiased estimators of. N1 and N are both All five estimators are biased for non-normal populations, and bias is inflated in a range of the parameter space, For detailed description, we refer to Lihua and Ahmed (008) article. It seems to be an appealing idea to construct a biascorrected estimator. For student-t and Chi-square distribution, non-normal situations Lihua and Ahmed (008) suggest employing a bias-reduction technique based on the N best performing estimator. They proposed a new biascorrected estimator may be defined as N N 1 N Simulation experiment is conducted to inspect the bias and MSE of the estimators. The result shows that these estimators effectively reduced the bias to a negligible level; however, extremely large variance was introduced due to the quadratic form, resulting in inflated MSE. They also proposed, a simple linear regression model without independent variable N N 1 N for only small degrees of freedom. The variance of this fitted estimator is greater than the original biased estimator; however it is not inflated too much... Limitations of the empirical bias-correction estimators a) The main problem of the above empirical bias correction is that, these formulas are provided only for student-t and chi-squared distributions. But doesn t consider other distribution to correct the bias. b) The performance of this bias-corrected estimator depends on a table as well as specified sample size. c) The empirical bias correction estimators effectively reduced the bias but extremely large variance was introduced.. Propose Bootstrap Bias Corrected Estimator All the estimators substantially underestimate kurtosis parameter when underlying population distribution is highly skewed or heavy tailed. In order to correct the bias, empirical formulas are provided for student-t and chi-squared distributions. However, empirical estimates are subject to extra variation introduced which results in inflated MSE. Perhaps, some re-sampling methods such as bootstrap and Jackknife may be considered to reduce the bias as well as keeping a relatively lower variance. Thus we want to use a popular re-sampling method bootstrapping, to overcome the problem of empirical bias-correction. For correcting the bias using bootstrap, we use second estimator of Lihua and Ahmed N. Because of N estimator performs well for normal as well as non-normal populations in many situations. Finally, our propose bootstrap bias corrected estimator is given by Where N N Bia Bia s boot * t x tf t. tf * s boot EFn n n
International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 8 t B * b. 1 t B x * b.1 Our Bootstrapping Method and Used Estimator In our research we use 5000 bootstrap samples for calculating bias and MSE of sizes n = 0, n = 0, n = 50, each number of replicated 1000 times. Each respective sample take from student-t and chi-squared distribution with d.f and 5. Then we have got the bootstrapped aggregated results and comparing bootstrap bias corrected MSE and empirical bias corrected MSE. We know that kurtosis of student-t and chi-squared distributions are, n n Kur t where n> 1 Kur n The bootstrap aggregated bias calculate as follows Bias 5000 N i1 5000 MSE Var boot Kurtosis N Bias, then performs for any distributions, any sample size but empirical bias-corrected estimator performs only student-t and chisquared distributions for specified sample size.. Results To simulate skewed and heavy tailed data, 5000 samples of sizes 0, 0 and 50 are randomly taken from and studentt distribution with degrees of freedom and 5. Now we compare among our proposed bootstrap bias corrected estimator and two empirical bias-corrected estimators. We find bias, mean square error of bias correction estimators at different non-normal populations. The results are represented in different tables and plots. Table 1.MSE comparison of chi-square distribution Sample Size d.f Bootstrap log(mse) Emperical-1 Emperical- 0.61.7.7 0 5.19.6. 0.5.15.5 0 5.1.09.1 50..91.5 50 5.0.8.10 Figure 1, Bootstrap algorithm for calculating bias corrected estimator. To simulate skewed and heavy tailed data, 5000 samples of sizes 0,0 and 50 are randomly taken from χ and student t distribution with degrees of freedom and 5. Now we compare among our proposed bootstrap bias-corrected estimator and two empirical bias-corrected estimators. Our proposed bootstrap bias-corrected estimator is more advantages than empirical estimators because our estimators Figure. MSE comparison for Chi-square distribution The Table-1 and fig.1 shows that the proposed bootstrap bias corrected measure gives the minimum MSE values for (skewed) distributions of sizes 0, 0 and 50 with df, 5. We found that our proposed estimators give greater discrepancy than first empirical correction but relatively lower difference than second empirical bias-corrected estimator based on MSE criterion. The table shows that the proposed bootstrap bias corrected measure gives the minimum MSE values than first empirical bias-corrected estimator for student-t (heavy tailed) distributions of sizes 0, 0 and 50 with df,5. We also found that second empirical estimator performs well than our
International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 9 estimators for d.f, but results in favor of our estimators when df increases to 5. Sample Size Table.MSE comparison of t-distribution d.f Bootstrap log(mse) Emperical-1 Emperical- 0.0 9.8.9 0 5.18 8.6.9 0. 10.1.9 0 5.1 8.16. 50.1 10.88.80 50 5.11 8.10.18 Oja. 000), therefore, ICA should only be applied to data sets where we can find components that have a non-gaussian distribution. Examples of super-gaussian distributions (highly positive kurtosis) are speech signals, because these are predominantly close to zero. However, for molecular data sub-gaussian distributions (negative kurtosis) are more interesting. Negative kurtosis can indicate a cluster structure or at least a uniformly distributed factor. Thus the components with the most negative kurtosis can give us the most relevant information. Experiment-1 (Simulation Study) In our research, first we generate four known distribution Normal, Chi-square, t and Uniform of size 100 with taken their different mean, mixing this four distribution and finding out which visualization techniques gives better identification of distribution pattern from mixture. 5. Application in ICA Independent component analysis (ICA) is a statistical method used to discover hidden factors(sources or features) from a set of measurements or observed data such that the sources are maximally independent. The ICA algorithms are able to separate the sources according to the distribution of the data. Independent component analysis (ICA) (Hyvarinen et al., 001), and projection pursuit (PP)(Jones and Sibson, 1987), are closely related techniques, which try to look for interesting directions (projections) in the data. ICA assumes a model, x = AS where x is a vector of observed random variables, A is a d d mixing matrix, and S is a vector of independent latent variables. The task then is to find A to recover S. A key assumption is usually that the S have different kurtosises K j, in order to separate the different independent components. In practice ICA usually measures interestingness of a linear combination a T x in terms of the size of its absolute kurtosis or some related measures. Since for a Gaussian random variables the kurtosis is zero, this criterion measures to some extent, non-gaussianity. j Figure. Original pattern of simulated data (a) Normal (b) Chi square (c) t (d) Uniform distribution. 5.1 Role of Kurtosis in ICA In principal component analysis, pc s are ordered by eigen value where first eigen value is first pc, second eigen value second pc and so on. But in independent component analysis, These components have no order. For practical reasons to define a criterion for sorting these components to our interest. One measurement which can match our interest very well, is kurtosis. Kurtosis is a classical measure of non- Gaussianity, and is computationally and theoretically relatively simple. It indicates whether the data are peaked or flat, relative to a Gaussian (normal) distribution. A Gaussian distribution has a kurtosis of zero. Positive kurtosis indicates a peaked distribution (super-gaussian) and negative kurtosis indicates a flat distribution (sub-gaussian). Now mixing this four distribution (Original sources), and if we apply PCA, ICA and ICA on PCA on the mixture data experiment, to investigate what techniques gives better identification. From purely Gaussian distributed data, no unique independent components can be extracted (Hyvarinen and
International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 10 Experiment- (Experiment of Breast cancer data) In breast cancer data, contains 10 variables and 107 observations. When we apply PCA we see that loadings of the first five PC s that explains 8 percent variability of the data set. Now we apply PCA and ICA for original data and ICA apply on 5 pc s and use our estimator to sorting IC s. Table (IC s ordering using kurtosis) Figure.. Mixed Sources of four distribution From the above table the largest negative value of kurtosis is -1.1 which is consider first IC s, second largest second IC s and so on. Since negative kurtosis can indicate a cluster structure or at least a uniformly distributed factor. Thus the components with the most negative kurtosis can give us the most relevant information. Figure.5. Performance of different visualization techniques for chi-square distribution. Fig. exhibits PCA and ICA could not detect the required distribution properly, but ICA on PCA as a new development of visualization technique for our experiment, we obtain that for the last case we get the maximum identification of the chi-square distribution, which is our required result for our experiment. Figure.7. On the left, by applying PCA to the total data, the result is worse than the result of ICA. However, by using PCA for preprocessing before applying ICA, a more strongly discriminating component can be extracted, as shown on the right. Figure.6. Performance of different visualization techniques for t and uniform distribution. Fig.5 shows the identification performance of t and uniform distribution, and we see the both cases PCA fails the proper identification, ICA performs well than PCA but ICA on PCA gives better discriminates of the two distributions. 6. Conclusion In this paper we describe five sample measures of kurtosis estimators and comparing the performances of three(empirical-1, empirical- and propose bootstrap) biascorrected kurtosis estimators. Their performances are investigated through simulation and bootstrapping. we consider χ and student-t distribution with three different sample sizes (0, 50 and 50). The estimators are compared with regard to bias and MSE, the bootstrap bias-corrected
International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1 April, 011 11 estimators, especially non-normal population for small degrees of freedom performs better than the class of two empirical bias-corrected estimators. We recommend using as a measure of kurtosis especially when the degrees of freedom small as well as large and non-normal population. We then apply our measure in sorting independent components in simulated and Breast cancer data, and try to examine the capacity of PCA, ICA and ICA on PCA for finding groups. In both data sets ICA on PCA a new visualization technique shows the maximum discriminating power whereas PCA the least. References [1] Cramer, H., Mathematical Methods of Statistics, Princeton University Press, Princeton, p. 86. 196. [] DeCarlo, L.T. On the meaning and use of kurtosis. Psychological Methods (), 907. 1997. [] Fisher, R.A., Moments and product moments of sampling distributions. Proc. London Math. Soc. Ser. 0, 1998. 199. [] Hyv arinen, A. and Oja, E.: Independent component analysis: Algorithms and applications. Neural Networks. -5(1):11-0. 000. [5] Hyvarinen, A., Karhunen, J. and Oja, E. Independent Component Analysis, John Wiley and Sons, NewYork.001. [6] Jones,M. and Sibson, R. What is projection pursuit? J. of the Royal Statistical Society, Ser. A, 150:1-6. 1987. [7] Joanes, D.N., Gill, C.A., Comparing measures of sample skewness and kurtosis. Statist. 7, 18-189. 1998. [8] Lihua An, S.Ejaz Ahmed. Improving the performance of kurtosis estimator. Computational Statistics and Data Analysis 5, 669-681. 008. [9] Matthias Scholz, Yves Gibon, Mark Stitt and Joachim Selbig, Independent component analysis of starch deficient pgm mutants. Proceedings of the German conference on Bioinformatics. Gesellschaft fur infomark, Bonn, pp.95-10,00. [10] Scholz, M., Gatzek, S., Sterling, A., Fiehn, O., and Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 0, 7-5, 00. [11] Shamim, M. Nasser, An improved version of kurtosis estimator and their application in ICA 1International conference on computer and information Technology, Program book, page-7, 010. [1] Scholz, M., and Selbig, J. Visualization and analysis of molecular data. Methods Mol Biol 58, 87-10, 007.