USE OF PROC IML TO CALCULATE L-MOMENTS FOR THE UNIVARIATE DISTRIBUTIONAL SHAPE PARAMETERS SKEWNESS AND KURTOSIS
|
|
- Suzanna Floyd
- 6 years ago
- Views:
Transcription
1 USE OF PROC IML TO CALCULATE L-MOMENTS FOR THE UNIVARIATE DISTRIBUTIONAL SHAPE PARAMETERS SKEWNESS AND KURTOSIS Michael A. Walega Covance, Princeton, New Jersey Introduction Exploratory data analysis statistics, such as those generated by the SAS procedure PROC UNIVARIATE (1990), are useful tools to characterize the underlying distribution of data prior to more rigorous statistical analyses. Assessment of the distributional shape of data is usually accomplished by careful examination of the values of the third and fourth central moments, skewness and kurtosis. However, when the sample size is small or the underlying distribution is non-normal, the information obtained from the sample skewness and kurtosis can be misleading. One alternative to the central moment shape statistics is the use of a linear combination of order statistics (L-moments) to examine the distributional shape characteristics of data. L-moments have several theoretical advantages over the central moment shape statistics: Characterization of a wider range of distributions, robustness to outliers and more accurate estimates in small sample sizes. This paper focuses on the development of a macro program that uses SAS/IML (1989) to generate the central moment and L-moment distributional shape parameters. In addition, the results of simulations, conducted with various sample sizes and distributions, will be presented. Background Largely through the influence of John Tukey s work (1977), statisticians have increasingly emphasized the exploratory analysis of data prior to more formal statistical analyses (t-tests, ANOVA, etc.). Tukey has suggested that to fully understand the nature of a variable and its measurement, characteristics other than the central tendency (mean) and variability (standard deviation) need to be examined. Many classical statistical tests rely on the assumption that the underlying distribution of the data (or residuals) is Gaussian. Bickel (1988) and Van Der Laan and Verdooren (1987) discuss the concept of robustness and how it pertains to the assumption of normality. As discussed by Glass et al. (1972), incorrect conclusions may be reached when the normality assumption is not valid, especially when one-tail tests are employed or the sample size or significance level are very small. Hopkins and Weeks (1990) also discuss the effects of highly nonnormal data on hypothesis testing of variances. Thus, the exam-ination of the skewness (departure from symmetry) and kurtosis (deviation from a normal curve) is an important component of exploratory data analyses. Various methods to estimate skewness and kurtosis have been proposed (MacGillivray and Balanda, 1988). For many years, the conventional coefficients of skewness and kurtosis, ϒ and κ (Hosking, 1990), have been used to describe the shape characteristics of distributions. However, as pointed out by Hosking (1990) and Royston (1992), these coefficients are not without limitations. Both are sensitive to minute changes in the tails of a distribution, susceptible to moderate outliers and biased in small to moderately-sized samples from skew distributions. Also, the information conveyed by the third and fourth central moments with regards to the shape of a distribution can be difficult to assess. Thus, it would be appropriate to determine if other, more robust measures of skewness and kurtosis can be used to assess the shape of a distribution. L-moments One more robust measure are linear combinations of order statistics, or L-moments. In theory, L- moments are less prone to the effects of sampling variability as compared to conventional moments. Hosking (1990) provides an excellent overview of the theory behind the derivation and application of L-moments as summary statistics for univariate probability distributions. Royston (1992) compares the prop-erties of the conventional shape parameters to their L-moment counterparts for two lognormal dis-tributions. Rather than discuss the detailed theory behind L-moments, the reader is referred to the two aforementioned papers. Instead,
2 a brief overview of the development of the equations necessary to apply L-moments is described below. As with the paper by Royston, the notation of Hosking (1990) will be employed. For the random variables X 1,, X n of sample size n drawn from the distribution of a random variable X with the mean µ and variance of σ 2, let X 1:1 X 1:n be the order statistics such that the L-moments of X are defined by r-1 λ r r-1 (-1) k (r - 1 ) EX r-k:r, k=0 k r = 1,2, where r is the r th L-moment of a distribution and EX i:r is the expected value of the i th smallest observation in a sample of size r. The first four central moments of a random variable X can be written as µ = E(X), σ 2 = E(X - µ) 2, ϒ = E(X - µ) 3 /σ 3 and κ = E(X - µ) 4 / σ 4. In a similar fashion, the first four L-moments of a random variable X can be written as λ 1 = E(X), λ 2 = 1/2E(X 2:2 - X 1:2 ), λ 3 = 1/3E (X 3:3-2X 2:3 + X 1:3 ) and λ 4 = 1/4E(X 4:4-3X 3:4 + 3X 2:4 - X 1:4 ) It can be seen that λ 1 is equivalent to the usual measure of central tendency, µ. λ 2 is similar to σ 2 in that both measure the difference between two randomly selected values of X; however, by its nature σ 2 assigns more weight to extreme sample values than does λ 2. λ 3 is a scale-dependent measure of skewness for a sample of size 3, and λ 4 is proportional to a weighted difference between outer extremes and the central portion samples of size 4 (Royston, 1992). Scale-free versions of the L-moments for skewness, τ 3, and kurtosis, τ 4, can be written as τ 3 = λ 3 / λ 2 and τ 4 = λ 4 / λ 2. An alternative measure of skewness, τ, 3, is defined as (1 + τ 3 ) / (1 - τ 3 ). This measure is the ratio of the expected length of the upper tail to that of the lower tail in samples of size 3, and as such may be easier to interpret than τ 3. λ 2, τ 3, τ, 3 and λ 4 are subject to the constraints λ 2 > 0, -1 < τ 3 < 1, 0 < τ, 3 < and 1/4(5τ 2 3-1) τ 4 < 1. If a random sample of size n is drawn from a distribution of the random variable X and x 1:n x n:n are the ordered sample values then estimates of the L-moments λ 1, λ 2, λ 3 and λ 4, namely I 1, I 2, I 3, I 4, can be calculated as follows. First, define w 2, w 3, and w 4 as 1 n w 2 = (i- 1)x i:n, n(n-1) i=2 1 n w 3 = (i - 1)(i -2)x i:n and n(n-1)(n-2) i=3 1 n w 4 = (i - 1)(i - 2)(i - 3)x i:n n(n-1)(n-2)(n-3) i=4 Then the L-moments and the corresponding shape statistics can be estimated as I 1 = x i / n, I 2 = 2w 2 - I 1, I 3 = 6w 3-6w 2 + I 1, I 4 = 20w 4-30w w 2 - I 1,
3 t 3 = I 3 / I 2 and t 4 = I 4 / I 2. t 3 and t 4 are the sample L-skewness and L-kurtosis, respectively, The sample estimate of the alternative measure of skewness, t, 3, is defined as (1 + t 3 ) / (1 - t 3 ). The Program The macro program L_MOMENTS was originally written using SAS v6.08 under the VMS operating environment. It has been modified to run under V6.09 and V6.12 on HP-UNIX. With slight modification (detailed below), the program should run on any operating system. The user is required to provide the name of the SAS data set (macro variable INDS) to be used in the analyses and the name(s) of the variables (macro variable VARS), separated by spaces, to be analyzed. There is no limit as to the number of variables that can be analyzed. Options available to the user include: Specify the location of the SAS data library (macro variable LIB). Default is current user location. BY group processing (macro variable BYVAR). No limit of number of BY variables, delimited by a space. Default is no BY group processing. Generate stem-leaf, box and normal probability plots (macro variable PLOTS). Default is no plots. Generate a hardcopy of the usual PROC UNIVARIATE output, with the central moment and L-moment shape statistics appended (macro variable PRINT). Default is to have the output provided. Create an output data (Temporary or Permanent) that contains the central moment and L-moment shape statistics (macro variable OUT). Default is no output dataset is created. A brief description of the flow of the program follows. A driver macro is used to initialize variables, search for the analysis data set, call a macro that outputs to the LMOMENTS.LOG file the options selected by the user, and call a macro that performs the calculations. If no analysis data set is found, the program reports this error to the.log file and terminates. Otherwise, the calculation macro begins. If BY variable processing is requested, the data are sorted before submission to PROC UNIVARIATE for analysis. PROC PRINTTO is used to capture the usual output and send it to the file UNI.DAT. An output data set from PROC UNIVARIATE is used to store the number of non-missing observations for each analysis. If the user chooses to generate a hardcopy of the results, a DATA step is used to process the UNI.DAT file. The functions PUT and SUBSTR are used in conjunction with the $HEX16. format to search for pagebreaks and set a flag that will be used to fire a PUT _PAGE_ in a DATA_NULL_ step at the end of the program. Next, a flag is set if BY variable box plots are created. For each page of output that does not contain BY variable box plots, a counter is incremented. The counter is used to facilitate direct read access of the shape statistics data set created by PROC IML for use in the DATA_NULL_ that generates the hardcopy output. Next, that part of the output line that displays the values for skewness and kurtosis is removed. Finally, a flag is set that indicates the last line of the tabular portion of the PROC UNIVARIATE output. For each analysis variable, the raw data are sorted, then merged and transposed. The output data set from PROC UNIVARIATE that contains the number of non-missing observations is also transposed. PROC IML is then used to calculate the central moments and L-moments for skewness and kurtosis. Using the same method as PROC UNIVARIATE, the sample skewness and kurtosis are calculated. Then, conditional upon there being at least four nonmissing observations, for each combination of BY variable and analysis variable the values for w 1, w 2, w 3 and w 4 are calculated and appended to an interim matrix. If this condition is not met, then the calculation of the L-moment parameters is not possible and a flag is set. Finally, the L-moment parameters are calculated, concatenated with the BY variables (if present), the central moment parameters and the conditional flag described above and placed into a SAS data set. The names of the analyses variables are placed in a separate SAS data set. Once the calculations have been completed, userdefined options direct the results to hardcopy output and/or a temporary or permanent SAS data set. If hardcopy output is requested, a DATA _NULL_ writes the modified PROC UNIVARIATE output and, using the direct access counter previously described, places the shape statistics immediately below the last line of PROC UNIVARIATE tabular
4 output. If the sample size flag generated by PROC IML has been fired, a ** is printed for the L-moment output, with an appropriate footnote. If the user has requested a temporary (OUT = T) or permanent (OUT =P) data set be created, then the two resultant data sets from PROC IML are merged and the data set is created as appropriate. Simulations Simulations were conducted to explore the applicability of the L-moment shape statistics to varying sample sizes and distributional shapes. For each of the following distributions, 5000 data sets were generated for samples sizes 5, 10, 20, 40, 60, 125 and 250: Logistic y = a + k*log(x/(1-x)), where a = 0 and k = 1; Gumbel y = a - b(log(-log(x))), where a = 1 and b =1; Normal(0, 1) Exponential y = a - b*log(1-x), where a = 1 and b = 1; Lognormal y = exp(a*x), where a = 0.5; Lognormal y = exp(a*x), where a = 1. In the equations, x is a random normal (0,1) variate. The table below lists the theoretical values for the shape statistics for the above distributions (values for the central moments are taken from Hastings and Peacock, 1975; except for t 4 for the Lognormal distributions (Royston, 1992), values for the L- moments are taken from Hosking, 1990): Distribution κ τ 3 τ 4 Logistic Gumbel Normal Exponential Lognormal(0.5) Lognormal(1.0) The method of Royston (1992) was used to quantify the results of the simulations. For the logistic and normal distributions, the mean absolute values for τ 3 and κ were determined. Otherwise, each of the 5000 values of the shape parameters was standardized by dividing its simulation mean by the nominal, theoretical value and then averaged. The results are presented in Appendix 1, comparing the usual shape statistics to the L-moment statistics for the simulated samples sizes. The results are presented as the nominal values for τ 3 and κ for the logistic and normal distributions or percent of nominal value for the other distributions. Upon review of the results it appears that, for the simulations conducted and independent of sample size or distributional shape, the L-moment shape statistics in general are less biased than the central moment shape statistics. As such, the L-moment shape statistics may be more useful indicators of the type of departure of a sample from normality (Royston, 1992). Example The usefulness of L-moment shape statistics become apparent when applied to the analysis of pharmacokinetic parameters. It has been suggested that many pharmacokinetic parameters follow a lognormal distribution. To examine this, data from Metzler and Huang (1983) will be used to calculate the central moment and L-moment shape statistics for the untransformed and log-transformed area under the plasma concentration-time curve data. Figures 1 and 2 present an example of the output produced using the macro call %LMOMENTS(INDS=TEST,PLOTS=Y,VARS= AUC LOGAUC); The shape statistics for the untransformed data suggest that the underlying distribution is positively skewed, with some evidence of kurtosis. Logtransformation of the data results in a closer approximation to normality. Note the disparity between the central moment, κ, and L-moment, τ 4, measures of kurtosis. This can probably be attributed to the poor small sample performance of κ compared to τ 4, and to the biasedness of κ in nonnormal distributions. Discussion The L-moment shape indices t 3, t 3, and t 4 have several advantages over the usual shape statistics ϒ and κ. Accurate characterization of several nonnormal distributions, reasonably unbiased in small sample sizes, ease of interpretability and robustness to outliers make the L-moment shape statistics useful measures of the shape of distribution. As shown in the example, the L-moment shape statistics could be useful indicators when transformation of data is required. A macro program was developed to include the calculation of the L-moment shape statistics with the central moment shape statistics in a hardcopy of PROC UNIVARIATE output, an output data set, or both.
5 SAS and SAS/IML are registered trademarks of the SAS Institute, Inc., Cary, NC. HP-UNIX is a registered trademark of the Hewlett- Packard Corporation, Boise, Idaho. References Bickle, P. Robust Estimation in S. Kotz and N. Johnson (eds.) the Encyclopedia of Statistical Sciences, John Wiley and Sons (1988), New York, NY, Volume 8, pp Van Der Laan, P. and L. R. Verdooren. Classical analysis of variance methods and nonparametric counterparts. Biom. J. 6: , The author can be reached at Covance 210 Carnegie Center Princeton, NJ Phone:(609) Michael.Walega@covance.com Glass, G.V., Peckham, P.D. and J.R. Sanders. Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Rev. Educ. Res. 42: Hastings, N.A.J. and J.B. Peacock, Statistical Distributions. John Wiley and Sons (1975), New York, NY. Hopins, K.D. and D.L. Weeks. Tests for normality and measures of skewness and kurtosis: Their place in research reporting. Educ. Psychol. Meas. 50: , 90. Hosking, J.R.M. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. Royal Stat. Soc.. B 52: , MacGillivray, H.L. and K.P. Balanda. The relationships between skewness and kurtosis. Austral. J. Stat. 30: , Metzler, C.M. and D.C. Huang. Statistical methods for bioavailability and bioequivalence. Clin. Res. Pract. Drug Reg. Affairs, 1: , Royston, P. Which measures of skewness and kurtosis are best? Stat. Med. 11: , SAS Institute, Inc. SAS Language: Reference, Version 6, First Edition, Cary, NC: SAS Institute, Inc., SAS Institute, Inc. SAS Procedures Guide, Version 6, Third Edition, Cary, NC: SAS Institute, Inc., SAS Institute, Inc. SAS/IML Software: Usage and Reference, Version 6, First Edition, Cary, NC: SAS Institute, Inc., Tukey, J.W. Exploratory Data Analysis. Addison- Wesley (1977), Reading, MA.
6 APPENDIX 1 - Simulation Results Sample Size Logistic Gumbel % 95% 38% 79% 4% 101% % 102% 55% 87% 12% 105% % 100% 72% 100% 20% 107% % 101% 80% 100% 28% 108% % 102% 85% 101% 34% 108% % 102% 100% 103% 40% 108% % 102% 104% 103% 42% 108% Sample Size Normal Exponential % 99% 27% 58% 4% 99% % 102% 43% 63% 12% 104% % 101% 57% 65% 21% 108% % 100% 64% 67% 32% 112% % 100% 69% 71% 38% 114% % 100% 74% 73% 46% 115% % 100% 82% 74% 51% 115% Sample Size Lognormal (0.5) Lognormal (1.0) 5 33% 78% 3% 82% 17% 75% 1% 67% 10 44% 82% 7% 87% 21% 81% 4% 77% 20 61% 89% 18% 91% 30% 87% 8% 83% 40 68% 86% 28% 96% 37% 92% 10% 90% 60 75% 98% 33% 98% 42% 95% 13% 94% % 99% 40% 99% 51% 97% 18% 96% % 100% 44% 100% 61% 98% 20% 98%
7 FIGURE 1 Univariate Procedure Variable=AUC Moments Quantiles(Def=5) Extremes N 20 Sum Wgts % Max % Lowest Obs Highest Obs Mean 7.08 Sum % Q % ( 1) 9.28( 16) Std Dev Variance % Med % ( 2) 10.73( 17) 25% Q % ( 3) 10.8( 18) USS CSS % Min % ( 4) 14.02( 19) CV Std Mean % ( 5) 16.26( 20) T:Mean= Pr> T Range Num ^= 0 20 Num > 0 20 Q3-Q M(Sign) 10 Pr>= M Mode 2.33 Sgn Rank 105 Pr>= S W:Normal Pr<W Skewness Kurtosis Usual Method: L-Moments: T T T Stem Leaf # Boxplot Normal Probability Plot * * *+* ** *--+--* +++**** *+**** * *++*+*+* * NOTE: T3 = (1+T3)/(1-T3). The L-Moment statistics are subject to the following constraints: -1 < T3 < 1, 0 < T3 < Infinity, ¼ * (5 * (T3**2) - 1) <= T4 <= 1. ** indicates that L-Moment statistics could not be computed. REF: P. Royston, Stat. Med. 11: (1992).
8 FIGURE 2 Univariate Procedure Variable=LOGAUC Moments Quantiles(Def=5) Extremes N 20 Sum Wgts % Max % Lowest Obs Highest Obs Mean Sum % Q % ( 1) ( 16) Std Dev Variance % Med % ( 2) ( 17) 25% Q % ( 3) ( 18) USS CSS % Min % ( 4) ( 19) CV Std Mean % ( 5) ( 20) T:Mean= Pr> T Range Num ^= 0 20 Num > 0 20 Q3-Q M(Sign) 10 Pr>= M Mode Sgn Rank 105 Pr>= S W:Normal Pr<W Skewness Kurtosis Usual Method: L-Moments: T T T Stem Leaf # Boxplot Normal Probability Plot *++++* ***+**+*+* *--+--* ****+* *++*+*+** * NOTE: T3 = (1+T3)/(1-T3). The L-Moment statistics are subject to the following constraints: -1 < T3 < 1, 0 < T3 < Infinity, ¼ * (5 * (T3**2) - 1) <= T4 <= 1. ** indicates that L-Moment statistics could not be computed. REF: P. Royston, Stat. Med. 11: (1992).
Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.
Review: Chebyshev s Rule Measures of Dispersion II Tom Ilvento STAT 200 Is based on a mathematical theorem for any data At least ¾ of the measurements will fall within ± 2 standard deviations from the
More informationEmpirical Rule (P148)
Interpreting the Standard Deviation Numerical Descriptive Measures for Quantitative data III Dr. Tom Ilvento FREC 408 We can use the standard deviation to express the proportion of cases that might fall
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, President, OptiMine Consulting, West Chester, PA ABSTRACT Data Mining is a new term for the
More informationMODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION
International Days of Statistics and Economics, Prague, September -3, MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION Diana Bílková Abstract Using L-moments
More informationKURTOSIS OF THE LOGISTIC-EXPONENTIAL SURVIVAL DISTRIBUTION
KURTOSIS OF THE LOGISTIC-EXPONENTIAL SURVIVAL DISTRIBUTION Paul J. van Staden Department of Statistics University of Pretoria Pretoria, 0002, South Africa paul.vanstaden@up.ac.za http://www.up.ac.za/pauljvanstaden
More informationKey Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions
SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference
More informationProcess capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods
ANZIAM J. 49 (EMAC2007) pp.c642 C665, 2008 C642 Process capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods S. Ahmad 1 M. Abdollahian 2 P. Zeephongsekul
More informationOn Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 8-26-2016 On Some Test Statistics for Testing the Population Skewness and Kurtosis:
More informationSymmetricity of the Sampling Distribution of CV r for Exponential Samples
World Applied Sciences Journal 17 (Special Issue of Applied Math): 60-65, 2012 ISSN 1818-4952 IDOSI Publications, 2012 Symmetricity of the Sampling Distribution of CV r for Exponential Samples Fauziah
More informationSome Characteristics of Data
Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationChapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1
Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and
More informationChapter 7. Inferences about Population Variances
Chapter 7. Inferences about Population Variances Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from
More informationData Distributions and Normality
Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical
More informationSome developments about a new nonparametric test based on Gini s mean difference
Some developments about a new nonparametric test based on Gini s mean difference Claudio Giovanni Borroni and Manuela Cazzaro Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
More informationFinancial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR
Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Nelson Mark University of Notre Dame Fall 2017 September 11, 2017 Introduction
More informationShape Measures based on Mean Absolute Deviation with Graphical Display
International Journal of Business and Statistical Analysis ISSN (2384-4663) Int. J. Bus. Stat. Ana. 1, No. 1 (July-2014) Shape Measures based on Mean Absolute Deviation with Graphical Display E.A. Habib*
More informationESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *
Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. (2011), Vol. 4, Issue 1, 56 70 e-issn 2070-5948, DOI 10.1285/i20705948v4n1p56 2008 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index
More informationTopic 8: Model Diagnostics
Topic 8: Model Diagnostics Outline Diagnostics to check model assumptions Diagnostics concerning X Diagnostics using the residuals Diagnostics and remedial measures Diagnostics: look at the data to diagnose
More informationNumerical Descriptions of Data
Numerical Descriptions of Data Measures of Center Mean x = x i n Excel: = average ( ) Weighted mean x = (x i w i ) w i x = data values x i = i th data value w i = weight of the i th data value Median =
More informationLecture 6: Non Normal Distributions
Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return
More informationChapter 6 Part 3 October 21, Bootstrapping
Chapter 6 Part 3 October 21, 2008 Bootstrapping From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the
More informationA Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution
A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution Debasis Kundu 1, Rameshwar D. Gupta 2 & Anubhav Manglick 1 Abstract In this paper we propose a very convenient
More informationOn Some Statistics for Testing the Skewness in a Population: An. Empirical Study
Available at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 12, Issue 2 (December 2017), pp. 726-752 Applications and Applied Mathematics: An International Journal (AAM) On Some Statistics
More informationPower of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach
Available Online Publications J. Sci. Res. 4 (3), 609-622 (2012) JOURNAL OF SCIENTIFIC RESEARCH www.banglajol.info/index.php/jsr of t-test for Simple Linear Regression Model with Non-normal Error Distribution:
More informationComputing and Graphing Probability Values of Pearson Distributions: A SAS/IML Macro
Computing and Graphing Probability Values of Pearson Distributions: A SAS/IML Macro arxiv:1704.02706v1 [stat.co] 10 Apr 2017 Wei Pan Duke University Xinming An SAS Institute Inc. Qing Yang Duke University
More informationPercentiles, STATA, Box Plots, Standardizing, and Other Transformations
Percentiles, STATA, Box Plots, Standardizing, and Other Transformations Lecture 3 Reading: Sections 5.7 54 Remember, when you finish a chapter make sure not to miss the last couple of boxes: What Can Go
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationNumerical Measurements
El-Shorouk Academy Acad. Year : 2013 / 2014 Higher Institute for Computer & Information Technology Term : Second Year : Second Department of Computer Science Statistics & Probabilities Section # 3 umerical
More informationIndices of Skewness Derived from a Set of Symmetric Quantiles: A Statistical Outline with an Application to National Data of E.U.
Metodološki zvezki, Vol. 4, No. 1, 2007, 9-20 Indices of Skewness Derived from a Set of Symmetric Quantiles: A Statistical Outline with an Application to National Data of E.U. Countries Maurizio Brizzi
More informationLinda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach
P1.T4. Valuation & Risk Models Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach Bionic Turtle FRM Study Notes Reading 26 By
More informationOverview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution
PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations
More informationSAS/STAT 14.1 User s Guide. The LATTICE Procedure
SAS/STAT 14.1 User s Guide The LATTICE Procedure This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More information14.1 Moments of a Distribution: Mean, Variance, Skewness, and So Forth. 604 Chapter 14. Statistical Description of Data
604 Chapter 14. Statistical Description of Data In the other category, model-dependent statistics, we lump the whole subject of fitting data to a theory, parameter estimation, least-squares fits, and so
More informationWeb Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data
More informationFitting financial time series returns distributions: a mixture normality approach
Fitting financial time series returns distributions: a mixture normality approach Riccardo Bramante and Diego Zappa * Abstract Value at Risk has emerged as a useful tool to risk management. A relevant
More informationFinancial Returns. Dakota Wixom Quantitative Analyst QuantCourse.com INTRO TO PORTFOLIO RISK MANAGEMENT IN PYTHON
INTRO TO PORTFOLIO RISK MANAGEMENT IN PYTHON Financial Returns Dakota Wixom Quantitative Analyst QuantCourse.com Course Overview Learn how to analyze investment return distributions, build portfolios and
More informationMonte Carlo Simulation (Random Number Generation)
Monte Carlo Simulation (Random Number Generation) Revised: 10/11/2017 Summary... 1 Data Input... 1 Analysis Options... 6 Summary Statistics... 6 Box-and-Whisker Plots... 7 Percentiles... 9 Quantile Plots...
More informationINDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.
INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture Moments of a distribubon Measures of
More informationApproximate Variance-Stabilizing Transformations for Gene-Expression Microarray Data
Approximate Variance-Stabilizing Transformations for Gene-Expression Microarray Data David M. Rocke Department of Applied Science University of California, Davis Davis, CA 95616 dmrocke@ucdavis.edu Blythe
More informationدرس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی
یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More informationOn Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations
On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations Khairul Islam 1 * and Tanweer J Shapla 2 1,2 Department of Mathematics and Statistics
More informationModule Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION
Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties
More informationSimple Descriptive Statistics
Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency
More informationFundamentals of Statistics
CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct
More informationCHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =
Solutions to End-of-Section and Chapter Review Problems 225 CHAPTER 6 6.1 (a) P(Z < 1.20) = 0.88493 P(Z > 1.25) = 1 0.89435 = 0.10565 P(1.25 < Z < 1.70) = 0.95543 0.89435 = 0.06108 (d) P(Z < 1.25) or Z
More informationWindow Width Selection for L 2 Adjusted Quantile Regression
Window Width Selection for L 2 Adjusted Quantile Regression Yoonsuh Jung, The Ohio State University Steven N. MacEachern, The Ohio State University Yoonkyung Lee, The Ohio State University Technical Report
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority
Chapter 235 Analysis of 2x2 Cross-Over Designs using -ests for Non-Inferiority Introduction his procedure analyzes data from a two-treatment, two-period (2x2) cross-over design where the goal is to demonstrate
More informationSubject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018
` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More informationGENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy
GENERATION OF STANDARD NORMAL RANDOM NUMBERS Naveen Kumar Boiroju and M. Krishna Reddy Department of Statistics, Osmania University, Hyderabad- 500 007, INDIA Email: nanibyrozu@gmail.com, reddymk54@gmail.com
More informationKARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI
88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical
More information1. Distinguish three missing data mechanisms:
1 DATA SCREENING I. Preliminary inspection of the raw data make sure that there are no obvious coding errors (e.g., all values for the observed variables are in the admissible range) and that all variables
More informationDescriptive Analysis
Descriptive Analysis HERTANTO WAHYU SUBAGIO Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable
More informationLecture Data Science
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner Learning Goals How to describe sample data? What is mode/median/mean?
More informationStatistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient
Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Predicting FLOODS Flood Frequency Analysis n Statistical Methods to evaluate probability exceeding a particular outcome - P (X >20,000 cfs)
More informationSuperiority by a Margin Tests for the Ratio of Two Proportions
Chapter 06 Superiority by a Margin Tests for the Ratio of Two Proportions Introduction This module computes power and sample size for hypothesis tests for superiority of the ratio of two independent proportions.
More informationModelling catastrophic risk in international equity markets: An extreme value approach. JOHN COTTER University College Dublin
Modelling catastrophic risk in international equity markets: An extreme value approach JOHN COTTER University College Dublin Abstract: This letter uses the Block Maxima Extreme Value approach to quantify
More informationFinancial Time Series and Their Characteristics
Financial Time Series and Their Characteristics Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana
More information2 Exploring Univariate Data
2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting
More informationFinancial Risk Management
Financial Risk Management Professor: Thierry Roncalli Evry University Assistant: Enareta Kurtbegu Evry University Tutorial exercices #4 1 Correlation and copulas 1. The bivariate Gaussian copula is given
More informationDescriptive Statistics Bios 662
Descriptive Statistics Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-08-19 08:51 BIOS 662 1 Descriptive Statistics Descriptive Statistics Types of variables
More informationFive Things You Should Know About Quantile Regression
Five Things You Should Know About Quantile Regression Robert N. Rodriguez and Yonggang Yao SAS Institute #analyticsx Copyright 2016, SAS Institute Inc. All rights reserved. Quantile regression brings the
More informationStandardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis
Descriptive Statistics (Part 2) 4 Chapter Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. Chebyshev s Theorem
More informationTi 83/84. Descriptive Statistics for a List of Numbers
Ti 83/84 Descriptive Statistics for a List of Numbers Quiz scores in a (fictitious) class were 10.5, 13.5, 8, 12, 11.3, 9, 9.5, 5, 15, 2.5, 10.5, 7, 11.5, 10, and 10.5. It s hard to get much of a sense
More informationDescriptive Statistics
Petra Petrovics Descriptive Statistics 2 nd seminar DESCRIPTIVE STATISTICS Definition: Descriptive statistics is concerned only with collecting and describing data Methods: - statistical tables and graphs
More informationdiscussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models
discussion Papers Discussion Paper 2007-13 March 26, 2007 Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models Christian B. Hansen Graduate School of Business at the
More informationTime Invariant and Time Varying Inefficiency: Airlines Panel Data
Time Invariant and Time Varying Inefficiency: Airlines Panel Data These data are from the pre-deregulation days of the U.S. domestic airline industry. The data are an extension of Caves, Christensen, and
More informationData screening, transformations: MRC05
Dale Berger Data screening, transformations: MRC05 This is a demonstration of data screening and transformations for a regression analysis. Our interest is in predicting current salary from education level
More informationPARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS
PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi
More informationMath 227 Elementary Statistics. Bluman 5 th edition
Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find
More informationHow To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion
How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 17, 2005 Introduction For individuals concerned with the quality of the goods and services that they
More informationThe Two-Sample Independent Sample t Test
Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal
More informationMarket Risk Analysis Volume I
Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii
More informationAs time goes by... On the performance of significance tests in reaction time experiments. Wolfgang Wiedermann & Bartosz Gula
On the performance of significance tests in reaction time experiments Wolfgang Bartosz wolfgang.wiedermann@uni-klu.ac.at bartosz.gula@uni-klu.ac.at Department of Psychology University of Klagenfurt, Austria
More informationFrequency Distribution and Summary Statistics
Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary
More information**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:
**BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,
More informationA New Multivariate Kurtosis and Its Asymptotic Distribution
A ew Multivariate Kurtosis and Its Asymptotic Distribution Chiaki Miyagawa 1 and Takashi Seo 1 Department of Mathematical Information Science, Graduate School of Science, Tokyo University of Science, Tokyo,
More informationMuch of what appears here comes from ideas presented in the book:
Chapter 11 Robust statistical methods Much of what appears here comes from ideas presented in the book: Huber, Peter J. (1981), Robust statistics, John Wiley & Sons (New York; Chichester). There are many
More informationIntroduction to Computational Finance and Financial Econometrics Descriptive Statistics
You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline
More informationModel Construction & Forecast Based Portfolio Allocation:
QBUS6830 Financial Time Series and Forecasting Model Construction & Forecast Based Portfolio Allocation: Is Quantitative Method Worth It? Members: Bowei Li (303083) Wenjian Xu (308077237) Xiaoyun Lu (3295347)
More informationSimulation of Moment, Cumulant, Kurtosis and the Characteristics Function of Dagum Distribution
264 Simulation of Moment, Cumulant, Kurtosis and the Characteristics Function of Dagum Distribution Dian Kurniasari 1*,Yucky Anggun Anggrainy 1, Warsono 1, Warsito 2 and Mustofa Usman 1 1 Department of
More informationTHE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES
International Days of tatistics and Economics Prague eptember -3 011 THE UE OF THE LOGNORMAL DITRIBUTION IN ANALYZING INCOME Jakub Nedvěd Abstract Object of this paper is to examine the possibility of
More informationChapter 8. Sampling and Estimation. 8.1 Random samples
Chapter 8 Sampling and Estimation We discuss in this chapter two topics that are critical to most statistical analyses. The first is random sampling, which is a method for obtaining observations from a
More informationEdgeworth Binomial Trees
Mark Rubinstein Paul Stephens Professor of Applied Investment Analysis University of California, Berkeley a version published in the Journal of Derivatives (Spring 1998) Abstract This paper develops a
More information1 Volatility Definition and Estimation
1 Volatility Definition and Estimation 1.1 WHAT IS VOLATILITY? It is useful to start with an explanation of what volatility is, at least for the purpose of clarifying the scope of this book. Volatility
More informationOn the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal
The Korean Communications in Statistics Vol. 13 No. 2, 2006, pp. 255-266 On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal Hea-Jung Kim 1) Abstract This paper
More informationEquivalence Tests for the Ratio of Two Means in a Higher- Order Cross-Over Design
Chapter 545 Equivalence Tests for the Ratio of Two Means in a Higher- Order Cross-Over Design Introduction This procedure calculates power and sample size of statistical tests of equivalence of two means
More informationSummary of Statistical Analysis Tools EDAD 5630
Summary of Statistical Analysis Tools EDAD 5630 Test Name Program Used Purpose Steps Main Uses/Applications in Schools Principal Component Analysis SPSS Measure Underlying Constructs Reliability SPSS Measure
More informationDescriptive Statistics (Devore Chapter One)
Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf
More informationOn the Distribution of Kurtosis Test for Multivariate Normality
On the Distribution of Kurtosis Test for Multivariate Normality Takashi Seo and Mayumi Ariga Department of Mathematical Information Science Tokyo University of Science 1-3, Kagurazaka, Shinjuku-ku, Tokyo,
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationApplications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK
Applications of Good s Generalized Diversity Index A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK Internal Report STAT 98/11 September 1998 Applications of Good s Generalized
More informationA Skewed Truncated Cauchy Logistic. Distribution and its Moments
International Mathematical Forum, Vol. 11, 2016, no. 20, 975-988 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2016.6791 A Skewed Truncated Cauchy Logistic Distribution and its Moments Zahra
More informationEmpirical Analysis of the US Swap Curve Gough, O., Juneja, J.A., Nowman, K.B. and Van Dellen, S.
WestminsterResearch http://www.westminster.ac.uk/westminsterresearch Empirical Analysis of the US Swap Curve Gough, O., Juneja, J.A., Nowman, K.B. and Van Dellen, S. This is a copy of the final version
More informationMEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL
MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL Isariya Suttakulpiboon MSc in Risk Management and Insurance Georgia State University, 30303 Atlanta, Georgia Email: suttakul.i@gmail.com,
More information2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data
Statistical Failings that Keep Us All in the Dark Normal and non normal distributions: Why understanding distributions are important when designing experiments and Conflict of Interest Disclosure I have
More informationContinuous Distributions
Quantitative Methods 2013 Continuous Distributions 1 The most important probability distribution in statistics is the normal distribution. Carl Friedrich Gauss (1777 1855) Normal curve A normal distribution
More informationstarting on 5/1/1953 up until 2/1/2017.
An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,
More information