Analysis Variable : Y Analysis Variable : Y E

Similar documents
One Sample T-Test With Howell Data, IQ of Students in Vermont

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi

Topic 8: Model Diagnostics

Terms & Characteristics

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

David Tenenbaum GEOG 090 UNC-CH Spring 2005

E.D.A. Exploratory Data Analysis E.D.A. Steps for E.D.A. Greg C Elvers, Ph.D.

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Normal Probability Distributions

22S:105 Statistical Methods and Computing. Two independent sample problems. Goal of inference: to compare the characteristics of two different

SAS Simple Linear Regression Example

Empirical Rule (P148)

1. Distinguish three missing data mechanisms:

AP Statistics Chapter 6 - Random Variables

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Problem Set 4 Answer Key

Chapter 3. Populations and Statistics. 3.1 Statistical populations

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

Simple Descriptive Statistics

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter 7. Sampling Distributions

The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

chapter 2-3 Normal Positive Skewness Negative Skewness

Moments and Measures of Skewness and Kurtosis

Economics 483. Midterm Exam. 1. Consider the following monthly data for Microsoft stock over the period December 1995 through December 1996:

The Two-Sample Independent Sample t Test

Some Characteristics of Data

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Random Effects ANOVA

The binomial distribution p314

The Normal Probability Distribution

* Source:

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Two-Sample T-Test for Superiority by a Margin

STAT Chapter 6: Sampling Distributions

Financial Econometrics Jeffrey R. Russell Midterm 2014

Two-Sample T-Test for Non-Inferiority

1/2 2. Mean & variance. Mean & standard deviation

Statistics & Statistical Tests: Assumptions & Conclusions

Handout seminar 6, ECON4150

Question scores. Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d M ult:choice Points

Statistics 114 September 29, 2012

Poster ID 17 JAVA Central Limit Theorem Lakshmi Varshini Damodaran. IEOM Society International

Business Statistics 41000: Probability 3

Data analysis methods in weather and climate research

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Fundamentals of Statistics

Statistics for Managers Using Microsoft Excel 7 th Edition

MgtOp S 215 Chapter 8 Dr. Ahn

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats

Engineering Mathematics III. Moments

INFLUENCE OF CONTRIBUTION RATE DYNAMICS ON THE PENSION PILLAR II ON THE

STAT 157 HW1 Solutions

Continuous Distributions

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

SPSS t tests (and NP Equivalent)

Chapter 4 Variability

Lecture 8: Single Sample t test

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

MAKING SENSE OF DATA Essentials series

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Frequency Distribution and Summary Statistics

2.4 STATISTICAL FOUNDATIONS

Point-Biserial and Biserial Correlations

Making Sense of Cents

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

You created this PDF from an application that is not licensed to print to novapdf printer (

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Establishing a framework for statistical analysis via the Generalized Linear Model

Found under MATH NUM

Statistics for Business and Economics

Chapter 6 Part 3 October 21, Bootstrapping

8.3 CI for μ, σ NOT known (old 8.4)

E B C L. Ii E A U ~ L RB A SURVEY OF ACTUAL TEST-SCORE DISTRIBUTIONS WITH RESPECT TO SKEWNESS AND KURTOSIS. Frederic M~ Lord

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Examples of continuous probability distributions: The normal and standard normal

VARIABILITY: Range Variance Standard Deviation

SOLUTIONS: DESCRIPTIVE STATISTICS

Basic Procedure for Histograms

1/12/2011. Chapter 5: z-scores: Location of Scores and Standardized Distributions. Introduction to z-scores. Introduction to z-scores cont.

Upcoming Schedule PSU Stat 2014

Simulation Lecture Notes and the Gentle Lentil Case

Data screening, transformations: MRC05

The Mode: An Example. The Mode: An Example. Measure of Central Tendency: The Mode. Measure of Central Tendency: The Median

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

8.1 Estimation of the Mean and Proportion

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

SPSS Reliability Example

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Transcription:

Here is the output from the SAS program in the document Skewness, Kurtosis, and the Normal Curve *g1g2.sas; data EDA; infile 'C:\Users\Vati\Documents\StatData\EDA.dat'; input Y; proc means mean skewness kurtosis N; var Y; run; Analysis Variable : Y Skewness Kurtosis N 72.5104167 0.5255689 0.0323668 96 PROC STANDARD data=eda mean=0 std=1 out=z_scores; run; proc means mean skewness kurtosis N; var Y; run; Analysis Variable : Y Skewness Kurtosis N -4.09395E-16 0.5255689 0.0323668 96 Notice that after standardizing the scores to mean 0, standard deviation 1, the values of skewness and kurtosis remain the same as with the original scores. A linear transformation will not change the shape of a distribution. data z34; set z_scores; Z3=Y**3; Z4=Y**4; proc means data=z34 noprint; var Z3 Z4; output out=sumz34 N=N sum=sumz3 sumz4; run; data skew; set sumz34; G1=N/(n-1)/(n-2)*sumZ3; G2=N*(n+1)/(n-1)/(n-2)/(n-3)*sumZ4-3*(n-1)*(n-1)/(n-2)/(n-3); proc print; run; Obs _TYPE FREQ_ N sumz3 sumz4 G1 G2 1 0 96 96 48.8889 279.103 0.52557 0.032367 Here I have used the standard formulas for computing g 1 and g 2. Notice that the values I obtain match those produced by SAS with the s procedure. *Kurtosis-Uniform.sas; TITLE 'One Sample of 500,000 Scores From Uniform(0,1) Distribution'; run; DATA uniform; DROP N; DO N=1 TO 500000; X=UNIFORM(0); OUTPUT; END; PROC MEANS mean std skewness kurtosis; VAR X; run;

One Sample of 500,000 Scores From Uniform(0,1) Distribution Analysis Variable : X Std Dev Skewness Kurtosis 0.4996022 0.2884675 0.0013739-1.1993764 Here a random number generator is used to create a single sample of half a million scores drawn from a uniform distribution. The expected value of kurtosis for such a sample is -1.2, which is what was obtained. *Kurtosis-T.sas; TITLE 'T ON 9 DF, T COMPUTED ON EACH OF 500,000 SAMPLES'; TITLE2 'EACH WITH 10 SCORES FROM A STANDARD NORMAL POPULATION'; run; DATA T9; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 10; X=NORMAL(0); DATA T10; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 11; X=NORMAL(0); TITLE 'T ON 10 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; DATA T16; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 17; X=NORMAL(0); TITLE 'T ON 16 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; DATA T28; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 29; X=NORMAL(0); TITLE 'T ON 28 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; Here random number generators are used to construct the sampling distribution of Student s t, with half a million samples in each distribution. Notice that as degrees of freedom (N 1) increase, the variance and kurtosis of t decrease, as t approaches the normal distribution.

T ON 9 DF, T COMPUTED ON EACH OF 500,000 SAMPLES EACH WITH 10 SCORES FROM A STANDARD NORMAL POPULATION Std Dev N Kurtosis 0.000398775 1.1356293 500000 1.1736952 T ON 10 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis 0.000792300 1.1183849 500000 0.9858713 T ON 16 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis 0.000678705 1.0685739 500000 0.5126401 T ON 28 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis -0.0024393 1.0394780 500000 0.2509950 *Kurtosis_Beta2.sas; *Illustrates the computation of population kurtosis; *Using data from the handout Skewness, Kurtosis, and the Normal Curve; options pageno=min nodate formdlim='-'; title; data A; do s=1 to 20; X=5; output; X=15; output; end; *SS=1000, SS/N = 25, M = 0; data ZA; set A; Z=(X-10)/5; Z4A=Z**4; proc means mean; var Z4A; run; I have not copied here the rest of the program. For each of the data sets, the program transforms the scores into z scores, raises each z score to the 4 th power, and then finds the mean of those z scores raised to the 4 th power. This mean is, by definition, for a population, the value of 2. Subtract 3 from each value of 2 and you will obtain the kurtosis values reported in the handout.

Table 1. Kurtosis for 7 Simple Distributions Also Differing in Variance X freq A freq B freq C freq D freq E freq F freq G 05 20 20 20 10 05 03 01 10 00 10 20 20 20 20 20 15 20 20 20 10 05 03 01 Kurtosis -2.0-1.75-1.5-1.0 0.0 1.33 8.0 Variance 25 20 16.6 12.5 8.3 5.77 2.27 Platykurtic Leptokurtic The MEANS Procedure Analysis Variable : Z4A 1.0000000-3 = -2 (Kurtosis Excess) Analysis Variable : Z4B 1.2500000-3 = 1.75 Analysis Variable : Z4C 1.5000000-3 = -1.5 Analysis Variable : Z4D 2.0000000-3 = -1 Analysis Variable : Z4E 3.0000000-3 = 0

Analysis Variable : Z4F 4.3333333-3 = 1.33 ----- Analysis Variable : Z4G 11.0000000-3 = 8 *Kurtosis-Normal.sas; TITLE 'Sampling Distributions of Skewness and Kurtosis for 100,000 Samples of 1000 Scores'; title2 'Each From a Normal(0,1) Distribution'; run; DATA normal; DROP N; DO SAMPLE=1 TO 100000; DO N=1 TO 1000; X=NORMAL(0); PROC MEANS NOPRINT; OUTPUT OUT=SK_KUR SKEWNESS=SKEWNESS KURTOSIS=KURTOSIS; VAR X; BY SAMPLE; PROC MEANS MEAN STD N; VAR skewness kurtosis; run; Variable Std Dev N SKEWNESS KURTOSIS 0.000146252 0.000135927 0.0772864 0.1549157 100000 100000 The expected value for the mean is zero for both g 1 and g 2, and is obtained. The expected value for the standard deviation of g 1 is 6/ n 6/1000. 077, as obtained. The expected value for the standard deviation of g 2 is 24/ n 24/1000. 155, as obtained.