Here is the output from the SAS program in the document Skewness, Kurtosis, and the Normal Curve *g1g2.sas; data EDA; infile 'C:\Users\Vati\Documents\StatData\EDA.dat'; input Y; proc means mean skewness kurtosis N; var Y; run; Analysis Variable : Y Skewness Kurtosis N 72.5104167 0.5255689 0.0323668 96 PROC STANDARD data=eda mean=0 std=1 out=z_scores; run; proc means mean skewness kurtosis N; var Y; run; Analysis Variable : Y Skewness Kurtosis N -4.09395E-16 0.5255689 0.0323668 96 Notice that after standardizing the scores to mean 0, standard deviation 1, the values of skewness and kurtosis remain the same as with the original scores. A linear transformation will not change the shape of a distribution. data z34; set z_scores; Z3=Y**3; Z4=Y**4; proc means data=z34 noprint; var Z3 Z4; output out=sumz34 N=N sum=sumz3 sumz4; run; data skew; set sumz34; G1=N/(n-1)/(n-2)*sumZ3; G2=N*(n+1)/(n-1)/(n-2)/(n-3)*sumZ4-3*(n-1)*(n-1)/(n-2)/(n-3); proc print; run; Obs _TYPE FREQ_ N sumz3 sumz4 G1 G2 1 0 96 96 48.8889 279.103 0.52557 0.032367 Here I have used the standard formulas for computing g 1 and g 2. Notice that the values I obtain match those produced by SAS with the s procedure. *Kurtosis-Uniform.sas; TITLE 'One Sample of 500,000 Scores From Uniform(0,1) Distribution'; run; DATA uniform; DROP N; DO N=1 TO 500000; X=UNIFORM(0); OUTPUT; END; PROC MEANS mean std skewness kurtosis; VAR X; run;
One Sample of 500,000 Scores From Uniform(0,1) Distribution Analysis Variable : X Std Dev Skewness Kurtosis 0.4996022 0.2884675 0.0013739-1.1993764 Here a random number generator is used to create a single sample of half a million scores drawn from a uniform distribution. The expected value of kurtosis for such a sample is -1.2, which is what was obtained. *Kurtosis-T.sas; TITLE 'T ON 9 DF, T COMPUTED ON EACH OF 500,000 SAMPLES'; TITLE2 'EACH WITH 10 SCORES FROM A STANDARD NORMAL POPULATION'; run; DATA T9; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 10; X=NORMAL(0); DATA T10; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 11; X=NORMAL(0); TITLE 'T ON 10 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; DATA T16; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 17; X=NORMAL(0); TITLE 'T ON 16 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; DATA T28; DROP N; DO SAMPLE=1 TO 500000; DO N=1 TO 29; X=NORMAL(0); TITLE 'T ON 28 DF, SAMPLING DISTRIBUTION OF 500,000 TS'; run; Here random number generators are used to construct the sampling distribution of Student s t, with half a million samples in each distribution. Notice that as degrees of freedom (N 1) increase, the variance and kurtosis of t decrease, as t approaches the normal distribution.
T ON 9 DF, T COMPUTED ON EACH OF 500,000 SAMPLES EACH WITH 10 SCORES FROM A STANDARD NORMAL POPULATION Std Dev N Kurtosis 0.000398775 1.1356293 500000 1.1736952 T ON 10 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis 0.000792300 1.1183849 500000 0.9858713 T ON 16 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis 0.000678705 1.0685739 500000 0.5126401 T ON 28 DF, SAMPLING DISTRIBUTION OF 500,000 TS Std Dev N Kurtosis -0.0024393 1.0394780 500000 0.2509950 *Kurtosis_Beta2.sas; *Illustrates the computation of population kurtosis; *Using data from the handout Skewness, Kurtosis, and the Normal Curve; options pageno=min nodate formdlim='-'; title; data A; do s=1 to 20; X=5; output; X=15; output; end; *SS=1000, SS/N = 25, M = 0; data ZA; set A; Z=(X-10)/5; Z4A=Z**4; proc means mean; var Z4A; run; I have not copied here the rest of the program. For each of the data sets, the program transforms the scores into z scores, raises each z score to the 4 th power, and then finds the mean of those z scores raised to the 4 th power. This mean is, by definition, for a population, the value of 2. Subtract 3 from each value of 2 and you will obtain the kurtosis values reported in the handout.
Table 1. Kurtosis for 7 Simple Distributions Also Differing in Variance X freq A freq B freq C freq D freq E freq F freq G 05 20 20 20 10 05 03 01 10 00 10 20 20 20 20 20 15 20 20 20 10 05 03 01 Kurtosis -2.0-1.75-1.5-1.0 0.0 1.33 8.0 Variance 25 20 16.6 12.5 8.3 5.77 2.27 Platykurtic Leptokurtic The MEANS Procedure Analysis Variable : Z4A 1.0000000-3 = -2 (Kurtosis Excess) Analysis Variable : Z4B 1.2500000-3 = 1.75 Analysis Variable : Z4C 1.5000000-3 = -1.5 Analysis Variable : Z4D 2.0000000-3 = -1 Analysis Variable : Z4E 3.0000000-3 = 0
Analysis Variable : Z4F 4.3333333-3 = 1.33 ----- Analysis Variable : Z4G 11.0000000-3 = 8 *Kurtosis-Normal.sas; TITLE 'Sampling Distributions of Skewness and Kurtosis for 100,000 Samples of 1000 Scores'; title2 'Each From a Normal(0,1) Distribution'; run; DATA normal; DROP N; DO SAMPLE=1 TO 100000; DO N=1 TO 1000; X=NORMAL(0); PROC MEANS NOPRINT; OUTPUT OUT=SK_KUR SKEWNESS=SKEWNESS KURTOSIS=KURTOSIS; VAR X; BY SAMPLE; PROC MEANS MEAN STD N; VAR skewness kurtosis; run; Variable Std Dev N SKEWNESS KURTOSIS 0.000146252 0.000135927 0.0772864 0.1549157 100000 100000 The expected value for the mean is zero for both g 1 and g 2, and is obtained. The expected value for the standard deviation of g 1 is 6/ n 6/1000. 077, as obtained. The expected value for the standard deviation of g 2 is 24/ n 24/1000. 155, as obtained.