The Normal Distribution & Descriptive Statistics. Kin 304W Week 2: Jan 15, 2012

Similar documents
chapter 2-3 Normal Positive Skewness Negative Skewness

Engineering Mathematics III. Moments

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Terms & Characteristics

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Descriptive Statistics

Descriptive Statistics

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Some Characteristics of Data

PSYCHOLOGICAL STATISTICS

2 Exploring Univariate Data

Lecture 2 Describing Data

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

The Normal Distribution

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

DATA SUMMARIZATION AND VISUALIZATION

ECON 214 Elements of Statistics for Economists 2016/2017

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

STAB22 section 1.3 and Chapter 1 exercises

Numerical Descriptions of Data

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Lecture Week 4 Inspecting Data: Distributions

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

Basic Procedure for Histograms

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Frequency Distribution and Summary Statistics

SOLUTIONS TO THE LAB 1 ASSIGNMENT

The normal distribution is a theoretical model derived mathematically and not empirically.

3.1 Measures of Central Tendency

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

ECON 214 Elements of Statistics for Economists

Fundamentals of Statistics

STAT 113 Variability

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

The Normal Probability Distribution

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Simple Descriptive Statistics

appstats5.notebook September 07, 2016 Chapter 5

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Moments and Measures of Skewness and Kurtosis

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Chapter 4. The Normal Distribution

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

STAT 157 HW1 Solutions

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Math 227 Elementary Statistics. Bluman 5 th edition

8. From FRED, search for Canada unemployment and download the unemployment rate for all persons 15 and over, monthly,

Summary of Statistical Analysis Tools EDAD 5630

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Lecture 6: Chapter 6

Lecture 1: Review and Exploratory Data Analysis (EDA)

Measures of Central tendency

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Lecture Data Science

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

CHAPTER 2 Describing Data: Numerical

2 2 In general, to find the median value of distribution, if there are n terms in the distribution the

Section Introduction to Normal Distributions

Exploring Data and Graphics

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

1 Describing Distributions with numbers

Sampling & Confidence Intervals

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

CHAPTER TOPICS STATISTIK & PROBABILITAS. Copyright 2017 By. Ir. Arthur Daniel Limantara, MM, MT.

DATA HANDLING Five-Number Summary

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Chapter 8 Estimation

Descriptive Statistics Bios 662

The Mode: An Example. The Mode: An Example. Measure of Central Tendency: The Mode. Measure of Central Tendency: The Median

DATA ANALYSIS EXAM QUESTIONS

Putting Things Together Part 2

Normal Probability Distributions

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Data screening, transformations: MRC05

Establishing a framework for statistical analysis via the Generalized Linear Model

Florida State University. From the SelectedWorks of Patrick L. Mason. Patrick Leon Mason, Florida State University. Winter February, 2009

Software Tutorial ormal Statistics

2CORE. Summarising numerical data: the median, range, IQR and box plots

Mini-Lecture 3.1 Measures of Central Tendency

Data Distributions and Normality

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Transcription:

The Normal Distribution & Descriptive Statistics Kin 304W Week 2: Jan 15, 2012 1

Questionnaire Results I received 71 completed questionnaires. Thank you! Are you nervous about scientific writing? You re not alone. 40 (56%) reported some hesitation about Scientific Writing. Don t have a lot of experience Nervous Scared Intimidated Many of you were also optimistic and excited to learn more about writing. Try to think about writing as an opportunity to educate your readers about what you have done and learned 2

Questionnaire Results Are you nervous or unexcited about statistical analysis? You re also not alone. 39 (55%) reported some hesitation about Statistical Analysis. Don t have a lot of experience Boring Don t like it Intimidated Our focus will be learning what type of statistical analysis to use with different types of data. 3

Writer s Corner Grammar Girl Quick and Dirty Tips For Better Writing http://grammar.quickanddirtytips.com/ 4

Writer s Corner What is wrong with the following sentence? This data is useless because it lacks specifics. 5

Outline Normal Distribution Testing Normality with Skewness & Kurtosis Measures of Central Tendency Measures of Variability Z-Scores Arbitrary Scores & Scales Percentiles 7

Frequency Distribution Histogram of hypothetical grades from a second-year chemistry class (n=144) 8

Normal Frequency Distribution Mean Mode Median Frequency 68.26% 34.13% 34.13% 2.15% 13.59% 13.59% 2.15% 95.44% -4-3 -2-1 0 1 2 3 4 Standard Deviations 9

Skewness & Kurtosis Deviations in shape from the Normal distribution. Skewness is a measure of symmetry, or more accurately, lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of peakedness. A distribution with high kurtosis has a distinct peak near the mean, declines rather rapidly, and has heavy tails. A distribution with low kurtosis has a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. 10

Skewness - Measure of Symmetry Negatively skewed Normal Positively skewed Many variables in BPK are positively skewed. Can you think of examples? 11

Kurtosis - Measure of Peakedness (Normal) 12

Coefficient of Skewness skewness = # N (X " X ) 3 i=1 i (N "1)s 3 Where: X = mean, X i = X value from individual i, N = sample size, s = standard deviation A perfectly Normal distribution has Skewness = 0 If -1 Skewness +1, then data are Normally distributed 13

Coefficient of Kurtosis kurtosis " N = i= 1 ( ( N X i! X! 1) s 4 ) 4 Where: X = mean, X i = X value from individual i N = sample size, s = standard deviation A perfectly Normal distribution has Kurtosis = 3 based on the above equation. However, SPSS and other statistical software packages subtract 3 from kurtosis values. Therefore, a kurtosis value of 0 from SPSS indicates a perfectly Normal distribution. 14

Is Height in Women Normally Distributed? 800 600 Height (Women) N = 5782 Mean = 161.0 cm SD = 6.2 cm Skewness = 0.092 Kurtosis = 0.090 400 200 0 186.0 182.0 178.0 174.0 170.0 166.0 162.0 158.0 154.0 150.0 146.0 Std. Dev = 6.22 Mean = 161.0 N = 5782.00

Is Weight in Women Normally Distributed? 700 600 500 400 300 200 100 0 Weight (Women) 125.0 120.0 115.0 110.0 105.0 100.0 95.0 90.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 50.0 45.0 N = 5704 Mean = 61.9 kg SD = 11.1 kg Skewness = 1.30 Kurtosis = 2.64 Std. Dev = 11.14 Mean = 61.9 N = 5704.00

Is Sum of 5 Skinfolds in Women Normally Distributed? 1000 800 600 Sum of 5 Skinfolds (Women) N = 5362 Mean = 75.8 mm SD = 29.0 mm Skewness = 1.04 Kurtosis = 1.30 400 200 0 20.0 40.0 60.0 80.0 220.0 200.0 180.0 160.0 140.0 120.0 100.0 Std. Dev = 29.01 Mean = 75.8 N = 5362.00

Mean Mode Median Normal Frequency Distribution 68.26% 34.13% 34.13% 2.15% 13.59% 13.59% 2.15% 95.44% -4-3 -2-1 0 1 2 3 4 Cumulative Frequency Distribution (CFD) Frequency (%) 100 50 0 18-3 -2-1 0 1 2 3

Normal Probability Plots Correlation between observed and expected cumulative probability is a measure of the deviation from normality. Normal P-P Plot of HT Normal P-P Plot of WT 1.00 1.00.75.75 Expected Cum Prob.50.25 0.00 0.00.25.50.75 1.00 Expected Cum Prob.50.25 0.00 0.00.25.50.75 1.00 Observed Cum Prob Observed Cum Prob 19

Measures of Central Tendency Mean, Median, Mode Measures of Variability (Precision) Variance, Standard Deviation, Interquartile Range Standardized scores (comparisons to the Normal distribution) Percentiles Descriptive Statistics 20

Measures of Central Tendency Mean: centre of gravity of a distribution; the weight of the values above the mean exactly balance the weight of the values below it. Arithmetic average. Median (50th %tile): the value that divides the distribution into the lower and upper 50% of the values Mode: the value that occurs most frequently in the distribution 21

Measures of Central Tendency When do you use mean, median, or mode? Height Skinfolds House prices in Vancouver Vertical jump 100 meter run time How many repeat measurements do you take on individuals to determine their true (criterion) score? Discipline specific Research design specific Objective vs. subjective tests 22

Measures of Variability Variance Var = #( X " X ) 2 ( N "1) Standard Deviation (SD) = Variance 1/2 Range is approximately = ±3 SDs For Normal distributions, report the mean and SD For non-normal distributions, report the median (50th %tile) and interquartile range (IQR, 25th and 75th %tiles) 23

Central Limit Theorem If a sufficiently large number of random samples of the same size were drawn from an infinitely large population, and the mean was computed for each sample, the distribution formed by these averages would be normal. Distribution of a single sample Distribution of multiple sample means. Standard deviation of sample means is called the standard error of the mean (SEM). 24

Standard Error of the Mean (SEM) SEM = SD n The SEM describes how confident you are that the mean of the sample is the mean of the population How does the SEM change as the size of your sample increases? 25

Standardizing Data Transform data into standard scores (e.g., Z-scores) Eliminates units of measurements Height (cm) Z-Score of Height 800 700 600 600 500 400 400 200 0 186.0 182.0 178.0 174.0 170.0 166.0 162.0 158.0 154.0 150.0 146.0 Std. Dev = 6.22 Mean = 161.0 N = 5782.00 300 200 100 0 4.00 3.50 3.00 2.50 2.00 1.50 1.00.50 0.00 -.50-1.00-1.50-2.00-2.50 Std. Dev = 1.00 Mean = 0.00 N = 5782.00 Mean=161.0; SD=6.2; N=5782 Mean=0.0; SD=1.0; N=5782 26

Standardizing Data Standardizing does not change the distribution of the data Weight (kg) Z-Score of Weight 700 800 600 500 600 400 300 200 100 0 125.0 120.0 115.0 110.0 105.0 100.0 95.0 90.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 50.0 45.0 Std. Dev = 11.14 Mean = 61.9 N = 5704.00 400 200 0 5.50 5.00 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00.50 0.00 -.50-1.00-1.50 Std. Dev = 1.00 Mean = 0.00 N = 5704.00 27

Z- Scores Score = 24 Mean of Norm = 30 Z = ( X! s X ) SD of Norm = 4 Z-score = 28

Internal or External Norm Internal Norm A sample of subjects are measured. Z-scores are calculated based upon the mean and SD of the sample. Thus, Z-scores using an internal norm tell you how good each individual is compared to the group they come from. Mean = 0, SD = 1 External Norm A sample of subjects are measured. Z-scores are calculated based upon mean and SD of an external normative sample (national, sportspecific etc.). Thus, Z-scores using an external norm tell you how good each individual is compared to the external group. Mean =?, SD =? (depends upon the external norm) E.g. You compare aerobic capacity to an external norm and get a lot of negative z-scores? What does that mean? 29

Z-scores allow measurements from tests with different units to be combined. But beware: higher Z-scores are not necessarily better performances. Variable z-scores for profile A Sum of 5 Skinfolds (mm) 1.5 Grip Strength (kg) 0.9 Vertical Jump (cm) -0.8 Shuttle Run (sec) 1.2 Overall Rating 0.7 z-scores for profile B -1.5* 0.9-0.8-1.2* -0.65 *Z-scores are reversed because lower skinfold and shuttle run scores are regarded as better performances 30

z-scores -1 0 1 2 z-scores -2-1 0 1 2 Sum of 5 Skinfolds (mm) Grip Strength (kg) Vertical Jump (cm) Shuttle Run (sec) Sum of 5 Skinfolds (mm) Grip Strength (kg) Vertical Jump (cm) Shuttle Run (sec) Overall Rating Overall Rating Test Profile A Test Profile B

Arbitrary Scores & Scales Z-scores with internal norm Mean=0, SD=1 T-scores Mean = 50, SD = 10 Hull scores Mean = 50, SD = 14 32

Arbitrary scores are based upon z-scores Z-score = +1.25 T-score = 50 + (+1.25 x 10) = 62.5 Hull score = 50 + (+1.25 x 14) = 67.5 Z-score = -1.25 T-score = 50 + (-1.25 x 10) = 37.5 Hull score = 50 + (-1.25 x 14) = 32.5 Note: You derive T-scores and Hull scores from Z-scores (based on internal norm) 33

Clinical Example: T-scores and Osteoporosis To diagnose osteoporosis, clinicians measure a patient s bone mineral density (BMD) They express the patient s BMD in terms of standard deviations above or below the mean BMD for a young normal person of the same sex and ethnicity 34

BMD T-scores and Osteoporosis T " score = BMD patient " BMDyoung normal ( ) SD young normal Although physician s call this standardized score a T-score, it is really just a Z-score where the reference mean and standard deviation come from an external population (i.e., young normal adults of a given sex and ethnicity). 35

Classification using BMD T-scores Osteoporosis T-scores are used to classify a patient s BMD into one of three categories: T-scores of -1.0 indicate normal bone density T-scores between -1.0 and -2.5 indicate low bone mass ( osteopenia ) T-scores -2.5 indicate osteoporosis Decisions to treat patients with osteoporosis medication are based, in part, on T-scores. http://www.nof.org/sites/default/files/pdfs/nof_clinicianguide200 9_v7.pdf 36

Percentiles Percentile: The percentage of the population that lies at or below that score Mean Mode Median 68.26% 34.13% 34.13% 2.15% 13.59% 13.59% 2.15% 95.44% -4-3 -2-1 0 1 2 3 4 37

Area under the Standard Normal Curve What percentage of the population is above or below a given z-score or between two given z-scores? -4-3 -2-1 0 1 2 3 4 Percentage between 0 and -1.5 43.32% Percentage above -1.5 50 + 43.32% = 93.32%

Predicting Percentiles from Mean and SD assuming a Normal Distribution Percentile Z-score for Percentile Predicted Percentile value based upon Mean = 170, SD = 10 5-1.645 153.55 25-0.675 163.25 50 0 170 75 +0.675 176.75 95 +1.645 186.45 Predicted percentile value = Mean + (Z-score x SD) 39

MS EXCEL Basics Entering data Opening data files Formatting Adjusting column widths and row heights Saving data (for use with other applications) Entering formula Functions (e.g., average) Copying formulas Split windows Relative and absolute referencing Copying and moving data Sorting data Charts Statistical tests (data analysis tool pack) Solver http://www.utexas.edu/its/training/handouts/utopia_excelgs/ 40