Empirical Rule (P148)

Similar documents
Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

1 Describing Distributions with numbers

2 Exploring Univariate Data

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Numerical Descriptions of Data

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Exploratory Data Analysis

Topic 8: Model Diagnostics

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Some estimates of the height of the podium

3.1 Measures of Central Tendency

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

CHAPTER 2 Describing Data: Numerical

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Some Characteristics of Data

Simple Descriptive Statistics

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Lecture 1: Review and Exploratory Data Analysis (EDA)

appstats5.notebook September 07, 2016 Chapter 5

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Section3-2: Measures of Center

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

STAT 113 Variability

Frequency Distribution and Summary Statistics

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Variance, Standard Deviation Counting Techniques

Basic Procedure for Histograms

Description of Data I

Putting Things Together Part 2

Math Take Home Quiz on Chapter 2

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Terms & Characteristics

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

DATA SUMMARIZATION AND VISUALIZATION

Describing Data: One Quantitative Variable

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Applications of Data Dispersions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

1. Distinguish three missing data mechanisms:

Mini-Lecture 3.1 Measures of Central Tendency

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

MAT 1371 Midterm. This is a closed book examination. However one sheet is permitted. Only non-programmable and non-graphic calculators are permitted.

USE OF PROC IML TO CALCULATE L-MOMENTS FOR THE UNIVARIATE DISTRIBUTIONAL SHAPE PARAMETERS SKEWNESS AND KURTOSIS

Unit 2 Statistics of One Variable

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

= P25 = Q1 = = P50 = Q2 = = = P75 = Q3

Statistics 114 September 29, 2012

Descriptive Analysis

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Lecture Week 4 Inspecting Data: Distributions

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

Edexcel past paper questions

Chapter 3. Populations and Statistics. 3.1 Statistical populations

Descriptive Statistics

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

( ) P = = =

The SAS System 11:03 Monday, November 11,

Lecture 07: Measures of central tendency

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Descriptive Statistics Bios 662

Math 140 Introductory Statistics. First midterm September

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Exploring Data and Graphics

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Lecture 2 Describing Data

STATS DOESN T SUCK! ~ CHAPTER 4

Statistics I Chapter 2: Analysis of univariate data

2CORE. Summarising numerical data: the median, range, IQR and box plots

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Test Bank Elementary Statistics 2nd Edition William Navidi

Descriptive Statistics (Devore Chapter One)

CHAPTER TOPICS STATISTIK & PROBABILITAS. Copyright 2017 By. Ir. Arthur Daniel Limantara, MM, MT.

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

SOLUTIONS TO THE LAB 1 ASSIGNMENT

NOTES: Chapter 4 Describing Data

4. DESCRIPTIVE STATISTICS

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats

Measures of Central Tendency: Ungrouped Data. Mode. Median. Mode -- Example. Median: Example with an Odd Number of Terms

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Stat 201: Business Statistics I Additional Exercises on Chapter Chapter 3

22.2 Shape, Center, and Spread

AP Statistics Unit 1 (Chapters 1-6) Extra Practice: Part 1

STOR 155 Practice Midterm 1 Fall 2009

Transcription:

Interpreting the Standard Deviation Numerical Descriptive Measures for Quantitative data III Dr. Tom Ilvento FREC 408 We can use the standard deviation to express the proportion of cases that might fall within one or 2 standard deviations from the mean. We can use two theorems to help Chebyshev s Rule (Tchebysheff s theorem in book, p148) Empirical Rule (p148) Chebyshev s Rule (Tchebysheff s Theorem P148) Is based on a mathematical theorem for any data At least ¾ of the measurements will fall within ± 2 standard deviations from the mean At least 8/9 of the measurements will fall within ± 3 standard deviations from the mean Empirical Rule (P148) Based on a symmetrical distribution where the mean, median, and the mode are similar the EPA mpg data fits this A Symmetrical Curve Empirical Rule Approximately 68% of the measurements will be ± 1 standard deviation from the mean Approximately 95% of the cases fall between ± 2 standard deviations from the mean 1

Empirical Rule cont. Approximately 99.7% of the cases will fall within ± 3 standard deviations from the mean This means it will be very rare to be more than 3 standard deviations from the mean when dealing with a symmetrical distribution Empirical Rule For the EPA mpg data we would expect that 68% of the cases would fall between 36.99 ± 2.42 or Between 34.57 to 38.41 Empirical Rule and EPA mpg Data A Symmetrical Curve 1 Std Dev 36.99 ± 2.42 34.57 to 38.41 2 Std Dev 36.99 ± 4.84 32.15 to 41.83 3 Std Dev 36.99 ± 7.26 29.73 to 44.25 One standard deviation below the mean This example has a mean = 60 And a standard deviation of 10 Auto Batteries Example Grade A Battery Average Life is 60 Months Guarantee is for 36 months Standard Deviation s = 10 months Frequency distribution is mound-shaped and symmetrical Battery example What percent of the Grade A Batteries will last more than 50 months? Start with finding how many standard deviations 50 months is from the mean Draw it out Figure out the probability from the Empirical Rule 2

Battery example 50 months is one standard deviation to the left of the mean This represents 34% of the cases Because ± 1 std deviation = 68%, so 1 std deviation = 34% To the right of the mean (60 months or more) represents 50% of the cases Answer: 34 + 50 = 84% Battery Example more than 50 months With a mean = 60 and s = 10 Here s the part that is one std deviation to the left Battery Example more than 50 months With a mean = 60 and s = 10 And here s the part that is greater than 60 months Battery example Approximately what percentage of the batteries will last less than 40 months? Start with finding how many standard deviations 40 months is from the mean Draw it out Figure out the probability Battery Example 40 is 2 standard deviations from the mean ± 2 standard deviations = 95% of the cases So, less than 40 is ½ of the 5% remaining So it represents 2.5% of the cases Battery Example less than 40 months With a mean = 60 and s = 10 3

Battery Example Suppose your battery lasted 37 months. What could you infer about the manufacturer s claim? Battery Example 37 months 37 months is more than 2 standard deviations from the mean Less than 2.5% of the batteries would fail within 37 months if the claims were true It s possible you just got a bad one do you feel lucky? Or unlucky?????? Z-Scores This is a method of transforming the data to reflect relative standing of the value We subtract the mean and divide by the standard deviation z i = ( x x) i s Z-Scores The result represents the distance between a given measurement x and the mean, expressed in standard deviations distance between a value and the mean expressed in standard deviations Z-Scores A positive z-score means that that measurement is larger than the mean A negative z-score means that it is smaller than the mean Demonstration of z-score EPA MPG Data Mean = 37 (rounded off) s = 2.4 One value is 34.0 z-score is (34.0 37.0)/2.4 = -1.25 This value of 34 is 1.25 standard deviations below the mean 4

You try it Create a z-score for the following values (mean = 37, s = 2.4) 30 42 38 Z-Scores If we were to convert an entire variable to z-scores This means create a new variable by taking each value, subtracting the mean, and dividing by the standard deviation This is called a data transformation The new variable would have Mean = 0 Standard deviation = 1 Empirical Rule and Z-Scores Approximately 68% of the measurements will have a z-score between 1 and 1 Approximately 95% of the measurements will have a z-score between 2 and 2 Almost all the measurements (99.7%) will have a z-score between 3 and 3 Data Example A female bank employee believes her salary is low as a result of sex discrimination. Her salary is $27,000 She collects information on salaries of male counterparts. Their mean salary is $34,000 with a standard deviation of $2,000. Does this information support her claim? How to begin to examine this issue What is her salary in relation to the mean male salary? Create a z-score for her salary to see how far below the mean her salary is in standard deviations Solve for the z-score $27,000 $34,000 z = = 3.5 $2,000 5

Rare-Event Approach Her salary is 3.5 standard deviations below that of her male counterparts If her salary is part of the same distribution as the males in her bank, a value of 3.5 would be very rare Rare Event Approach Perhaps her salary does not come from the same distribution, and we might conclude there is something different about her salary One conclusion could be discrimination But it could also be related to performance, or time on the job, or some other factors Rare Event Approach What if the woman s salary was only 1 standard deviation below her male counterparts? The Rare Event Approach We hypothesize a frequency distribution to describe a population of measurements We draw a sample from the population Compare the sample statistic to the hypothesized frequency distribution And see how likely or unlikely the sample came from the hypothesized distribution Box Plots The book covers quartiles and box plots on page 158 and page 162 I want you to look this material over, but I won t require you draw a box plot Box plots are a way to show the distribution of a variable relative to the median Box plots highlight extreme values in data Box Plots and 5 number summary Five number summary Lowest Q1 Median Q3 Highest number This gives us the extremes, the middle, the range, and the Inter-Quartile Range 6

X low Standard Box Plot The box is proportional to the data and has the median in the middle, and Q 1 and Q3 on either end Q1 M Q3 Plus whiskers that go to the two extreme values X high Modified Box Plot A more advanced Box Plot use the Inter- Quartile Range to construct an Inner and Outer Fence Inner Fence = 1.5 x IQR Outer Fence = 3 x IQR To better identify mild and extreme outliers SAS will do a Stem & Leaf (or Histogram) and a Box Plot Histogram # Boxplot 18 2.3+******.**** 11.******** 22.********** 28.********** 29.*********** 32.***************** 51 0.9+*********************** 68.***************** 50 +-----+.*************** 43.**************************** 82.****************************** 90 +.************************************ 106 *-----*.*************************** 80-0.5+*********************** 67.************************************* 111 +-----+.********************************** 101.************************** 76.********** 28.*********** 33.****** 17-1.9+*** 8 ----+----+----+----+----+----+----+-- * may represent up to 3 counts SAS Univariate Example Measures based on the mean Measures based on the median and position Extreme Values The SAS System Univariate Procedure Variable= Poultry Grower Satisfaction Moments N 1151 Sum Wgts 1151 Mean 0 Sum 0 Std Dev 0.941405 Variance 0.886244 Skewness 0.41011 Kurtosis -0.53125 USS 1019.18 CSS 1019.18 CV. Std Mean 0.027748 T:Mean=0 0 Pr> T 1.0000 Num ^= 0 1151 Num > 0 524 M(Sign) -51.5 Pr>= M 0.0026 Sgn Rank -15772 Pr>= S 0.1621 W:Normal 0.954257 Pr<W 0.0001 Quantiles(Def=5) 2.248864 99% 100% Max 2.248864 75% Q3 0.643514 95% 1.712166 50% Med -0.09904 90% 1.35261 25% Q1-0.72838 10% -1.09005 0% Min -1.81783 5% -1.43535 1% -1.71886 Range 4.066695 Q3-Q1 1.37189 Mode -1.00449 Extremes Obs Highest Lowest Obs -1.81783( 833) 2.248864( 936) -1.81783( 814) 2.248864( 1005) -1.81783( 790) 2.248864( 1124) -1.81783( 501) 2.248864( 1127) -1.81783( 431) 2.248864( 1202) The UNIVARIATE Procedure Variable: SPEED Moments N 38 Sum Weights 38 Mean 98.3157895 Sum Observations 3736 Std Deviation 16.1765684 Variance 261.681366 Skewness 0.97103858 Kurtosis 1.09588273 Uncorrected SS 376990 Corrected SS 9682.21053 Coeff Variation 16.4536831 Std Error Mean 2.62418592 Basic Statistical Measures Location Variability Mean 98.31579 Std Deviation 16.17657 Median 95.00000 Variance 261.68137 Mode 95.00000 Range 70.00000 Interquartile Range 25.00000 NOTE: The mode displayed is the smallest of 2 modes with a count of 7. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t 37.46525 Pr > t <.0001 t Sign M 19 Pr >= M <.0001 Signed Rank S 370.5 Pr >= S <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 145 99% 145 95% 140 90% 115 75% Q3 110 50% Median 95 25% Q1 85 10% 80 5% 80 1% 75 0% Min 75 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 75 27 115 24 80 32 115 25 80 28 120 19 Student Speeding Data Student Speeding Data Stem Leaf # Boxplot 14 5 1 14 0 1 13 13 12 12 0 1 11 55 2 11 0000000 7 +-----+ 10 55 2 10 04 2 9 5555555 7 *--+--* 9 000 3 8 55555 5 +-----+ 8 000002 6 7 5 1 ----+----+----+----+ Multiply Stem.Leaf by 10**+1 7

Alternative Stem and Leaf Speed Stem and Leaf for Speed Stem unit: 10 75 8 0 0 0 0 0 2 5 5 5 5 5 9 0 0 0 5 5 5 5 5 5 5 10 0 4 5 5 11 0 0 0 0 0 0 0 5 5 12 0 13 14 0 5 Exam I example Length of Hospital Stay Data 0 0+ 2 2 3 3 3 0++ 4 4 4 4 4 4 4 5 5 5 5 5 0+++ 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 0++++ 8 8 8 8 9 9 9 9 9 9 1 0 1 1 1+ 2 1++ 5 The stems break the tens digit into parts designated by using + where 0++ stands for no tens, values of 4 and 5. Sum of x = 327 Sum of x 2 = 2477 Q2 = 6 n = 50 Calculate: 1. Mean 2. Standard Deviation 3. Median 4. Mode 5. Z-score for a value of 15 Exam I example Length of Hospital Stay Data 0 1. Mean = 327/50 = 6.54 0+ 2 2 3 3 3 0++ 4 4 4 4 4 4 4 5 5 5 5 5 2. Std Dev = 0+++ 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 0++++ 8 8 8 8 9 9 9 9 9 9 [(2477 (327 2 /50))/49].5 = 2.63 1 0 1 1 1+ 2 1. Median = Q2 = 6 1++ 5 2. Mode = 6 The stems break the tens digit into parts designated by 3. using Z-score + for a value of 15 = where 0++ stands for no tens, values of 4 and 5. (15 6.54)/2.63 = 3.22 Sum of x = 327 Sum of x 2 = 2477 Q2 = 6 n = 50 Calculate: Alternative Stem and Leaf Stem-and-Leaf Display for Length Stem unit: 1 20 0 30 0 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 8 0 0 0 0 9 0 0 0 0 0 0 10 0 11 0 0 12 0 13 14 15 0 8