Applications of Data Dispersions

Similar documents
Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

The Normal Probability Distribution

Numerical Descriptions of Data

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

3.1 Measures of Central Tendency

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

1 Describing Distributions with numbers

Chapter 3. Lecture 3 Sections

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Normal Model (Part 1)

Simple Descriptive Statistics

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Discrete Probability Distribution

Terms & Characteristics

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Section3-2: Measures of Center

( ) P = = =

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

3.5 Applying the Normal Distribution (Z-Scores)

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Found under MATH NUM

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

appstats5.notebook September 07, 2016 Chapter 5

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Lecture 9. Probability Distributions. Outline. Outline

Chapter 6. The Normal Probability Distributions

Lecture 9. Probability Distributions

York University MATH 1131 (FALL 2005): Introduction to Statistics Mid Term Test Friday, Oct 28, 2005

Math 14, Homework 6.2 p. 337 # 3, 4, 9, 10, 15, 18, 19, 21, 22 Name

(a) salary of a bank executive (measured in dollars) quantitative. (c) SAT scores of students at Millersville University quantitative

Some estimates of the height of the podium

DATA HANDLING Five-Number Summary

The Normal Model The famous bell curve

Basic Sta)s)cs. Describing Data Measures of Spread

Describing Data: One Quantitative Variable

Statistics vs. statistics

Chapter 5 Normal Probability Distributions

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Chapter 4. The Normal Distribution

Description of Data I

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

2 2 In general, to find the median value of distribution, if there are n terms in the distribution the

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STAB22 section 1.3 and Chapter 1 exercises

MSM Course 1 Flashcards. Associative Property. base (in numeration) Commutative Property. Distributive Property. Chapter 1 (p.

Prob and Stats, Nov 7

3.3-Measures of Variation

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Mini-Lecture 3.1 Measures of Central Tendency

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Math 227 Elementary Statistics. Bluman 5 th edition

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

DATA SUMMARIZATION AND VISUALIZATION

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Tuesday, Week 10. Announcements:

Continuous Probability Distributions & Normal Distribution

Descriptive Statistics

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Empirical Rule (P148)

ECON 214 Elements of Statistics for Economists 2016/2017

2 DESCRIPTIVE STATISTICS

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

CH 5 Normal Probability Distributions Properties of the Normal Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

Section 3.5a Applying the Normal Distribution MDM4U Jensen

S3 (3.2) N5 Mean & Standard Deviation.notebook October 31, 2014

CHAPTER 2 Describing Data: Numerical

Shifting and rescaling data distributions

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

STAT 113 Variability

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Descriptive Statistics

22.2 Shape, Center, and Spread

Math 140 Introductory Statistics. First midterm September

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

SOLUTIONS TO THE LAB 1 ASSIGNMENT

ECON 214 Elements of Statistics for Economists

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Chapter 4 Variability

Set up a normal distribution curve, to help estimate the percent of the band that, on average, practices a greater number of hours than Alexis.

Numerical Descriptive Measures. Measures of Center: Mean and Median

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Measure of Variation

5-1 pg ,4,5, EOO,39,47,50,53, pg ,5,9,13,17,19,21,22,25,30,31,32, pg.269 1,29,13,16,17,19,20,25,26,28,31,33,38

1/12/2011. Chapter 5: z-scores: Location of Scores and Standardized Distributions. Introduction to z-scores. Introduction to z-scores cont.

The Normal Distribution

starting on 5/1/1953 up until 2/1/2017.

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Statistics 511 Supplemental Materials

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Transcription:

1 Applications of Data Dispersions Key Definitions Standard Deviation: The standard deviation shows how far away each value is from the mean on average. Z-Scores: The distance between the mean and a given value, represented in number of standard deviations. Percentile: A percentage that represents a set of data less than or equal to the value of data given. Quartiles: Divide the data into four equal parts. Interquartile Range: The range of the middle 50% of the data set. Fences: The cutoff point when finding the outliers. Outliers: The outlier is a data value that is not close to or similar to the other data values. These are known as extreme values. Empirical Rule Empirical Rule Usage: If our data has a bell-shaped distribution, we can utilize the empirical rule to find the percentage that lies within a certain number of standard deviations. We will use a bellshaped curve (shown below) to looks at this more closely. 68%.15% 34% 34% 2.35% 2.35% 13.5% 13.5%.15% µ - 3σ µ - 2σ µ - σ µ µ + σ µ + 2σ µ + 3σ 95% 99.7% Explaining the Empirical Rule s Bell-Shaped Curve: The highest point of the graph is where the mean lies. The lines to the right and left of the mean are data value points that lie a certain number of standard deviations away from the mean. The percentages located within the different standard deviations is the percentage of data that lies within the region.

2 How to Interpret Percentage Using the Empirical Rule Graph: There are two ways they will ask you to find the percentage using the empirical rule graph. The following will explain both: Version 1: The first way is they will tell you that the data lies within k standard deviations. The k represents the number of standard deviations away the data is and not the actual value of the standard deviation. We mark the lines that have the k standard deviations before the σ. We then add up the percentages in-between. Version 2: In this version, you will apply the steps from version 1 after you have applied some new steps. If you are given the value of data, the mean, and the standard deviation and asked to find the percentage, you must find the k standard deviations first before adding up the percentages. To do this, we use an equation to solve for the missing k. The following is the equation (x is the data value given): μ + k 1 σ = x and μ k 2 σ = x You plug in the information already known and solve for k. When you have found k, you mark the lines that have the k standard deviations before the σ, and add up the percentages in-between. Example of Interpreting Percentage from Empirical Rule Graph: Version 2: You are looking for the percentage of kids who have taken swimming lessons at a facility that are between the ages 7 and 10. The mean of the data is 8 and the standard deviation in 1. We must plug into the equation is to find the k. 8 + k 1 = 10 and 8 k 2 = 7 k 1 = 2 and k 2 = 1 Since the k 1 is 2 and the k 2 is 1, we mark the lines on the graph with the proper k in front of the standard deviation. We add together those numbers to find the answer. Then, we get the answer: Percentage = 34% + 34% + 13.5% = 81.5% How to Interpret Values Using the Empirical Rule Graph: When you have a problem that gives you the mean, standard deviation, and the percentage, they want you to find the data values. To do this, you look to see which already known percentage it is (68%, 95%, and 99.7%). When you find the proper percentage on the graph, follow the lines to the proper equations and plug in your mean and standard deviations. From there, you will have to generate the data values. Example of Interpreting Values Using the Empirical Rule Graph: You are looking to find the ages of the kids who have taken swimming lessons at a facility. The mean age is 8 and the standard deviation is 2. What is 95% of the kids ages? We first looks to find how many standard deviations away is the 95% mark on the graph.

3 When we mark the lines, we see the proper formulas. Now, we plug into the formulas to find the values: μ 2σ and μ + 2σ 8 2(2) = 8 4 = 4 and 8 + 2(2) = 8 + 4 = 12 95% of the kids in the swim lessons are between the ages 4 and 12. Chebyshev s Inequality How to Find Percentages from k Standard Deviations: It is easy to find the percentage with k standard deviations with Chebyshev s Inequality. When k > 1, you square the k, divide 1 by the squared k, subtract that number from 1, and then multiply the subtraction by 100%. The following is the formula: (1 1 k2) 100% Example of Finding the Percentages from k Standard Deviations: You are looking for the percentage of kids who have taken swimming lessons at a facility. The mean of the data is 8 and the standard deviation in 2. What is the minimum percentage of the ages of kids within 2.5 standard deviations? 1 (1 2.5 2) 100% = (1 1 ) 100% = (1.16) 100% = (. 84) 100% = 84% 6.25 The minimum percentage of the ages of kids within 2.5 standard deviations is 84%. How to Find Percentages from Values: If you are given the mean, standard deviation, and two data values, you can find the minimum percentage using Chebyshev s inequality. You first must find the k. To do so, you plug in your information into the following formulas: μ k 1 σ = x 1 and μ + k 2 σ = x 2 where x 1 < x 2 If k 1 and k 2 are the same, then you plug the k into Chebyshev s inequality to find the percentage. Follow the steps from the How to Find Percentages from k Standard Deviations above on how to use the inequality where you know the k. Example of Finding Percentages from Values: You are looking for the percentage of kids who have taken swimming lessons at a facility. The mean of the data is 8 and the standard deviation in 2. What is the minimum percentage of the ages of kids who are between the ages 5 and 11?

4 To find the percentage, we first must find the k. We must plug our data values into our formulas: μ k 1 σ = x 1 and μ + k 2 σ = x 2 8 2k 1 = 5 and 8 + 2k 2 = 11 k 1 = 1.5 and k 2 = 1.5 Since k 1 and k 2 are the same, we can now use Chebyshev s inequality to find the percentage. 1 (1 1.5 2) 100% = (1 1 ) 100% = (1.44) 100% = (. 56) 100% = 56% 2.25 The percentage of kids who are in swimming class that are between the ages of 5 and 11 is 56%. How to find Values from k Standard Deviations: To find data values from the k standard deviations, you just plug into the following formulas: μ kσ and μ + kσ Example of Finding Values from k Standard Deviations: You are looking for the ages of kids who have taken swimming lessons at a facility. The mean of the data is 8 and the standard deviation in 2. What are the ages of the kids within 2.5 standard deviations? You plug into the formula with the information to find the values. μ kσ and μ + kσ 8 2.5(2) = 8 5 = 3 and 8 + 2.5(2) = 8 + 5 = 13 Z-Scores How to Find Z-Scores: To find the z-score for a value of data, we take the value and subtract the mean from it. After we subtract the two, we divide the subtraction by the standard deviation. We use the same formula for sample and population data sets. The following is both formulas: Sample Z-Score: Population Z-Score: z = x x s z = x μ σ Example of Finding Z-Scores: You are looking for the ages of kids who have taken swimming lessons at a facility. The mean of the data is 8 and the standard deviation in 2. How many standard deviations away (what is the z-score) from the mean is a child who goes at the age of 12? Since, we have our data value, mean, and standard deviation, we can plug into the formula above: x μ z = = 12 8 = 4 σ 2 2 = 2 A child who is 12 is 2 standard deviations away from the mean of the children s ages in the swimming lessons. Percentiles kth Percentile: When we talk about the kth percentile, we are saying that k% percent of the data had the value of your exact data value or lower. It is represented by a P k.

5 Example of How to Interpret the kth Percentile: On a math exam, a student s score of 91% makes them part of the 88 th percentile. This means that 88% of people who took the math exam scored a 91% or lower on the exam. This, also, means that 12% of people scored higher than 91% on the math exam. Quartiles and Interquartile Range How to Find the 1 st Quartile, 2 nd Quartile, and 3 rd Quartile: To find the different quartiles, you first must find the 2 nd quartile, also known as the median. We find the 2 nd quartile the same way we have found the median in the past. From there, we divide the data set into halves. The data to the left side of the median will be the data used to find the 1 st quartile, and the data to the right side of the median is used to find the 3 rd quartile. The 1 st quartile is the median of the first half of the data. So, we take the data to the left side of the 2 nd quartile and mark one off from the left and the right until we are left with the data in the middle. The 3 rd quartile is the median of the data to the right side of the 2 nd quartile. To find the 3 rd quartile, we mark one data off from the left and the right until we have the data in the middle left. How to Find the Interquartile Range: Find the interquartile range is easy once we have found the 1 st quartile and the 3 rd quartile. We follow the following formula: IQR = Q 3 Q 1 Example of Finding the 1 st Quartile, 2 nd Quartile, 3 rd Quartile, and the Interquartile Range: Our data value set is the following: 72, 73, 75, 75, 78, 83, 85, 90, 94, 98 To find the 2 nd quartile/median of the data, we mark one on the left and one on the right continuously until we are left with the values in the middle. Since it is an even, we have to add the two values together and then divide by two to find the 2 nd Quartile: 78 + 83 Q 2 = = 80.5 2 Now that we have found the 2 nd quartile, we can find the 1 st quartile. To do that, we take the data to the left side of the median. Because we have an even sample size, we must also take the data value 78. We will mark one off the left and the right until we have a value left in the middle. 72, 73, 75, 75, 78 So, we have now found that Q 1 = 75. Now, we have to find the 3 rd quartile. To do this, we take the data to the right side of the median. Because we have an even sample size, we must also take the data value 83. We will mark one off the left and the right until we have a value left in the middle. 83, 85, 90, 94, 98 So, we have found that Q 3 = 90. Now, the last step is to find the interquartile range. To do this, we take the Q 1 and the Q 3 we just found and subtract them from one another. IQR = Q 3 Q 1 = 90 75 = 15 * For an easier way to find the Quartiles, you can use our Excel spreadsheet for an example and instructions on how to find the Quartiles in Excel.

6 Fences and Outliers How to Determine an Outlier: Determining what an outlier is in the data can be obvious or a little harder than expected. We have a formula that we use to find the cutoff points for data that is considered to not be an outlier, known as fences. To find the fences and therefore the outliers, you must have found the 1 st quartile, the 3 rd quartile, and the interquartile range. If you have these three already found, you are able to find the two fences. The following formulas are to find our fences: Lower Fence = Q 1 1.5(IQR) Upper Fence = Q 3 + 1.5(IQR) Once we have found our lower fence and upper fence, we look to see if any of our data is below the lower fence or higher than the upper fence. If a data value is below the lower fence or above the upper fence, then that data value is an outlier. If it is not below the lower fence or higher than the upper fence, then there are no outliers in your data set. You can have multiple outliers. Example of Determining an Outlier: Using the data set from the example from the Quartiles and Interquartile Range section, we already know our 1 st quartile, 3 rd quartile, and our interquartile range. The following is the data, the 1 st quartile, the 3 rd quartile, and the interquartile range: 72, 73, 75, 75, 78, 83, 85, 90, 94, 98 Q 1 = 75, Q 3 = 90, and IQR = 15 With the information, we plug into the formulas for our fences: Lower Fence = 75 1.5(15) = 75 22.5 = 52.5 Upper Fence = 90 + 1.5(15) = 90 + 22.5 = 112.5 Now, we look at our data set to see if any of the data is lower than 52.5 or higher than 112.5. Since no data is smaller or higher, there are no outliers in our data set. Symbol Guide Chapter Title Symbols Term Symbol Use Population Mean µ Identify the population mean Population Standard Deviation σ Identify the population standard deviation Sample Mean x Identify the sample mean Sample Standard Deviation s Identify the sample standard deviation Amount of Standard Deviations k Identify the amount of standard deviations used to reach value Z-Score z Identify the z-score kth Percentile P k Identify the kth percentile 1 st Quartile Q 1 Identify the first quartile 2 nd Quartile/Median Q 2 or M Identify the second quartile/median 3 rd Quartile Q 3 Identify the third quartile Interquartile Range IQR Identify the interquartile range