Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

Similar documents
Lecture 1: Review and Exploratory Data Analysis (EDA)

1 Describing Distributions with numbers

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

appstats5.notebook September 07, 2016 Chapter 5

Ti 83/84. Descriptive Statistics for a List of Numbers

2 Exploring Univariate Data

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Some estimates of the height of the podium

Describing Data: One Quantitative Variable

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Numerical Descriptions of Data

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Descriptive Statistics (Devore Chapter One)

STA 248 H1S Winter 2008 Assignment 1 Solutions

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

DATA HANDLING Five-Number Summary

Simple Descriptive Statistics

Lecture 18 Section Mon, Feb 16, 2009

Lecture 18 Section Mon, Sep 29, 2008

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

3.1 Measures of Central Tendency

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

STAT 113 Variability

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Basic Procedure for Histograms

Applications of Data Dispersions

Statistics vs. statistics

How Wealthy Are Europeans?

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Putting Things Together Part 2

Section3-2: Measures of Center

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Description of Data I

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Descriptive Statistics

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Measures of Variability

Math 140 Introductory Statistics. First midterm September

DATA SUMMARIZATION AND VISUALIZATION

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Chapter 5 Normal Probability Distributions

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary.

3.3-Measures of Variation

Quantitative Analysis and Empirical Methods

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Basic Sta)s)cs. Describing Data Measures of Spread

Descriptive Statistics

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Lecture Week 4 Inspecting Data: Distributions

Descriptive Analysis

CHAPTER 2 Describing Data: Numerical

4. DESCRIPTIVE STATISTICS

STA Module 3B Discrete Random Variables

Using the TI-83 Statistical Features

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

A.REPRESENTATION OF DATA

MATH SPEAK - TO BE UNDERSTOOD AND MEMORIZED

Variance, Standard Deviation Counting Techniques

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf. Key: 2 0 = 20 cm. 6 a Stem Leaf. Key: 4 3 = 43 cm.

Numerical Descriptive Measures. Measures of Center: Mean and Median

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Chapter 4 Variability

STAB22 section 1.3 and Chapter 1 exercises

Frequency Distribution and Summary Statistics

VARIABILITY: Range Variance Standard Deviation

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Found under MATH NUM

NOTES: Chapter 4 Describing Data

CSC Advanced Scientific Programming, Spring Descriptive Statistics

Empirical Rule (P148)

AP Statistics Chapter 6 - Random Variables

Edexcel past paper questions

Statistics S1 Advanced/Advanced Subsidiary

(And getting familiar with R) Jan. 8th, School of Information, University of Michigan. SI 544 Descriptive Statistics

2CORE. Summarising numerical data: the median, range, IQR and box plots

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Solutions for practice questions: Chapter 9, Statistics

MVE051/MSG Lecture 7

5.3 Standard Deviation

Section Distributions of Random Variables

2 2 In general, to find the median value of distribution, if there are n terms in the distribution the

Numerical Measurements

Diploma in Financial Management with Public Finance

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

Math 14, Homework 6.2 p. 337 # 3, 4, 9, 10, 15, 18, 19, 21, 22 Name

Copyright 2005 Pearson Education, Inc. Slide 6-1

Transcription:

1 Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

2 Once we know the central location of a data set, we want to know how close things are to the center.

2 Once we know the central location of a data set, we want to know how close things are to the center. We ll see two ways to measure dispersion of a data set.

3 five-number summary (goes with the median)

3 five-number summary (goes with the median) standard deviation (goes with the mean)

4 Five-Number Summary

5 Five-number Summary: 1. Min 2. Lower Quartile 3. Median 4. Upper Quartile 5. Max

6 Definition The Min is the smallest value in the whole data set.

6 Definition The Min is the smallest value in the whole data set. The Max is the largest value in the whole data set.

6 Definition The Min is the smallest value in the whole data set. The Max is the largest value in the whole data set. The Lower Quartile is the median of the lower half.

6 Definition The Min is the smallest value in the whole data set. The Max is the largest value in the whole data set. The Lower Quartile is the median of the lower half. The Upper Quartile is the median of the upper half.

7 Example The appraisals of the 10 houses are: [$75K, $96K, $107K, $110K, $110K, $118K, $130K, $135K, $150K, $520K ]

7 Example The appraisals of the 10 houses are: [$75K, $96K, $107K, $110K, $110K, $118K, $130K, $135K, $150K, $520K ] Find the five-number summary.

8 Solution We already found: the median, Med = $114K

8 Solution We already found: the median, Med = $114K the lower half, [$75K, $96K, $107K, $110K, $110K ]

8 Solution We already found: the median, Med = $114K the lower half, [$75K, $96K, $107K, $110K, $110K ] the upper half [$118K, $130K, $135K, $150K, $520K ]

8 Solution We already found: the median, Med = $114K the lower half, [$75K, $96K, $107K, $110K, $110K ] the upper half [$118K, $130K, $135K, $150K, $520K ] Since each half has size 5, their respective medians will be in the 3rd location.

9 Solution Thus the lower quartile is Q1 = $107K

9 Solution Thus the lower quartile is Q1 = $107K the upper quartile is Q3 = $135K

9 Solution Thus the lower quartile is Q1 = $107K the upper quartile is Q3 = $135K the lowest value is Min = $75K

9 Solution Thus the lower quartile is Q1 = $107K the upper quartile is Q3 = $135K the lowest value is Min = $75K the highest value is Max = $520K

9 Solution Thus the lower quartile is Q1 = $107K the upper quartile is Q3 = $135K the lowest value is Min = $75K the highest value is Max = $520K So the five-number summary is: [Min = $75K, Q1 = $107K, Med = $114K, Q3 = $135K, Max = $520K ].

10 The five-number summary can be visualized with a boxplot diagram, or box-and-whiskers diagram.

11 75 107 114 135 520 Min Q1Med Q3 Max \\ 50 75 100 125 150 500 525

12 The box goes from the lower quartile to the upper quartile, with a mark at the median.

12 The box goes from the lower quartile to the upper quartile, with a mark at the median. Two whiskers extend from the box to the Min and Max.

13 Remarks: the left whisker spans the bottom 25%

13 Remarks: the left whisker spans the bottom 25% the box spans the middle 50%

13 Remarks: the left whisker spans the bottom 25% the box spans the middle 50% the right whisker spans the top 25%

13 Remarks: the left whisker spans the bottom 25% the box spans the middle 50% the right whisker spans the top 25% each half of the box spans 25%

14 Example The ages of the police officers in the Clearview Police Department are Age 22 25 26 27 28 29 30 32 35 39 Freq. 3 4 3 5 4 6 5 4 5 2

14 Example The ages of the police officers in the Clearview Police Department are Age 22 25 26 27 28 29 30 32 35 39 Freq. 3 4 3 5 4 6 5 4 5 2 Find the five-number summary and draw the boxplot.

15 Age 22 25 26 27 28 29 30 32 35 39 Freq. 3 4 3 5 4 6 5 4 5 2 Cum. Freq 3 7 10 15 19 25 30 34 39 41

16 The size is n = 41, so the median is in location

16 The size is n = 41, so the median is in location 41 + 1 = 21. 2

16 The size is n = 41, so the median is in location 41 + 1 = 21. 2 The lower half has size 20, so the lower quartile is the average of the values at locations 10 and 11: 26 + 27 Q1 = = 26.5 2

17 The upper half also has size 20, so the upper quartile is the average of the values at locations 10 and 11 of the upper half.

17 The upper half also has size 20, so the upper quartile is the average of the values at locations 10 and 11 of the upper half. Since the median is at location 21, the third quartile is the average of the values at locations 31 and 32 of the whole data set: Q3 = 32 + 32 2 = 32

18 Five-number summary: [Min = 22, Q1 = 26.5, Med = 29, Q3 = 32, Max = 39] 22 26.5 29 32 39 Min Q1 Med Q3 Max 20 25 30 35 40 45

19 Remark: Outliers can be drawn separated from the rest of the data set.

20 Example The appraisals of the 10 houses are: [$75K, $96K, $107K, $110K, $110K, $118K, $130K, $135K, $150K, $520K ]

20 Example The appraisals of the 10 houses are: [$75K, $96K, $107K, $110K, $110K, $118K, $130K, $135K, $150K, $520K ] Find the five-number summary with outliers separated.

21 75 107 114 135 150 520 Min Q1Med Q3 Max \\ 50 75 100 125 150 500 525

22 Boxplots and five-number summaries are useful when comparing two data sets.

23 Example Waiting times at two car washes: Acme Car Wash: [Min = 1, Q1 = 5, Med = 8, Q3 = 9, Max = 12] Kleen Car Wash: [Min = 3, Q1 = 4, Med = 5, Q3 = 8, Max = 20] (Times are in minutes.)

24 Example Draw the boxplots together, and compare them.

25 Solution Here are the boxplots: Acme Kleen 0 2 4 6 8 10 12 14 16 18 20

26 Solution The Min and Max tell us:

26 Solution The Min and Max tell us: everyone at Kleen has to wait at least 3 minutes, and some people have a very long wait.

26 Solution The Min and Max tell us: everyone at Kleen has to wait at least 3 minutes, and some people have a very long wait. at Acme, some have a tiny wait and everyone gets started in 12 minutes.

26 Solution The Min and Max tell us: everyone at Kleen has to wait at least 3 minutes, and some people have a very long wait. at Acme, some have a tiny wait and everyone gets started in 12 minutes. Acme seems better.

27 Solution But, the Median tells us: half of the customers of Acme wait 8 minutes for service

27 Solution But, the Median tells us: half of the customers of Acme wait 8 minutes for service at Kleen half of them start in 5 minutes

27 Solution But, the Median tells us: half of the customers of Acme wait 8 minutes for service at Kleen half of them start in 5 minutes Now Kleen seems better.

28 Which is better? There s no simple answer

28 Which is better? There s no simple answer If you don t mind waiting a little, Acme is better, since there are no long waits.

28 Which is better? There s no simple answer If you don t mind waiting a little, Acme is better, since there are no long waits. If you re willing to risk a long wait, in hope of a really short wait, Kleen is better.

29 Standard Deviation

30 When using the mean to measure the center, we use the standard deviation to measure dispersion.

30 When using the mean to measure the center, we use the standard deviation to measure dispersion. Think of standard deviation as measuring how far from the average the data points tend to be.

31 (Wrong way:)

31 (Wrong way:) 1. take the deviation of each data point from the average

31 (Wrong way:) 1. take the deviation of each data point from the average 2. average those deviations

31 (Wrong way:) 1. take the deviation of each data point from the average 2. average those deviations The deviation of a point x i from the average x is just x i x

32 (Wrong way:)

32 (Wrong way:) Example Weekly Sales of Home Town Pharmacy: S M T W R F S $2,548, $1,225, $1,732, $1,871, $975, $2,218, $1,339. Find the average of x i x.

32 (Wrong way:) Example Weekly Sales of Home Town Pharmacy: S M T W R F S $2,548, $1,225, $1,732, $1,871, $975, $2,218, $1,339. Find the average of x i x. We have already found the average: x = 1701.14.

33 (Wrong way:) Here are deviations x i x: Day x i (sales) x i x (deviation) Sunday 2,548.00 846.86 Monday 1,225.00-476.14 Tuesday 1,732.00 30.86 Wednesday 1,871.00 169.86 Thursday 975.00-726.14 Friday 2,218.00 516.86 Saturday 1,339.00-362.14 Total 11,908.00 0.02 Average 1,701.14 0.00

34 (Wrong way:) Deviations are like distances, but with a sign

34 (Wrong way:) Deviations are like distances, but with a sign Positive deviation x i is to the right of x

34 (Wrong way:) Deviations are like distances, but with a sign Positive deviation x i is to the right of x Negative deviation x i is to the left of x

35 (Wrong way:) The average of those deviations: 846.86 476.14 + 30.86 + 169.86 726.14 + 516.86 362.14 7 = 0.00

35 (Wrong way:) The average of those deviations: 846.86 476.14 + 30.86 + 169.86 726.14 + 516.86 362.14 7 This is going to happen with any data set! Average deviation from the mean is a useless measure of dispersion. = 0.00

36 (Right way:) However, if we square all deviations, they will turn all positive

36 (Right way:) However, if we square all deviations, they will turn all positive We can then average those squared deviations

36 (Right way:) However, if we square all deviations, they will turn all positive We can then average those squared deviations that is called the variance

37 Definition The variance var(x) of a data set x is the average of the squared deviations from the mean x: var(x) = 1 (xi x) 2 n

38 To compensate for the squaring, we take the square root of the variance.

38 To compensate for the squaring, we take the square root of the variance. Definition The standard deviation is σ(x) = var(x)

39 Example Find the variance and standard deviation for the Home Town Pharmacy daily sales data set.

40 Day x (sales) x x (x x) 2 Sunday 2,548.00 846.86 717171.8596 Monday 1,225.00-476.14 226709.2996 Tuesday 1,732.00 30.86 952.3396 Wednesday 1,871.00 169.86 28852.4196 Thursday 975.00-726.14 527279.2996 Friday 2,218.00 516.86 267144.2596 Saturday 1,339.00-362.14 131145.3796 Total 11,908.00 0.02 1899254.8572 Average 1,701.14 0.00 271322.1224571

41 the variance is var(x) = 271322.1224571

41 the variance is var(x) = 271322.1224571 the standard deviation is σ(x) = 271322.1224571 = 520.89

42 What if we start with a frequency table or a histogram?

43 Example Find the standard deviation for the Math 109 quizzes score 4 5 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 freq. 1 1 2 2 3 5 9 12 11 13 9 8 7 5 3 2 1 1 cum fr. 1 2 4 6 9 14 23 35 46 59 68 76 83 88 91 93 94 95

44 Solution We computed the average µ = 14.64

44 Solution We computed the average µ = 14.64 For convenience turn the frequency table into a vertical table

45 x f x f (x µ) (x µ) 2 (x µ) 2 f 4 4-10.64 113.2096 113.2096 5 1 5-9.64 92.9296 92.9296 8 16-6.64 44.0896 88.1792 9 2 18-5.64 31.8096 63.6192 10 3 30-4.64 21.5296 64.5888 11 5 55-3.64 13.2496 66.2480 12 9 108-2.64 6.9696 62.7264 13 12 156-1.64 2.6896 32.2752 14 11 154-0.64 0.4096 4.5056 15 13 195 0.36 0.1296 1.6848 16 9 144 1.36 1.8496 16.6464 17 8 136 2.36 5.5696 44.5568 18 7 126 3.36 11.2896 79.0272 19 5 95 4.36 19.0096 95.0480 20 3 60 5.36 28.7296 86.1888 21 2 42 6.36 40.4496 80.8992 22 22 7.36 54.1696 54.1696 25 1 25 10.36 107.3296 107.3296 Tot. 95 1391 1067.6432 Ave. 14.64 11.2383

46 So the standard deviation is σ = 11.2383 = 3.35.

47 To find the Standard Deviation σ 1. Compute the deviations x i µ. 2. Square the deviations (x i µ) 2. 3. Average the squared deviations to the variance (xi µ) 2 var =. n 4. Take the square root of the variance σ = var.

48 Question What does standard deviation mean in practice?

49 In the previous example: The average is µ = 14.64 the standard deviation is σ = 3.35

50 How many data points are within one standard deviation of the average?

50 How many data points are within one standard deviation of the average? µ σ = 11.29 and µ + σ = 17.99

50 How many data points are within one standard deviation of the average? µ σ = 11.29 and µ + σ = 17.99 Between these two values there are a total of 9 + 12 + 11 + 13 + 9 + 8 = 62 data points (out of 95), i.e., about two thirds.

51 For nice data sets, about 2 of the 3 data set is located within one standard deviation of the average.

51 For nice data sets, about 2 of the 3 data set is located within one standard deviation of the average. if σ is small, the data points are crowded close to µ

51 For nice data sets, about 2 of the 3 data set is located within one standard deviation of the average. if σ is small, the data points are crowded close to µ if σ is large, the data points are scattered.