Section3-2: Measures of Center

Similar documents
Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Some estimates of the height of the podium

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

1 Describing Distributions with numbers

appstats5.notebook September 07, 2016 Chapter 5

Describing Data: One Quantitative Variable

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Lecture 2 Describing Data

Mini-Lecture 3.1 Measures of Central Tendency

Numerical Descriptions of Data

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

3.1 Measures of Central Tendency

DATA SUMMARIZATION AND VISUALIZATION

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT 113 Variability

Math 140 Introductory Statistics. First midterm September

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Statistics vs. statistics

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Applications of Data Dispersions

Frequency Distribution and Summary Statistics

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Unit 2 Statistics of One Variable

Math 243 Lecture Notes

Copyright 2005 Pearson Education, Inc. Slide 6-1

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Lecture Week 4 Inspecting Data: Distributions

Descriptive Statistics

Empirical Rule (P148)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

2 Exploring Univariate Data

Variance, Standard Deviation Counting Techniques

Descriptive Analysis

Simple Descriptive Statistics

Putting Things Together Part 2

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Chapter 3 Descriptive Statistics: Numerical Measures Part A

NOTES: Chapter 4 Describing Data

Numerical Measurements

Normal Model (Part 1)

Section 6-1 : Numerical Summaries

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Statistics I Chapter 2: Analysis of univariate data

Description of Data I

Numerical Descriptive Measures. Measures of Center: Mean and Median

4. DESCRIPTIVE STATISTICS

Putting Things Together Part 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Statistics I Final Exam, 24 June Degrees in ADE, DER-ADE, ADE-INF, FICO, ECO, ECO-DER.

SOLUTIONS TO THE LAB 1 ASSIGNMENT

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Test Bank Elementary Statistics 2nd Edition William Navidi

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary.

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Lecture 18 Section Mon, Feb 16, 2009

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Lecture 18 Section Mon, Sep 29, 2008

3.5 Applying the Normal Distribution (Z-Scores)

Descriptive Statistics

CHAPTER 2 Describing Data: Numerical

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Some Characteristics of Data

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Chapter 3. Lecture 3 Sections

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Edexcel past paper questions

The Normal Distribution

(a) salary of a bank executive (measured in dollars) quantitative. (c) SAT scores of students at Millersville University quantitative

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

22.2 Shape, Center, and Spread

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

Math Take Home Quiz on Chapter 2

Introduction to Descriptive Statistics

Shifting and rescaling data distributions

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

2 DESCRIPTIVE STATISTICS

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Basic Procedure for Histograms

Chapter 6. The Normal Probability Distributions

DATA ANALYSIS EXAM QUESTIONS

The normal distribution is a theoretical model derived mathematically and not empirically.

Terms & Characteristics

STATS DOESN T SUCK! ~ CHAPTER 4

Transcription:

Chapter 3 Section3-: Measures of Center Notation Suppose we are making a series of observations, n of them, to be exact. Then we write x 1, x, x 3,K, x n as the values we observe. Thus n is the total number of data points, and x 4 (say) is the value of the fourth data point. 1 Example Measures of Center and Spread Suppose we ask five people how many hours of television they watch in a week, and get the following data: Observation 1 3 4 Data value 7 3 38 7 That is, x =, x = 7, x = 3, x = 38, x = 7 1 3, 4 What is the center of these data, and what is the spread about that value? Mean Median Mode Midrange CENTER SPREAD Range Inter-quartile range (IQR) Variance/Standard deviation Mean Mode Median Midrange Mean Sample Mean x Population Mean µ x-bar mu the average of a set of observations in a sample If the n observations are then x + x + x + K+ x x = n xi i = = n 1 3 n 1 n the average of a set of observation in the population If the N observations are then x + x + x + K+ x x = N xi i = = N 1 3 N 1 N 6 1

Example: Sample Mean x =, x = 7, x = 3, x = 38, x = 7 1 3, 4 Mean is the balance point of the distribution The mean is x + x + x + x + x x = = n x = + 7 + 3 + 38 + 7 = 1 1 3 4 i= 1 n x i mean Median Median The median M is the midpoint of the distribution (like the median strip in a road) It is the number such that half of the observations fall above and half fall below. Example: Median x =, x = 7, x = 3, x = 38, x = 7 1 3, 4 Step 1: order the data: 3 7 7 38 Example : Median If the data are x =, x = 7, x = 3, x = 38 1 3, 4 Step 1: order the data: 3 7 38 Median = 7 Median = + 7 = 6

Example Data set A: 64 6 66 68 70 71 73 median is 68 mean is 68.1 Data set B: 64 6 66 68 70 71 730 median is still 68 mean is 16 outlier The mean is very sensitive to outliers, while the median is resistant to outliers. Comparing the mean and the median Mean describes the center as an average value, where the actual values of the data points play an important role Sensitive to outliers Median locates the middle value as the center, and the order of the data is the key to finding it. Not sensitive to outliers Symmetric distributions with no outliers Left-skewed distributions Mean Median Mean < Median Right-skewed distributions Skewness Median=68 Mean=7 Median< Mean 3

Which measure of center to use? We will therefore use the mean as a measure of center for symmetric distributions with no outliers. Otherwise, the median will be a more appropriate measure of the center of our data. Another measure of center: Mode Mode is the most frequent value in the data set Data:, 4, 4, 4,,, 6, 7, 8, 10, 1 Mode = 4 Data:, 4, 4, 4,,,, 6, 7, 8, 10, 1 Mode = 4, Data:, 4,, 6, 7, 8, 10, 1 No mode And another measure of center: Midrange Midrange is the value midway between the maximum and the minimum values in the data set. midrange = max+ min Summary: Measures of Center The two main numerical measures for the center of a distribution are the mean x and the median M. The mean is the average value, while the median is the middle value. The mean is very sensitive to outliers, while the median is resistant to outliers. The mean is an appropriate measure of center only for symmetric distributions with no outliers. In all other cases, the median should be used to describe the center of the distribution. The mode is the most frequent value. The midrange is the average of the max and the min values. 1 Range Interquartile range (IQR) Variance/Standard deviation 3 4 4

Spread Measures of Spread Spread: how far from the center the data tend range. If all the data points are identical, there would be no spread at all. Numerically, the spread would be zero. Ex.: Center: Spread: 0 Range Inter-quartile range (IQR) Range = max. value min. value the IQR gives the range covered by the MIDDLE 0% of the data Example: Data:, 4, 4, 4,,, 6, 7, 8, 10, 1 Range= max.- min. = 1- = 13 How to find the IQR? How to find the IQR? Step 1: arrange the data in increasing order Step : find the median Step 3: Find the median of the lower 0% of the data. This is called the first quartile of the distribution and is denoted by Q1.

How to find the IQR? IQR Step 4: Repeat this again for the top 0% of the data. Find the median of the top 0% of the data. This is called the third quartile of the distribution and is denoted by Q3. The middle 0% of the data falls between Q1 and Q3, and therefore: IQR = Q3 - Q1 IQR Example Weights of 10 students: 10, 118, 10, 136, 138, 149, 17, 17, 161, 180 + M = 138 149 = 143. IQR = 17-10 = 37 Note Using the IQR to detect outliers The IQR should be used as a measure of spread of a distribution only when the median is used as a measure of center. Median IQR The 1.(IQR) Criterion for Outliers An observation is considered a suspected outlier if it is below Q1-1.(IQR) or above Q3 + 1.(IQR) 6

The 1.(IQR) Criterion for Outliers Example 1 Weights of 10 students: 10, 118, 10, 136, 138, 149, 17, 17, 161, 1 + M = 138 149 = 143. IQR = 17-10 = 37 Q3 + 1. IQR = 17 + 1. ( 37) = 1. Anything above 1.? YES. 1 IS an outlier. Example Outlier! Data: -1, 8, 9, 1, 14, 19,, 3, 3, 4, 0 M IQR= 3-9 = 14 1. (IQR)= 1. (14) =1 Anything below Q1-1.(IQR)=9 1 = -1? Outliers! YES! Five-number summary To get a quick summary of both center and spread, we consider these five numbers: Mininum value Q1 Median Q3 Maximum value Five-number summary Anything above Q3+1.(IQR)=3 +1 = 44? YES! Boxplot John Tukey invented another kind of display to show off the five-number summary. It s called boxplot. Example 1 Weights of 10 students: 10, 118, 10, 136, 138, 149, 17, 17, 161, 180 Min. + M = 138 149 = 143. Max. 100 110 10 130 140 10 160 170 180 7

Example Outlier Weights of 10 students: 10, 118, 10, 136, 138, 149, 17, 17, 161, 1 Min. + M = 138 149 = 143. Max. Comparing distributions Boxplots are best used for side-byside comparison of more than one distribution. * 100 110 10 130 140 10 160 170 180 Variance and Standard Deviation The standard deviation gives the average (or typical distance) between a data point and the mean, x. Variance: Standard deviation: s ( x x) + ( x x) + K+ ( xn x) = n 1 1 s = ( x1 x) + ( x x) + K+ ( xn x) n 1 Facts about the standard deviation (s) s measures the spread about the mean and should be used only when the mean is chosen as the measure of center. That is, when the distribution of the data is roughly symmetric with no outliers. Mean Standard deviation 4 46 Facts about the Standard Deviation (s) Standard Deviation s is always zero or greater than 0. s = 0 only when there is no spread, i.e., the data values are identical. s gets larger as the spread increases. s has the same units of measurements as the original observations. Like the mean, s is not resistant. It is very sensitive to outliers. Sample s.d.: s s = ( x1 x) + ( x x) + K+ ( xn x) n 1 Population s.d.: σ σ = ( x1 x) + ( x x) + K+ ( x N x) N 47 48 8

The 68-9-99.7 rule (Empirical Rule) The Empirical Rule for Data with a Bell-Shaped Distribution Measures of Relative Standing 0 The Empirical Rule Standard Normal Distribution is a Normal distribution with mean 0 and standard deviation 1. Notation: N (0,1). How to standardize? Need to change x to z-scores: z = x µ σ Ordinary/Unusual Values A z-score measures the number of standard deviations that a data value x is from the mean µ. When x is larger than the mean, z is positive. When x is smaller than the mean, z is negative. When x is equal to the mean, z is zero. Ordinary values: - z-score Unusual values: z-score < -, or z-score > 3 4 9

Percentiles are measures of location. There are 99 percentiles denoted P 1, P,... P 99, which divide a set of data into 100 groups with about 1% of the values in each group. number of values less than x Percentile of value x = 100 total number of values Summary Measures of the center of distributions: Mean Median Mode Measures of spread of distributions: Range IQR Using IQR to detect outliers the 1.(IQR) rule Boxplots Variance/Standard deviation 10