appstats5.notebook September 07, 2016 Chapter 5

Similar documents
1 Describing Distributions with numbers

Some estimates of the height of the podium

Describing Data: One Quantitative Variable

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Section3-2: Measures of Center

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

2 Exploring Univariate Data

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Unit 2 Statistics of One Variable

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

DATA SUMMARIZATION AND VISUALIZATION

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Putting Things Together Part 2

Numerical Descriptions of Data

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

STAT 113 Variability

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Description of Data I

Descriptive Statistics

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Lecture 2 Describing Data

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

CHAPTER 2 Describing Data: Numerical

3.1 Measures of Central Tendency

Descriptive Statistics

Simple Descriptive Statistics

Frequency Distribution and Summary Statistics

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Lecture Week 4 Inspecting Data: Distributions

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

4. DESCRIPTIVE STATISTICS

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

Ti 83/84. Descriptive Statistics for a List of Numbers

Statistics vs. statistics

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

2CORE. Summarising numerical data: the median, range, IQR and box plots

The Normal Distribution

Mini-Lecture 3.1 Measures of Central Tendency

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Numerical Measurements

Applications of Data Dispersions

Putting Things Together Part 1

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Monte Carlo Simulation (Random Number Generation)

Empirical Rule (P148)

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

SOLUTIONS TO THE LAB 1 ASSIGNMENT

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Numerical Descriptive Measures. Measures of Center: Mean and Median

Basic Procedure for Histograms

Describing Data: Displaying and Exploring Data

Exploratory Data Analysis

Math 2200 Fall 2014, Exam 1 You may use any calculator. You may not use any cheat sheet.

Chapter 4-Describing Data: Displaying and Exploring Data

GOALS. Describing Data: Displaying and Exploring Data. Dot Plots - Examples. Dot Plots. Dot Plot Minitab Example. Stem-and-Leaf.

Section 6-1 : Numerical Summaries

DATA HANDLING Five-Number Summary

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Source: Fall 2015 Biostats 540 Exam I. BIOSTATS 540 Fall 2016 Practice Test for Unit 1 Summarizing Data Page 1 of 6

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

1.2 Describing Distributions with Numbers, Continued

MAT 1371 Midterm. This is a closed book examination. However one sheet is permitted. Only non-programmable and non-graphic calculators are permitted.

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

Chapter 4 Variability

Chapter 4-Describing Data: Displaying and Exploring Data

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STA 248 H1S Winter 2008 Assignment 1 Solutions

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Numerical summary of data

Probability & Statistics Modular Learning Exercises

Descriptive Statistics (Devore Chapter One)

How Wealthy Are Europeans?

Math 140 Introductory Statistics. First midterm September

22.2 Shape, Center, and Spread

2 DESCRIPTIVE STATISTICS

Chapter 3 Descriptive Statistics: Numerical Measures Part A

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Some Characteristics of Data

starting on 5/1/1953 up until 2/1/2017.

NOTES: Chapter 4 Describing Data

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

8. From FRED, search for Canada unemployment and download the unemployment rate for all persons 15 and over, monthly,

Misleading Graphs. Examples Compare unlike quantities Truncate the y-axis Improper scaling Chart Junk Impossible to interpret

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Copyright 2005 Pearson Education, Inc. Slide 6-1

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

Normal Model (Part 1)

Measures of Variation. Section 2-5. Dotplots of Waiting Times. Waiting Times of Bank Customers at Different Banks in minutes. Bank of Providence

Transcription:

Chapter 5 Describing Distributions Numerically Chapter 5 Objective: Students will be able to use statistics appropriate to the shape of the data distribution to compare of two or more different data sets. the center (median, mean) the spread (interquartile range, standard deviation)

We learned to describe distributions with words in chapter 4 by describing the shape, center,and spread. Now we are going to focus on describing distributions numerically. When describing distributions, we need to discuss shape, center, and spread. How we measure the center and spread of a distribution depends on its shape. Finding the Center: The Median pg 74 When we think of a typical value, we usually look for the center of the distribution. If the shape is unimodal and symmetric of a distribution, it s easy to find the typical value it is just the middle(center). For skewed distributions, use the median to determine the center of the distribution and the interquartile range to describe the spread of the distribution.

Finding the Center: The Median (cont.) pg 74 The median is the value with exactly half the data values below it and half above it. It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. The median: is the middle data value (when the data have been ordered) that divides the histogram into two equal areas has the same units as the data is resistant to outliers (extreme data values)

As a measure of center, the midrange (the average of the minimum and maximum values) is very sensitive to skewed distributions and outliers. The median is a more reasonable choice for center than the midrange. pg 74 Spread: Home on the Range pg 75 Always report a measure of spread along with a measure of center when describing a distribution numerically. The range of the data is the difference between the maximum and minimum values: Range = max min

The range: is the difference between the maximum value and the minimum value is a number, NOT an interval is sensitive to outliers. A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall. Spread: The Interquartile Range pg 74 75 The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are The lower and upper quartiles are the 25 th and 75 th percentiles of the data, so The IQR contains the middle 50% of the values of the distribution,

Quartiles divide the data into four equal sections. pg 75 The lower quartile is the median of the half of the data below the median. The upper quartile is the median of the half of the data above the median. The difference between the quartiles is the IQR, so IQR = upper quartile lower quartile Look at the class survey find the Interquartile range for The interquartile range (IQR): contains the middle 50% of the data is the difference between the lower (Q1) and upper (Q3) quartiles is a number, NOT an interval is resistant to outliers

: Record on the board tomorrow How many hours, on average, do you spend watching TV per week? pg 77 The five number summary of a distribution reports : minimum, lower quartile, median, upper quartile, maximum How many hours, on average, do you spend watching TV per week? Collect data from the entire class and record the values in order from smallest to largest. Calculate the five number summary:

A boxplot is a graphical display of the five number summary. pg 77 Boxplots are particularly useful when comparing groups. Constructing Boxplots Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box. Test for outliers Erect fences around the main part of the data. pg 77 The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display.

Use the fences to grow whiskers. Constructing Boxplots (cont.) pg 78 Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker. Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for far outliers that are farther than 3 IQRs from the quartiles. Perform the outlier test with the distribution (Number of hours spent watching tv per week)

Average Number of Hours per Day Spent Watching TV

Average Number of Hours per Day Spent Watching TV Compare the histogram and boxplot for rock concert deaths: How does each display represent the distribution? Pair share pg 78 Describe shape, center, spread and outliers

Comparing Groups With Boxplots The following set of boxplots compares the effectiveness of various coffee containers: see step by step pg 78 Pair share What does this graphical display tell you? Explain. Summarizing Symmetric Distributions pg 80 Medians do a good job of identifying the center of skewed distributions. For symmetric data, the mean is a good measure of center. We find the mean by adding up all of the data values and dividing by n, the number of data values we have.

The mean: is the arithmetic average of the data values is the balancing point of a histogram has the same units as the data is sensitive to outliers is given by the formula Summarizing Symmetric Distributions pg 81 The distribution of pulse rates for 52 adults is generally symmetric, with a mean of 72.7 beats per minute (bpm) and a median of 73 bpm:

Mean or Median? Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance: pg 82 In symmetric distributions, the mean and median are approximately the same in value, so either measure of center may be used. pg 83 For skewed data, though, it s better to report the median than the mean as a measure of center. For symmetric distributions, use the mean to determine the center of the distribution and the standard deviation to describe the spread of the distribution.

What About Spread? The Standard Deviation A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A standard deviation is the distance a data value is from the mean pg 83 Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations. The variance, notated by s 2, is found by summing the squared deviations and (almost) averaging them: The variance will play a role later in our study, but it is problematic as a measure of spread it is measured in squared units! pg 83 measures the typical distance each data value is from the mean Because some values are above the mean and some are below the mean, finding the sum is not useful (positives cancel out negatives); therefore we first square the deviations, then calculate an adjusted average. This is called the variance. This statistics does not have the same units as the data, since we squared the deviations. Therefore, the final step is to take the square root of the variance, which gives us the standard deviation. is sensitive to outliers, since its calculation involves the mean is given by the formula The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.

Average Number of Hours per Day Spent Watching TV Find the mean and standard deviation Thinking About Variation pg 84 Since Statistics is about variation, spread is an important fundamental concept of Statistics. Measures of spread help us talk about what we don t know. add to your notes When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small. (less variation) When the data values are scattered far from the center, the IQR and standard deviation will be large. (more variation)

Shape, Center, and Spread see step by step pg 84 When telling about a quantitative variable, always report the shape of its distribution, along with a center and a spread. If the shape is skewed, report the median and IQR. If the shape is symmetric, report the mean and standard deviation and possibly the median and IQR as well. What About Outliers? add to your notes See TI tips pg 86 If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be affected by the outliers.

What Can Go Wrong? pg 86 Don t forget to do a reality check don t let technology do your thinking for you. Don t forget to sort the values before finding the median or percentiles. Don t compute numerical summaries of a categorical variable. Watch out for multiple modes multiple modes might indicate multiple groups in your data. What Can Go Wrong? (cont.) pg 86 Be aware of slightly different methods different statistics packages and calculators may give you different answers for the same data. Beware of outliers. Make a picture (make a picture, make a picture).

What Can Go Wrong? (cont.) pg 86 Be careful when comparing groups that have very different spreads. Consider these side by side boxplots of cotinine levels: *Re expressing to Equalize the Spread of Groups Here are the side by side boxplots of the log(cotinine) values: pg 87

What have we learned? We can now summarize distributions of quantitative variables numerically. The 5 number summary displays the min, Q1, median, Q3, and max. Measures of center include the mean and median. pg 88 Measures of spread include the range, IQR, and standard deviation. We know which measures to use for symmetric distributions and skewed distributions. What have we learned? (cont.) pg 88 We can also display distributions with boxplots. While histograms better show the shape of the distribution, boxplots reveal the center, middle 50%, and any outliers in the distribution. Boxplots are useful for comparing groups.