Describing Data: One Quantitative Variable

Similar documents
Section 2.2 One Quantitative Variable: Shape and Center

2 Exploring Univariate Data

appstats5.notebook September 07, 2016 Chapter 5

Some estimates of the height of the podium

Section3-2: Measures of Center

STAT 113 Variability

1 Describing Distributions with numbers

The Range, the Inter Quartile Range (or IQR), and the Standard Deviation (which we usually denote by a lower case s).

Lecture 1: Review and Exploratory Data Analysis (EDA)

DATA SUMMARIZATION AND VISUALIZATION

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Lesson 12: Describing Distributions: Shape, Center, and Spread

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

22.2 Shape, Center, and Spread

Descriptive Statistics (Devore Chapter One)

Exploratory Data Analysis

NOTES TO CONSIDER BEFORE ATTEMPTING EX 2C BOX PLOTS

Center and Spread. Measures of Center and Spread. Example: Mean. Mean: the balance point 2/22/2009. Describing Distributions with Numbers.

STAB22 section 1.3 and Chapter 1 exercises

Unit 2 Statistics of One Variable

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Lecture 2 Describing Data

Putting Things Together Part 2

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Putting Things Together Part 1

4. DESCRIPTIVE STATISTICS

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Numerical Descriptions of Data

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Description of Data I

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

FINALS REVIEW BELL RINGER. Simplify the following expressions without using your calculator. 1) 6 2/3 + 1/2 2) 2 * 3(1/2 3/5) 3) 5/ /2 4

Simple Descriptive Statistics

Frequency Distribution and Summary Statistics

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Chapter 15: Sampling distributions

Lecture Week 4 Inspecting Data: Distributions

We will also use this topic to help you see how the standard deviation might be useful for distributions which are normally distributed.

Math 243 Lecture Notes

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

The normal distribution is a theoretical model derived mathematically and not empirically.

STA 248 H1S Winter 2008 Assignment 1 Solutions

Section 6-1 : Numerical Summaries

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

3.1 Measures of Central Tendency

Descriptive Statistics

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA. Name: ID# Section

Mini-Lecture 3.1 Measures of Central Tendency

Numerical Descriptive Measures. Measures of Center: Mean and Median

The Normal Distribution

Measures of Central Tendency Lecture 5 22 February 2006 R. Ryznar

Applications of Data Dispersions

Today s plan: Section 4.1.4: Dispersion: Five-Number summary and Standard Deviation.

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

Distributions and their Characteristics

Math146 - Chapter 3 Handouts. The Greek Alphabet. Source: Page 1 of 39

NOTES: Chapter 4 Describing Data

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Wk 2 Hrs 1 (Tue, Jan 10) Wk 2 - Hr 2 and 3 (Thur, Jan 12)

Some Characteristics of Data

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf. Key: 2 0 = 20 cm. 6 a Stem Leaf. Key: 4 3 = 43 cm.

3) Marital status of each member of a randomly selected group of adults is an example of what type of variable?

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Math 140 Introductory Statistics. First midterm September

Chapter 4 Variability

Ti 83/84. Descriptive Statistics for a List of Numbers

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

SOLUTIONS TO THE LAB 1 ASSIGNMENT

CHAPTER 2 Describing Data: Numerical

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Empirical Rule (P148)

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

Example: Histogram for US household incomes from 2015 Table:

Exploring Data and Graphics

Basic Procedure for Histograms

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

Sampling Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

CH 5 Normal Probability Distributions Properties of the Normal Distribution

starting on 5/1/1953 up until 2/1/2017.

Prob and Stats, Nov 7

Normal Model (Part 1)

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 3. Lecture 3 Sections

GOALS. Describing Data: Displaying and Exploring Data. Dot Plots - Examples. Dot Plots. Dot Plot Minitab Example. Stem-and-Leaf.

STAT 157 HW1 Solutions

Chapter 3. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 1. The Normal Distributions

How Wealthy Are Europeans?

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Transcription:

STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive Statistics Descriptive Statistics In order to make sense of data, we need ways to summarize and visualize it Summarizing and visualizing variables and relationships between two variables is often known as descriptive statistics (also known as exploratory data analysis) Type of summary statistics and visualization methods depend on the type of variable(s) being analyzed (categorical or quantitative) Today: One quantitative variable Question of the Day How obese are Americans? Obesity Trends* Among U.S. Adults BRFSS, 1990, 2000, 2010 (*BMI 30, or about 30 lbs. overweight for 5 4 person) 1990 2010 2000 Obesity in America Obesity is a HUGE problem in America We ll explore the topic of obesity in America question with two different types of data, both collected by the CDC: Proportion of adults who are obese in each state BMI for a random sample of Americans No Data <10% 10% 14% 15% 19% 20% 24% 25% 29% 30% Source: Behavioral Risk Factor Surveillance System, CDC. 1

Behavioral Risk Factor Surveillance System Obesity by State: 2013 http://www.cdc.gov/obesity/data/table-adults.html Dotplot In a dotplot, each case is represented by a dot and dots are stacked. Easy way to see each case?? Histogram The height of the each bar corresponds to the number of cases within that range of the variable 5 states with obesity rate between 33.25 and 33.75 Minitab: Graph -> Dotplot -> One Y -> Simple Minitab: Graph -> Histogram -> Simple 33.25 to 33.75 Shape National Health and Nutrition Examination Survey Long right tail Symmetric Right-Skewed Left-Skewed 2

BMI of Americans BMI of Americans The distribution of BMI for American adults is a) Symmetric b) Left-skewed c) Right-skewed Notation The sample size, the number of cases in the sample, is denoted by n We often let x or y stand for any variable, and x 1, x 2,, x n represent the n values of the variable x x 1 = 32.4, x 2 = 28.4, x 3 = 26.8, Mean The mean or average of the data values is mean = sum of all data values number of data values mean = x 1 + x 2 + + x n n = x n Sample mean: x Population mean: ( mu ) Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics Mean Median The average obesity rate across the 50 states is µ = 28.606. The median, m, is the middle value when the data are ordered. If there are an even number of values, the median is the average of the two middle values. The average BMI for Americans in this sample is x = 24.887. The median splits the data in half. Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics 3

Measures of Center For symmetric distributions, the mean and the median will be about the same For skewed distributions, the mean will be more pulled towards the direction of skewness Measures of Center m = 24.163 =24.887 Mean is pulled in the direction of skewness Skewness and Center A distribution is left-skewed. Which measure of center would you expect to be higher? a) Mean b) Median Outlier An outlier is an observed value that is notably distinct from the other values in a dataset. Outliers Resistance A statistic is resistant if it is relatively unaffected by extreme values. The median is resistant while the mean is not. More info here Mean Median With Outlier 105.22 101.0 Without Outlier 102.56 100.5 4

Frequency Frequency 0 50 150 0 50 150 Outliers When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake Standard Deviation The standard deviation for a quantitative variable measures the spread of the data If not, you have to decide whether the outlier is part of your population of interest or not Usually, for outliers that are not a mistake, it s best to run the analysis twice, once with the outlier(s) and once without, to see how much the outlier(s) are affecting the results s = x x 2 n 1 Sample standard deviation: s Population standard deviation: ( sigma ) Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics Standard Deviation The larger the standard deviation, the more variability there is in the data and the more spread out the data are The standard deviation gives a rough estimate of the typical distance of a data values from the mean Standard Deviation s 1-15 -10-5 0 5 10 15 s 4-15 -10-5 0 5 10 15 Both of these distributions are bell-shaped Two Ways of Measuring Obesity 95% Rule States as cases Individual people as cases If a distribution of data is approximately symmetric and bell-shaped, about 95% of the data should fall within two standard deviations of the mean. Differences? For a population, 95% of the data will be between µ 2 and µ + 2 For a sample, 95% of the data will be between x 2s and x 2s 5

Frequency 0 50 150 Frequency 0 50 150 The 95% Rule 95% Rule Give an interval that will likely contain 95% of obesity rates of states. (x 2s, x + 2s ) (28.606 2*3.377, 28.606 2*3.377) (21.852, 35.36) 47/50 = 94% 95% Rule The 95% Rule Could we use the same method to get an interval that will contain 95% of BMIs of American adults? s 1-3 -2-1 0 1 2 3 a) Yes b) No s 4-15 -10-5 0 5 10 15 StatKey The 95% Rule The standard deviation for hours of sleep per night is closest to a) ½ b) 1 c) 2 d) 4 e) I have no idea z-score The z-score for a data value, x, is z = x x s for sample data, and x μ z = σ for population data. z-score measures the number of standard deviations away from the mean 6

z-score A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean Challenge: For symmetric, bell-shaped distributions, 95% of all z-scores fall between what two values? z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT: = 21, = 5 SAT: = 1500, = 325 Assume ACT and SAT scores have approximately bell-shaped distributions a) ACT score of 28 b) SAT score of 2100 c) I don t know Other Measures of Location Maximum = largest data value Minimum = smallest data value Quartiles: Q 1 = median of the values below m. Q 3 = median of the values above m. Five Number Summary Five Number Summary: Min Q 1 m Q 3 25% 25% 25% 25% Max Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics Five Number Summary > summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max. 2.00 10.00 15.00 20.00 69.00 The distribution of number of hours spent studying each week is a) Symmetric b) Right-skewed c) Left-skewed d) Impossible to tell Percentile The P th is the value which is greater than P% of the data We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used s: ACT score of 28: 91st SAT score of 2100: 97th 7

Five Number Summary Five Number Summary: Min Q 1 m Q 3 25% 25% 25% 25% Max Measures of Spread Range = Max Min Interquartile Range (IQR) = Q 3 Q 1 Is the range resistant to outliers? a) Yes b) No 0 th 25 th 50 th 75 th 100 th Is the IQR resistant to outliers? a) Yes b) No Measures of Center: Mean (not resistant) Median (resistant) Comparing Statistics Measures of Spread: Standard deviation (not resistant) IQR (resistant) Range (not resistant) Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information Middle 50% of data Boxplot Minitab: Graph -> Boxplot -> One Y -> Simple Lines ( whiskers ) extend from each quartile to the most extreme value that is not an outlier Median Q 3 Q 1 Boxplot Boxplot *For boxplots, outliers are defined as any point more than 1.5 IQRs beyond the quartiles (although you don t have to know that) Outlier This boxplot shows a distribution that is a) Symmetric b) Left-skewed c) Right-skewed 8

Summary: One Quantitative Variable Summary Statistics Center: mean, median Spread: standard deviation, range, IQR 5 number summary Percentiles Visualization Dotplot Histogram Boxplot To Do Read Sections 2.2 and 2.3 Do Homework 2.2, 2.3 (due Friday, 9/18) Other concepts Shape: symmetric, skewed, bell-shaped Outliers, resistance z-scores 9