NOTES: Chapter 4 Describing Data

NOTES: Chapter 4 Describing Data Intro to Statistics COLYER Spring 2017 Student Name:

Page 2

Section 4.1 ~ What is Average? Objective: In this section you will understand the difference between the three most common measures of central tendency; mean, median, and mode. You will learn how each is affected by outliers and also when it is appropriate to use a weighted mean. Mean: Essential Questions: 1. In your own words, explain what mean, median, and mode are representing. 2. How is mean calculated? 3. How do you find the median of an odd number set? Of an even number set? 4. How do you find the mode of a data set? Will there always be a mode? 5. What measure of central tendency is affected by outliers? 6. What is the major advantage of using median as opposed to mean in some circumstances? 7. What is the rounding rule? 8. When is weighted mean used? How is it calculated? The mean is most commonly referred to as the and is found by the following formula: Ex. ~ The following values represent the weights of five wrestlers: 188 162 190 150 176 Find the mean weight of these five wrestlers. Median: The median is the number in a data set. To find the median, you must first arrange the values in ascending (or descending) order: A data set that has an odd number of values will have exactly value in the middle. Ex ~ Find the median of the data set: 4 3 6 3 6 A data set that has an even number of values will have values in the middle which means that the median will be in the middle of those two values (or the of those two values). Ex ~ Find the median of the data set: 4 3 6 3 6 5 Page 3

Mode: The mode is the value (or group of values) in a data set. A data set may have one mode, more than one mode, or no mode Ex ~ Find the mode in the data set: 4 3 6 10 6 3 6 Ex ~ Find the mode in the data set: 4 3 6 3 10 6 Ex ~ Find the mode in the data set 4 3 6 2 10 The mode is most commonly used for qualitative data at the level since neither the mean nor the median can be found for qualitative data Ex. ~ the brand of shoes each student is wearing in this class Can t determine the mean because Can t determine the median because Can determine the mode because Rounding Rule: Example 1: Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79 Find the mean, median, and mode for these prices. Page 4

Outlier: An outlier does not affect the median or mode because An outlier does affect the mean because Ex. ~ The following values represent the contract offer received for five graduating college seniors in the NBA (zero means that they didn t receive an offer) $0 $0 $0 $0 $3,500,000 Comparison of mean, median, and mode Measure Definition Takes every value into account? Affected by Outliers? Advantages sum of all values MEAN Yes Yes total number of values MEDIAN MODE middle value most frequent value No (except for the ordering of every number) No No (since the an outlier won t fall in the middle of a data set) No (since an outlier won t occur most frequently) Commonly understood as the average; works well with many statistical methods When there are outliers finding the median may be more representative of an average than the mean Most appropriate for qualitative data at the nominal level Example 2: A track coach wants to determine an appropriate heart rate for her athletes during their workouts. She chooses five of her best runners and asks them to wear heart monitors during a workout. In the middle of the workout, she reads the following heart rates for the five athletes: 130, 135, 140, 145, and 325. Which is a better measure of the average in this case the mean or the median? Why? Page 5

Weighted mean: Formula for weighted mean: Ex. ~ Suppose your course grade is based on four tests and one final exam. Each test counts as 15% of your final grade and the final exam counts as 40% of your final grade. Your test scores are 75, 80, 84, and 88 and your final exam score is a 96. What is your final grade? Example 3: Each quarter grade is calculated with a 80% weight on tests and quizzes and a 20% weight on class work, homework, and class participation. Furthermore, your course grade is calculated with a 40% weight on quarter 1, a 40% weight on quarter 2, and a 20% weight on the final exam. Suppose that Joe s quiz/test average in quarter 1 was an 86% and his homework/class work/ class participation grade was 95%. In quarter 2 his quiz/test average was a 92% and his homework/class work/class participation grade was a 90%, and he scored an 80% on his final exam. What was Joe s final grade? Example 4: Each quarter grade is calculated with a 80% weight on tests and quizzes and a 20% weight on class work, homework, and class participation. Furthermore, your course grade is calculated with a 40% weight on quarter 3, a 40% weight on quarter 4, and a 20% weight on the final exam. If Amy got an 88% for quarter 3 and a 93% for quarter 4, what would she need to get on the final exam in order to receive at least a 90% for her final grade? Page 6

Section 4.2 ~ Shapes of Distributions Objective: In this section, you will learn how to describe the general shape of a distribution in terms of its number of modes, skewness, and variation. Overview In chapter 3 you learned how to graph distributions using a variety of methods including bar graphs, histograms, and line charts as well as many other types of graphs and charts. Although these visuals gave us a very detailed amount of information about the data set as a whole, sometimes it s more useful to just get a general idea of how the data is being distributed by simply describing the shape that it forms. This general shape is represented by a smooth curve that is ultimately an outline of the detailed visual display (bar graph, histogram, line chart, etc.) Here are 3 examples of how data can be generalized by using smooth curves: Number of Modes and the Shape of a Distribution Recall that a mode or modes of a data set are the values that occur most frequently. One simple way to describe the shape of a distribution is: When all the data values in a data set have the same frequency; i.e. {3, 6, 4, 9, 5}, there is no mode and therefore no peaks in the distribution. This type of distribution is known as: o Here is an example of a uniform distribution: Page 7

When there is one mode in a data set, there is peak and the distribution is therefore called a or distribution. o Here is an example of a single-peaked or unimodal distribution: Any peak in a distribution is considered a mode even though they may not be the same height therefore representing different frequencies When there are two modes in a data set, there are peaks and the distribution is therefore called a distribution. o Here is an example of a bimodal distribution: Notice that the peaks are different heights; this means that the data values had different frequencies, but nonetheless, they are both considered modes Page 8

When there are three modes in a data set, there are Peaks and the distribution is therefore called a distribution. o Here is an example of a trimodal distribution: Example 1~ How many modes would you expect for each of the following distributions? Why? Make a rough sketch for each distribution, with clearly labeled axes. a. Heights of 1,000 randomly selected adult women. b. Hours spent watching football on TV in January for 1,000 randomly selected adult Americans. Page 9

c. Weekly sales throughout the year at a retail clothing store for children. d. The number of people with particular last digits (0 through 9) in their Social Security numbers. Page 10

Symmetry or Skewness In addition to describing a distribution by the number of its modes, you can describe it based on its symmetry, or lack there of A distribution is symmetric if: o Here are 3 examples of symmetric distributions: This symmetric distribution that is bell shaped with a single peak is known as a. It is so important that chapter 5 will be devoted to it. If a distribution is not symmetric, then it is. A distribution is consider to be left-skewed (or negatively skewed) when: o Here is an example of a distribution that is left-skewed: Since the values are more spread out on the left side, it s left-skewed The mean and median of a distribution that is left-skewed will be less than the mode Page 11

A distribution is consider to be right-skewed (or positively skewed) when: Here is an example of a distribution that is right-skewed: Since the values are more spread out on the right side, it s right-skewed The mean and median of a distribution that is right-skewed will be greater than the mode (as illustrated in the diagram on the previous page) In a distribution that is symmetric, the mean, median, and mode will be equal: Example 2~ For each of the following situations, state whether you expect the distribution to be symmetric, left-skewed, or right-skewed. Explain. a. Heights of a sample of 100 women. b. Family income in the United States. c. Speeds of cars on a road where a visible patrol car is using radar to detect speeders. Page 12

Variation The last way to describe a distribution in general is by its variation The variation of a distribution is: There are 3 types of variations,,, and. Here are examples of each type of variation: Example 3~ How would you expect the variation to differ between times in the Olympic marathon and times in the New York City Marathon? Explain. Sketch a graph for each situation. Tell what type of variation each situation has. Page 13

Page 14

Section 4.3 ~ Measures of Variation Objective: In this section you will be able to understand and interpret the following common measures of variation: range, five-number summary, and standard deviation. Essential Questions: 1. What does a five-number summary consist of and briefly explain how each element is found. 2. What type of visual distribution can be used to represent a five-number summary? How is it constructed? 3. What is a percentile? Recall that the variation of a data set describes how widely data are spread out about the center of the data set (low, moderate, or high) Range: Ex. ~ the following data represents the wait time (in minutes) for 11 customers at two different banks Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 The range for Big Bank is: The range for Best Bank is: Example 1: Consider the following data sets which represent the quiz scores for nine students. Which set has the greater range? Would you also say that this set has the greater variation? Quiz 1: 1 10 10 10 10 10 10 10 10 Quiz 2: 2 3 4 5 6 7 8 9 10 Page 15

Quartiles: Lower quartile (Q 1 ): Odd data set: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Even data set: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 Middle quartile (Q 2 ): Odd data set: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Even data set: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 Page 16

Upper quartile (Q 3 ): Odd data set: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Even data set: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 Five-number summary: Ex. ~ Write the five-number summaries for the waiting times for Big Bank and Best Bank. Boxplot: Page 17

Steps to drawing a boxplot: 1. 2. 3. 4. Example 2: A bakery collected the following data about the number of loaves of fresh bread sold on each of 10 business days. Write a five-number summary and then make a boxplot to represent this data. State any skewness. 43 39 17 38 50 42 34 8 39 43 Percentile: nth percentile: Ex. ~ The 35th percentile of a data set is the value that separates the bottom 35% of data values from the top 65% If exam results stated that your exam score is in the 35th percentile that means that you scored than 35% of the people that took the exam and than 65% of the people that took the exam Page 18

If a data value lies between two percentiles it is often said to lie in the lower of the two percentiles. Ex. ~ If you score higher than 84.7% of all people taking a college entrance examination, it is said that you scored in the 84th percentile Percentile Formula: Ex. ~ What percentile is the lower value, Q 1, Q 2, Q 3, and the upper value in for Big Bank? Example 3: Refer to table 4.4 on p.168 to answer the following questions: a. What is the percentile for the data value of 592.79 ng/ml for smokers? b. What is the percentile for the data value of 61.33 ng/ml for nonsmokers? c. What values mark the 36th percentile in the data set? Page 19

Standard deviation: Steps to calculate the standard deviation: 1. 2. 3. 4. 5. 6. Standard Deviation Formula: standard deviation = 2 sum of (deviations from the mean) total number of data values - 1 OR Example 4: Find the standard deviation of the following data set. 1 5 6 8 11 Range rule of thumb: Ex. ~ Estimate the standard deviation of the data set used in the last example. 1 5 6 8 11 If you know the standard deviation of a data set, you can use the range rule of thumb to estimate the high and low values of a data set: Page 20

Example 5: Use the range rule of thumb to estimate the standard deviations for the waiting times at Big Bank and Best Bank. Compare the estimates to the actual values in example 4. Example 6: Studies of the gas mileage of a BMW under varying driving conditions show that it gets a mean of 22 miles per gallon with a standard deviation of 3 miles per gallon. Estimate the minimum and maximum typical gas mileage amounts that you can expect under ordinary driving conditions. Notes on Notation: Page 21