MA 115 Lecture 05 - Measures of Spread Wednesday, September 6, 017 Objectives: Introduce variance, standard deviation, range. 1. Measures of Spread In Lecture 04, we looked at several measures of central tendency. In an attempt to describe the numbers in a set, we used the mean, the median, the mode, or the midrange, as single numbers that represented the entire data set. Each of these is a single number that represents all of the numbers. For example, if we know that the mean weight of a certain breed of dog is 17.5 pounds, then we know that we re talking about relatively small dogs. The mean doesn t tell us whether a dog of this breed weighing 5 pounds is unusual or not, however. I saw on the news that the median price of a house in the United States just went over $00,000. It s certainly clear that houses cost a lot now, but it does not tell us how many houses go for under $100,000, for example. Given a data set, knowing the mean, the median, the mode, or the midrange, gives us a good start on understanding how big the numbers in the set are. If we would like a little bit more information, knowing how spread out the numbers are would go a long way. Today, we ll look at some measures of spread.. The range I m mostly going to focus on the mean, but I d like to look at one of the other measures of central tendency first as an example alternative. We looked at the midrange last time, and that is the number that is halfway between the smallest and largest values in the data set. Suppose we want to build a garage, but we have no idea about whether we could afford it or not. To check this out, we ask for a number of quotes, and we got (1) $16,000 $,000 $17,000 $19,000 $1,000 $0,000 It s easy to compute the midrange, we just find the average of the smallest and largest numbers. () midrange = $16,000 + $,000 = $19,000. The range is the difference between the smallest and largest numbers. (3) range = $,000 $16, 000 = $6,000. 1
If someone were to ask us how much garages cost, we could say, Well according to my research, the midrange is about $19,000 and the range is about $6,000. If you would like this kind of garage, you could plan on spending fairly close to $19,000. 3. Deviations from the mean The midrange and range are easy to compute, but they can be easily misled. Of the measures of central tendency, the mean lends itself well to mathematical analysis, and there is a lot we can do with it. Most of the rest of the class will be devoted to extending the information given by the mean. Right now, we re interested in measures of spread. Since the mean is, in some sense, in the middle, we re going to look at how far all the other numbers are from the mean, or in fancy statistical language, at the deviations from the mean. If x represents a number in a sample, its deviation from the mean is (4) deviation from the mean = x x. In a population, we have a different symbol for the mean, so the deviation from the mean in a population is (5) deviation from the mean = x µ. If we know the mean for a sample and all the deviations from the mean, we can actually figure out all numbers in the data set. This can be a useful way of looking at the data set, but it s not really much simpler. We d like a single number that describes how big the deviations are. One idea is to take the average of all the deviations from the mean. That is, take the mean of the deviations from the mean. 4. The variance Whenever you take the average of the deviations from the mean, you will always get zero. As a result, the mean deviation from the mean is not a useful measure of spread. We could, if we wanted to, just make all the deviations from the mean positive by using absolute values. We could then take the average of these numbers. That s a great idea, but I ve never seen it used. I m not exactly sure why, but I know that absolute values can be awkward mathematically, and I think the way we re going to solve this problem contains more information.
MA 115 Lecture 05 - Measures of Spread 3 We need to get rid of the negative signs in the deviations from the mean somehow, and we re going to do that by squaring them. That may sound odd, but it ends up working quite well. To combine the deviations from the mean into a single number, we re going to do the following. We ll do it for a sample first, then for an entire population. Compute the deviations from the mean (6) x x. Square the deviations to make them positive (or zero) (7) (x x). Then we re going to find the mean for the deviations squared, which means add them up and divide by how many there are () s = (x x) n 1 Two things should look odd. First, the s. This is the symbol for the sample variance. Second, we re dividing by n 1 instead of n. Here s my explanation: The mean, on average, is one of the numbers in the data set, so one of the deviations from the mean is zero, on average, so we re really computing the average deviation for the other numbers. That may or may not make sense, but equation () is the standard formula for a sample variance. Let s compute the variance for the sample in problem 6. I strongly suggest working in a table, as I am going to demonstrate. We ll do this a lot. When we see a Σ in a formula, that will mean that we re going to add up a column in the table. OK. Our table starts off as follows. (9) 1 11 11
4 First we compute the mean. This entails computing x, so we ll sum over the first column. After that, we divide by n to get x. (10) 1 11 11 50 x = 50 5 = 10 Once we know the mean, we subtract it from all the x s as formula () tells us. (11) 1 11 1 11 1 50 x = 50 5 = 10 Next, we square all the deviations from the mean to make them positive. (1) 4 4 1 4 11 1 1 11 1 1 50 x = 50 5 = 10
MA 115 Lecture 05 - Measures of Spread 5 Finally, we sum over the deviations from the mean and divide by n 1, which is 5 1 = 4 in this case. (13) 4 4 1 4 11 1 1 11 1 1 50 14 x = 50 5 = 10 s = 14 4 = 3.5 The variance is s = 3.5. 5. The standard deviation The variance is a measure of spread. If you get a larger number for s, then this says that the data set is more spread out. More specifically, it s hard to tell what the number means exactly, however. We ll end up using a different number a little more often, but even this other measure of spread needs a lot of mathematical analysis to understand it well. Since we ve squared the deviations to compute the variance, the variance is not quite in line with the sizes of the individual deviations. To compensate, we ll mostly work with something called the standard deviation. The standard deviation, s, is simply the square root of the variance. (14) s = s. This formula looks a bit odd, but we compute the variance first, and then take the square root to get the standard deviation. For the set {,, 1, 11,11 }, we got a variance of s = 3.5. The standard deviation is the square root of this, so (15) s = s = 3.5 = 1.70693 1.7. 6. Quiz 05, Part I of I Find the standard deviation for the sample { 3, 5, 6, 6 }.
6 7. Population variance and standard deviation In our discussion about the variance and standard deviation, we ve only talked in terms of a sample. The variance and standard deviation for a population are computed in pretty much the same way. These are parameters, of course, and we will use the lowercase Greek letter sigma, σ, instead of the s. The population variance is (16) σ = (x µ). N The population mean µ is used here, but the variance σ is still the average of the deviations from the mean squared. We don t have N 1 in the denominator, just N. In practice, a population size N is going to be a really big number, so subtracting 1 doesn t matter much. The population standard deviation, σ, is again just the square root of the variance. (17) σ = σ.. Homework 05 For problems 1-3, suppose the numbers came out a little differently in our garage survey, and we got (1) $1,000 $5,000 $4,000 $5,000 $4,000 $5,000 1. What is the midrange?. What is the range? 3. Does the midrange and range describe the data set in problem 1 very well? (That is, are most of the numbers about the same as the midrange, and are most of the numbers as spread out as the range indicates?) For problems 4 and 5, the questions are general ones about the deviation from the mean. 4. If the deviation from the mean for a number x is positive, is x larger than the mean or smaller? 5. If the deviation from the mean for x is negative, is x larger than the mean or smaller?
MA 115 Lecture 05 - Measures of Spread 7 For problems 6-14, work with the sample {,, 6, 5, 7, 4, 3 }. You should put your numbers into a table that looks like (19) 6 5 7 4 3 6. Count the number of elements in this set. So n =? 7. Find x.. What is the deviation from the mean for x =? 9. What is the deviation from the mean for x =? 10. What is the deviation from the mean squared (i.e., what is (x x) ) for x =? 11. What is the deviation from the mean squared for x =? 1. What did you get for (x x)? 13. What is s? Round your answer correctly to two decimal places. 14. What is the standard deviation? Round correctly to two decimal places. For problems 15-17, work with the sample {, 5, 4, 5 }. 15. Find x. 16. Find s. 17. Find s. Round your answer correctly to two decimal places. 1. In general, what is σ a symbol for? The population 19. Is σ a statistic or a parameter? 0. Suppose a population variance is 3.05. What is σ? Round your answer correctly to two decimal places. Answers on next page
Quiz Answers (0) 3 4 5 0 0 6 1 1 6 1 1 0 6 x = 0 4 = 5 s = 6 3 = s = 1.41 That is, x = 5, s =, and s = 1.41. HW Answers 1) $1,000+$5,000 = $1,500. ) $5,000 $1,000 = $13,000. 3) The $1,000 distorts the range and midrange. The prices are mostly $4,000 or $5,000, and they don t vary very much. 4) Positive deviations from the mean go with x s that are larger than x. 5) Negative deviations go with x s that are smaller than x. 6) n = 7. 7) x = 5. ) 3 (corrected /3/14 at 1:30). 9) 3. 10) 9. 11) 9. 1) (x x) =. 13) s = 4.67. Note: You divide by 6! 14) s =.16. 15) x = 4.
MA 115 Lecture 05 - Measures of Spread 9 16) s = (Divide by 3.) 17) s = 1.41. 1) σ is the population variance. 19) σ, the population standard deviation is a parameter. 0) σ = 3.05 = 1.75.