Standard Deviation Table of Contents 1 Motivation 1 2 Standard Deviation 2 3 Computing Standard Deviation 4 4 Calculator Instructions 7 5 Homework Problems 8 5.1 Instructions...................................... 8 5.2 Problems........................................ 8 6 Document License (CC BY-ND 4.0) 11 6.1 License Links..................................... 11 1 Motivation The mean (expected value) is a measure of central tendancy, it measures the position of the center of data. The mean summarizes data by telling us its center. The mean gives us one bit of information about the data, but is it enough information? Consider the following two frequency distributions. In each distribution, x denotes an outcome of an experiment. Distribution A 1 10 9 10 Mean = 1 10 + 9 10 10 + 10 = 5 1
Distribution B 4 10 6 10 Mean = 4 10 + 6 10 10 + 10 = 5 Both distributions have the same mean (5), and both distributions have the same number of outcomes (10 + 10 = 20). However, in distribution A, the distance from an outcome to the mean is 4 (5 1 = 4 and 9 5 = 4), while in distribution B, the distance from an outcome to the mean is 1 (5 4 = 1 and 6 5 = 1). You can see how the mean is not an adequate summary of data. Distributions A and B provide a hint of another quantity that we can use to summarize data. That quantity is the average distance between an outcome and the mean. This average distance is 4 for distribution A, and it is 1 for distribution B. You can see that the outcomes of distribution A are spread out further from the mean than the outcomes of distribution B. In other words, the outcomes of distribution A are more dispersed around the mean than the outcomes of distribution B. The average distance between an outcome and the mean is called a measure of dispersion. There are different ways of computing the average distance between an outcome and the mean. We computed the mean distance between an outcome and the mean for distributions A and B. However, the mean distance between an outcome and the mean is not the usual average that is used as a measure of dispersion for technical reasons. The usual average that is computed is the standard deviation. 2 Standard Deviation The standard deviation measures the (weighted) average distance between a value of a random variable and the mean. It tells us how values of a random variable deviate from the mean on average. It shows us how much the values of a random variable are spread out (dispersed) around the mean. The standard deviation is the standard measure of dispersion. Three definitions of standard deviation are given below. These three definitions are equivalent to each other. The definition that is used to compute standard deviation depends on the situation we are given. The symbol µ (pronounced "mew") denotes the mean and the expected value. The symbol σ (pronounced "sigma") denotes the standard deviation. 2
Definition 1: The standard deviation of numbers x 1, x 2..., x n which have mean µ is σ = (x1 µ) 2 + (x 2 µ) 2 + + (x n µ) 2 n Definition 2: Suppose X is a discrete random variable with possible values x 1, x 2,..., x n and suppose X has the following frequency distribution (x denotes a value of X): x 1 f 1 x 2 f 2. x n Let m = f 1 + f 2 + + f n and P (X = x i ) = f i /m (the relative frequency of x i ) for 1 i n. If E(X) = µ, then the standard deviation of X is σ =. f n f 1 (x 1 µ) 2 + f 2 (x 2 µ) 2 + + f n (x n µ) 2 f 1 + f 2 + + f n Definition 3: Suppose X is a discrete random variable with possible values x 1, x 2,..., x n and suppose X has the following probability distribution (x denotes a value of X): x P (X = x) x 1 p 1 x 2 p 2 If E(X) = µ, then the standard deviation of X is. x n. p n σ = p 1 (x 1 µ) 2 + p 2 (x 2 µ) 2 + + p n (x n µ) 2 Remember that if σ is small, then the values of a random variable are close to the mean on average. If σ is large, then the values of a random variable are far away from the mean on average. 3
3 Computing Standard Deviation You should have noticed the complexity of the standard deviation formulas in the three definitions. This complexity makes it difficult to compute the standard deviation by hand. To avoid this complexity, we will compute the standard deviation using the statistical functions built into our calculators. Example: Suppose X is a discrete random variable which has the following frequency distribution (x denotes a value of X): Compute µ and σ. 0 4 1 7 2 19 3 16 4 8 5 2 Notes: When you are given the frequency distribution of a random variable, you may assume that P (X = x) is the relative frequency of x. In addition, the instructions given below are summarized on the supplement in the Calculator Instructions section of this document. The first step is to put the calculator into the proper mode. Since the data in this problem uses only a single variable x, we need to put the calculator into 1-variable statistics mode. This is achieved by pressing the following keys on the calculator: 2nd DATA ENTER The next step is to put the data into the calculator. Press DATA to access the data entry function of the calculator. Since the first x in the frequency distribution table is 0, press 0 to get X 1 = 0 on the screen of the calculator. Now press and then 4 to get F RQ = 4 on the screen of the calculator. You have now successfully told the calculator that data value 0 has frequency 4. The following keystrokes will tell the calculator that data value 1 has frequency 7. Watch the screen carefully as you press the keys. 1 7 The following keystrokes will tell the calculator that data value 2 has frequency 19. Watch the screen carefully as you press the keys. 4
2 1 9 The following keystrokes will tell the calculator that data value 3 has frequency 16. Watch the screen carefully as you press the keys. 3 1 6 The following keystrokes will tell the calculator that data value 4 has frequency 8. Watch the screen carefully as you press the keys. 4 8 The following keystrokes will tell the calculator that data value 5 has frequency 2. Watch the screen carefully as you press the keys. 5 2 The data values and their frequencies have now been successfully entered into the calculator. The final step is to get the calculator to do the tedious work of computing the expected value (µ) and the standard deviation (σ). Press STATVAR to tell the calculator to do the computations. All we need to do now is to retrieve the expected value and the standard deviation from the calculator. Press to see the expected value. The calculator uses the symbol x for the expected value, but we use the symbol µ. Your screen should show that µ = 2.410714286. Press to see the standard deviation. The calculator uses the symbol σx for the standard deviation, but we use the symbol σ. Your screen should show that σ = 1.191889043. If your calculator failed to show the correct µ and σ, then it is likely that you did not enter the data values and frequencies correctly. Press DATA and then use and to check each data value and its frequency. Change any numbers that are incorrect and press STATVAR to recompute µ and σ. If you cannot obtain the correct µ and σ even after attempting to correct the data values and their frequencies, then you can clear out all of the data by pressing the following sequence of keys. Watch the screen carefully as you do this. 2nd DATA ENTER Now press DATA and enter the data values and their frequencies by following the instructions given above. Finally, press STATVAR to recompute µ and σ. 5
Example: Compute the mean and the standard deviation of the following scores from a math exam: 72, 54, 95, 74, 50, 74, 64, 66, 87, 88. Round each statistic to the nearest 1/10th. The first step is to clear out the data values and frequencies from the previous example. Pressing the following sequence of keys to clear out the old data values and their frequencies. Watch the screen carefully as you do this. 2nd DATA ENTER Now press DATA and enter the data values and their frequencies. In this case, each frequency will be 1. Once the data values and frequencies are entered correctly, press STATVAR to compute µ and σ. Retrieve µ and σ by following the instructions given in the previous example. You should get µ = 72.4 and σ = 13.87227451 if you followed the steps correctly. Since each statistic must be rounded to the nearest 1/10th, the final results are µ = 72.4 and σ = 13.9. Example: A group of people were surveyed and asked: "Which of the numbers 1, 2, 3, and 4 is your favorite?" Let X be a random variable that denotes the favorite number of a person responding to the survey. The probability distribution of X is given below. Compute µ and σ. x P (X = x) 1 0.24 2 0.15 3 0.41 4 0.20 Since we cannot put probabilities into the calculator, we need to convert the probability distribution to a frequency distribution. To do this, we will assume that 100 people responded to the survey. Since P (X = 1) = 24%, 24% of the 100 people (24 people) said that 1 was their favorite. The other frequencies are determined in a similar way, and we have the following frequency distribution: 1 24 2 15 3 41 4 20 Now follow the procedure given in the first example. You should get µ = 2.57 and σ = 1.060707311. 6
You should exit 1-variable statistics mode once you are finished computing statistics such as the mean and the standard deviation. This will enable your calculator to operate normally for non-statistical computations. Press the following key sequence to exit 1-variable statistics mode: 2nd STATVAR ENTER 4 Calculator Instructions Enter 1-Variable Statistics Mode 2nd DATA ENTER Data Input 0 4 1 7 2 19 3 16 4 8 5 2 To put the above data into the calculator, do the following steps: DATA X 1 = 0 F RQ = 4 X 2 = 1 F RQ = 7 X 3 = 2 F RQ = 19 X 4 = 3 F RQ = 16 X 5 = 4 F RQ = 8 X 6 = 5 F RQ = 2 Compute Statistics STATVAR µ = 2.410714286 σ = 1.191889043 Clear Out Data 2nd DATA ENTER Exit 1-Variable Statistics Mode 2nd STATVAR ENTER 7
5 Homework Problems 5.1 Instructions Work through the homework problems referring to your notes and the lesson notes when necessary. Use the homework problem solutions only when you get completely stuck. Redo the homework problems before a quiz without referring to any other materials. It is best to do this more than once. 5.2 Problems 1. The exam scores for a course and the two classes taking the course are given below. Course Class X Class Y 57 63 29 50 44 44 57 63 29 50 44 44 85 93 43 80 94 94 85 93 43 80 94 94 99 90 80 74 78 68 99 90 80 74 78 68 71 59 53 37 86 74 71 59 53 37 86 74 60 60 84 87 89 93 60 60 84 87 89 93 A. Compute µ and σ for the course. Round each statistic to the nearest 1/10th. B. If a score greater than µ σ is passing, how many students in the course passed the exam? C. If a score greater than µ + σ is an A, how many students in the course received an A on the exam? D. Compute µ and σ for class X. Round each statistic to the nearest 1/10th. E. Compute µ and σ for class Y. Round each statistic to the nearest 1/10th. 2. A. The daily high temperatures (in degrees Fahrenheit) for thirty consecutive winter days in city Z are given below. Compute the mean and standard deviation of this data. Round each statistic to the nearest 1/10th. 31.1 25.2 31.0 25.9 23.5 31.1 32.3 38.6 31.6 44.5 28.5 29.2 24.5 30.9 27.2 28.2 28.0 24.5 25.0 27.0 34.5 36.6 28.7 31.9 22.7 27.5 25.1 29.3 30.3 33.1 B. We say that data value x is within one standard deviation of the mean if µ σ x µ + σ. For example, if the mean is 100 and the standard deviation is 10, then data value x is within one standard deviation of the mean if 100 10 x 100 + 10 (90 x 110). Similarly, if µ is the mean, σ the standard deviation, n a positive integer, and x a data value, then x will be within n standard deviations of the mean 8
if µ n σ x µ + n σ. (We say that σ is one standard deviation and that n σ is n standard deviations.) How many of the daily high temperatures are within one standard deviation of the mean? C. What percentage of the data values are within one standard deviation of the mean? D. A sample of the above data is listed below. Compute the mean and sample standard deviation of the sample. Round each statistic to the nearest 1/10th. 31.1 25.2 31.0 25.9 23.5 31.1 32.3 38.6 34.5 36.6 28.7 31.9 22.7 27.5 25.1 29.3 E. How many of the sample s data values are within two standard deviations of its mean? F. What percentage of the sample s data values are within two standard deviations of its mean? 3. A group of students were asked how many different computer games they play. Let X be the random variable that denotes the number of different computer games played by a student. The frequency distribution of X is given below (x denotes a value of X). 0 19 1 13 2 7 3 2 4 6 5 3 6 1 A. Compute the mean and standard deviation of X. Round each statistic to the nearest 1/10th. B. A student is considered to be obsessed with computer games if the number of different computer games the student plays is more than one standard deviation above the mean. How many of the students are obsessed with computer games? C. Determine the percentage of students obsessed with computer games. 4. Q made many phone calls during the month of March, but none were longer than six minutes in length. Let X be the random variable that denotes the length of a phone call made by Q. The frequency distribution of X is given below (x denotes a value of X). 1 21 2 18 3 0 4 39 5 62 6 10 9
A. Compute the mean and standard deviation of X. Round each statistic to the nearest 1/10th. B. How many phone calls did Q make during the five days which had lengths that were more than one standard deviation from the mean? C. During most normal five day periods, around 68% of the phone calls Q makes have lengths within one standard deviation of the mean. Would this five day period be considered normal? Explain your answer. 5. Let X be a random variable which denotes the number of computers owned by the members of a Linux users group. The probability distribution of X is shown below. x P (X = x) 1 0.22 2 0.37 3 0.19 4 0.08 5 0.10 6 0.02 7 0.02 A. Compute the mean and standard deviation of X. Round each statistic to the nearest 1/10th. B. What is the probability that the number of computers a member of the group owns is within one standard deviation of the mean? 6. There are 30 students enrolled in a section of a course. Let X be a random variable which denotes the number of students who attend an entire class meeting of the section. The probability distribution of X is shown below. x P (X = x) x P (X = x) 20 0.06 26 0.18 21 0.07 27 0.15 22 0.05 28 0.08 23 0.09 29 0.04 24 0.10 30 0.05 25 0.13 A. Compute the mean and standard deviation of X. Round each statistic to the nearest 1/10th. B. What is the probability that the number of students attending an entire class is fewer than one standard deviation below the mean? 10
6 Document License (CC BY-ND 4.0) Copyright 2016 2017 Scott P. Randby This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) or later version license. License legal code 6.1 License Links License summary: https://creativecommons.org/licenses/by-nd/4.0/ License legal code: https://creativecommons.org/licenses/by-nd/4.0/legalcode 11