Variance, Standard Deviation Counting Techniques Section 1.3 & 2.1 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston 1 / 52
Outline 1 Quartiles 2 The 1.5IQR Rule 3 Understanding Standard Deviation 4 Calculating The Standard Deviation 5 Coefficient of Variation 6 Counting Techniques 7 Permutations 8 Combinations 2 / 52
Types of Measurements for the Spread Range Percentiles Quartiles IQR; Interquartile range Variance Standard deviation Coefficient of Variation 3 / 52
The Quartiles The first quartile is 25th percentile, Q 1. The second quartile is the median and the 50th percentile, Q 2. The third quartile is the 75th percentile, Q 3. 4 / 52
Interquartile Range Interquartile range, IQR, is the difference between Q 3 and Q 1 IQR = Q 3 Q 1 5 / 52
Example Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Find Q 1, Q 2, Q 3, the range and the IQR. 6 / 52
Find the Five Number Summary of the Course Scores > stem(grades$score,scale=0.5) The decimal point is 1 digit(s) to the right of the 0 827 2 2 4 0391 6 78825 8 0134445701238 10 1114 7 / 52
Detecting Outliers: 1.5IQR Rule An outlier is an observation that is "distant" from the rest of the data. Outliers can occur by chance or by measurement errors. Any point that falls outside the interval calculated by Q 1 1.5(IQR) and Q 3 + 1.5(IQR) is considered an outlier. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 8 / 52
Outliers for Basketball Shoe Prices? Recall: Q 1 = 130, Q 3 = 215, So IQR = 215-130 = 85. Q 1 1.5(IQR) = 130 1.5(85) = 2.5 Q 3 + 1.5(IQR) = 215 + 1.5(85) = 342.5 Any price that is below $2.50 or above $342.50 is considered an outlier. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 9 / 52
Outliers? The following is information from 91 pairs of basketball shoes: > fivenum(shoes$price) [1] 40 75 90 120 250 The highest four numbers in the dataset is..., 170, 225, 250, 250. Are there any prices that are considered an outlier? 10 / 52
A Graph of the Five Number Summary: Boxplot A central box spans the quartiles. A line inside the box marks the median. Lines extend from the box out to the smallest and largest observations. Asterisks represents any values that are considered to be outliers. Boxplots are most useful for side-by-side comparison of several distributions. Rcode: boxplot(dataset name$variable name) 11 / 52
Boxplot of Prices 50 100 150 200 250 boxplot(shoes$price,horizontal = T) 12 / 52
Boxplot of Course Scores 20 40 60 80 100 13 / 52
Boxplot of Course Scores by Session Fal15 Sp16 Sum16 20 40 60 80 100 boxplot(grades$score~grades$session,horizontal=true) 14 / 52
Question about the Graphs Given the first type of plot indicated in each pair, which of the second plots could not always be generated from it? a) dot plot, histogram b) stem and leaf, dot plot c) histogram, stem and leaf d) dot plot, box plot 15 / 52
Measuring Spread: The Standard Deviation Measures spread by looking at how far the observations are from their mean. Most common numerical description for the spread of a distribution. A larger standard deviation implies that the values have a wider spread from the mean. Denoted s when used with a sample. This is the one we calculate from a list of values. Denoted σ when used with a population. This is the "idealized" standard deviation. The standard deviation has the same units of measurements as the original observations. 16 / 52
Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. 17 / 52
Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 18 / 52
Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 19 / 52
Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 20 / 52
Population Variance and Standard Deviation If N is the number of values in a population with mean mu, and x i represents each individual in the population, the the population variance is found by: σ 2 = N i=1 (x i µ) 2 N and the population standard deviation is the square root, σ = σ 2. 21 / 52
Sample Variance and Standard Deviation Most of the time we are working with a sample instead of a population. So the sample variance is found by: s 2 = n i=1 (x i x) 2 n 1 and the sample standard deviation is the square root, s = s 2. Where n is the number of observations (samples), x i is the value for the i th observation and x is the sample mean. 22 / 52
Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 23 / 52
Step 6: Standard Deviation 6. Find the square root of the variance. This is the standard deviation s = 1 n (x i x) n 1 2 i=1 24 / 52
Example: Section A Determine the sample standard deviation of the test scores for Section A. Section A Scores (X i ) 65 66 67 68 71 73 74 77 77 77 25 / 52
Step 1: Calculate the Mean The sample mean is x = 71.5. 26 / 52
Use Table To Calculate Standard Deviation Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 66 67 68 71 73 74 77 77 77 sum 27 / 52
Step 2: Calculate Deviations For All Values Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 66 66 71.5 = 5.5 67 67 71.5 = 4.5 68 68 71.5 = 3.5 71 71 71.5 = 0.5 73 73 71.5 = 1.5 74 74 71.5 = 2.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 sum 28 / 52
Step 3: Calculate Squared Deviations Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 29 / 52
Step 4: Calculate the Sum of the Squared Deviations Variable Deviations Deviations Squared Score(X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum n i=1 (X i X) 2 = 204.5 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 30 / 52
Step 5: Calculate the Variance variance = s 2 = 1 n 1 n (x i x) 2 i=1 = 204.5 9 = 22.7222 31 / 52
Step 6: Take the Square Root of the Variance standard deviation = s = 1 n 1 = 22.7222 = 4.77 n (x i x) 2 i=1 32 / 52
Sample Standard Deviation of Section A test scores Sample standard deviation is s = 4.77. This implies that from the sample of the 10 students from section A the tests scores has a spread, on average, of 4.77 points from the mean of 71.50 points. athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 33 / 52
Example A statistics teacher wants to decide whether or not to curve an exam. From her class of 300 students, she chose a sample of 10 students and their grade were: 72, 88, 85, 81, 60, 54, 70, 72, 63, 43 Determine the sample mean. What is the variance? What is the standard deviation? athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 34 / 52
Add 10 Suppose the statistics instructor decides to curve the grade by adding 10 points to each score. What is the new mean, variance and standard deviation? 35 / 52
Multiply by 2 For the following dataset the mean is x = 4.5, the variance is s 2 = 3.5 and the standard deviation is s = 1.870829. 3, 6, 2, 7, 4, 5 Now, multiply each value by 2. What is the new variance and the new standard deviation? 36 / 52
Calculating Standard Deviation For larger data sets use a calculator or computer software. Each calculator is different if you cannot determine how to compute standard deviation from your calculator ask your instructor. For this course we will be using R as the software. The function for the sample standard deviation in R is sd(data name$variable name). athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Section 1.3 University & 2.1 of Houston ) 37 / 52
Coefficient of Variation This is to compare the variation between two groups. The coefficient of variation (cv) is the ratio of the standard deviation to the mean. cv = sd mean A smaller ratio will indicate less variation in the data. 38 / 52
CV of test scores Section A Section B Sample Size 10 10 Sample Mean 71.5 71.5 Sample Standard Deviation 4.770 18.22 4.77 CV 71.5 = 0.066 18.22 71.5 = 0.2548 39 / 52
CV Example The following statistics were collected on two different groups of stock prices: Portfolio A Portfolio B Sample size 10 15 Sample mean $52.65 $49.80 Sample standard deviation $6.50 $2.95 What can be said about the variability of each portfolio? 40 / 52
Beginning Example In the city of Milford, applications for zoning changes go through a two-step process: 1. A review by the panning commission. 2. A final decision by the city council. At step 1 the planning commission reviews the zoning change request and makes a positive or negative recommendation concerning the change. At step 2 the city council reviews the planning commission s recommendation and then votes to approve or to disapprove the zoning change. How many possible decisions can be made for a zoning change in Milford? 41 / 52
Counting Rules If an experiment can be described as a sequence of k steps with n 1 possible outcomes on the first step, n 2 possible outcomes on the second step, and so on, then the total number of experimental outcomes is given by (n 1 )(n 2 )... (n k ). A tree diagram can be used as a graphical representation in visualizing a multiple-step experiment. 42 / 52
Tree diagram Step 1 Planning Commission Step 2 City Council approve Sample Points (positive, approve) positive disapprove (positive, disapprove) negative approve (negative, approve) disapprove (negative, disapprove) 43 / 52
Examples How many ways can you create a pizza choosing a meat and two veggies if you have 3 choices of meats and 4 choices for veggies? In how many ways can 6 people be seated in a row? How many possible outcomes can we have when rolling a pair of 6-sided die? 44 / 52
Permutations It allows one to compute the number of outcomes when r objects are to be selected from a set of n objects where the order of selection is important. The number of permutations is given by P n r = n! (n r)! Where n! = n(n 1)(n 2) (2)(1) Rocode for n!: factorial(n) 45 / 52
Allowing Repeated Values When we allow repeated values, The number of orderings of n objects taken r at a time, with repetition is n r. Example: In how many ways can you write 4 letters on a tag using each of the letters C O U G A R with repetition? 46 / 52
Several Objects At Once The number of permutations, P, of n objects taken n at a time with r objects alike, s of another kind alike, and t of another kind alike is P = n! r!s!t! Example: How many different words (they do not have to be real words) can be formed from the letters in the word MISSISSIPPI? 47 / 52
Objects Taken of Circular The number of circular permutations of n objects is (n 1)!. Example: In how many ways can 12 people be seated around a circular table? 48 / 52
Combinations Counts the number of experimental outcomes when the experiment involves selecting r objects from a (usually larger) set of n objects. The number of combinations of n objects taken r unordered at a time is Rcode: choose(n,r) C n r = ( ) n r = n! r!(n r)! 49 / 52
Difference Between Combinations and Permutations 50 / 52
Examples In how many ways can a committee of 5 be chosen from a group of 12 people? In a manufacturing company they have to choose 5 out of 50 boxes to be sent to a store. How many ways can they choose the 5 boxes? 51 / 52
Examples 1. A researcher selects 3 fish from a tank of 12 and puts each of the 3 fish into different containers. How many ways can this be done? 2. Among 10 electrical components 2 are known not to function. If 5 components are randomly selected, how many ways can we have only one of components not functioning? 52 / 52