Handout 4 numerical descriptive measures part Calculating Mean for Grouped Data mf Mean for population data: µ mf Mean for sample data: x n where m is the midpoint and f is the frequency of a class. Example 1 The following table gives the frequency distribution of the daily commuting times (in minutes) from home to work for all 5 employees of a company. Calculate the mean of the daily commuting times. mf µ 535 5 1.40 minutes Thus, the employees of this company spend an average of 1.40 minutes a day commuting from home to work. Variance and Standard Deviation for Grouped Data Short-Cut Formulas for the Variance and Standard Deviation for Grouped Data σ ( mf ) and ( mf ) n 1 where σ² is the population variance, s² is the sample variance, and m is the midpoint of a class. The standard deviation is obtained by taking the positive square root of the variance. s n 1
Example 1 Calculate the variance and standard deviation Measures of position. Quartiles and Interquartile Range Quartiles are three summery measures that divide a ranked data set into four equal parts. The second quartile is the same as the median of a data set. The first quartile is the value of the middle term among the observations that are less than the median, and the third quartile is the value of the middle term among the observations that are greater than the median. ( mf ) σ (535 ) 14,85 5 5 3376 5 135.04 σ σ 135.04 11.6 minutes Example The following table gives the 008 profits (rounded to billions of dollars) of 1 companies selected from all over the world. That table is reproduced below. a) Find the values of the three quartiles. Where does the 008 profits of Merck & Co fall in relation to these quartiles? b) Find the interquartile range. c) Find the value of the 4nd percentile. Give a brief interpretation of the 4nd percentile.
c)the data arranged in increasing order as follows: a) By looking at the position of $8 billion, which is the 008 profit of Merck & Co, we can state that this value lies in the bottom 5% of the profits for 008. b)iqr Interquartile range Q3 Q1 15.5 9.5 $6 billion 7 8 9 10 11 1 13 13 14 17 17 45 The position of the 4nd percentile is kn (4)(1) 5.04th term 100 100 The value of the 5.04th term can be approximated by the value of the fifth term in the ranked data. Therefore, P k 4nd percentile 11 $11 billion Thus, approximately 4% of these 1 companies had 008 profits less than or equal to $11 billion. umber of values less than xi Percentile rank of xi 100 Total number of values in the data set 7 8 9 10 11 1 13 13 14 17 17 45 8 Percentile rank of 14 100 66.67% 1 Box-Plot A box-plot is a graphical display of data which gives information about: range of values, median and quartiles; minimum and maximum values. Sort data in increasing order, from smallest to largest; Compute Q1, Q3 and the median Q. Compute the IQR (Interquartile range)( Q3 Q1); Draw a horizontal (or vertical) line representing the scale of measurement. Form a box, near the line, with the end at Q1 and Q3, draw a line in the middle at the location of the median. Upper and lower fences are used to find outliers (unusual observations), observations that lie outside these fences: - Lower fence: Q1-1.5*IQR - Upper fence : Q3 + 1.5 * IQR Whiskers go to the lowest observation which is not an outlier and to the highest observation which is also not an outlier. 3
Interpreting Box Plots Median line left of center and long right whisker skewed right Median line in center of box and whiskers of equal length symmetric distribution Median line right of center and long left whisker skewed left Example 3 Amount of sodium in 8 brands of cheese: 60 90 300 30 330 340 340 50 Construct a box-and-whisker plot for these data. Example 4: A data set was collected to compare the birth weight of babies whose mothers smoked during the pregnancy and the weight of babies whose mother did not. The following summary values were calculated: Smoker moms: min.8, Q13.8, Q6.1, Q36.7, max8. onsmoker moms: min3, Q14.5, Q6.8, Q38, max8.5 3 4 5 6 7 8 9 pounds Example 5 The test scores on a 100-point test were recorded for 9 students: 61,50,100,85,15,65,75,70,90 Compute: a) the sample mean,median and mode b) the upper and lower quartiles,and the IQR c)construct a stem and leaf plot,box-plot and histogram for this data d)write a brief description of this data set (symmetry,outliers) e) based on the graph and calculations identify what statistic is the best indicator of a typical score. f) suppose 15 were changed to 10 in the above data.would the mean increase,decrease,or stay the same?same question for median and standard deviation. 4
Descriptive Statistics: score Minitab -output Variable * Mean SE Mean StDev Minimum Q1 Median Q3 Max score 9 0 67.89 8.37 5.10 15.00 55.50 70.00 87.50 100.00.0 1.5 Histogramof score Variable IQR score 3.00 Frequency 1.0 Stem-and-leaf of score 9 Leaf Unit 1.0 0.5 0.0 0 40 60 80 100 score 1 1 5 1 1 3 1 4 5 0 4 6 15 () 7 05 3 8 5 9 0 1 10 0 score 100 90 80 70 60 50 40 30 0 10 Boxplot of score 5