SUMMARY STATISTICS EXAMPLES AND ACTIVITIES

Session 6 SUMMARY STATISTICS EXAMPLES AD ACTIVITIES

Example 1.1 Expand the following: 1. X 2. 2 6 5 X 3. X 2 4 3 4 4. X 4 2 Solution 1. 2 3 2 X X X... X 2. 6 4 X X X X 4 5 6 5 3. X 2 X 3 2 X 4 2 X 5 2 3 4 4. X 4 X 2 4 X 3 4 X 4 4 2 Example 1.2 Write the following in symbols: 5. X1 X 2... X 6. X 4 X5 X 6 7. X 5 X 5 X 5 3 4 5 Solution X X... X X 5. 1 2 1 6. X 4 X 5 X 6 X 6 4 5 7. X 3 5 X 4 5 X 5 5 X 5 3 40

Activity 1.1 Expand the following: 1. X 2. 4 6 X 3. 3 8 5 X Write the following in sigma notations: 4. X 2 X3... X 5. X5 X6 X7 X8 6. X 3 X 3 X 3 5 6 7 Example 1.3 Raw Data Find the mean of the values 4, 5, 2, 3, 10. X X 452310 24 4 5 4.8 Example 1.4 Grouped Frequency Data Find the arithmetic mean of the following data: 2, 2, 2, 3, 3, 4, 4, 4, 5, 5 Solution We may obtain the mean using the formula (5.1) 41

X 32 23 34 25 34 3.4 3 2 3 2 10 Example 1.5 The table below is a grouped frequency distribution of ages of 34 children. Use it to find the mean age of the children. Age f x fx 1-2 5 2 10 4-6 6 5 30 7-9 10 8 80 10-12 4 11 44 13-15 6 14 84 16-18 3 17 51 = 34 fx 299 X fx 299 8.8 34 This method is called the long method and involves a lot of tedious calculations. When you are confronted with large values in the class intervals, the formula in (5.1) makes the computation quite tedious. To avoid this, there is the use of two short methods I will introduce: (i) The Assumed Mean method and (ii) The Coding method. Activity 1.2 Use the following frequency distribution to find the arithmetic mean: Marks 10 19 20 29 30 39 Frequency 2 3 8 42

40 49 50-59 5 2 Example 1.6 Using the data in Example 1.4 and an Assumed mean A = 11, find the mean age of the children. Solution Age f x d = x-a fd 1 3 5 2-9 - 45 4 6 6 5-6 - 36 7 9 10 8-3 - 30 10 12 4 11 0 0 13 15 6 14 + 3 18 16 18 3 17 + 6 18 = 34 Σfd = - 75 Using (5.2), we have: X A fd = 11 + (- 75)/34 = 11 2.2 = 8.8 43

Example 1.7 Use the coding formula to obtain the mean for the following data: Exact Class Boundaries 0 10 10 20 20 30 30 40 40 50 Midpoints (x) 5 15 25 35 45 Frequency (f) 12 15 28 25 20 100 Solution Here, c = 10, we take A = 25. Therefore Exact Class Boundaries 0 10 10 20 20 30 30 40 40 50 Therefore: x 25 u So we have: 10 Midpoints (x) Frequency (f) x 25 fu u 10 5 12-2 -24 15 15-1 -15 25 28 0 0 35 25 1 25 45 20 2 40 100 26 u fu 26 0.26 f 100 To find the mean, we use the formula (5.3): X A cu 25 100.26 27.6 44

Activity 1.3 (a) Using an assumed mean of 34.5, compute the arithmetic mean for the data in Activity 1.2. (b) Use the coding method to obtain the mean for the data in Activity 1.2 Activity 1.4 Determine the median for the following ungrouped data: (a) 10, 5, 13, 12, 15, 17, 21, 7, 9 (b) 10, 8, 7, 15, 21, 22, 23, 14, 8, 20 Example 1.8 Find the median for the table of age distribution of 34 children below: Age Frequency (f) less than cumulative frequency 1 3 5 5 4 6 6 11 7 9 10 21 10 12 4 25 13 15 6 31 16 18 3 34 n = 34 Step 1 Find n 2 of cases = 34/2 = 17. This means we need to count 17 cases out of 34 to locate the class in which the median would lie. Step 2 45

The lowest class (1 3) has 5 cases; therefore the median would not lie in that class. The next class (4 6) has 6 cases and when this is added to the 5 in the previous class we have cumulatively 11 cases. This means the median would not lie in the (4 6) class. The next class (7 9) has 10 cases. If we add the 10 cases to the 11 we have, we would get 21 and this is more than the 17 cases we are looking for. This means we cannot take all the 10 cases lying in this class. This means that the median would lie in the (7 9) class and it becomes the median class. Step 3 Having determined the median class we can now compute the median using the expression given as follows: L 1 = Lower class boundary of class in which median falls: = 6.5 = Total frequency:= 34 Σf 1 = Sum of frequencies in classes below median class: =11 f med = Frequency of median class: =10 c = Class width for the table:= 3 Step 4 Substituting in the expression, we have: f 1 /2 Median L1 c f med 34 11 6.5 2 Median 3 10 6 6.5 3 6.51.8 8.3 10 46

Activity 1.5 List three examples of data you can compute the median from. (a)------------------------------------------- (b) ------------------------------------------ (c) ------------------------------------------ 47

Example 1.9 From the table of age distribution of 34 children the following values are computed L 1 = 6.5 1 = 4 2 = 6 C = 3 Age Distribution of 34 Children Age f 1 3 5 4 6 6 7 9 10 10 12 4 13 15 6 16 18 3 = 34 Mode = L 1 1 + 1 + 2 c 4 = 6.5 4 6 3 6.5 1.2 7.7 Example 1.10 Given that for particular distribution, the mean = 125.7 and the median = 124.9, find the mode. Mode 3Median 2Mean 3124.9 2125.7 123.3 48

Example 1.11 Solution Find Q 1 using the sampled data in an ordered array: 11 12 13 16 16 17 18 21 22 First, note that n = 9. Q = is in the (9+1)/4 = 2.5 ranked value of the ranked data, so use the value half 1 way between the 2 nd and 3 rd ranked values, so Q = 12.5 1 Activity 1.5 Using the Example, compute Q 2 and Q 3. Example 2.1 Compute the MD for the following data: 72, 81, 86, 69, 57 Solution X 365 X 73 5 X X X 72 81 86 69 57 1 8 13 4 16 42 X X 42 MD = 8.4 5 49

This means that on average, the values differ from the mean in absolute terms by 8.4 Example 2.2 Using the raw data constituting the population of values: 72, 81, 86, 69, 57, then the population standard deviation is computed as follows: X X 2 72 81 86 69 57-1 8 13-4 -16 X 2 506 101.2 10.06 5 X 1 64 169 16 256 506 Example 2.3 The population standard deviation σ, for the following data set: 72, 81, 86, 69, 57 is computed as follows: X X 2 72 5184 81 6561 86 7396 69 4761 57 3249 365 27,151 X 2 2 2 365 X 27,151 5 10.059 5 50

Example 2.4 Marks 60 62 63 65 66 68 69 71 72 74 Frequency (f) Class Midpoint (X) X 2 fx fx 2 5 61 3,721 305 18,605 18 64 4,096 1,152 73,728 42 67 4,489 2,814 188,538 27 70 4,900 1,890 132,300 8 73 5,329 584 42,632 100 6,745 455,803 fx 2 2 2 6745 fx 455803 100 8.5275 2.92 100 Example 2.5 Using data from the previous table of the distribution of 34 children we develop the following working sheet: Marks 60 62 63 65 66 68 69 71 72 74 Frequency (f) Class Midpoint (X) d = X- A fd fd 2 5 61-6 -30 180 18 64-3 -54 162 42 67 0 0 0 27 70 3 81 243 8 73 6 48 288 100 45 873 From the table above, we have the following sums: Σf =100, Σfd = 45, Σfd 2 = 873 s 2 2 fd 2 45 f f 1 100 1 fd 873 100 8.6136 2.93 51

Example 2.6 This method is recommended for group data when class interval sizes are equal. It is called the coding method. Marks 60 62 63 65 66 68 69 71 72 74 Frequency (f) 5 18 42 27 8 Class Midpoint (X) 61 64 67 70 73 u X 67-2 - 1 0 1 2 3 fu fu 2-10 -18 100 45 97 0 27 16 20 18 0 27 32 From the table above, we have the following sums: c = 3, Σf =100, Σfu = 15 and Σfu 2 = 97 s fu 2 2 f 2 fu 15 97 c 3 100 3 0.9571 2.93 f 1 100 1 Activity 2.1 1. Find the standard deviation of the following sets of data: (a) 3, 6, 2, 1, 7, 5 (b) 3.2, 4.6, 2.8, 5.2, 4.4 2. The following is the age distribution of 40 students from the University of Ghana: Age distribution of 40 Students Age (f) 18-22 8 23-27 7 28-32 10 33-37 5 52

38-42 7 43-47 3 =40 (a) Find the range and the mean deviation for the table: (b) Compute the standard deviation for the age distribution for the students. 53

Example 3.1 Suppose that two stocks, A and B have the following summary statistics: Stock A: Average price last year = GHC 50 Standard deviation = GHC 5 Stock B: Average price last year = GHC 100 Standard deviation = GHC 5 Which of these two stocks price is less variable? Solution s GHC 5 CVA 100% 100% 10% X GHC 50 s GHC 5 CVB 100% 100% 5% X GHC 100 From the two CVs computed, you can observe that the two stocks have the same standard deviation but stock B is less variable compared to its price. Activity 3.1 The points of two football teams in ten league seasons are given below. Which of the two teams are more consistent? Team A: 32 28 47 63 71 39 10 60 96 14 Team B: 19 31 48 53 67 90 10 62 40 80 [Hint: Compute the CV of each team and compare the results] Example 3.2 Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the z-score for a test score of 620. Solution X X 620 490 130 Z 1.3 s 100 100 A score of 620 is 1.3 standard deviations above the mean and would not be considered an outlier. 54

Activity 3.2 The average performance in mathematics in school A is 80 with a standard deviation of 10. That of school B is 70 and 15 respectively. Adei who is in school A, scored 75 while his friend Obonto, in school B, scored 65. Compare their relative performance in mathematics. 55

Example 4.1 If for a given data set, it is found that: x 10, Mode 8, s 4, then we have P S k x Mode 10 8 s 4 0.5 (i) Bowley s Coefficient of Skewness B Example 4.2 S k Q 2Q Q 1 2 3 BS k Q3 Q1 If for a given data set, it is found that:, 1 1 Q 20, Q 50, Q 80, 1 2 3 then we have B S k Q 2Q Q Q Q 1 2 3 3 1 20 2 50 80 80 20 0 ote that a skewness measure of zero indicates that the underlying distribution is symmetric. Activity 4.1 The following is the age distribution of 40 students from the University of Ghana: Age distribution of 40 Students Age (f) 18-22 8 23-27 7 28-32 10 33-37 5 38-42 7 56

43-47 3 =40 (a) Compute a measure of skewness and commen on your result. (b) Compute a measure of kutosis and comment on your results. Activity 5.1 The following are age distributions for a group of pupils: 14, 8, 9, 5, 17, 12, 20, 7, 6, 2, 4, 7, 8, 9, 8, 17. (a) Compare the quartile values for the distribution. (b) Obtain a box-and-whisker plot for the distribution and use it to describe the shape of the distribution. 57

Section 6: Exploratory Data Analysis using Excel Introduction Welcome to Section 6. Here we will be illustrating the computations of the summary measures we have discussed in Unit 5 i.e. the mean, median, mode, standard deviation, variance, the quartiles, coefficient of skewness, coefficient of kurtosis and the Box-and-whisker plot. Obectives At the end of this section, you will be able to: Use Excel to obtain summary measures to describe the centre, spread and shape of a distribution. General Descriptive Statistics Using Microsoft Excel We illustrate how to obtain general descriptive statistics of a distribution using Microsoft Excel. Example The following are the marks obtained by 50 students in an examination: 83 64 84 76 84 54 75 59 70 61 63 80 84 73 68 52 65 90 52 77 95 36 78 61 59 84 95 47 87 60 54 79 47 64 36 45 51 24 78 26 33 60 18 68 35 58 48 53 55 45 Solution 1. Enter the data into one column of cells as shown. 58

2. Select Data. 3. Select Data Analysis. 4. Select Descriptive Statistics and click OK. 59

5. Select the cell range that contains the data and put into the Input Range: 6. Check Labels in first row. 7. Select ew worksheet Ply: to store the output in a new worksheet. 8. Select Summary statistics. 9. Click OK. Microsoft Excel descriptive statistics output 60

Activity 5.1 Collect data on ages of a sample of 50 students in your class. Enter the data in Excel and perform an exploratory data analysis to obtain summary statistical measures. Write a brief report of your findings in not more than one page. 61