OCR Statistics 1 Working with data. Section 2: Measures of location

OCR Statstcs 1 Workng wth data Secton 2: Measures of locaton Notes and Examples These notes have sub-sectons on: The medan Estmatng the medan from grouped data The mean Estmatng the mean from grouped data Codng data The mode Comparson of measures of locaton The medan When data s arranged n order, the medan s the tem of data n the mddle. However, when there s an even number of data, the mddle one les between two values, and we use the mean of these two values for the medan. For example, ths dataset has 9 tems: 1 1 3 4 6 7 7 9 10 There are 4 data tems below the 5 th and 4 tems above; so the mddle tem s the 5 th, whch s 6. If another tem of data s added to gve 10 tems, the mddle tems are the 5 th and 6 th : 1 1 3 4 6 7 7 9 10 12 so the medan s the mean average of the 5 th and 6 th tems,.e. 6 7 6.5. 2 Example 1 Fnd the medan of the data dsplayed n ths stem and leaf dagram 16 5 5 6 7 8 n = 20 17 0 0 1 3 3 7 8 9 17 3 represents 1.73 18 2 2 2 5 5 8 19 0 Countng from the lowest tem (1.65), the 10th s 1.73 and the 11th s 1.77. The medan s therefore 1.73 1.77 1.75. 2 1 of 10 13/11/13 MEI

When you want to fnd the medan of a data set presented n a frequency table, one useful pont s that the data s already ordered. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 For ths data set, there are 20 data tems, so the medan s the mean of the 10 th and 11 th tems. For ths small set of data, t s easy to see that the 10 th data tem s 3 and the 11 th s 4. The medan s therefore 3.5. However, for a larger set of data t may be more dffcult to dentfy the mddle tem or tems. One way to make ths a lttle easer s to use a cumulatve frequency table. x f Cum. freq. 1 3 3 2 5 8 3 2 10 4 3 13 5 4 17 6 3 20 The thrd column gves the cumulatve frequency. Ths s the total of the frequences so far. You can fnd each cumulatve frequency by addng each frequency to the prevous cumulatve frequency. E.g., for x = 4, the cumulatve frequency s 10 + 3 = 13. The fnal value of the cumulatve frequency (n ths case 20) tells you the total of the frequences. The cumulatve frequences show that the 10 th tem s 32 and the 11 th tem s 4. So the medan s 3.5. Estmatng the medan from grouped data Cumulatve frequency curves are useful for estmatng the medan of a large data set, as shown n the next example. 2 of 10 13/11/13 MEI

Example 2 Estmate the medan of the followng dataset, whch gves the mass of 100 eggs: Mass, m (g) Frequency 40 m < 45 4 45 m < 50 15 50 m < 55 15 55 m < 60 22 60 m < 65 17 65 m < 70 16 70 m < 75 11 75 m < 80 0 Mass, m (g) Frequency Mass Cumulatve frequency m < 40 0 40 m < 45 4 m < 45 4 45 m < 50 15 m < 50 19 50 m < 55 15 m < 55 34 55 m < 60 22 m < 60 56 60 m < 65 17 m < 65 73 65 m < 70 16 m < 70 89 70 m < 75 11 m < 75 100 The cumulatve frequency curve s drawn below: c.f. 100 80 60 40 20 40 50 60 70 mass (kg) Medan = 58 50 of the eggs le below the medan, shown by the red lne. 3 of 10 13/11/13 MEI

The mean When people talk about the average, t s usually the mean they mean! Ths s the sum of the data dvded by the number of tems of data. We can express ths usng mathematcal notaton as follows: x denotes the mean value of x For the data set x 1, x 2, x 3, x 4, x n, x 1 n x n 1 s the Greek letter sgma and stands for the sum of. The whole expresson s sayng: The mean ( x ) s equal to the sum of all the data tems (X for = 1 to n) dvded by the number of data tems (n). Example 3 shows a very smple calculaton set out usng ths formal notaton. Example 3 Fnd the mean of the data set {6, 7, 8, 8, 9}. x 1 = 6, x 2 = 7, x 3 = 8, x 4 = 8, x 5 = 9, n = 5 5 x 1 x1 x2 x3 x4 x5 6 7 8 8 9 x 5 5 5 7.6 When calculatng the mean from a frequency table, you need to be careful to use the correct totals. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 The mean of the data shown n the frequency table above can be wrtten as 111 2 2 2 2 2 3 3 4 4 4 5 5 5 5 6 6 6 69 x 3.45 20 20 An alternatve way of wrtng ths s 31 52 23 3 4 45 36 69 x 3.45 3 5 2 3 4 3 20 Ths can be expressed more formally as Each value of x s multpled by ts frequency, and then the results are added together. 4 of 10 13/11/13 MEI

x 6 1 6 1 fx f The frequences are added to fnd the total number of data tems It s helpful to add another column to the frequency table, for the product fx. x f fx 1 3 3 2 5 10 3 2 6 4 3 12 5 4 20 6 3 18 Total f 20 fx 69 Then you can smply add up the two columns and use the totals to calculate the mean. fx 69 x 3.45 f 20 In general, when the data s gven usng frequences, the formula for the mean s: x n 1 n 1 fx f Estmatng the mean from grouped data When the data s grouped nto classes, you can stll estmate the mean by usng the mdpont of the classes (the md-nterval value). Ths means that you assume that all the values n each class nterval are equally spaced about the md-pont. You can show most of the calculatons n a table, as shown n the followng example. Example 4 Estmate the mean weght for the followng data: 5 of 10 13/11/13 MEI

Weght, w, (kg) Frequency 50 w < 60 3 60 w < 70 5 70 w < 80 7 80 w < 90 3 90 w < 100 2 Total 20 The md-nterval value s the mean of the upper and lower bound of the weght. Weght, w, (kg) Md-nterval Frequency, f fx value, x 50 w < 60 55 3 165 60 w < 70 65 5 325 70 w < 80 75 7 525 80 w < 90 85 3 255 90 w < 100 95 2 190 f 20 fx 1460 fx 1460 x 73 f 20 The mean weght s estmated to be 73 kg. To fnd md-nterval values, you need to thnk carefully about the upper and lower bounds of each nterval. In the example above, t s clear what these bounds are. However, f the ntervals had been expressed as 50 59, 60 69 and so on, then t s clear that the orgnal weghts had been rounded to the nearest klogram, and the ntervals were actually 49.5 w < 59.5, 59.5 w < 69.5, etc. So n that case the md-nterval values would be 54.5, 64.5 and so on. Codng data It s sometmes possble to smplfy the calculaton of the mean by codng the data. You can transform the data usng a lnear codng: y a bx You can undo ths codng: y a x b Snce each data tem has been transformed usng ths codng, the mean of the data undergoes the same transformaton. So the mean of the coded data, y, s related to the mean of the orgnal data, x, by the equaton 6 of 10 13/11/13 MEI

y a bx. For example, the data set {30, 50, 20, 70, 40, 20, 30, 60} could be smplfed by dvdng all the data by 10. x Ths means usng the codng y. 10 whch gves the new data set {3, 5, 2, 7, 4, 2, 3, 6}. You can fnd the mean y of ths new data set. Then, snce x = 10y, you can fnd the mean of the orgnal data usng the equaton x 10y. Alternatvely, the numbers could be made smaller by subtractng 20 before x 20 dvdng by 10. Ths s the codng y 10 whch gves the new data set {1, 3, 0, 5, 2, 0, 1, 4} You can fnd the mean, y of ths new data set. Then, snce x = 10y + 20, you can fnd the mean of the orgnal data usng the equaton x 10y 20. Codng s especally useful when dealng wth grouped data, snce n these cases you are dealng wth md-nterval values whch follow a fxed pattern. For example, f you were dealng wth heghts grouped as 100-109, 110-119 etc., you would be workng wth md-nterval values of 104.5, 114.5, 124.5 etc. x 104.5 By usng the codng y, you would be workng wth y values of 0, 1, 10 2, etc. You mght feel that snce you can use a calculator, then smplfyng the numbers s of lttle value. However, the calculatons nvolved can be qute long-wnded, and t s easy to make a mstake n enterng the numbers. If the numbers are smpler then you are less lkely to make a mstake. In addton, you may be requred n an examnaton queston to show that you understand ths method. Example 5 Use lnear codng to calculate the mean of the followng data: Weght, w, (grams) Frequency, f 0 w < 10 4 10 w < 20 6 20 w < 30 9 30 w < 40 7 40 w < 50 4 The md-nterval values (denoted by x) are 5, 15, 25, etc. A convenent codng s x 5 y 10 7 of 10 13/11/13 MEI

The correspondng y values become 0, 1, 2, x y f fy 5 0 4 0 15 1 6 6 25 2 9 18 35 3 7 21 45 4 4 16 f 30 fy 61 61 y 2.03333 30 x 5 y x10y 5 10 61 x 10y 5 10 5 25.33 30 The mode The mode s the most common or frequent tem of data; n other words the tem wth the hghest frequency. So for the data set {6, 7, 8, 8, 9} the mode s 8 as ths appears twce. There may be more than one mode, f more than one tem has the hghest frequency. Identfyng the mode s easy when data are gven n a frequency table. x f 1 3 2 5 3 2 4 3 5 4 6 3 Total 20 The hghest frequency s for x = 2. So the mode s 2. Comparson of measures of locaton The mean ncludes all the data n the average, and takes account of the numercal value of all the data. So exceptonally large or small tems of data can have a large effect on the mean t s susceptble to outlers. 8 of 10 13/11/13 MEI

The medan s less senstve to hgh and low values (outlers), as t s smply the mddle value n order of sze. If the numercal values of each of the tems of data s relevant to the average, then the mean s a better measure; f not, the use the medan. The mode pcks out the commonest data tem. Ths s only sgnfcant f there are relatvely hgh frequences nvolved. It takes no account at all of the numercal values of the data. Suppose you are negotatng a salary ncrease for employees at a small frm. The salares are currently as follows: 6000, 12000, 14000, 14000, 15000, 15000, 15000, 15000, 16000, 16000, 18000, 18000, 18000, 20000, 100000 The 6000 s a part-tme worker who works only two days a week The 100000 s the managng drector The mean salary s 20800 The medan salary s 15000 The modal salary s also 15000 Whch s the most approprate measure? If you were the managng drector, you mght quote the mean of 20800, but of the current employees she s the only one who earns more than ths amount. If you were the unon representatve, you would quote the medan or the mode ( 15000), as these gve the lowest averages. Ths s certanly more typcal of the majorty of workers. There s no rght answer to the approprate average to take t depends on the purpose to whch t s put. However, t s clear that: The mean takes account of the numercal value of all the data, and s hgher due to the effect of the 100000 salary, whch s an outler. The medan and mode are not affected by the outlers ( 100000 and 6000) 9 of 10 13/11/13 MEI

Example 6 Shance receves the followng marks for her end-of-term exams: Subject Mark (%) Maths 30 Englsh 80 Physcs 45 Chemstry 47 French 47 Hstory 50 Bology 46 Relgous Educaton 55 Calculate the mean, medan and mode. Comment on whch s the most approprate measure of average for ths data. The mean = 30 80 45 47 47 50 46 55 50 8 In numercal order, the results are: 30, 45, 46, 47, 47, 50, 55, 80 The medan s therefore 47. The mode s 47, as there are two of these and only one each of the other marks. The mode s not sutable there s no sgnfcance n gettng two scores of 47. The medan or the mean could be used. The mean s hgher snce t takes more account of the hgh Englsh result. The medan s perhaps the most representatve, and she got 4 scores n the range 45-47; but Shance would no doubt use the mean to make more of her good Englsh result! 10 of 10 13/11/13 MEI