Chapter 3 Descriptive Statistics: Numerical Measures Part B

Sldes Prepared by JOHN S. LOUCKS St. Edward s Unversty Slde 1 Chapter 3 Descrptve Statstcs: Numercal Measures Part B Measures of Dstrbuton Shape, Relatve Locaton, and Detectng Outlers Eploratory Data Analyss Measures of Assocaton Between Two Varables The Weghted Mean and Workng wth Grouped Data Slde 2 Measures of Dstrbuton Shape, Relatve Locaton, and Detectng Outlers Dstrbuton Shape z-scores Chebyshev s Theorem Emprcal Rule Detectng Outlers Slde 3 1

Dstrbuton Shape: Skewness An mportant measure of the shape of a dstrbuton s called skewness. The formula for computng skewness for a data set s somewhat comple. Skewness can be easly computed usng statstcal software. Slde 4 Dstrbuton Shape: Skewness Symmetrc (not skewed) Skewness s zero. Mean and medan are equal. Relatve Frequency.35.3.25.2.15.1.5 Skewness = Slde 5 Dstrbuton Shape: Skewness Moderately Skewed Left Skewness s negatve. Mean wll usually be less than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness =.31 Slde 6 2

Dstrbuton Shape: Skewness Moderately Skewed Rght Skewness s postve. Mean wll usually be more than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness =.31 Slde 7 Dstrbuton Shape: Skewness Hghly Skewed Rght Skewness s postve (often above 1.). Mean wll usually be more than the medan. Relatve Frequency.35.3.25.2.15.1.5 Skewness = 1.25 Slde 8 Dstrbuton Shape: Skewness Eample: Apartment Rents Seventy effcency apartments were randomly sampled n a small college town. The monthly rent prces for these apartments are lsted n ascendng order on the net slde. Slde 9 3

Dstrbuton Shape: Skewness 425 43 43 435 435 435 435 435 44 44 44 44 44 445 445 445 445 445 45 45 45 45 45 45 45 46 46 46 465 465 465 47 47 472 475 475 475 48 48 48 48 485 49 49 49 5 5 5 5 51 51 515 525 525 525 535 549 55 57 57 575 575 58 59 6 6 6 6 615 615 Slde 1 Dstrbuton Shape: Skewness Relatve Frequency.35.3.25.2.15.1.5 Skewness =.92 Slde 11 z-scores The z-score s often called the standardzed value. It denotes the number of standard devatons a data value s from the mean. z = s Slde 12 4

z-scores An observaton s z-score s a measure of the relatve locaton of the observaton n a data set. A data value less than the sample mean wll have a z-score less than zero. A data value greater than the sample mean wll have a z-score greater than zero. A data value equal to the sample mean wll have a z-score of zero. Slde 13 z-scores z-score of Smallest Value (425) 425 49.8 z = = = 1.2 s 54.74 Standardzed Values for Apartment Rents -1.2-1.11-1.11-1.2-1.2-1.2-1.2-1.2 -.93 -.93 -.93 -.93 -.93 -.84 -.84 -.84 -.84 -.84 -.75 -.75 -.75 -.75 -.75 -.75 -.75 -.56 -.56 -.56 -.47 -.47 -.47 -.38 -.38 -.34 -.29 -.29 -.29 -.2 -.2 -.2 -.2 -.11 -.1 -.1 -.1.17.17.17.17.35.35.44.62.62.62.81 1.6 1.8 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 Slde 14 Chebyshev s Theorem At least (1-1/z 2 ) of the tems n any data set wll be wthn z standard devatons of the mean, where z s any value greater than 1. Slde 15 5

Chebyshev s Theorem At least 75% of the data values must be wthn z = 2 standard devatons of the mean. At least 89% of the data values must be wthn z = 3 standard devatons of the mean. At least 94% of the data values must be wthn z = 4 standard devatons of the mean. Slde 16 Chebyshev s Theorem For eample: Let z = 1.5 wth = 49.8 and s = 54.74 At least (1 1/(1.5) 2 ) = 1.44 =.56 or 56% of the rent values must be between - z(s) = 49.8 1.5(54.74) = 49 and + z(s) = 49.8 + 1.5(54.74) = 573 (Actually, 86% of the rent values are between 49 and 573.) Slde 17 Emprcal Rule For data havng a bell-shaped dstrbuton: 68.26% of the values of a normal random varable are wthn +/- 1 standard devaton of ts mean. 95.44% of the values of a normal random varable are wthn +/- 2 standard devatons of ts mean. 99.72% of the values of a normal random varable are wthn +/- 3 standard devatons of ts mean. Slde 18 6

Emprcal Rule 99.72% 95.44% 68.26% µ 3σ µ 1σ µ 2σ µ µ + 1σ µ + 3σ µ + 2σ Slde 19 Detectng Outlers An outler s an unusually small or unusually large value n a data set. A data value wth a z-score less than -3 or greater than +3 mght be consdered an outler. It mght be: an ncorrectly recorded data value a data value that was ncorrectly ncluded n the data set a correctly recorded data value that belongs n the data set Slde 2 Detectng Outlers The most etreme z-scores are -1.2 and 2.27 Usng z > 3 as the crteron for an outler, there are no outlers n ths data set. Standardzed Values for Apartment Rents -1.2-1.11-1.11-1.2-1.2-1.2-1.2-1.2 -.93 -.93 -.93 -.93 -.93 -.84 -.84 -.84 -.84 -.84 -.75 -.75 -.75 -.75 -.75 -.75 -.75 -.56 -.56 -.56 -.47 -.47 -.47 -.38 -.38 -.34 -.29 -.29 -.29 -.2 -.2 -.2 -.2 -.11 -.1 -.1 -.1.17.17.17.17.35.35.44.62.62.62.81 1.6 1.8 1.45 1.45 1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27 Slde 21 7

Eploratory Data Analyss Fve-Number Summary Bo Plot Slde 22 Fve-Number Summary 1 Smallest Value 2 3 4 5 Frst Quartle Medan Thrd Quartle Largest Value Slde 23 Fve-Number Summary Lowest Value = 425 Frst Quartle = 445 Medan = 475 Thrd Quartle = 525 Largest Value = 615 425 43 43 435 435 435 435 435 44 44 44 44 44 445 445 445 445 445 45 45 45 45 45 45 45 46 46 46 465 465 465 47 47 472 475 475 475 48 48 48 48 485 49 49 49 5 5 5 5 51 51 515 525 525 525 535 549 55 57 57 575 575 58 59 6 6 6 6 615 615 Slde 24 8

Bo Plot A bo s drawn wth ts ends located at the frst and thrd quartles. A vertcal lne s drawn n the bo at the locaton of the medan (second quartle). 375 4 425 45 475 5 525 55 575 6 625 Q1 = 445 Q3 = 525 Q2 = 475 Slde 25 Bo Plot Lmts are located (not drawn) usng the nterquartle range (IQR). Data outsde these lmts are consdered outlers. The locatons of each outler s shown wth the symbol *. contnued Slde 26 Bo Plot The lower lmt s located 1.5(IQR) below Q1. Lower Lmt: Q1-1.5(IQR) = 445-1.5(75) = 332.5 The upper lmt s located 1.5(IQR) above Q3. Upper Lmt: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outlers (values less than 332.5 or greater than 637.5) n the apartment rent data. Slde 27 9

Bo Plot Whskers (dashed lnes) are drawn from the ends of the bo to the smallest and largest data values nsde the lmts. 375 4 425 45 475 5 525 55 575 6 625 Smallest value nsde lmts = 425 Largest value nsde lmts = 615 Slde 28 Measures of Assocaton Between Two Varables Covarance Correlaton Coeffcent Slde 29 Covarance The covarance s a measure of the lnear assocaton between two varables. Postve values ndcate a postve relatonshp. Negatve values ndcate a negatve relatonshp. Slde 3 1

Covarance The correlaton coeffcent s computed as follows: ( s )( y y) y = n 1 for samples ( µ )( y µ y ) σ y = N for populatons Slde 31 Correlaton Coeffcent The coeffcent can take on values between -1 and +1. Values near -1 ndcate a strong negatve lnear relatonshp. Values near +1 ndcate a strong postve lnear relatonshp. Slde 32 Correlaton Coeffcent The correlaton coeffcent s computed as follows: s y σ y r y = ρ y = ss y σσ y for samples for populatons Slde 33 11

Correlaton Coeffcent Correlaton s a measure of lnear assocaton and not necessarly causaton. Just because two varables are hghly correlated, t does not mean that one varable s the cause of the other. Slde 34 Covarance and Correlaton Coeffcent A golfer s nterested n nvestgatng the relatonshp, f any, between drvng dstance and 18-hole score. Average Drvng Dstance (yds.) 277.6 259.5 269.1 267. 255.6 272.9 Average 18-Hole Score 69 71 7 7 71 69 Slde 35 Covarance and Correlaton Coeffcent Average Std. Dev. 277.6 259.5 269.1 267. 255.6 272.9 y 69 71 7 7 71 69 ( y y ) ( )( y y ) ( ) 1.65-7.45 2.15.5-11.35 5.95-1. 1. 1. -1. -1.65-7.45-11.35-5.95 266.95 7. Total -35.4 8.2192.8944 Slde 36 12

Covarance and Correlaton Coeffcent Sample Covarance ( )( ) y y s 35.4 y = = = 7.8 n 1 6 1 Sample Correlaton Coeffcent s y 7.8 r y = = = -.9631 ss (8.2192)(.8944) y Slde 37 The Weghted Mean and Workng wth Grouped Data Weghted Mean Mean for Grouped Data Varance for Grouped Data Standard Devaton for Grouped Data Slde 38 Weghted Mean When the mean s computed by gvng each data value a weght that reflects ts mportance, t s referred to as a weghted mean. In the computaton of a grade pont average (GPA), the weghts are the number of credt hours earned for each grade. When data values vary n mportance, the analyst must choose the weght that best reflects the mportance of each value. Slde 39 13

Weghted Mean where: w = w = value of observaton w = weght for observaton Slde 4 Grouped Data The weghted mean computaton can be used to obtan appromatons of the mean, varance, and standard devaton for the grouped data. To compute the weghted mean, we treat the mdpont of each class as though t were the mean of all tems n the class. We compute a weghted mean of the class mdponts usng the class frequences as weghts. Smlarly, n computng the varance and standard devaton, the class frequences are used as weghts. Slde 41 Mean for Grouped Data Sample Data f M = n = Populaton Data where: µ = f M N f = frequency of class M = mdpont of class Slde 42 14

Sample Mean for Grouped Data Gven below s the prevous sample of monthly rents for 7 effcency apartments, presented here as grouped data n the form of a frequency dstrbuton. Rent ($) Frequency 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Slde 43 Sample Mean for Grouped Data Rent ($) f 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Total 7 M 429.5 449.5 469.5 489.5 59.5 529.5 549.5 569.5 589.5 69.5 f M 3436. 7641.5 5634. 3916. 3566.5 2118. 199. 2278. 1179. 3657. 34525. 34,525 = = 493.21 7 Ths appromaton dffers by $2.41 from the actual sample mean of $49.8. Slde 44 Varance for Grouped Data For sample data s 2 For populaton data f ( M ) = n 1 2 2 f µ σ = ( M ) N 2 Slde 45 15

Sample Varance for Grouped Data Rent ($) f 42-439 8 44-459 17 46-479 12 48-499 8 5-519 7 52-539 4 54-559 2 56-579 4 58-599 2 6-619 6 Total 7 M 429.5 449.5 469.5 489.5 59.5 529.5 549.5 569.5 589.5 69.5 M - -63.7-43.7-23.7-3.7 16.3 36.3 56.3 76.3 96.3 116.3 (M f (M - ) 2 - ) 2 458.96 32471.71 191.56 32479.59 562.16 6745.97 13.76 11.11 265.36 1857.55 1316.96 5267.86 3168.56 6337.13 582.16 2328.66 9271.76 18543.53 13523.36 8114.18 28234.29 contnued Slde 46 Sample Varance for Grouped Data Sample Varance s 2 = 28,234.29/(7 1) = 3,17.89 Sample Standard Devaton s = 3,17.89 = 54.94 Ths appromaton dffers by only $.2 from the actual standard devaton of $54.74. Slde 47 End of Chapter 3, Part B Slde 48 16