Assume the average annual rainfall for in Portland is 36 inches per year with a standard deviation of 9 inches. Also assume that the average wind speed in Chicago is 10 mph with a standard deviation of 2 mph. Suppose that one year Portland s annual rainfall was only 24 inches and Chicago s wind speed was 13 mph. Which of these events was more extraordinary? Notice that these two events we want to compare are quite different. (To start, one is measured in inches and the other is measured in mph!) To put each on level that we can compare them, we ll want to use the standard deviation as our ruler. If we determine how far each event is from its mean, measured in standard deviations, we ll have something common to compare. The standard deviation can be used as a ruler to measure statistical difference from the mean. When we want to compare values (from the same data set or different data sets), we can use the z-score. The z-score is also called a standardized value. It measures the distance of a value from the mean in standard deviations and is given by: z-score: z = y ȳ s 1
Example 1. Continue to assume that the average annual rainfall in Portland is 36 inches per year with a standard deviation of 9 inches and that the average annual wind speed in Chicago is 10 mph with a standard deviation of 2 mph. (a) Calculate the z-scores for a year when the annual rainfall in Portland was 40 inches and the average wind speed in Chicago was 14 mph. Which was more unusual? (b) If Portland has a z-score of -2 one year, will the amount of rain that year be above or below the mean? Calculate the amount of rain that year. Group Work 1. Continue to assume that the average annual rainfall in Portland is 36 inches per year with a standard deviation of 9 inches and that the average annual wind speed in Chicago is 10 mph with a standard deviation of 2 mph. (a) Calculate the z-scores for a year when the annual rainfall in Portland was 42 inches and the average wind speed in Chicago was 9 mph. Which was more unusual? (b) If Chicago has a z-score of 2 one year, will the wind speed that year be above or below the mean? Calculate the wind speed that year. Instructor: A.E.Cary Page 2 of 14
Example 2. You ll notice that the calculations for both standard deviation and z-scores involve first calculating a value s difference from its mean. In this example, we ll explore the effect this has on the histogram and box plot. Answer the following using the histograms and box plots shown in Figure 1 and Figure 2. (a) Does the shape of the distribution change? Figure 1. Hen Weights for Various Breeds Frequency 5 - b = 6.5 4 - (b) Does the range change? 3-2 - 1 - (c) Do Q1 and Q3 change? 4 5 6 7 8 9 10 11 12 Weight (in lbs) (d) Does the IQR change? Figure 2. Hen Weight Above 6.5 Pounds for Various Breeds Frequency 5 - b = 0 (e) Does the standard deviation change? 4-3 - 2-1 - (f) Does the median change? 4 5 6 Weight Above 6.5 Pounds Instructor: A.E.Cary Page 3 of 14
Example 3. You ll also notice that the after the z-score adjust for the mean, we divide by the number of standard deviations. In this example, we ll explore the effect that scaling has on the histogram and box plot. Answer the following using the histograms and box plots shown in Figure 3 and Figure 4. (a) Does the shape of the distribution change? Figure 3. Hen Weights for Various Breeds Frequency 5 - (b) Does the range change? 4-3 - 2 - (c) Do Q1 and Q3 change? 1-2 3 4 5 6 7 8 9 10 (d) Does the IQR change? Weight (in lbs) Figure 4. Hen Weights for Various Breeds Frequency (e) Does the standard deviation change? 5-4 - 3 - (f) HOW does each value change? 2-1 - 2 3 4 5 6 7 8 9 10 Weight (in kg) Table 1. Hen Weights n mean s min Q1 median Q3 max in lbs 12 6.4583 1.852 4 4.5 6.5 8 9.5 in kg 12 2.9356 0.8418 1.8182 2.0455 2.9545 3.6364 4.3182 Instructor: A.E.Cary Page 4 of 14
Standardizing into z-scores: does not change the shape of the distribution of a variable changes the center by making the mean 0 changes the spread by making the standard deviation 1 The Normal Distribution Model is one that models the continuous distribution of populations. It s identified as a bell-shaped curve. Normal models are only appropriate for distributions that are symmetric and unimodal. The standard Normal model has mean 0 and standard deviation 1. We denote this with N(0, 1). In general, a Normal model with mean µ and standard deviation σ is denoted with N(µ, σ). For a normal model, the z-score is given by: z = y µ σ Example 4. For the Normal model, it happens that 68% of the values fall within 1 standard deviation of the mean, 95% fall within 2 standard deviations of the mean, and 99.7% fall within 3 standard deviations of the mean. Label the bell curve below to show these key features. Instructor: A.E.Cary Page 5 of 14
Example 5. Let s return to our Portland rainfall example. We ll need to assume that this distribution is normal (a somewhat suspect assumption). Recall that the average annual rainfall is 36 inches and the standard deviation is 12 inches. (a) What percent of the time is the annual rainfall between 27 inches and 45 inches? (b) What percent of the time is the annual rainfall 36 inches or more? (c) What percent of the time is the annual rainfall 54 inches or more? (d) What percent of the time is the annual rainfall 27 inches or less? Instructor: A.E.Cary Page 6 of 14
(e) Portland should receive less than or equal to what amount of rain only 0.15% of the time? (f) Portland should receive more than or equal to what amount of rain 97.5% of the time? (g) What percent of the time is the annual rainfall 63 inches or less? (h) What percent of the time is the annual rainfall between 45 inches and 54 inches? Instructor: A.E.Cary Page 7 of 14
Recall that the z-score has mean 0 and standard deviation 1. The Normal distribution appears as such: 68% 95% 3 2 1 99.7% 0 1 2 3 z What happens if a value is 1.5 standard deviations below the mean (or 2.75 sd below, or 3.8 sd above, etc.)? Can we determine the associated percentage? The answer happens to be yes. We ll need some sort of computer algebra system to compute this percentage based on that particular z-score. To determine a percentage associated with a given z-score, we use the Excel command NORM.S.DIST(z,1). To determine a z-score associated with a given percentage, we use the Excel command NORM.S.INV(%). Group Work 2. For each problem, draw a picture to support your answer. (a) Tell Excel to compute: NORM.S.DIST(0,1)= (b) Tell Excel to compute: NORM.S.DIST(-2,1)= (c) Tell Excel to compute: NORM.S.DIST(1,1)= (d) Tell Excel to compute: NORM.S.DIST(-1,1)= Instructor: A.E.Cary Page 8 of 14
(e) What percentage of z-scores are between z = 0 and z = 1.3? (f) What percentage of z-scores are between z = 0.4 and z = 1.3? (g) What percentage of z-scores are above z = 1.8? (h) What percentage of z-scores are below z = 2.2? Instructor: A.E.Cary Page 9 of 14
Group Work 3. For each problem, draw a picture to support your answer. (a) Tell Excel to compute: NORM.S.INV(0.5)= (b) Tell Excel to compute: NORM.S.INV(0.12)= (c) Tell Excel to compute: NORM.S.INV(0.8)= (d) Tell Excel to compute: NORM.S.INV(0)= (e) Tell Excel to compute: NORM.S.INV(1)= Instructor: A.E.Cary Page 10 of 14
(f) What z-score is associated with the 90 th percentile? (g) What z-score is associated with the 1 st percentile? (h) What z-score is associated with the 75 th percentile? Instructor: A.E.Cary Page 11 of 14
Example 6. Assume that sections of MTH 111 at PCC have a pass rate of 0.62 with a standard deviation of 0.05. Also assume that they have a normal distribution. We can categorize this as a N(0.62, 0.05) model. (a) What percentage of sections have success rates of 0.71 or higher? (b) What percentage of sections have success rates of 0.46 or lower? Instructor: A.E.Cary Page 12 of 14
Example 7. Tillamook 2 pound (i.e. 32 ounce) blocks of cheese are made on equipment that is not quite perfect. Assume that the machine causes a standard deviation of 0.4 ounces. (a) To keep their customers happy, they set their machine to produce 33 ounce blocks of cheese. We can categorize this Normal distribution model with N(33, 0.4). What percentage of cheese blocks produced contain less than 32 ounces of cheese? (b) The company wants to save a little money. They can do so by setting the mean slightly lower (so that each block has slightly less cheese). They want no more than 3% of the blocks of cheese to be underweight though. Where should they re-set the mean? Instructor: A.E.Cary Page 13 of 14
(c) Now let s assume that Tillamook does have some control over the standard deviation of their machine. They want to set the mean to be 32.5 ounces and have 99% of their cheese blocks produced weigh at least 32 ounces. What s the maximum standard deviation the machine can have? Instructor: A.E.Cary Page 14 of 14