Statistics 511 Supplemental Materials

Gaussian (or Normal) Random Variable In this section we introduce the Gaussian Random Variable, which is more commonly referred to as the Normal Random Variable. This is a random variable that has a bellshaped curve as its probability density function. This is pictured below. Page 1 of 8

The Normal distribution or a Normal random variable has nothing truly normal about it. That is to say, that there is nothing abnormal about other random variables. The Normal distribution does arise more frequently than other distributions. There are two settings in which it occurs quite frequently. The first of these is biological. The Normal distribution seems to arise when numerous quantities are added together. This often arises in biology when large amounts of genetic material combine in a particular trait, e.g. heights or lengths. The other setting where the Normal is often observed is the psychological setting. As with heights and lengths, this is thought to be the result of many genetic factors combining. For example, IQ measurements are often modeled as having a Normal distribution. More specific examples of Normal RV s include: lengths of newborn male piglets, heights of female peacocks, lengths of 2 inch nails, scores on the Stanford-Binet Psychological test. There are an infinite number of Normal Distributions. Every Normal distribution has the following characteristics. Its range is the entire number line. We can completely identify any Normal distribution by specifying its mean and its standard deviation (or equivalently its variance). Every Normal distribution is symmetric about its mean. Consequently the mean and the median are the same number. The mean tells us where the center of the distribution is and the standard deviation tells us how dispersed or spread out the distribution is. The Normal distribution is used so commonly that we have special notation for the Normal distribution. Notation: X ~ N(5,4) is read X is a RV with a Normal distribution with mean 5 and variance 4. In general, the notation is Y ~ N(µ y, σ y 2 ) is read Y is a Normal random variable with mean µ y and variance σ y 2. As with other continuous RV s the Normal distribution uses area to determine probability. However, the Normal has a special feature that separates it from other distributions. This feature is that for calculating probabilities what is necessary for finding a particular probability is the z-score corresponding to the boundary of the area of interest. The z- score formula is Z = X µ σ That is, if we want to know P(X<7) for a Normal RV X, what we need to know is the z- score for X=7. Recall that the z-score for 7 would be z = X µ = 7 µ which depends σ σ on the values for the mean and the standard deviation. One result of this is that the probability of being 2 standard deviations above the mean is the same whether the mean is 75 or 75,000 and whether the standard deviation is 2 or 200. As a consequence the z- score plays an indispensable role in calculating probabilities from a Normal distribution. Recall that the z-score of a value x is the number of standard deviations the value x is above or below the mean. Page 2 of 8

Because of the role that the z-score plays, we specify a random variable Z to have a Normal distribution with mean 0 and standard deviation 1. Z is often referred to as a Standard Normal random variable. The reason for this specification is that by calculating the z-score all Normal random variables can be transformed into an equivalent Standard Normal RV with mean 0 and standard deviation 1. The overall goal and consequence of this is that we need to use the z-score (and hence the Standard Normal distribution) to find probabilities involving ANY normal distribution. Thus if X is a Normal random variable with mean 85 and standard deviation 5, then P(X>90) = P(Z> ) = P(Z>1.0). This is because we can transform the variable X into the variable Z and by calculating the z-score for X=90, we have the same probability, P(X>90) = P(Z>1.0). This is true for any calculation that we do with Normal random variables. We transform X to Z and use Z to find our probabilities. Calculating Normal Probabilities There are three steps to calculating a Normal probability. 1. Find the z-score for the value of interest. 2. Determine the appropriate formula for calculating the probability. 3. Use that z-score to find the probability using the Standard Normal Table of probabilities. If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for X = 4. The z-score for X = 4 is z = X µ = 4 µ = = - 0.5. Consequently, the value X = σ σ 4 is one-half of a standard deviation below the mean, since z = - 0.5. So P(X>4) = P(Z>-0.5). Example If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for X=8.4. The z-score for X=8.4 is z = X µ = 8.4 µ = = 1.7. Consequently, the value σ σ X = 8.4 is 1.7 standard deviations above the mean, since z = 1.7. So P(X<8.4) = P(Z<1.7). If H ~N(142, 3.5 2 ), find the z-score for H=150. Page 3 of 8

The z-score for H=150 is z = 150 µ = =2.29. Consequently, the value σ H=150 is 2.29 standard deviations above the mean, since z=2.29. So P(H>150) = P(Z>2.29). Having found the z-score we need to determine the appropriate method for calculating the probability of interest. The reason that we do this is the structure of The Cumulative Standard Normal Probability Table, which we will use for calculation. This table has probabilities for values that are greater than specific z-scores. Assume that we are interested in a random variable X with mean 70 and standard deviation 10. P(X<80)=P(Z< ) = P(Z<1.0). This is an example of a probability that is less than a positive z-score. Instead, if we wanted P(X>80) = P(Z>1.0), then this is an example of a probability that is more than a positive z-score. If we wanted to know P(X>60) = P(Z> ) = P(Z>-2.0), this is an example of a greater than probability with a negative z-score. Finally, if we need to calculate P(X<60) = P(X<-2.0), this is an example of a less than probability with a negative z-score. The Cumulative Standard Normal Probability Table contains probabilities such as P(Z > z). Consequently, we need rules to work other probabilities into this format. This is similar to the rules that were used for the binomial and Poisson tables to get probabilities other than P(X r). What we want Calculation we need to perform Example P(Z>z), with z positive P(Z>z) P(Z>1.42) P(Z<z) with z positive P(Z<z) = 1-P(Z>z) P(Z<1.42) = 1-P(Z>1.42) P(Z<z) with z negative P(Z> z) P(Z> - 1.42) P(Z<z) with z negative P(Z<z) = P(Z>-z) * P(Z< -1.42) = P(Z>1.42) *Recall that the negative of a negative is a positive. These rules stem from two basic facts. First the symmetry of the Normal distribution means that the P(Z>z) = P(Z<-z). Since z and z are the same distance from the mean of zero, symmetry says these probabilities must be the same. The other fact that is used is the complement rule, which says that P(Z>z) = 1- P(Z<z). Combining these facts we get the above table of rules. Page 4 of 8

Finally the last step we need is using Table A.1. Suppose we want to find P(Z>1.92). Find the value 1.92 under the column labeled Z. The corresponding entry (under the column Prob > Z) in the table is 0.0274. So P(Z>1.92) = 0.0274. Suppose we want to find P(Z> - 0.68). Find the value 0.68 under the column labeled Z. The corresponding entry (under the column Prob > Z) in the table is 0.7517. So P(Z> - 0.68) = 0.7517. Suppose we want to find P(Z<1.48). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<1.48) = 1 P(Z>1.48). Find the value 1.48 under the column labeled Z. The entry in the table is 0.0694. So P(Z<1.48) = 1-0.0694 = 0.9306. Note that we could have rewritten the problem as P(Z<1.48) = P(Z> - 1.48) using symmetry. Then find the value -1.48 under the column labeled Z. Then read the corresponding probability. So P(Z<1.48) = P(Z> - 1.48) = 0.9306. If we want to find P(Z< 0.85). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<0.85) = 1 P(Z>0.85). Find the value 0.85 under the column labeled Z. The entry in the table is 0.1997. So P(Z<0.85) = 1-0.1977 = 0.8023. Note that we could have rewritten the problem as P(Z<0.85) = P(Z> - 0.85) using symmetry. Then find the value - 0.85 under the column labeled Z. Then read the corresponding probability. So P(Z<0.85) = P(Z> - 0.85) = 0.8023. If we want to find P(Z<2.11). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<2.11) = 1 P(Z>2.11). Find the value 2.11 under the column labeled Z. The entry in the table is 0.0174. So P(Z<1.48) = 1-0.0174 =0.9826. Note that we could have rewritten the problem as P(Z<2.11) = P(Z> - 2.11) using symmetry. Then find the value - 2.11 under the column labeled Z. Then read the corresponding probability. So P(Z<2.11) = P(Z> - 2.11) = 0.9826. The following examples combine all these steps. Example Suppose that X is a normal random variable with mean 100 and standard deviation 7.5 Find P(X < 110). P(X<110) = P(Z< = 1-0.0918 = 0.9082. ) = P(Z<1.33) = (by complementary events) =1- P(Z>1.33) Page 5 of 8

Find P(X > 120) P(X>120) = P(Z> in the table. ) = P(Z>2.67) = 0.0038. We can look up P(Z>2.67) directly Find P(X > 93) P(X>93) = P(Z> ) = P(Z> - 0.93) = 0.8238. We can look up P(Z> - 0.93) directly in the table. Find P(X < 84) P(X<84) = P(Z< Normal distribution) ) = P(Z < - 2.13) =P(Z>2.13) = 0.0166 (by symmetry of the or P(X<84) = P(Z< - 2.13) = 1-P(Z> -2.13) =1-0.9834 = 0.0166 TIP: Since Table A.1 uses only two decimal places for z-scores, round all z-scores to two decimal places when using this table. TIP: It is common to refer to a random variable by the name of the random variable or by the distribution. They are interchangeable. Since any RV is defined by its distribution, this usage is appropriate, though it often confuses people the first time they see or hear this. TIP: It is often helpful when doing calculations with Normal probabilities to draw a picture to get an idea about the quality of your final answer. If it conflicts with the picture then you may need to reconsider your calculations. The first step in this is to draw a bell-shaped curve. Draw a vertical line down the center and label it with the value of the mean. Over 99% of the Normal distribution is within 3 standard deviations of the mean. So go to the right edge of you curve and label it with the value of the mean plus three times the standard deviation. Go to the left edge and label it with the value of the mean minus three times the standard deviation. Then shade the area for the probability that you are interested in. Page 6 of 8

Suppose X is a Normal random variable with mean 120 and standard deviation 7. Find P(X>125) 99 120 141 We use 120 for the center since it is the mean. The values 141 and 99 are 120 + 3*7 and 120-3*7, which are 3 standard deviations above and below the mean, respectively. P(X>125) = P(Z> ) = P(Z>0.71) = 0.2389. Given the accuracy of the picture it seems reasonable that the probability should be around 24%. We would have been nervous had the answer we calculated been more than 50 % or less than 2%. Drawing a picture is a nice check for gross errors in calculation. Percentiles of the Normal distribution Using the Z-table in reverse As an example the 80 th percentile is the point in the distribution where 80% of the data or 80% of the probability (area) is below that point (and consequently 20% of the probability is above that point). We often want to calculate percentiles for a specific distribution or set of data. For example if I want to build a cage that 98% of frogs will be comfortable in, I need to know the 98 th percentile of frog sizes. A college admissions officer might only want to accept students who are in the top 20% of all scores on some standardized test. In that case the admissions officers would need to know the 80 th percentile of scores on that test. They would accept only those students whose test scores were above the 80 th percentile. To find percentiles for the Normal distribution, we reverse the process from the previous section. In the previous section we had a value and we were looking for a probability or a percentage. For example, the previous section we wanted P(X>182) = c and we found the probability c. In this section, we ll be given a probability like 0.7500 and have to find a value k such that P(X>k) = 0.7500. Here, we have the percentage and we want to find Page 7 of 8

the value that would give us that percentage. Consequently, we ll reverse the steps we took in the previous section. Finding the j*100 th percentile, k, of a Normal random variable X. 1. In the body of Table A.1 find probability j (or the value closest to j). 2. Find the z-score for j, call it z k. 3. Using the formula for the z-score, z k = k µ, solve for k. σ Suppose that we want to find the 75 th percentile of a Normal distribution with mean 430 and standard deviation 22. Let X be a Normal RV with mean 430 and standard deviation 22. Then we want to find a value k, such that P(X<k) = 0.7500 (or equivalently P(X>k) = 0.2500). Likewise there exists a z-score for k, call it z k, such that P(Z>z k ) = 0.2500. Now we can find z k by going into the body of Table A.1 and finding the probability 0.2500. Inside the body of the table we find the closest percentage to 0.2500. That percentage is 0.2514. This probability corresponds to a z-score of 0.67. This is the z- score for k, but we need to find the actual value of k. Recall z k = k µ σ. So 0.67 =. Solving for k gives us k = 430 + 0.67*(22) = 444.74. So the 75 th percentile of a Normal distribution with mean 430 and standard deviation 22 is approximately 444.74. For a Normal Random variable X ~N(45, 36) find the 92 nd percentile 1. The 92 nd percentile has 0.9200 of the area below k and 0.0800 of the area above k. In Table A.1, the closest value to 0.0800 is 0.0793. 2. z k = 1.41 3. 1.41 =, then k = 45 + 1.41*6 = 53.46. So the 92 nd percentile of a N(45, 36) distribution is 53.46. For a Normal RV Y ~ N(76, 9), find the 97 th percentile 1. The 97 th percentile has 0.9700 of the area below k and 0.0300 of the area above k. In Table A.1 the closest value to 0.0300 is 0.0301. 2. z k = 1.88 3. 1.88 =, so k = 76 + 1.88*3 = 81.64 So the 97 th percentile of a N(76, 9) distribution is 81.6. Page 8 of 8