Statistics 511 Supplemental Materials

Similar documents
No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Chapter 6. The Normal Probability Distributions

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

ECON 214 Elements of Statistics for Economists 2016/2017

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Density curves. (James Madison University) February 4, / 20

The Normal Probability Distribution

ECON 214 Elements of Statistics for Economists

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

MAKING SENSE OF DATA Essentials series

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Math 227 Elementary Statistics. Bluman 5 th edition

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

The Normal Distribution

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Section Introduction to Normal Distributions

Lecture 6: Chapter 6

The Normal Distribution

Math 14 Lecture Notes Ch The Normal Approximation to the Binomial Distribution. P (X ) = nc X p X q n X =

Prob and Stats, Nov 7

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

The Binomial Distribution

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

2011 Pearson Education, Inc

Lecture 9. Probability Distributions. Outline. Outline

Chapter ! Bell Shaped

STAT 201 Chapter 6. Distribution

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Standard Normal Calculations

Lecture 9. Probability Distributions

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Data Analysis and Statistical Methods Statistics 651

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

What was in the last lecture?

The Normal Distribution

Introduction to Statistics I

Chapter 4 Continuous Random Variables and Probability Distributions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Uniform Probability Distribution. Continuous Random Variables &

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 4 Continuous Random Variables and Probability Distributions

6.1 Graphs of Normal Probability Distributions:

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Continuous Distributions

Chapter 6: Random Variables

AMS7: WEEK 4. CLASS 3

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

MidTerm 1) Find the following (round off to one decimal place):

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

3.1 Measures of Central Tendency

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Introduction to Business Statistics QM 120 Chapter 6

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 4. The Normal Distribution

Chapter 6: Random Variables

Theoretical Foundations

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

AP * Statistics Review

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Data Analysis and Statistical Methods Statistics 651

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Statistics 431 Spring 2007 P. Shaman. Preliminaries

1/12/2011. Chapter 5: z-scores: Location of Scores and Standardized Distributions. Introduction to z-scores. Introduction to z-scores cont.

11.5: Normal Distributions

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Part V - Chance Variability

Basic Procedure for Histograms

Statistics for Business and Economics

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Discrete Probability Distribution

The Bernoulli distribution

Review. What is the probability of throwing two 6s in a row with a fair die? a) b) c) d) 0.333

Continuous Probability Distributions & Normal Distribution

Math 243 Lecture Notes

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

8.1 Estimation of the Mean and Proportion

The normal distribution is a theoretical model derived mathematically and not empirically.

CHAPTER 6 Random Variables

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Chapter Seven: Confidence Intervals and Sample Size

4.3 Normal distribution

7 THE CENTRAL LIMIT THEOREM

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Normal Model (Part 1)

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

7.1 Graphs of Normal Probability Distributions

Transcription:

Gaussian (or Normal) Random Variable In this section we introduce the Gaussian Random Variable, which is more commonly referred to as the Normal Random Variable. This is a random variable that has a bellshaped curve as its probability density function. This is pictured below. Page 1 of 8

The Normal distribution or a Normal random variable has nothing truly normal about it. That is to say, that there is nothing abnormal about other random variables. The Normal distribution does arise more frequently than other distributions. There are two settings in which it occurs quite frequently. The first of these is biological. The Normal distribution seems to arise when numerous quantities are added together. This often arises in biology when large amounts of genetic material combine in a particular trait, e.g. heights or lengths. The other setting where the Normal is often observed is the psychological setting. As with heights and lengths, this is thought to be the result of many genetic factors combining. For example, IQ measurements are often modeled as having a Normal distribution. More specific examples of Normal RV s include: lengths of newborn male piglets, heights of female peacocks, lengths of 2 inch nails, scores on the Stanford-Binet Psychological test. There are an infinite number of Normal Distributions. Every Normal distribution has the following characteristics. Its range is the entire number line. We can completely identify any Normal distribution by specifying its mean and its standard deviation (or equivalently its variance). Every Normal distribution is symmetric about its mean. Consequently the mean and the median are the same number. The mean tells us where the center of the distribution is and the standard deviation tells us how dispersed or spread out the distribution is. The Normal distribution is used so commonly that we have special notation for the Normal distribution. Notation: X ~ N(5,4) is read X is a RV with a Normal distribution with mean 5 and variance 4. In general, the notation is Y ~ N(µ y, σ y 2 ) is read Y is a Normal random variable with mean µ y and variance σ y 2. As with other continuous RV s the Normal distribution uses area to determine probability. However, the Normal has a special feature that separates it from other distributions. This feature is that for calculating probabilities what is necessary for finding a particular probability is the z-score corresponding to the boundary of the area of interest. The z- score formula is Z = X µ σ That is, if we want to know P(X<7) for a Normal RV X, what we need to know is the z- score for X=7. Recall that the z-score for 7 would be z = X µ = 7 µ which depends σ σ on the values for the mean and the standard deviation. One result of this is that the probability of being 2 standard deviations above the mean is the same whether the mean is 75 or 75,000 and whether the standard deviation is 2 or 200. As a consequence the z- score plays an indispensable role in calculating probabilities from a Normal distribution. Recall that the z-score of a value x is the number of standard deviations the value x is above or below the mean. Page 2 of 8

Because of the role that the z-score plays, we specify a random variable Z to have a Normal distribution with mean 0 and standard deviation 1. Z is often referred to as a Standard Normal random variable. The reason for this specification is that by calculating the z-score all Normal random variables can be transformed into an equivalent Standard Normal RV with mean 0 and standard deviation 1. The overall goal and consequence of this is that we need to use the z-score (and hence the Standard Normal distribution) to find probabilities involving ANY normal distribution. Thus if X is a Normal random variable with mean 85 and standard deviation 5, then P(X>90) = P(Z> ) = P(Z>1.0). This is because we can transform the variable X into the variable Z and by calculating the z-score for X=90, we have the same probability, P(X>90) = P(Z>1.0). This is true for any calculation that we do with Normal random variables. We transform X to Z and use Z to find our probabilities. Calculating Normal Probabilities There are three steps to calculating a Normal probability. 1. Find the z-score for the value of interest. 2. Determine the appropriate formula for calculating the probability. 3. Use that z-score to find the probability using the Standard Normal Table of probabilities. If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for X = 4. The z-score for X = 4 is z = X µ = 4 µ = = - 0.5. Consequently, the value X = σ σ 4 is one-half of a standard deviation below the mean, since z = - 0.5. So P(X>4) = P(Z>-0.5). Example If X is a Normal RV with mean 5 and standard deviation 2, find the z-score for X=8.4. The z-score for X=8.4 is z = X µ = 8.4 µ = = 1.7. Consequently, the value σ σ X = 8.4 is 1.7 standard deviations above the mean, since z = 1.7. So P(X<8.4) = P(Z<1.7). If H ~N(142, 3.5 2 ), find the z-score for H=150. Page 3 of 8

The z-score for H=150 is z = 150 µ = =2.29. Consequently, the value σ H=150 is 2.29 standard deviations above the mean, since z=2.29. So P(H>150) = P(Z>2.29). Having found the z-score we need to determine the appropriate method for calculating the probability of interest. The reason that we do this is the structure of The Cumulative Standard Normal Probability Table, which we will use for calculation. This table has probabilities for values that are greater than specific z-scores. Assume that we are interested in a random variable X with mean 70 and standard deviation 10. P(X<80)=P(Z< ) = P(Z<1.0). This is an example of a probability that is less than a positive z-score. Instead, if we wanted P(X>80) = P(Z>1.0), then this is an example of a probability that is more than a positive z-score. If we wanted to know P(X>60) = P(Z> ) = P(Z>-2.0), this is an example of a greater than probability with a negative z-score. Finally, if we need to calculate P(X<60) = P(X<-2.0), this is an example of a less than probability with a negative z-score. The Cumulative Standard Normal Probability Table contains probabilities such as P(Z > z). Consequently, we need rules to work other probabilities into this format. This is similar to the rules that were used for the binomial and Poisson tables to get probabilities other than P(X r). What we want Calculation we need to perform Example P(Z>z), with z positive P(Z>z) P(Z>1.42) P(Z<z) with z positive P(Z<z) = 1-P(Z>z) P(Z<1.42) = 1-P(Z>1.42) P(Z<z) with z negative P(Z> z) P(Z> - 1.42) P(Z<z) with z negative P(Z<z) = P(Z>-z) * P(Z< -1.42) = P(Z>1.42) *Recall that the negative of a negative is a positive. These rules stem from two basic facts. First the symmetry of the Normal distribution means that the P(Z>z) = P(Z<-z). Since z and z are the same distance from the mean of zero, symmetry says these probabilities must be the same. The other fact that is used is the complement rule, which says that P(Z>z) = 1- P(Z<z). Combining these facts we get the above table of rules. Page 4 of 8

Finally the last step we need is using Table A.1. Suppose we want to find P(Z>1.92). Find the value 1.92 under the column labeled Z. The corresponding entry (under the column Prob > Z) in the table is 0.0274. So P(Z>1.92) = 0.0274. Suppose we want to find P(Z> - 0.68). Find the value 0.68 under the column labeled Z. The corresponding entry (under the column Prob > Z) in the table is 0.7517. So P(Z> - 0.68) = 0.7517. Suppose we want to find P(Z<1.48). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<1.48) = 1 P(Z>1.48). Find the value 1.48 under the column labeled Z. The entry in the table is 0.0694. So P(Z<1.48) = 1-0.0694 = 0.9306. Note that we could have rewritten the problem as P(Z<1.48) = P(Z> - 1.48) using symmetry. Then find the value -1.48 under the column labeled Z. Then read the corresponding probability. So P(Z<1.48) = P(Z> - 1.48) = 0.9306. If we want to find P(Z< 0.85). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<0.85) = 1 P(Z>0.85). Find the value 0.85 under the column labeled Z. The entry in the table is 0.1997. So P(Z<0.85) = 1-0.1977 = 0.8023. Note that we could have rewritten the problem as P(Z<0.85) = P(Z> - 0.85) using symmetry. Then find the value - 0.85 under the column labeled Z. Then read the corresponding probability. So P(Z<0.85) = P(Z> - 0.85) = 0.8023. If we want to find P(Z<2.11). The first step is to rewrite the problem in a form that allows the use of the normal tables. P(Z<2.11) = 1 P(Z>2.11). Find the value 2.11 under the column labeled Z. The entry in the table is 0.0174. So P(Z<1.48) = 1-0.0174 =0.9826. Note that we could have rewritten the problem as P(Z<2.11) = P(Z> - 2.11) using symmetry. Then find the value - 2.11 under the column labeled Z. Then read the corresponding probability. So P(Z<2.11) = P(Z> - 2.11) = 0.9826. The following examples combine all these steps. Example Suppose that X is a normal random variable with mean 100 and standard deviation 7.5 Find P(X < 110). P(X<110) = P(Z< = 1-0.0918 = 0.9082. ) = P(Z<1.33) = (by complementary events) =1- P(Z>1.33) Page 5 of 8

Find P(X > 120) P(X>120) = P(Z> in the table. ) = P(Z>2.67) = 0.0038. We can look up P(Z>2.67) directly Find P(X > 93) P(X>93) = P(Z> ) = P(Z> - 0.93) = 0.8238. We can look up P(Z> - 0.93) directly in the table. Find P(X < 84) P(X<84) = P(Z< Normal distribution) ) = P(Z < - 2.13) =P(Z>2.13) = 0.0166 (by symmetry of the or P(X<84) = P(Z< - 2.13) = 1-P(Z> -2.13) =1-0.9834 = 0.0166 TIP: Since Table A.1 uses only two decimal places for z-scores, round all z-scores to two decimal places when using this table. TIP: It is common to refer to a random variable by the name of the random variable or by the distribution. They are interchangeable. Since any RV is defined by its distribution, this usage is appropriate, though it often confuses people the first time they see or hear this. TIP: It is often helpful when doing calculations with Normal probabilities to draw a picture to get an idea about the quality of your final answer. If it conflicts with the picture then you may need to reconsider your calculations. The first step in this is to draw a bell-shaped curve. Draw a vertical line down the center and label it with the value of the mean. Over 99% of the Normal distribution is within 3 standard deviations of the mean. So go to the right edge of you curve and label it with the value of the mean plus three times the standard deviation. Go to the left edge and label it with the value of the mean minus three times the standard deviation. Then shade the area for the probability that you are interested in. Page 6 of 8

Suppose X is a Normal random variable with mean 120 and standard deviation 7. Find P(X>125) 99 120 141 We use 120 for the center since it is the mean. The values 141 and 99 are 120 + 3*7 and 120-3*7, which are 3 standard deviations above and below the mean, respectively. P(X>125) = P(Z> ) = P(Z>0.71) = 0.2389. Given the accuracy of the picture it seems reasonable that the probability should be around 24%. We would have been nervous had the answer we calculated been more than 50 % or less than 2%. Drawing a picture is a nice check for gross errors in calculation. Percentiles of the Normal distribution Using the Z-table in reverse As an example the 80 th percentile is the point in the distribution where 80% of the data or 80% of the probability (area) is below that point (and consequently 20% of the probability is above that point). We often want to calculate percentiles for a specific distribution or set of data. For example if I want to build a cage that 98% of frogs will be comfortable in, I need to know the 98 th percentile of frog sizes. A college admissions officer might only want to accept students who are in the top 20% of all scores on some standardized test. In that case the admissions officers would need to know the 80 th percentile of scores on that test. They would accept only those students whose test scores were above the 80 th percentile. To find percentiles for the Normal distribution, we reverse the process from the previous section. In the previous section we had a value and we were looking for a probability or a percentage. For example, the previous section we wanted P(X>182) = c and we found the probability c. In this section, we ll be given a probability like 0.7500 and have to find a value k such that P(X>k) = 0.7500. Here, we have the percentage and we want to find Page 7 of 8

the value that would give us that percentage. Consequently, we ll reverse the steps we took in the previous section. Finding the j*100 th percentile, k, of a Normal random variable X. 1. In the body of Table A.1 find probability j (or the value closest to j). 2. Find the z-score for j, call it z k. 3. Using the formula for the z-score, z k = k µ, solve for k. σ Suppose that we want to find the 75 th percentile of a Normal distribution with mean 430 and standard deviation 22. Let X be a Normal RV with mean 430 and standard deviation 22. Then we want to find a value k, such that P(X<k) = 0.7500 (or equivalently P(X>k) = 0.2500). Likewise there exists a z-score for k, call it z k, such that P(Z>z k ) = 0.2500. Now we can find z k by going into the body of Table A.1 and finding the probability 0.2500. Inside the body of the table we find the closest percentage to 0.2500. That percentage is 0.2514. This probability corresponds to a z-score of 0.67. This is the z- score for k, but we need to find the actual value of k. Recall z k = k µ σ. So 0.67 =. Solving for k gives us k = 430 + 0.67*(22) = 444.74. So the 75 th percentile of a Normal distribution with mean 430 and standard deviation 22 is approximately 444.74. For a Normal Random variable X ~N(45, 36) find the 92 nd percentile 1. The 92 nd percentile has 0.9200 of the area below k and 0.0800 of the area above k. In Table A.1, the closest value to 0.0800 is 0.0793. 2. z k = 1.41 3. 1.41 =, then k = 45 + 1.41*6 = 53.46. So the 92 nd percentile of a N(45, 36) distribution is 53.46. For a Normal RV Y ~ N(76, 9), find the 97 th percentile 1. The 97 th percentile has 0.9700 of the area below k and 0.0300 of the area above k. In Table A.1 the closest value to 0.0300 is 0.0301. 2. z k = 1.88 3. 1.88 =, so k = 76 + 1.88*3 = 81.64 So the 97 th percentile of a N(76, 9) distribution is 81.6. Page 8 of 8