Binomial Distribution. Normal Approximation to the Binomial

Binomial Distribution Normal Approximation to the Binomial /29

Homework Read Sec 6-6. Discussion Question pg 337 Do Ex 6-6 -4 2 /29

Objectives Objective: Use the normal approximation to calculate 3 /29

Binomial When there can be only two outcomes for an event, or when we can reduce the outcomes to two possibilities, we have a binomial experiment or Bernoulli trials. Essentially the two outcomes are: success or failure H or T M or F Acceptable or Unacceptable or < 4 /29

Binomial A binomial experiment has 4 conditions There is a finite number of repeated trials. The results of each trial is only success or failure. The trials must be independent. The probability for success must remain constant for every trial. 5 /29

Notation To keep notation clear, we use the following conventions. P(S) = probability of success = p P(F) = probability of failure = q P(S) = - P(F) or p = q P(F) = - P(S) or q = - p X = number of successes in n trials n = number of attempts P(X=x) is the probability of exactly X number of successes out of n trials. 6 /29

Binomial We found P(X), the probability of exactly x number of successes in n trials, with: n! X n X P(X = x) = p q (n X)! X! Using the calculator to find the probability of exactly x successes: P(X = x) = Binompdf(n,p,x) (Binomial probability density function) For finding the probability of x or fewer successes (at most x successes) P(X x) = Binomcdf(n,p,x) (Binomial cumulative density function) 7 /29

Binomial Distribution For example, suppose we roll a die four times. Success is rolling a 6. In the 4 rolls we could roll 4 6s, 3 6s, 2 6s, 6, or 0 6s. What would be the probability of rolling exactly one 6? P(x = ) = binomialpdf(4, /6, ) =.38580 The probability of exactly two 6s = P(X = 2) = binomialpdf(4, /6, 2) =.57 Listing all possible number of successes with the attendant probabilities creates a probability distribution known as the Binomial Distribution. 8 /29

Display If we list all possible outcomes in a table, and graph the outcomes in a bar chart, you will notice that the binomial distribution is NOT symmetric. 4 Rolls of a Die P(x) 0 0.4823 0.3858 2 0.57 3 0.054 4 0.0008 0.5 0.375 P(X) x 0.25 0.25 0 0 2 3 4 Number of 6s rolled 9 /29

Statistics 4 Rolls of a Die µ= np = 4(/6) = 2/3 5 σ = npq = 4 i i.7454 6 6 x 0 2 3 4 P(x) 0.4823 0.3858 0.57 0.054 0.0008 0.5 0.375 P(X) This distribution has a mean and standard deviation. 0.25 0.25 0 0 2 3 4 Number of 6s rolled That makes perfect sense, the expected number of 6 s in 4 attempts would be 2/3, with a standard deviation of about 3/4. 0 /29

Probability Distribution Now suppose we rolled that die 00 times. The table and bar chart would have 00 entries making them cumbersome to display. We can still calculate the mean and standard deviation of the binomial distribution. µ= np = 00(/6) = 50/3 = 6.6667 5 σ = npq = 00 i i 9.287 6 6 We do have another option for displaying the distribution if we meet a couple of conditions.. /29

Histograms Let us look at a few histograms for the binomial probability distribution for the number of 6s in 4 rolls. 0 µ = np =.6667 6 σ = npq.785 0 attempts 50 µ = np = 8.3333 6 σ = npq 2.6352 50 attempts 20 µ = np = 3.3333 6 5 µ = np = = 2.5 6 σ = npq.4434 5 attempts 00 µ = np = 6.6667 6 σ = npq 3.7268 00 attempts σ = npq.6667 20 attempts Notice a trend? As the number of attempts, n, increases, the distribution approaches a unimodal and symmetric distribution. 2 /29

Normal Approximation Hopefully you noticed the distribution of a binomial becomes very unimodal and symmetric when the number of attempts becomes sufficiently large. If we show the normal curve with the distribution of 00 attempts you can see it very closely approximates the binomial. As p approaches 0.5 and/or as n gets large, the binomial distribution approaches normal. When np and nq are 0 (your book suggests 5) the normal distribution can be used as an approximation of the binomial distribution. 3 /29

Normal Approximation The normal approximation must be used with caution. Our example of 4 rolls and recording 6: p = /6, q = 5/6 np = 4 x /6 =.67, nq = 4 x 5/6 = 3.33 We fail on both counts, and cannot use the normal approximation to the binomial To use the normal approximation we would need at least 30 trials np = 30 x /6 = 5, nq = 30 x 5/6 = 25 An even better approximation would be for 60 trials np = 60 x /6 = 0, nq = 60 x 5/6 = 50 4 /29

Applet for Binomial Applet http://www.stat.berkeley.edu/~stark/java/html 5 /29

Continuity There is another problem with the normal approximation to the binomial. Remember that the binomial has two discrete results, success and failure. The normal curve is continuous. Thus we must use a correction for continuity. That correction is to use the interval that is represented by the data value. For our example of rolling six on the die, we use the values.5 to.5, p() = p(.5 < x <.5) 2 3 4 Rolling 3 sixes on the die, use the values 2.5 to 3.5, p(3) = p(2.5 < x < 3.5) 6 /29

Interval If we want P(x > ) we would find P(x >.5) P(x < 4) use P(x < 3.5) P(x 4) use P(x < 4.5) P(x > 4) use P(x > 4.5) P(x 4) use P(x 3.5) 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 2 6 6 6 6 3 4 5 6 normalcdf(-0^99, 3.5, µ, σ) normalcdf(-0^99, 4.5, µ, σ) normalcdf(4.5, 0^99, µ, σ) normalcdf(3.5, 0^99, µ, σ) Think in terms of the interval representing the value of interest. > or < do not include the number and do not include the interval. or do include the number and thus include the interval. 7 /29

Continuity Correction For the distribution of the number of 6s in 50 attempts we can use the calculator to find the probability two ways. On your calculator find the following values using both the binomial and the normal approximation: P(x < 7) P(x < 7) = Binomcdf(50, /6, 6) =.2506 50 µ = np = 8.3333 6 σ = npq 2.6352 P(x < 6.5) = Normcdf(-0^99, 6.5, 8.3333, 2.6352) =.2433 P(x > 2) 50 attempts P(x > 2) = Binomcdf(50, /6, 2) =.0627 P(x > 2.5) = Normcdf(2.5, 0^99, 8.3333, 2.6352) =.0569 Keep in mind, it is called the Normal Approximation for a reason. 8 /29

Continuity Correction For your distribution of the number of 6s in 50 attempts we can use the calculator to find the probability two ways. On your calculator find the following values using the binomial and the normal approximation: P(x 7) P(x 7) = Binomcdf(50, /6, 7) =.39 50 µ = np = 8.3333 6 σ = npq 2.6352 P(x 7.5) = Normcdf(-0^99, 7.5, 8.3333,2.6352) =.3759 P(x 2) 50 attempts P(x 2) = Binomcdf(50, /6, ) =.73 P(x.5) = Normcdf(.5, 0^99, 8.3333,2.6352) =.47 9 /29

Z scores We can now use z-scores to find probabilities of success or failure. st - make sure you can use the normal approximation, np > 0, nq > 0 2nd - find µ = np, σ = npq 3rd - Write the probability using the appropriate interval correction factor 4th - Find the z scores for the value(s) of the interval Finally - find the probability 20 /29

Using Z scores For the distribution of number of 6s in 50 rolls we can use z-scores with the normal approximation to find the probabilities. When calculating z-scores be certain to use the values determined by the continuity correction. P(x < 7) = Binomcdf(50, /6, 6) =.2506 z= 6.5 8.3333 2.6352 0.6957 P(x < 6.5) = Normcdf(-9, -0.6957, 0, )=.2433 P(x > 2) = Binomcdf(50, /6, 2) =.0627 z= 2.5 8.3333 2.6352 50 µ = np = 8.3333 6 σ = npq 2.6352 50 attempts.582 P(x > 2.5) = Normcdf(.582, 99, 0, ) =.0569 2 /29

Example A doctor s office claims that it keeps appointments on time 65% of the time. (Uh huh) If 20 patients are seen in one week, what is the probability that at least 65 patients will be seen on time. n = 20, p =.65, q =.35 np = 20(.65) = 78, nq = 20(.35) = 42, normal approximation ok. µ = np = 20(.65) = 78 σ = npq = 20(.65)(.35) = 5.22494-3 -2-62.34 67.56 72.78 0 78 2 3 83.22 88.44 93.66 22 /29

Don t fergit the Picher A doctor s office claims that it keeps appointments on time 65% of the time. (Uh huh) If 20 patients are seen in one week, what is the probability that at least 65 patients will be seen on time. P(x 65) = Binomcdf(20,.65, 64) =.9945 µ = np = 20(.65) = 78 σ = npq = 20(.65)(.35) = 5.22494 64.5-2.59-3 -2-62.34 67.56 72.78. 995 0 78 2 3 83.22 88.44 93.66 P(x 65) = P(x > 64.5) = normalcdf(64.5, 999, 78, 5.22494) =.995 z= 64.5 78 5.22 2.59 P(z > -2.59) = normalcdf(-2.59, 99, 0, ) =.9952 23 /29

Binomial Probability P(x 65) = P(x > 64.5) = P(z > -2.59) = normalcdf(-2.59, 99, 0, ) =.9952 Remember, the binomialcdf(n, p, x) function finds the probability of at most x successes of probability p in n trials. Using the calculator we found the probability of p(x 64) of at most 64 on time (complement of 65 or more on time) is.0055. Subtracting from gives us.9945; very close to the normal approximation of.9952. P(x 65) = Binomcdf(20,.65, 64) = -.0055 =.9945 Using the calculator we can also find the appropriate probability by using failures of p(x 55) of at most 55 late is.9945. P(x 55) = Binomcdf(20,.35, 55) =.9945 24 /29

Example In 202 80% of households in Great Britain had an internet connection. How many homes should we survey if we want to use the normal approximation to find probabilities? np and nq 0.2n 0 n 50 We need a sample of at least 50 homes. 25 /29

Example In 202 80% of households in Great Britain had an internet connection. If we were to survey 50 homes, what is the probability that at least 30 homes had internet connection. p( x 30) = p(x 29) = binomialcdf(50,.8, 29) =.9776 =.0224 µ = np = 50(.8) = 20 σ = npq = 50(.8)(.2) = 4.8990.9392. z= 29.5 20 4.899.9392 0262-3 -2-0 2 3 05.3 0.2 5. 20 24.9 29.8 34.7 P(x 29.5) = P(z >.9392) = normalcdf(.9392, 0^99, 0, ) =.0262 The probability of at least 30 homes with internet is about.0262 26 /29

Example In 202 80% of households in Great Britain had an internet connection. If we were to survey 50 homes, what is the probability that no more than 30 homes had internet connection? µ = np = 50(.8) = 20 σ = npq = 50(.8)(.2) = 4.8990 No more than 30 is 30: X 30.5 (interval) normalcdf(-0^99, 30.5, 20, 4.89898) =.9840 z= 30.5 20 4.899 2.433 normalcdf(-9, 2.433, 0, ) =.9840. 9840 2.433-3 -2-0 2 3 05.3 0.2 5. 20 24.9 29.8 34.7 p(x 30) = binomialcdf(50,.8, 30) =.9872 The probability of no more than 30 homes with internet is about.98 27 /29

Normal Approximation This section is about using the normal model to approximate a binomial model. We can calculate probabilities using the binomial function or the normal function. x P(x) 0.pppp.qqqq 2.rrrr... n.nnnn Normal Binomial np 0, nq 0 µ = np, σ = npq z= cc µ σ.zzzz -3-2 - 0 2 3 p(x=b) = binomialpdf(n,p,b) p(b-.5<x<b+.5) = normalcdf(b-.5,b+.5,np, npq) p(x<b) = binomialcdf(n,p,b-) p(x<b-.5) = normalcdf(-0^99, b-.5,np, npq) p(x b) = binomialcdf(n,p,b) p(x<b+.5) = normalcdf(-0^99, b+.5,np, npq) p(x>b) = -binomialcdf(n,p,b) p(x>b+.5) = normalcdf(b+.5,0^99,np, npq) p(x b) = -binomialcdf(n,p,b-) p(x>b-.5) = normalcdf(b-.5,0^99,np, npq) 28 /29

Calculating Probabilities Population Sampling Distribution Approximating Binomial Mean Proportion np > 0, nq > 0 N(μ, σ) N µ, σ n N p, pq n N(np, npq) P(x b) P ( X b ) P (! p b ) x = ( x.5, x +.5 ) P(x < b) P ( X < b ) P (! p < b ) P ( b.5 < x < b +.5 ) P(x > +b) P ( X > +b ) P (! p > +b ) P ( x < b +.5 ) P(a < x < b) P ( a < X < b ) P ( x < b.5 ) z = x µ X σ X z = X µ X σ X n z = p! p pq n P ( x > b.5 ) P ( x > b +.5 ) z = x ±.5 np npq 29/29