We have seen in the previous investigation that binomial distributions can have different shapes. The distributions can range from approximately normal to skewed left or skewed right. Remember that when we describe a distribution, two of the characteristics of the distribution that we are interested in are center and spread like the mean and the standard deviation. Since a binomial distribution is a discrete random variable we could calculate the mean and standard deviation using known formulas. 1. Recall the table for the binomial distribution from the previous investigation for the number of girls in a family of four. Number of Girls (x) Theoretical Probability P(x) 0 1 2 3 4 0.0677 0.2600 0.3747 0.2400 0.0576 Using the formulas for the mean and variance of a discrete probability distribution (and your graphing calculator) find the mean and standard deviation for this binomial distribution. x x The formulas used to find the mean and standard deviation for a discrete random variable work fine but can get tedious especially if n is large. For binomial distributions the formulas for the mean and variance turn out to be quite simple. We will derive these formulas in the next activity. 2. We will start with a simple binomial setting where the random variable x has two outcomes 0 or 1. We will consider 1 a success and 0 a failure. Let be the probability of success and q the probability of failure. (Note that q 1 ). The probability distribution for this variable is given in the table below. x 0 1 P(x) 1 a. Calculate the mean, variance, and standard deviation for this distribution. 2 Page 1 of 20
b. Now define a new random variable y, where y is the count of the number of successes in n independent observations. That is y x1 x2 x3... xn, where each x i is a binomial random variable with the same distribution given in the previous table. Using the formulas for the means and variances of linear combinations of independent random variables find the mean, variance and standard deviation for y. y 2 y y Page 2 of 20
Mean and Standard Deviation of a Binomial Random Variable If a count X has the binomial distribution with n trials and the probability of success, then the mean and the standard deviation of X are n x n(1 ) x 3. Use the formulas for the mean and standard deviation of a binomial random variable to calculate the mean and standard deviation for the distribution of the count of the number of girls in a family of four children where the probability of a girl is 0.49. Show work. 4. Recall the TV purchasing problem from the previous investigation. Eighty percent of all TVs sold by a large retailer are plasma high definition (HDTV) and twenty percent are HD light emitting diode LCD TVs. The type of TV purchased by each of the next 12 customers will be noted. Let x be the random variable the number of plasma HDTVs purchased by these 12 customers. Find the mean and the standard deviation for this distribution. Show work. Page 3 of 20
Binomial Distributions in Statistical Sampling 5. In an attempt to increase sales, a breakfast cereal company decides to offer a NASCAR promotion. Each box of cereal will contain a collectible card featuring one of these NASCAR drivers: Jeff Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica Patrick, or Jimmie Johnson. The company says that each of the 5 cards is equally likely to appear in any box of cereal. Suppose that the company printed 20,000 of each of the 5 cards so that there are 100,000 cereal boxes on the market that contain a card. A NASCAR fan bought 6 boxes of cereal at random, what is the probability of not getting the Danica Patrick card? Let s explore how we might use a binomial distribution to answer this question. a. What is considered a success in this situation? Define a random variable X to represent the number of successes. b. What is the probability of success? Is this a binomial setting? Why or why not? Since we are sampling without replacement (we didn t put the cereal box back on the shelf), the trials (buying cereal boxes) are not independent so the distribution is not technically binomial. But it is close. Consider that the probability of success for the first trial (not getting the Danica Patrick card) is 80,000 0.8 100,000. Since we are not doing replacement the probability of success changes to 79,999 0.7999 still pretty close to 99,999 0.8. This is due to the large population that we are drawing from. So for all 6 trials the probability of success does change but not significantly so. Thus we can consider this close to a binomial setting and thus use the rules to calculate the probability of 6 successes in 6 trials. c. Assuming that X is a binomial random variable find PX ( 6). Page 4 of 20
d. Calculate P(no Danica Patrick card) using the general multiplication rule (since the events are not independent) and compare this to the result you got in part c. Let s summarize when we can use a binomial distribution to calculate probabilities. Sampling Without Replacement Condition When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the 1 count of successes (X) in the sample as long as n N ie. The population is at least ten times larger than the 10 sample. 6. Hiring Discrimination It Just Won t Fly! Sampling without replacement An airline has just finished training 25 first officers 15 male and 10 female to become captains. Unfortunately, only eight captain positions are available. Airline managers decide to use a lottery to determine which pilots will fill the available positions. Of the 8 captains chosen, 5 are female and 3 are male. a. Explain why the probability that 5 female pilots are chosen in a fair lottery is NOT PX 8 5 5 3 ( 5) (0.40) (0.6) 0.124 b. What is the correct probability that there are 5 female pilots selected? Page 5 of 20
7. Are attitudes toward shopping changing? Suppose that 60% of all adult U.S. Residents shop for gifts on the internet. Determine the probability that 1520 or more of a random sample of 2500 adult U.S. Residents shop for gifts online. a. Justify that this situation represents a binomial setting. b. Define a random variable for this problem and indicate the value for. c. In terms of your random variable express the probability that 1520 or more of the sample shopped on line using appropriate statistical notation. d. Find the probability in part c. 2500 1520 980 e. Find PX ( 1520) using the formula PX ( 1520) (0.6) (0.4). Did you encounter any 1520 difficulties? Find this probability using the built in functions for binomial distributions on your calculator. f. Find Px ( 1520) using binomcdf if you haven t already. Page 6 of 20
g. By using binomcdf you found that Px ( 1520) 0.2131. We have seen that binomial distributions can take on various shapes. Sometimes they are approximately normal. Suppose that the histogram below represents the results of the sample survey. Notice that this binomial distribution is approximately normal. Find the mean and standard deviation for this binomial distribution and then approximate Px ( 1520) using a normal distribution assuming it has the same mean and standard deviation as the binomial distribution. Compare this result with the value from part f. Page 7 of 20
8. From the previous exercise we see that if the binomial distribution is approximately normal we can use normal distribution calculations to approximate binomial distribution calculations. But when is a binomial distribution normal enough to do this? Recall the TV problem where eighty percent of all TVs sold by a large retailer are plasma high definition (HDTV) and twenty percent are HD light emitting diode LCD TVs. The type of TV purchased by each of the next 12 customers was noted and x was the random variable the number of plasma HDTVs purchased by these 12 customers. Recall that for this problem situation 0.8. The probability distribution table and histogram for x is shown below. Clearly this distribution is skewed left so a normal distribution approximation should not be used. But since n 12 is a small number of observations we have no difficulty calculating probabilities using the formulas for a binomial distribution or by using binomial calculations on your calculator. HDTV Purchase 1 2 3 4 5 6 7 8 9 10 11 12 13 x Px 0 4.096e-09 1 1.96608e-07 2 4.32538e-06 3 5.76717e-05 4 0.000519045 5 0.00332189 6 0.0155021 7 0.0531502 8 0.132876 9 0.236223 10 0.283468 11 0.206158 12 0.0687195 As you have seen when a binomial distribution is approximately normal it is reasonable to use a normal distribution to approximate desired probabilities. Since calling a distribution approximately normal is a judgment call on the part of the observer we will need criteria to establish when it is appropriate to use a normal distribution to approximate probabilities for a binomial distribution. 9. Open the Fathom file Binomial vs Normal. You should see a screen similar to the one shown below. Page 8 of 20
The histogram is the binomial distribution B (10,0.932). This distribution is clearly skewed left and thus a normal distribution is not an appropriate model. The curve drawn in the figure is a normal distribution and as you can see this curve just doesn t quite fit the histogram. a. Draw a more appropriate curve for this histogram in the figure above. b. Using the slider bars, change the value of and note when the distribution histogram appears most normal. c. Now change back to 0.932 and move the slider bar for n to increase it s value. What happens to the shape of the binomial distribution as n increases in value? Based on the results in activity 7, it is clear that the shape of a binomial distribution is dependent on and n. As a rule of thumb we can use a normal distribution to approximate a binomial distribution when n 10 and n (1 ) 10. When these conditions are met we may then safely use N n, n (1 ) as the normal approximation for the binomial distribution. 10. a. Do the conditions for the TV purchasing problem from activity 8 allow you to use a normal distribution? Justify your response. b. Do the conditions for the shopping problem in activity 7 allow you to use a normal distribution? Justify your response. Page 9 of 20
11. One way of checking the effect of undercoverage, nonresponse, and other sources of error in a sample survey is to compare the sample with known facts about the population. About 12% of American adults identify themselves as black. Suppose we take a SRS of 1500 American adults and let X be the number of blacks in the sample. a. Justify that this is a binomial setting. b. Check the conditions for using a normal approximation in this setting. c. Calculate and interpret P(165 X 195) in the context of this problem situation. using the normal approximation. d. Calculate P(165 X 195) using binomcdf on our calculator and compare it to your answer in part c. Page 10 of 20
We have seen in the previous investigation that binomial distributions can have different shapes. The distributions can range from approximately normal to skewed left or skewed right. Remember that when we describe a distribution, two of the characteristics of the distribution that we are interested in are center and spread like the mean and the standard deviation. Since a binomial distribution is a discrete random variable we could calculate the mean and standard deviation using known formulas. 1. Recall the table for the binomial distribution from the previous investigation for the number of girls in a family of four. Number of Girls (x) Theoretical Probability P(x) 0 1 2 3 4 0.0677 0.2600 0.3747 0.2400 0.0576 Using the formulas for the mean and variance of a discrete probability distribution (and your graphing calculator) find the mean and standard deviation for this binomial distribution. x x The formulas used to find the mean and standard deviation for a discrete random variable work fine but can get tedious especially if n is large. For binomial distributions the formulas for the mean and variance turn out to be quite simple. We will derive these formulas in the next activity. 2. We will start with a simple binomial setting where the random variable x has two outcomes 0 or 1. We will consider 1 a success and 0 a failure. Let be the probability of success and q the probability of failure. (Note that q 1 ). The probability distribution for this variable is given in the table below. x 0 1 P(x) 1 a. Calculate the mean, variance, and standard deviation for this distribution. 2 Page 11 of 20
b. Now define a new random variable y, where y is the count of the number of successes in n independent observations. That is y x1 x2 x3... xn, where each x i is a binomial random variable with the same distribution given in the previous table. Using the formulas for the means and variances of linear combinations of independent random variables find the mean, variance and standard deviation for y. y 2 y y Page 12 of 20
Mean and Standard Deviation of a Binomial Random Variable If a count X has the binomial distribution with n trials and the probability of success, then the mean and the standard deviation of X are n x n(1 ) x 3. Use the formulas for the mean and standard deviation of a binomial random variable to calculate the mean and standard deviation for the distribution of the count of the number of girls in a family of four children where the probability of a girl is 0.49. Show work. 4. Recall the TV purchasing problem from the previous investigation. Eighty percent of all TVs sold by a large retailer are plasma high definition (HDTV) and twenty percent are HD light emitting diode LCD TVs. The type of TV purchased by each of the next 12 customers will be noted. Let x be the random variable the number of plasma HDTVs purchased by these 12 customers. Find the mean and the standard deviation for this distribution. Show work. Page 13 of 20
Binomial Distributions in Statistical Sampling 5. In an attempt to increase sales, a breakfast cereal company decides to offer a NASCAR promotion. Each box of cereal will contain a collectible card featuring one of these NASCAR drivers: Jeff Gordon, Dale Earnhardt, Jr., Tony Stewart, Danica Patrick, or Jimmie Johnson. The company says that each of the 5 cards is equally likely to appear in any box of cereal. Suppose that the company printed 20,000 of each of the 5 cards so that there are 100,000 cereal boxes on the market that contain a card. A NASCAR fan bought 6 boxes of cereal at random, what is the probability of not getting the Danica Patrick card? Let s explore how we might use a binomial distribution to answer this question. a. What is considered a success in this situation? Define a random variable X to represent the number of successes. b. What is the probability of success? Is this a binomial setting? Why or why not? Since we are sampling without replacement (we didn t put the cereal box back on the shelf), the trials (buying cereal boxes) are not independent so the distribution is not technically binomial. But it is close. Consider that the probability of success for the first trial (not getting the Danica Patrick card) is 80,000 0.8 100,000. Since we are not doing replacement the probability of success changes to 79,999 0.7999 still pretty close to 99,999 0.8. This is due to the large population that we are drawing from. So for all 6 trials the probability of success does change but not significantly so. Thus we can consider this close to a binomial setting and thus use the rules to calculate the probability of 6 successes in 6 trials. c. Assuming that X is a binomial random variable find PX ( 6). Page 14 of 20
d. Calculate P(no Danica Patrick card) using the general multiplication rule (since the events are not independent) and compare this to the result you got in part c. Let s summarize when we can use a binomial distribution to calculate probabilities. Sampling Without Replacement Condition When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the 1 count of successes (X) in the sample as long as n N ie. The population is at least ten times larger than the 10 sample. 6. Hiring Discrimination It Just Won t Fly! Sampling without replacement An airline has just finished training 25 first officers 15 male and 10 female to become captains. Unfortunately, only eight captain positions are available. Airline managers decide to use a lottery to determine which pilots will fill the available positions. Of the 8 captains chosen, 5 are female and 3 are male. a. Explain why the probability that 5 female pilots are chosen in a fair lottery is NOT PX 8 5 5 3 ( 5) (0.40) (0.6) 0.124 b. What is the correct probability that there are 5 female pilots selected? Page 15 of 20
7. Are attitudes toward shopping changing? Suppose that 60% of all adult U.S. Residents shop for gifts on the internet. Determine the probability that 1520 or more of a random sample of 2500 adult U.S. Residents shop for gifts online. a. Justify that this situation represents a binomial setting. b. Define a random variable for this problem and indicate the value for. c. In terms of your random variable express the probability that 1520 or more of the sample shopped on line using appropriate statistical notation. d. Find the probability in part c. 2500 1520 980 e. Find PX ( 1520) using the formula PX ( 1520) (0.6) (0.4). Did you encounter any 1520 difficulties? Find this probability using the built in functions for binomial distributions on your calculator. f. Find Px ( 1520) using binomcdf if you haven t already. Page 16 of 20
g. By using binomcdf you found that Px ( 1520) 0.2131. We have seen that binomial distributions can take on various shapes. Sometimes they are approximately normal. Suppose that the histogram below represents the results of the sample survey. Notice that this binomial distribution is approximately normal. Find the mean and standard deviation for this binomial distribution and then approximate Px ( 1520) using a normal distribution assuming it has the same mean and standard deviation as the binomial distribution. Compare this result with the value from part f. Page 17 of 20
8. From the previous exercise we see that if the binomial distribution is approximately normal we can use normal distribution calculations to approximate binomial distribution calculations. But when is a binomial distribution normal enough to do this? Recall the TV problem where eighty percent of all TVs sold by a large retailer are plasma high definition (HDTV) and twenty percent are HD light emitting diode LCD TVs. The type of TV purchased by each of the next 12 customers was noted and x was the random variable the number of plasma HDTVs purchased by these 12 customers. Recall that for this problem situation 0.8. The probability distribution table and histogram for x is shown below. Clearly this distribution is skewed left so a normal distribution approximation should not be used. But since n 12 is a small number of observations we have no difficulty calculating probabilities using the formulas for a binomial distribution or by using binomial calculations on your calculator. HDTV Purchase 1 2 3 4 5 6 7 8 9 10 11 12 13 x Px 0 4.096e-09 1 1.96608e-07 2 4.32538e-06 3 5.76717e-05 4 0.000519045 5 0.00332189 6 0.0155021 7 0.0531502 8 0.132876 9 0.236223 10 0.283468 11 0.206158 12 0.0687195 As you have seen when a binomial distribution is approximately normal it is reasonable to use a normal distribution to approximate desired probabilities. Since calling a distribution approximately normal is a judgment call on the part of the observer we will need criteria to establish when it is appropriate to use a normal distribution to approximate probabilities for a binomial distribution. 9. Open the Fathom file Binomial vs Normal. You should see a screen similar to the one shown below. Page 18 of 20
The histogram is the binomial distribution B (10,0.932). This distribution is clearly skewed left and thus a normal distribution is not an appropriate model. The curve drawn in the figure is a normal distribution and as you can see this curve just doesn t quite fit the histogram. a. Draw a more appropriate curve for this histogram in the figure above. b. Using the slider bars, change the value of and note when the distribution histogram appears most normal. c. Now change back to 0.932 and move the slider bar for n to increase it s value. What happens to the shape of the binomial distribution as n increases in value? Based on the results in activity 7, it is clear that the shape of a binomial distribution is dependent on and n. As a rule of thumb we can use a normal distribution to approximate a binomial distribution when n 10 and n (1 ) 10. When these conditions are met we may then safely use N n, n (1 ) as the normal approximation for the binomial distribution. 10. a. Do the conditions for the TV purchasing problem from activity 8 allow you to use a normal distribution? Justify your response. b. Do the conditions for the shopping problem in activity 7 allow you to use a normal distribution? Justify your response. Page 19 of 20
11. One way of checking the effect of undercoverage, nonresponse, and other sources of error in a sample survey is to compare the sample with known facts about the population. About 12% of American adults identify themselves as black. Suppose we take a SRS of 1500 American adults and let X be the number of blacks in the sample. a. Justify that this is a binomial setting. b. Check the conditions for using a normal approximation in this setting. c. Calculate and interpret P(165 X 195) in the context of this problem situation. using the normal approximation. d. Calculate P(165 X 195) using binomcdf on our calculator and compare it to your answer in part c. Page 20 of 20