BINOMIAL EXPERIMENT SUPPLEMENT

BINOMIAL EXPERIMENT SUPPLEMENT Binomial Experiment - 1 A binomial experiment is any situation that involves n trials with each trial having one of two possible outcomes (Success or Failure) and the probability of a success is equal to p for all n trials [i.e., p(s) = p and p(f) = q = (1-p)]. The researcher is interested in the number of successes over the n trials. Letting x represent the number of successes, the possible values range from 0 to n (i.e., x = 0, 1, 2,..., n). The prototypical binomial experiment is tossing an unbiased coin. The probability of a head on each trial is.5 and the probability of a tail is.5 because each outcome is equally likely. If the coin is tossed 5 times, one would want to calculate the probabilities associated with x = 0, 1, 2, 3, 4, or 5 heads. A second example is rolling a single die where you are only interested in whether you obtain a 1 or not on each trial. The probability of a 1 on each trial is p = 1/6 because the die has 6 sides, only one side has a 1, and each outcome is equally likely. The probability of not-1 is q = 5/6. It the die is rolled 3 times (or equivalently, 3 dies are rolled at once), then one would want to calculate the probabilities associated with x = 0, 1, 2, or 3 1s. Binomial Theorem The probability of each value of x for a binomial experiment is given by the binomial theorem: p(x) = C(n,x) * p x q n-x Before we apply this formula, let us work out the solution logically. Consider an experiment in which participants indicate whether each of 4 adjectives describes them or not. The number of possible patterns of Successes (describes me) and Failures (does not describe me) is 2 4 = 2x2x2x2 = 16. These are enumerated below. x # Combinations Enumerated 0 1 4!/0!(4-0)! FFFF 1 4 4!/1!(4-1)! SFFF FSFF FFSF FFFS 2 6 4!/2!(4-2)! SSFF SFSF SFFS FSSF FSFS FFSS 3 4 4!/3!(4-3)! SSSF SSFS SFSS FSSS 4 1 4!/4!(4-4)! SSSS 16 = 2x2x2x2 = 2 4 If each outcome was equally likely [i.e., p(s)=.5], then it would be possible to determine the probability for each value of x by dividing the number of ways that value could occur by the total number of sequences, 16 [e.g., p(x=3) = 4/16 =.25]. But when p(s)

Binomial Experiment - 2 deviates from.5, then it is necessary to consider the probability for each of the sequences. Suppose that p(s) =.8. The probability of a specific sequence (e.g., SSSF) can be determined by the multiplication rule: p(sssf) =.8 x.8 x.8 x.2 =.1024. We could calculate the probability for other sequences of 3 successes and 1 failure in a similar manner: p(ssfs) =.8 x.8 x.2 x.8 =.1024, p(sfss) =.8 x.2 x.8 x.8 =.1024, p(fsss) =.2 x.8 x.8 x.8 =.1024. By the addition rule, p(x = 3) =.1024 +.1024 +.1024 +.1024 = 4 x.1024 =.4096. Examination of the preceding calculations shows that the probability is the same irrespective of the specific trials that are a success. That is, p =.8 3 x.2 1. All we really need to know is how many successes and how many failures there are so that we can calculate the probability of each sequence of that many successes and that many failures. In general, the probability of each sequence of x successes and n-x failures is given by: p x q n-x. We also need to determine the number of ways to obtain that many successes. This number is given by the combination rule: ncx = n!/x!(n-x)!. We need to add up the probability value this many times (or equivalently, multiply the probability value by the number of sequences). This gives us the binomial theorem for calculating the probability distribution of a binomial random variable. The probability distribution for n=4 and p=.8 is shown below. x p(x) #Comb. x p(each sequence) 0.0016 = 1 x.0016 = 4!/0!(4-0)! x.8 0 x.2 4-0 1.0256 = 2 x.0064 = 4!/1!(4-1)! x.8 1 x.2 4-1 2.1536 = 6 x.0256 = 4!/2!(4-2)! x.8 2 x.2 4-2 3.4096 = 4 x.1024 = 4!/3!(4-3)! x.8 3 x.2 4-3 4.4096 = 1 x.4096 = 4!/4!(4-4)! x.8 4 x.2 4-4 = 1.0 If someone endorsed 0 of the items, then this would be a very unexpected event if in fact the probability of endorsing an item was.8. For p =.8, 0 adjectives would be endorsed only 16 times out of 10,000 experiments. This is an extremely rare event and would suggest that the p value for this person was less than the value of.8 that was used to generate the probability distribution shown above. Consider another example of a binomial experiment. Approximately 40% of university students who have taken statistics have a positive attitude towards statistics. We randomly select 5 students who took statistics from Professor Z and find that all 5 have positive attitudes. We can use the binomial theorem to calculate the probability distribution for this experiment using n=5 and p=.40. For example, p(x = 2) = (5!/2!(5-2)!) x.4 2 x.6 5-2 =

Binomial Experiment - 3 10 x.03456 =.346. There are 10 ways to obtain 2 successes and 3 failures (e.g., SSFFF, SFSFF,...) and the probability of each of those ways is.03456 [e.g., p(ssfff) =.4 x.4 x.6 x.6 x.6 =.03456). The complete distribution is shown below. x p(x) 0.078 1.259 2.346 3.230 4.077 5.010 = 1.0 Given this outcome, the probability of all 5 students having positive attitudes towards statistics is quite low (.010) if p were actually equal to.4. It seems more likely that p>.4 for students of Professor Z. Table for Specified Binomial Distributions Table B in Appendix D of Pagano presents the probabilities for different values of x for n s from 1 to 20 and certain specified p values (p =.05,.10,.15,....50). For the example that we just did using the Binomial Theorem, the probabilities can be found for N = 5 and P =.4 in Table B. These probabilities are shown below, along with those we calculated using the binomial theorem (BT) p(x) x BT Table B 0.078.0778 1.259.2592 2.346.3456 3.230.2304 4.077.0768 5.010.0102 = 1.0 1.0 Our values would have been exactly those presented in Table B if we had carried more decimal values in our calculations.

Binomial Experiment - 4 One limitation of the Table approach is that it only includes binomial probabilities for the p values included in the Table (i.e., the 10 values from.05 to.50). We can also figure out the binomial probabilities for p values above.50 (i.e., p =.55,.60,....95) by considering the number of failures (Q in Table B), rather than the number of successes (P in Table B). The procedure is explained in Pagano. Essentially, however, you need to reverse the sequence of values in the column headed by x above; that is, low to high (x = 0, 1, 2, 3, 4, 5) becomes high to low (x = 5, 4, 3, 2, 1, 0). Mean and Standard Deviation of a Binomial Distribution The mean and standard deviation of a binomial distribution can be calculated using the general formula for the mean and standard deviation of a discrete probability distribution [i.e., = x*p(x), = (y-x) 2 *p(x)]. These calculations are illustrated below for our binomial probability distribution for n=4 and p=.8. x p(x) 0.0016 1.0256 = 0 x.0016 + 1 x.0256 +... + 4 x.4096 2.1536 = 3.2 3.4096 = (0-3.2) 2 x.0016 + (1-3.2) 2 x.0256... 4.4096 =.80 There is also a shorter way to calculate the mean and standard deviation for a binomial distribution, namely: = np = 4 x.8 = 3.2 = npq = 4 x.8 x.2 =.80 Using these formulas, it is quite easy to determine the mean and standard deviation for a binomial distribution once n and p are specified. Normal Approximation to the Binomial Distribution The mean and standard deviation of the binomial can be used to approximate the binomial distribution by the normal distribution. This approximation will be better for large n and for values of p close to.5. The smaller the n and the further away from.5 that p is, then the worse the approximation will be. In order to use the normal approximation, we need to remember that the binomial distribution is a discrete distribution, whereas the normal distribution is a continuous distribution. We will use the boundaries between the discrete values of the binomial when estimating with the normal approximation. For example, the boundary between 3 and 4

Binomial Experiment - 5 successes would be 3.5. To illustrate the procedure, consider the binomial for n=5 and p=.4. Earlier we used the binomial theorem to calculate the exact probability for x = 2: p(x = 2) =.346. Now we will calculate the same probability using the normal approximation. The discrete value of 2 falls between the boundaries of 1.5 and 2.5 for a continuous distribution. We will calculate the z scores corresponding to these values and determine the area (i.e., probability) between those values. First, we need and. = 5 x.4 = 2.0 = 5 x.4 x.6 = 1.095 Now for the calculations of our z scores z 1.5 = (1.5-2)/1.095 = -.46 z 2.5 = (2.5-2)/1.095 =.46 The area between -.46 and +.46 in a normal distribution is.1772 +.1772 =.3544. This estimated value is actually quite close to the exact probability of.346. Suppose that 100 students write a test with 40 questions that can be either correct or incorrect. Let us estimate the number of students who will answer 30 or more questions correct if the probability that each item is answered correctly is thought to be.6. Again we need and. = np = 40 x.6 = 24.0 = npq =40 x.6 x.4 = 3.098 The exact boundary between 29 and 30 is 29.5, so we calculate the z score for 29.5. z = (29.5-24.0)/3.098 = 1.78 From Appendix A, the area above a z-score of 1.78 is.0375; this is the probability of a score of 30 or better. Therefore, we would expect only 3.75 students (approximately 4 students) to get a score of 30 or better if p=.6. If many more than 4 students answered 30 or more questions correctly, then we might conclude that p was not equal to.6 for this class. One must be cautious in using the normal approximation because it might be that the normal distribution is not a good approximation for the set of data or experimental situation that is being examined. With a reasonable n and a value of p close to.5, however, the normal approximation appears to work quite well.

Binomial Experiment - 6 Sample Problems 1. Given a particular genetic disorder, there is a one out of four chance that a couple with certain genes will give birth to a child with the condition. A family has three children, all with the condition. What is the probability of this outcome? p =.25, n = 3, x = 3, p(x=3) =.0156 (Table B) 2. Calculate the exact probability reported in 1. p = (3!/(3!(3-3)!)).25 3 x.75 3-3 =.0156 3. What is the probability in a large family of 24 children that from 4 to 6 (inclusive) children have the disorder? = 24 x.25 = 6.00, = sqr(24 x.25 x.75) = 2.1213 z 4 = (3.5-6.0)/2.1213 = -1.18 z 6 = (6.5-6.0)/2.1213 = +.24 p =.3810 +.0948 =.4758 Exact probability =.4923 (calculated using SPSS)

SUPPLEMENTARY BINOMIAL PROBLEMS 1. If the probability that a used-car salesperson makes a sale on any given day is.12, (a) What is the probability of making a sale on each of three consecutive days? (b) What is the probability that she makes a sale on exactly 2 of 5 randomly chosen days? (c) What is the probability that she sells more than 40 cars in 2001? 2. If the probability that a student at U of W is female is.65, (a) What is the probability that there are all females in a seminar class of 10 students? (b) Show how to calculate the probability of exactly 4 females. (c) What is the probability that fewer than 4 of the 10 students in the seminar class are female? (d) What is the probability that the graduating class of 250 students contains between 170 and 180 females? 3. Joan estimates that she will miss her school bus if she sleeps in, which happens 12% of the time, or if the bus is early, which happens 3% of the time. (a) Do Joan s estimates seem correct given that Joan missed her bus every day this week? (b) What is the maximum number of school days out of 60 that you would expect Joan to catch her bus with a probability of.90? 4. George attended none of his statistics classes, read none of the text, and had no prior knowledge of statistics. (a) What is the probability that George will only get 3 of 15 questions correct on a multiple-choice test with 5 alternatives per question? (b) Calculate the probability in a. (c) What is the probability on a similar test with 50 questions that George gets fewer than 8 questions correct? 5. Out of 16 people who participated in a 6-week treatment for compulsive eating, 12 people improved from pre-test to post-test. What is the probability that 12 or more people improved: (a) If the treatment was ineffective and half of untreated people tend to improve over any 6-week period? (b) If the treatment was ineffective, but any placebo treatment tends to produce improvement in 63% of people (Note: use an approximation for b)? 6. Given a particular genetic disorder, there is a one out of four chance that a couple with certain genes will give birth to a child with the condition. (a) A family has three children, all with the condition. What is the probability of this outcome? (b) Calculate the exact probability reported in 1. (c) What is the probability in a large family of 24 children that from 4 to 6 (inclusive) children have the disorder?

SOLUTIONS FOR SUPPLEMENTARY BINOMIAL PROBLEMS 1. (a) 3!/[3!(3-3)!] x.12 3 x.88 3-3 = 1 x.001728 =.001728 (b) 5!/[2!(5-2)!] x.12 2 x.88 5-2 = 10 x.009813 =.09813 (c) =.12 x 365 = 43.8, = sqrt(365 x.12 x.88) = 6.2084, z = (39.5-43.8)/6.2084 = -.69, p =.50 +.2549 =.7549 2. (a) from Table B for N = 10 and q =.35, p(m=0) =.0135 (b) 10!/[4!(10-4)!] x.65 4 x.35 10-4 = 210 x.000328 =.0689 (c) from Table B for N = 10 and q =.35, p =.0000 +.0005 +.0043 +.0212 =.026 (d) =.65 x 250 = 162.5, = sqrt(250 x.65 x.35) = 7.5416, z1 = (169.5-162.5)/7.5416 =.93, z2 = (180.5-162.5)/7.5416 = 2.39, p =.4916 -.3238 =.1678 =.1762 -.0084 3. (a) p =.12 +.03 -.12x.03 =.1464, p(5) = 5!/[5!(5-5)!] x.1464 5 x.8536 5-5 =.000067, unlikely if p s =.12 and.03 (b) =.8536 x 60 = 51.216, = sqrt(60 x.8536 x.1464) = 2.7383, z.90 = 1.28, X.90 = 51.216 + 1.28 x 2.7383 = 54.721, p(x <= 54) =.90 4. (a) From Table B for N = 15 and p = 1/5 =.20, p(3) =.2501 (b) 15!/[3!(15-3)!] x.20 3 x.80 15-3 = 455 x.0005498 =.2501 (c) =.20 x 50 = 10.0, = sqrt(50 x.20 x.80) =2.8284, z = (7.5-10.0)/2.8284 = -.88, p =.1894 5. (a) From Table B for N = 16 and p =.50, p(x >= 12) =.0278 +.0085 +.0018 +.0002 +.0000 =.0401 (b) =.63 x 16 = 10.08, = sqrt(16 x.63 x.37) = 1.9312, z = (11.5-10.08) / 1.9312 =.7352, p =.2296. 6. (a) p =.25, n = 3, x = 3, p(x=3) =.0156 (Table B) (b) p = (3!/(3!(3-3)!)).25 3 x.75 3-3 =.0156 (c) = 24 x.25 = 6.00, = sqr(24x.25x.75) = 2.1213 z 4 = (3.5-6)/2.1213 = -1.18 z 6 = (6.5-6)/2.1213 = +.24 p =.3810 +.0948 =.4758 Exact probability =.4923