STAB22 section 5.2 and Chapter 5 exercises

Size: px

Start display at page:

Download "STAB22 section 5.2 and Chapter 5 exercises"

Abner Parrish
6 years ago
Views:

1 STAB22 section 5.2 and Chapter 5 exercises seniors were questioned, so n = 250. ˆp is the fraction in your sample that were successes (said that they had taken a statistics course): 45% or The number of successes in your sample must have been 45% of 250, 112.5, which ought to be your value of X. However, X has to be a whole number, so X has to be 112 or 113. (If you check, you ll see that these round off to 45%, if you take 2 decimals in your division, but 111 and 114 don t.) 5.34 According to the genetic theory, each child inherits genes from its parents independently of other children. So we have n = 4 trials (children) who each have probability 0.25 of having type O blood, so the number of children who actually do end up having type O blood has a binomial distribution with n = 4 and p = (You say that something has a particular distribution before you observe any data; once you have the data, you have a value like 2 children with type O blood, not a distribution.) 5.35 Each coin toss is independent, with the same probability 0.5 of getting a head each time. So the number of heads in 15 tosses has a binomial distribution with n = 15 and p = 0.5. When you actually do it, there s no knowing what number of heads you ll get, but the values near the middle (7.5) are more likely: you d expect to get about half heads and half tails. So 8 heads is more likely than 14, though both are possible To solve (a), look at Table C, page T-6. Find the rows of the table with n = 5 (the fourth block down), and the column with p = 0.4. The numbers in that column give the probabilities of observing each possible value k in a binomial distribution with that n and that p. Thus for n = 4 and p = 0.3, the probabilities of observing 0, 1, 2, 3, 4 and 5 successes are , , , , , and respectively. P (X = 0) you just read off the table as , and for P (X 3) you add up the ones you want: P (X 3) = P (X = 3) + P (X = 4) = = If you are going to use Table C for (b), you ll need to arrange things so that the success probability is 0.5 or less. What you do is to interchange successes and failures: subtract p from 1 to get = 0.4, and subtract the numbers of successes from n so that 5 successes in the question becomes 5 5 = 0 successes in your calculation, and X 2 becomes X 5 2 = 3. (Note that the becomes ). Since the values for n, p, k are now the same as in (a), the answers will be the same too. 1

2 The thought process in doing part (b) by Table C explains why the answers are the same: whatever is not a success is a failure, and you can count either. You can use software instead of Table C. In StatCrunch, select Stat, Calculators and Binomial; fill in n = 5 and p = 0.4 (for (a)), pick = from the pull-down menu next to Prob, and put 0 next to that. Click Compute. The answer, , appears in the grey box, and is illustrated in red on the graph above. For the second part of (a), change = to =>, and change 0 to 3. Part (b) you can do directly; just put the numbers in the boxes without changing anything. StatCrunch can handle large values of n too, so that it doesn t need to use the normal approximation to the binomial. This gives a way of seeing how good the normal approximation to the binomial is. (By the way, I can t see a good way of doing these by calculator unless you use the formula at the bottom of page 329, which we won t cover.) 5.38 Here n = 100 and p = 0.5 (fair coin). Use the formulas in the box at the top of page 322 to find that µˆp = p = 0.5 and σˆp = (0.5)(0.5)/100 = 0.05: that is, the fraction of heads you ll get will be close to 0.5 (50%) almost certainly, because the SD of ˆp is small. This is not the same as the mean and SD of the count of the number of heads, because the proportion of heads will be about 0.50 (regardless of the number of times you toss the coin), whereas the number of heads will be about half the number of tosses if you toss the coin 100 times, you d expect about 50 heads. If you want the mean and SD of the count, use the formulas in the middle of page 320: the count of the number of heads has mean np = (100)(0.5) = 50 and SD (100)(0.5)(0.5) = 5. This says that the number of heads should be relatively close to 50, which is what you d expect The first thing we need when using a normal approximation is the mean and SD of the thing we re normal-approximating, here the proportion of heads in 100 tosses. We can use the answers from 5.38 for this: mean 0.5, SD Then turn the given values into z-scores and use Table A as in has z-score ( )/0.05 = 4, and 0.7 has z-score ( )/0.05 = 4. The chance of being between these is basically 1. For (b), follow the same steps: 0.35 has z-score ( )/0.05 = 3 and 0.65 has z-score ( )/0.05 = 3, so the chance of ending up between these is = Or use the rule to get about 99.7%, and you don t 2

3 have to use the tables at all. Since we know we re tossing the coin 100 times, the question could also have asked find the probability that the number of heads is between 30 and 70, between 35 and 65, which we would have expected to give the same answers. To do it this way, we use the mean and SD of the count of heads, 50 and 5, as I found at the end of Using these figures with the proper mean and SD gives the same z-scores as working with the proportions, and so the same answers. If you work with the counts of the numbers of successes, you can get the answer from StatCrunch. Put in n = 100 and p = 0.5, and find the probability of being strictly less than 35 (< 35), which is , and the chance of being less than or equal to 65, which is Then subtract them to get (This is the probability of being between 35 and 65 inclusive, and is a little bigger than what we got from the normal approximation. Since the binomial is a discrete distribution, it matters whether you include or exclude values, but this is nearly a continuous distribution, so it doesn t matter much. Using a continuity correction to the normal approximation (page 329, which we also don t do) makes that a bit more accurate: for between 35 and 65 inclusive that gives , closer to the exact figure from the binomial (a) A properly tossed fair coin has no memory, so that individual tosses are independent: what happened in the past (three consecutive heads) has no influence over what s going to happen this time. Tails are due is a fallacy. (b) has the same reasoning: the probability is still exactly 0.5. (c) ˆp is the sample proportion: you perform the experiment and see how many successes you get, so that ˆp is a number that you know afterwards. This is unlike p, one of the parameters of the binomial distribution, the probability of success, which (in this chapter) would be known before you toss any coins, roll any dice, etc (a) X is the number of successes, a number like 19, and not a proportion at all (which would be a number like 19/50 = 0.38). (b) is wrong two ways: the given quantity is an SD not a variance (because of the square root), and it is the SD of the proportion and not the count. (c) The accuracy of the normal approximation depends on p as well as n: if p is very close to 0 or 1, even n = might not be large enough. (If p is 0.5, even a small n like n = 20 would do. Try this n and p, and also n = 10000, p = , in the rule of thumb in the box on page 323.) Also, if you have StatCrunch handy, type these two pairs of values into the n and p boxes, and 3

4 put something (doesn t matter what) into the X box. Take a look at the picture: for n = 20 and p = 0.5, it looks pretty normal, apart from being discrete, and for n = 10000, p = it looks both discrete and skewed right. In the latter case, any number of successes is possible, but the only ones with any appreciable probability are 0 to 4. (n = 10000, p = 0.2, on the other hand, looks pretty continuous and very normal.) 5.43 (a) If the poll is a simple random sample, this one will be OK, with n = 200 and p being some reasonable value for the probability of a randomly chosen student being usually irritable in the morning (close to 1 for both me and my daughter!). (b) is no good because the number of trials (tosses) is not fixed: every time you do this experiment, you ll need a different number of tosses. (c) is OK, again because it is a random sample, with n = 500 and p = 1/12. (d) fails because once you know that one of the 10 cards you dealt is black, you (slightly) decrease the chance that the next one will be. (To a small degree here, once you see a black card, a red card is due.) In other words, whether or not each card is black depends on what you ve seen before: the trials (cards) are not independent (a) There is no notion of success here. If a count were made of the number of students with mean systolic blood pressure greater (or less) than some target value, that count could be binomial. (b) looks OK (random sampling) with a fixed sample size (20), and a clear definition of success (defect) and failure (no defect). (c) also looks OK, for the same reason: a student will either report that they eat the required amount of fruits and vegetables (success), or report that they don t (failure) The number of errors caught will have a binomial distribution with n = 10 and p = 0.7. The number of errors missed also has a binomial distribution with n = 10 and p = 0.3, interchanging successes and failures. (You wouldn t tell the student proofreader that there were 10 errors, because he or she might keep trying until finding all 10, but from your point of view there are 10 opportunities to catch an error, and each time the student may or may not succeed. Some errors might be easier to catch than others, which would make the probability of success at each trial unequal, but we re not worrying about that here.) In (b), we re counting the number of errors missed, so p = = 0.3. Table C (starting at page T-6 in the back of the textbook) has binomial probabilities; the second table on page T-9, with n = 10 and values of p from 0.10 to 0.50, is 4

5 the one you need. Look in the p = 0.30 column, and add up the probabilities from 4 on (to get 4 or more errors missed ): this is = This is quite high, because 10 errors isn t very many, and it s quite likely to have this poor a performance by chance. Notice that Table C doesn t have any probabilities for p > 0.5. This is because you can always rephrase a problem to use a p less than 0.5. Another way to ask the question in (b) is: how likely is it that the proofreader will catch 6 or more errors of the 10?. The connection is: interchange successes and failures (6 errors caught is 10 6 = 4 errors missed), and replace p (here 0.7) with 1 p (1 0.7 = 0.3). Either you ll have a p you can use directly, or you can get one by this recipe. In StatCrunch, use Stat, Calculators, Binomial, and fill in n = 10, p = 0.30, X = 4 and = (since we re talking about errors missed) to get This is more accurate than using Table C, where the last digit was off by one The number of listeners has a binomial distribution with n = 20 and p = 0.3, approximately. Consult Table C, with n = 20 and use the p = 0.3 column. Take the probabilities for k = 8 onwards, and add them up. This gives = Or use StatCrunch: n = 20, p = 0.3, X = 8 and =, to get Isn t that so much easier than wading through Table C? 5.47 The mean is np. For the number of errors caught, this is (10)(0.7) = 7 and for the number of errors missed the mean is (10)(0.3) = 3. (These add up to 10 as they should.) The SD of the number of errors caught is np(1 p) = (10)(0.7)(0.3) = (The SD of the number of errors missed is the same, because the formula has the same numbers multiplied together in a different order.) If p goes up to 0.9, the SD becomes (10)(0.9)(0.1) = 0.95, which is smaller. If p goes up to 0.99, the SD decreases further to If the probability of success gets closer and closer to 1, the proofreader will make fewer and fewer mistakes, so the number of errors caught will get (almost certainly) closer and closer to 10. The spread will decrease to nothing, so the SD should (and does) approach zero The mean of the count is np = (20)(0.25) = 5. The mean of the proportion ˆp is np/n = p = 0.25 no matter what n is. When n = 200, the mean count of listeners is 5

6 np = (200)(0.25) = 50, and the mean proportion of listeners is p = When n = 2000, the mean count of listeners is np = (2000)(0.25) = 500, and the mean proportion of listeners is still p = The mean count of listeners goes up as n goes up, but the mean proportion of listeners is constant at (If you work out the standard deviations using the formulas on pages 320 and 322, you ll find that the ones for the counts go up, and the ones for the proportions go down. With a larger n, that is, a larger sample, the sample proportion becomes more predictably close to 0.25.) 5.51 For 0 in the question, read any particular digit, such as a 5. The number of 5 s in a group of 5 digits has a binomial distribution with n = 5 and p = So go into Table C (page T- 7). Probability of at least one five is one minus probability of no five; this is = Or you can take the probabilities of 1 through 5 and add them up: = The other way to do this is to pretend you re still in Chapter 4. Each digit has probability = 0.9 of not being a 5, so the probability that all 5 digits fail to be a 5 is = , and therefore the chance that at least 1 is a 5 is = (The strategy here is that at least one is not none, so you work out the probability of none first, which is usually easier, and then subtract it from 1.) In lines 40 digits long, n = 40, so the mean number of fives is (40)(0.1) = 4. About one-tenth of all digits are fives, so in a line of 40 digits, about 4 of them will be fives. This doesn t mean that exactly 4 will be fives; sometimes you will get 4, but usually you ll get a little more or fewer. If you use Stat-Calculators-Binomial in StatCrunch, enter n = 40 and p = 0.1 and any old thing in the other boxes, when you click Compute, you ll get a picture of that binomial distribution. Anything from 0 to 40 is possible, but only numbers of fives from 1 to about 8 are at all likely. Also note that the shape is a little bit skewed to the right; the rule of thumb says that a normal approximation wouldn t be good enough here (np = 4, less than 10) n = 4 and p = 0.25 (one quarter). In StatCrunch, enter the values 0 to 4 into a blank column, and copy the five entries from Table C for n = 4, p = 0.25 (page T-7), starting with into another empty column. Give the columns meaningful names (I used Type A and Probability ). Select Graphics, Bar Plot and With Summary. Select Type A for Categories and Probabilities for Counts (even though they are not really counts). Click Next twice. For Y axis label enter Probability. Then you can Create 6

7 Graph, with the result shown in Figure 1. Figure 1: Probability histogram for blood type data The mean is np = (4)(0.25) = 1, which goes right under the 1 bar on the histogram; this bar happens to be the tallest The number of children here is a bit of a distraction. What s really happening is that each person surveyed is saying I agree or I disagree to the statement The ideal number of children for a family to have is 2. The agrees and disagrees are where the binomial thing comes from, with n = 1007 and p = n is large and p is near 0.5, so use the normal approximation. The mean (for the sample proportion) is µˆp = 0.52 and the SD is σˆp = (0.52)(0.48)/1007 = A sample proportion of 0.49 gives a z of ( )/ = 1.91, and 0.55 gives z = Table A gives the chance of being between these values as = To use StatCrunch, 1007(0.49) = 493 successes, and 1007(0.55) = 554 successes. In a binomial distribution with n = 1007 and p = 0.52, using Stat, Calculators, Binomial, the exact chance of a number of successes between these values, inclusive, is = , so the normal approximation is very close. You ll have to figure the probababilities of < 493 and 554 to get exactly the right thing, but it won t make much difference if you do 493 for the first one. I suspect that the difference between the normal approx and the binomial would be even smaller if you used a continuity correction in the normal approximation. The chance of getting a sample proportion here between 0.49 and 0.55, that is, within 0.03 of the true value 0.52 of p, is high, about 95%. But it is not a certainty. Some of the possible samples 7

8 you could draw will have a sample proportion less than 0.46 or bigger than 0.52 (that is, outside the poll s stated margin of error of 0.03). We will see in 6.1 that it is impossible to be certain, because anything could happen in a sample, so what we do is to offer something like a margin of error that is correct 19 times out of 20, that is, it has probability 0.95 of being correct, over all the possible samples that we could take. If we wanted to have a higher chance of being correct, say 99%, we d have to accept a larger margin of error The calculations here are like those in 5.54: use the normal approximation to the binomial. Here, n is large but p is not that close to 0.50, so we should check the rules of thumb. For p = 0.04, np = 1027(0.04) = and n(1 p) = 1027(0.96) = , both of which are safely greater than 10. The more extreme case is OK, so the case with p = 0.24 must be OK as well. (If you want to check, np = 1027(0.24) = and n(1 p) = 1027(0.76) = , so this is OK too.) Though the normal approximation is acceptable in both cases, we d expect it to be better when p = 0.24 and worse when p = (We ll check this later.) When n = 1027 and p = 0.24, the mean and SD (for ˆp) are 0.24 and (0.24)(0.76)/1027 = The z values for 0.22 and 0.26 are ±0.02/ = ±1.50, so the probability of being between is = If p = 0.04, the mean and SD are 0.04 and (0.04)(0.96)/1027 = The z-values for 0.04 and 0.08 are ±0.02/ = ±3.27, so the probability of being between is = The probability of being within 0.02 of p appears to be getting larger as p gets smaller. As p gets closer to 0, the SD of ˆp is getting closer to 0 as well, which means that ˆp is almost certainly close to p, and the chance of being within anything will get closer to 1. Let s check how good those normal approximations are. In the first case, we want to be between 1027(0.22) = 226 and 1027(0.26) = 267 successes with n = 1027 and p = StatCrunch gives us (in the same manner as 5.54) = In the second case, we want to be between 1027(0.02) = 21 and 1027(0.06) = 62 successes, and we get = The second approximation is actually better than the first, but luckily so: both the upper- and lowerend probabilities in the second case are relatively speaking quite a bit off, but they happen to be off in the same direction, so when you subtract them, you come out close to the right answer The calculations here are the same idea again: 8

9 use the normal approximation to the binomial and the mean and SD of ˆp to get z-values and probabilities for the various values of n. Since you have to do the calculations several times over, you can use a spreadsheet to do the repetitive calculations for you. Mine is shown in Figure 2. (Actually, I used another statistical package called R for this, instead of a spreadsheet. But a spreadsheet would come out looking the same.) If you can follow the calculations in 5.54, you ll be able to see what I m doing here. n p sd z1 z2 prob Figure 2: Spreadsheet for calculations of 5.56 The probability of getting a sample proportion within 0.03 of the true p appears to be (and is) heading towards 1. That is, with a larger sample, the sample proportion is more likely to be close to the population proportion. If you were able to take an infinitely large sample, you would be certain to get ˆp = p. In real life, though, you ll have to accept that your sample proportion won t be exactly equal to the population proportion. You might be concerned about how good the normal approximation to the binomial is, but here p is very close to 0.5 and even the smallest n passes the rule of thumb easily, so there s no issue about that here The sample proportion is 140/200 = 0.7 or 70%. You can use a normal approximation to find the chance that 140 or more students in a sample of 200 would support the crackdown, if p = Or you can pull the answer out of StatCrunch: the chance of 140 or more is , using n = 200, p = A normal approximation gives = , which is not bad. The upshot of this calculation is that if the proportion of students favouring the crackdown is really 0.67, it is quite likely (the probability is about 0.20) that you will get as many as 140 = 70% in favour in your sample, just by chance. So this, by itself, is not evidence that the proportion of students in favour at your college is higher than (Your letter needs to make this point: because of random sampling, the result that was observed could easily have happened by chance.) If you really wanted evidence that your college was different, you would have to either (a) get a sample proportion quite a bit bigger, or (b) get the same result (70%) as here with a bigger sample. With a bigger sample, a sample proportion 9

10 as high as 0.70 becomes progressively less likely if p = 0.67, and so with a bigger sample you would be more entitled to conclude that p is not equal to 0.67 after all. (This is the logic of a test of significance, which we ll see a lot more of in 6.2) (a) There are four shapes, of which the subject guesses one, so p = 1. (b) The number 4 of shapes guessed out of 20 has a binomial distribution with n = 20 and p = 1 = 0.25, so 4 from Table C (page T-10), the probability of 10 or more correct guesses is = (c) This is just the mean and SD of the binomial distribution here, ie. mean is np = 20(0.25) = 5, variance is np(1 p) = 20(0.25)(0.75) = 3.75, and SD is 3.75 = 1.94 guesses. (d) Knowing that the deck has exactly 5 of each card might change things: for instance, if the subject hasn t seen a star in the first 10 cards, he/she knows that 5 of the last 15 cards are stars and may start guessing stars, with a higher chance of being correct (that is, the chance of guessing a card correctly isn t constant all the way through, and therefore a binomial distribution is no good.) This is the same strategy as counting face cards if you are playing blackjack; counting cards is a winning strategy in a casino if you are discreet enough not to get thrown out! (In rather more statistical terms: independence doesn t apply any more, because knowing what has come up before will affect your future guesses: cards you haven t seen many of yet really are due. Casinos get around this in blackjack by using as many of 6 decks of cards mixed together, and shuffling and starting over when there are quite a lot of cards left.) 5.62 The mean is np = 1200(0.75) = 900, variance is 1200(0.75)(0.25) = 225, so SD is 225 = 15. Here, n = 1200 is larger than Table C has, so we need to use the normal approximation. The idea is to find the mean and SD (as we just did) and pretend that the count has a normal distribution with this mean and SD. (It actually has a binomial distribution, of course, but with a large n we can often get away with it. See the rule of thumb calculations below.) 800 is a value, so turn it into a z and then look it up in Table A, using the mean and SD you just found. This gives: z = = This is off the end of Table A, so the probability of less than is 0, and the probability of more than is 1. It is as good as certain that they will have at least 800 acceptances. 950 works the same way: z = =

11 From Table A, the probability of less than is , so the probability of more than is = So the college will rarely get caught out: almost all of the time, they won t end up with more than 950 students following this strategy. We re using the normal approximation to the binomial here because n is large (1200) and p is not too far from 0.5. The rule of thumb on page 324 says this will be OK if np 10 and n(1 p) 10. Here np = 1200(0.75) = 900 and n(1 p) = 1200(0.25) = 300, so we are completely safe. (You might be concerned that the binomial distribution deals with whole numbers, whereas the normal distribution deals with fractional numbers. Or you might be thinking that at least 950 and more than 950 are different in the first you include 950, and in the second you don t. But the normal approximation above treats them the same way. You can get a more accurate answer using a continuity correction : ask yourself what decimal number would round off to the whole number I want? In this case, more than 950 means bigger than 950.5, so use instead of 950 to get z = ( )/15 = With large n, this often doesn t make much difference; here the probability a little smaller, but is the same to 4 decimals. If you are not concerned about this, don t worry; you don t need to know continuity corrections in this course.) Change n to 1300, so the mean changes to np = 1300(0.75) = 975, variance becomes np(1 p) = 1300(0.75)(0.25) = and SD is = Now z = ( )/ = 1.60, so prob. is = The college is now very likely to end up with too many students. Or use the continuity correction and start from 950.5, so get z = ( )/ = 1.57 and a prob. of = This time the continuity correction makes more of a difference (though still not much). StatCrunch has no trouble figuring these binomial probabilities exactly. Select Stat, Calculators and Binomial. Enter 1200 for n and 0.75 for p. The chance of at least 800 is 1 to the accuracy shown. The chance of more than 950 is , a smidgen less than the normal approximation gave us. For (d), just change n to 1300 to get (The continuity-corrected normal approximation is more accurate, though the uncorrected one is not bad.) 5.63 The success prob. is 1 = 0.2, so the mean is 5 900(0.2) = 180, the variance is 900(0.2)(0.8) = 144, and the SD is 144 = 12. For the proportion, which you get by dividing the count by n, divide the mean and SD by n as well to get a mean 11

12 of 180/900 = 0.2 and an SD of 12/900 = (Or you can use the formulas for ˆp in the box on page 324, to get the same answers.) For (c), use the mean and SD you got in (b) along with the normal approximation, so z = ( )/ = 3 and the prob. is = You may not think that 24% is a very impressive performance, but see the discussion below. The last part has you working backwards, from the table to a z to a proportion. The probability to be looked up backwards in the table is = 0.99 (we want this well or better ), which goes with z = 2.33 (the closest value). Turn this back into a value by multiplying by the SD and adding the mean, to get (2.33)(0.0133) = ; that is, a subject must get about 23% or more successes, or 208 out of 900, to have evidence of ESP. (You might think that this is not much more than the 20% a subject could get by guessing, but with so many attempts (trials), it s very unlikely that someone could do this well by guessing alone.) Sanity-checking: the chance of at least 24% successes is , which is less than 0.01, the chance of at least 23% successes. In StatCrunch, without using the normal approx, 24% successes is 216 out of 290, and the chance of at least that is If you play with the number of successes, 209 gives you a probability of just under 0.01, and 208 just over 0.01, which agrees with the answers above You can use the normal approximation to the binomial for this one; we are dealing with proportions, so be sure to use the right formulas for the mean and SD. For Jodi, n = 100 and p = 0.88, so for the proportion she gets correct, the mean is p = 0.88, and the the SD is (0.88)(0.12)/100 = To get 85% or lower, her z is z = ( )/ = 0.92, giving a probability from the table of (b) is the same thing, but with n = 250, so mean is 0.88, SD (0.88)(0.12)/250 = z = ( )/ = 1.46, so probability is now With more questions, Jodi s proportion of correct answers should be closer to her p of 0.88, so the chance of her getting 85% or lower, which is less than she would expect, goes down compared with n = 100. To cut the SD for the proportion in half, n has to be multiplied by 2 2 = 4 (because of the n on the bottom of the formula). Try it by calculation or algebra if you don t believe me. You d need to 12

13 solve (0.88)(0.12) n = for n. That means that 400 questions would be needed (but there is the little matter of how long a 400-question exam would take!) This holds true for any p, including the p = 0.75 of Laura in part (d). To do that one by calculation, figure out Laura s SD for 100 questions, divide that in half, and put it equal to (0.75)(0.25)/n, solving for n For this question, the binomial distribution no longer applies because we no longer have a fixed number of trials: the number of rolls of the die is whatever it needs to be to get a 1. Thus for (a), 5 1 = 5 ; (b) is = 25 ; and for (c) the answers are and To get the first 1 being on the k-th roll, you need k 1 non-1 s followed by a 1, and this has probability ( 5 6 )k 1 ( 1 ). This distribution 6 for the number of rolls to get the 1st 1 is called a geometric distribution; see exercise 5.69 for more Y could be 1, 2, 3, 4 and so on (it could be very large because you might wait a long time for the first success). Using the same ideas as in 5.68, Y = 1 if you get a success on the first trial, which happens with probability p; Y = 2 if you get a failure followed by a success, which happens with probability (1 p)p, and Y = k in general if you get k 1 failures followed by a success in that order (because if you get your success any earlier, Y is not equal to k any more), which has probability (1 p) k 1 p. These probabilities form an infinite series, because Y could be as big as you like; all the infinite number of probabilities add up to 1, as they should (the rationale being that you can get the total as close to 1 as you like by taking enough probabilities and adding them up) The number of randomly-sampled vehicles carrying only one person has a binomial distribution with n equal to the sample size and p = In (a), n = 12, which is too small for a normal approximation, so the smart way is to use software. I got probability for 7 or more out of 12 cars carrying one person (or even = 0.245) is not in Table C, but is close to 0.25 so you can get an approximate answer from there. (On an exam, if we re expecting you to use Table C, we ll give you a p that you can look up exactly.) The probability is (about) 0.25 that a car will not be carrying exactly one person, and we want the probability that 5 or fewer cars will not be carrying exactly one person. So find n = 12, p = 0.25 and add up the probabilities from 0 to 5 inclusive. This gives , which is not far off the mark. Don t even think about a 13

14 normal approximation here, since n(1 p) 3, which is not nearly bigger than 10. In (b), n = 80 and the rules of thumb are fine, so use the normal approximation. The mean number of single-occupant cars is 80(0.755) = 60.4, and the SD is 80(0.755)(0.245) = 3.85, so for 41, z = ( )/3.85 = 5.04, so the probability is effectively 1 that 41 or more of the 80 vehicles will have just one person aboard. (My software gives me an exact answer of ) This is hardly surprising, since for a large sample, the proportion of single-occupant vehicles should be close to 0.755, and 41 out of 80 is nowhere near You may be observing a common structure in these questions by now: first a small n, inviting use of Table C, and then a larger n, requiring the normal approximation. In both cases, you can use StatCrunch to get exact answers, to check your work with Table C and to see how good the normal approx was. The number of plants having red blossoms in the sample has a binomial distribution with n = 12 and p = Table C reveals that the probability of exactly 9 plants having red blossoms is (since p is bigger than 0.5, the chance of a plant not having red blossoms is = 0.25, and we want 12 9 = 3 plants not to have red blossoms). Your software should give you the same answer (more easily). Note that even though 9 is the mean, the chance of getting exactly 9 red blossoms is quite small. The chance of getting something close to the mean, however, is pretty large. (StatCrunch s picture of the binomial distribution with n = 12 and p = 0.75 shows this pretty clearly.) When n = 120, the mean is np = 120(0.75) = 90. The variance (which you ll need for (c)) is np(1 p) = 120(0.75)(0.25) = 22.5, so the SD is 22.5 = Use the normal approximation with this mean and SD to find the probability of at least 80 red-blossomed plants: z = (80 90)/ = 2.11, so the probability is = (The exact answer is , so the normal approximation is very good.) 5.78 The machine doesn t know how strong a particular cap is, and presumably was designed to fasten the cap successfully using a torque much less than the torque that would break the cap. (If the machine was testing the cap strength by twisting the caps until they broke, which is known as testing to destruction, it would be a different matter, but it isn t.) A cap will break if the torque, call it X, is bigger than the cap strength, call it Y. We know that X has a normal distribution with mean 7.0 and 14

15 SD 0.9, while Y has a normal distribution with mean 10.1 and SD 1.2, and we want P (X > Y ). Inequalities with random variables on both sides are not fun to deal with, but we can subtract Y from both sides to get P (X Y > 0) as the probability we want to find. X Y is the difference of two normal random variables, so it has also a normal distribution with mean the difference of the two means, = 3.1, and variance the sum of the two variances, ie = 2.25, and therefore SD 2.25 = 1.5. Thus to find the chance that this difference is greater than zero, calculate z = (0 ( 3.1))/2.25 = 1.38, and get the answer as = from Table A. This is a small probability, but since the capping machine will be fastening thousands of caps onto bottles, we would like it to be quite a bit smaller This is an experiment, so that if you were to find that the mean score in your samples was higher for early-speaking students than it was for speaking-delayed students, and the difference was statistically significant, you would have evidence that it is the early speaking that makes a difference. Of course, in real life you wouldn t know the population means (or SDs), and you would have to use the methods of 7.2, but in this exercise we re doing a what-if: if the populations are as given, how likely are we to find our samples coming out the wrong way? Let s call x the sample mean score for the earlyspeaking group. x has a normal distribution with mean 32 and SD 6/ 25 = 1.2. Likewise, call ȳ the sample mean score for the delayed-speaking group. ȳ has a normal distribution with mean 29 and SD 5/ 25 = 1. We want P (ȳ x) = P (ȳ x 0); this difference also has a normal distribution with mean = 3 and variance = (This process is like 5.78). The chance that this difference is bigger than zero uses z = (0 ( 3))/ 2.44 = 1.92; the chance of a value bigger than this is = It is unlikely that the sample mean score for delayed speaking will be at least as large as for early speaking, but it is possible that it will be The mean of a sum of two random variables is the sum of the two means, so (a) is a reasonable thing to do. The same thing is true of variances only if the two random variables are independent. But here, if a husband s income is high, the wife s income might be high as well (they might have met in a professional school, say) whereas a low-income husband would be more likely to have a low-income wife as well. That would mean that husbands and wives incomes are (positively) correlated, which would rule out independence and also rule out calculation of the variance of the sum in this way. 15

16 5.83 The random variable Y is the sum of many (500) copies of random variables like X i. So work out (the mean and variance of) what happens at one step, and then scale up to get the mean and variance of what happens at 500 steps. Finally, by the central limit theorem, Y will have an approximate normal distribution. X i has mean µ = 1(0.6) 1(0.4) = 0.2 and variance (1 0.2) 2 (0.6) + ( 1 0.2) 2 (0.4) = Y is 500 independent copies of these X i all added up, so its mean is µ Y = 500(0.2) = 100 and its variance is 500(0.96) = 480. (Note carefully here: we are not taking a single X i and multiplying it by 500, for which the right thing to multiply the variance by would be ; we are taking 500 different X i and adding them together.) Then, to find the chance that Y is 200 or bigger, we calculate z = ( )/ 480 = 4.56, and the probability of being bigger than this is more or less zero. This kind of random walk is often used to describe stock prices, which are very unpredictable over the short term, but will have a noticeable upward trend over the long term. With Internet trading, the stock market is even more jittery over the short term, as it is now easy to buy or sell in volume (in response to a piece of news), but over the long term the trend of the market is slowly upward, almost regardless of which stocks you hold. If you still have the binomial distribution in your head, you can use that too: the particle can move either right or left, independently of previous moves, with the same probability of going each way each time. It will be 200 or more steps to the right if the number of successes (moves to the right) exceeds the number of failures (moves to the left) by 200 or more. A little experimentation shows that you need at least 350 successes, so that the number of failures is 150 or less. Then use the normal approximation to the binomial: the mean number of successes is 500(0.6) = 300 and the SD is 500(0.6)(0.4) = 10.95, so z = ( )/10.95 = 4.56, so the chance of more than 350 successes is pretty nearly 0. (The exact binomial gives ) 16

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution. Chapter 5 Solutions 51 (a) n = 1500 (the sample size) (b) The Yes count seems like the most reasonable choice, but either count is defensible (c) X = 525 (or X = 975) (d) ˆp = 525 1500 = 035 (or ˆp = 975