STA 6166 Fall 2007 Web-based Course 1 Notes 10: Probability Models We first saw the normal model as a useful model for the distribution of some quantitative variables. We ve also seen that if we make a random draw from a distribution that fits the normal model, then the random draw is a normal random variable. There are several other models for random phenomena that are so common that they have names. This chapter discusses three models for discrete random variables. They are all models for the results of independent Bernoulli trials. A Bernoulli trial is a random phenomenon where there are only two possible outcomes (generically called success and failure ). Independent Bernoulli trials are independent repetitions of the random phenomenon where the probability of success (called p) stays the same over all the trials. This is the model used for coin tosses. It can be used as a model for the outcome of many betting games where the two outcomes are win and lose. It can be used for many random phenomena where there are more than two possible outcomes, but where we are only concerned with whether some particular event happens or not. For example, roll a die and see whether or not you get an even number. Randomly choose an adult and determine if they are currently married or not, or intend to vote for Bush or not, or whether their annual income is above $30,000 or not. Random sampling from a population and observing a binary response variable is not precisely like Bernoulli trials. Even though the draws are independent, the probability of a success changes because random sampling is drawing without replacement. If I draw 10 cards without replacement from a deck of cards and observe whether or not each card is an Ace, these are not independent Bernoulli trials. However, if the population is large (like all adults in the U.S.) and the sample relatively small (less than 10% of the population), then the sample can be treated like independent Bernoulli trials without much loss. That is, the Bernoulli trials model will still be an acceptably good model (remember, models are never perfect representations of reality, anyway). The geometric model Suppose we have independent Bernoulli trials with probability of success p. One random variable we might be interested in is X = the number of the trial on which we first observe a success. In these experiments, we keep performing the trials until we get a success and then we stop. Example: Say we have a biased coin, such that the probability of heads is 0.1 instead of 0.5. If X is the number of coin tosses to the first head (head is a success ), what are the possible values of X? What is the probability that we get a head on the first throw? On the second throw? Not until the 10 th throw? Answer: The possible values of X are 1 to infinity. Infinity is very unlikely, but it is possible, theoretically. You all know the probability of getting heads on any throw is 0.5 for a fair coin, and so it s clear that the probability of getting heads on any throw with this biased coin is 0.1. To calculate the probability of requiring 2 throws to get a head, we need to use our rules of probability and a little logical thinking. If X=2, that means we failed to get a head on the first throw, but got a head on the second throw. Since the throws are independent, we know the probability of 2 throws is the product of the probability of the outcomes from the two throws.
2 In other words, the probability of tails on the first throw is 0.9 and the probability of heads on the second throw is 0.1, so the probability of X=2 is 0.9*0.1=0.09 Continuing for this example, X P(X = x) 1 0.1 2 0.1*0.9 3 0.1*0.1*0.9 4 (0.1) 3 0.9 10 (0.1) 9 0.9 n (0.1) n-1 0.9 The probability that we need to toss the coin 10 times to get our first head is (0.1) 9 0.9=9-10. Checkpoint 1: Say we are collecting cards of famous sports figures in cereal boxes, a success might be getting a Tiger Woods card and X is the number of boxes we buy until we get our first Tiger card. Since 20% of the boxes have Tiger cards, p =.2. We let q = 1 p =.8 denote the probability of a failure. What are the possible values of X? For the Tiger model, what s P(X = 1)? P(X = 2)? P(X = 10)? What s a general expression for P(X = x) in terms of p and q? The examples above follow a geometric model. The geometric model has one parameter: p. We denote the geometric model by Geom(p). So the number of boxes needed to get a Tiger card is modeled by a Geom(.2). The probabilities look like this (they keep going beyond x=20, but the probabilities keep getting smaller and smaller):
3 Geom(.2) Probability 0.0 0.05 0.10 0.15 1 2 3 4 5 6 7 8 9 1011121314151617181920 x The expected value of a random variable X with model Geom(p) is 2 3 1 E(X) = xp(x = x) = 1p + 2qp + 3q p + 4q p +... = p (See the text for a proof of this using geometric series.) The standard deviation isσ = q / p. Hence, the expected number of boxes we need to but to get a Tiger card is 1/.2 = 5 boxes. The standard deviation is 2. 8 /(.2) = 4.47. 2 Checkpoint 2: What s the probability we will have to buy at least 10 boxes to get a Tiger card? Hint: At least 10 means greater than 9. Use what you know about complements from the rules of probability to simplify.
4 Binomial Model Suppose again we have independent Bernoulli trials with constant probability of success p. However, suppose now that the number of trials n is fixed and that the random variable X is the number of successes in the n trials. Then X follows a Binomial model. The binomial model has two parameters: n and p. It s denoted by Binom(n,p). Checkpoint 3: What are the possible values of X? The binomial probability model is n x n x P( X = x) = p q, x = 0,1,2,..., n, where x n = x n! x!( n x)! and q = 1 p. The symbol n! is pronounced n factorial and is equal to the product of all the numbers from 1 to n. That is 5!=5*4*3*2*1. The mean is µ = E ( X ) = np and the standard deviation is σ = SD ( X ) = npq. Checkpoint 4: Suppose we buy 5 boxes of cereal. What s the expected number of Tigers? What s the standard deviation? What s the probability that we get exactly one Tiger? What s the probability that we get two or fewer Tigers?
5 Binom(5,.2) Probability 0.0 0.1 0.2 0.3 0.4 0 1 2 3 4 5 x Deriving the mean and standard deviation of a binomial random variable To derive the mean and standard deviation for the binomial model, start with a single Bernoulli trial with probability of success p. Let X be the number of successes on this single trial so X is either 1 or 0 and it s 1 with probability p and 0 with probability q=1-p. X is sometimes said to be a Bernoulli random variable. Checkpoint 5: Use the properties of expected value and variance to show that E(X) and Var(X) for a Bernoulli random variable. Remember, first list the possible outcomes and their probabilities. Then use the formulas for expected value and variance to calculate them, as you did for the roulette example in the last set of notes. Now, suppose we have n independent Bernoulli trials. The number of successes on the n trials is Y = X 1 + X 2 + + X n where each of the X s is a Bernoulli trial. So, using the properties of expected value and variance, E(Y) = Var(Y) = SD(Y) = Using the normal model to approximate the binomial model If n in the Binomial model is large, then the calculation of the binomial probabilities can get pretty difficult. What happens to n! as n gets large?
6 Example: Suppose I take a 100-question, multiple-choice test for which there are 4 choices on every problem. What s the probability I will get more than 30 correct if my choices are random? Calculating all the binomial probabilities would be tedious and difficult for this example. However, it turns out that for large n, the binomial probabilities are well approximated by a normal model. The mean and standard deviation of the normal model are the mean and standard deviation of the binomial distribution you re trying to approximate. Binom(100,.25) Probability 0.0 0.02 0.04 0.06 0.08 0 5 10 15 20 25 30 35 40 45 50 x For the Binom(100,.25) model, µ = 100(.25) = 25 and σ = 100(.25)(.75) = 4.33. Therefore, if the normal approximation is adequate we can use z-scores and µ and σ to calculate binomial probabilities. Example: Use the normal approximation to the binomial to find the probability X < 10. Answer: 10 25 15 = z = = 3.46. 4.33 4.33 From the Z table, we see that the probability of a N(0,1) random variable being less than -3.46 is 0.9997. Checkpoint 6: Use the normal approximation to calculate P (X > 30) The normal approximation is generally considered adequate if np and nq are both greater than or equal to 10. Check this condition for the checkpoint above.
7 The Poisson model In the binomial model, what if n is large, but p is so small that the normal approximation to the binomial cannot be used? This would be a nightmare to calculate using the binomial model. Another model, the Poisson, can be used as an approximation to the binomial model in this case. The Poisson is commonly used for events that occur at a low rate over a large amount of time or space. Poisson is pronounced pwahsahn. People will make fun of you if you say poison. Checkpoint 7: If there are 3 x 10 9 basepairs in the human genome and the mutation rate per generation per basepair is 10-9. In this example we see the 3 x 10 9 basepairs as trials where a mutation could occur or not. The mutation rate is the probability of a success where a success is the event that a mutation occurred. This is a classic binomial model. What is the probability that a baby will have no mutations? One mutation? We could use the binomial model for these outcomes. What assumption do we make if we do? What are n and p? Try using the binomial model to compute these probabilities. What does you calculator have to say about this? The Poisson model has one parameter, which is traditionally called λ, pronounced lambda. The probability function is P( X λ x e = x) =, x = 0, 1, 2, λ x! The expected value is E(X) = λ and the standard deviation is SD(X) = λ. To use the Poisson approximation to the binomial, use λ = np. Note in the mutation example that 3 x 10 9 *10-9 =3. Example: What is the probability of 0 mutations in the problem from checkpoint 7? Answer: x 3 0 e λ λ e λ 3 Pr{ = 0} = = = = 0.0498 X x! 0! e
Checkpoint 8: Calculate the probabilities of 1, 2, 3 mutations using the Poisson model. 8 The Poisson model turns out to be the theoretically correct model for events that occur randomly over time or space and we count the number of events in equal-sized intervals of time or space. Adjusting for Change in Units in the Poisson Model If we expect a drought in Florida every 5 years, how many droughts do we expect in 15? The intuitive answer is 3, and you intuition is right! The Poisson model is used for counts per unit time or space. If we change those units we simply change λ by the same conversion. So, if typos on a page follow a Poisson(3) model, typos on half a page follow a Poisson(1.5) model. The expected value of the first model is 3 and the variance is 3. For the second model, the expected value is 1.5 and the variance is 1.5. Checkpoint 9: If hummingbirds arrive at a flower at a rate of λ per hour, how many visits are expected in n hours of observation and what is the variance in this expectation? If significantly more variance is observed than expected, what might this tell you about hummingbird visits?
Checkpoint 10: (From Romano) If bacteria are spread across a plate at an average density of 5000 per square inch, what is the chance of seeing no bacteria in the viewing field of a microscope if this viewing field is 10-4 square inches? What, therefore, is the probability of seeing at least one cell? 9 Example: FLYING-BOMB HITS ON LONDON (576 cells with 537 hits)* # of hits 0 1 2 3 4 5 # of cells with # of hits above 229 211 93 35 7 1 Poisson Fit 226.7 211.4 98.5 30.6 7.1 1.6 The actual fit of the Poisson for the data is surprisingly good. It is interesting to note that most people believed in a tendency of the points of impact to cluster. If this were true there would be a higher frequency of areas with either no impacts or many impacts and a deficiency in the intermediate classes. The table indicates randomness and homogeneity of the area. We have here an instructive illustration of the fact that to the untrained eye randomness appears as regularity or tendency to cluster. Example: From Time magazine, Sept. 4, 1989 For players of Montana's Big Spin lottery game, lady luck went on vacation and never came back. In Big Spin, the state chooses three people each week from among thousands of losing lottery ticket holders and gives them a chance to win a $1 million jackpot by whirling the Big Spin wheel. But in 222 tries over the past 18 months, no one has hit the jackpot. Officials added a second $1 million slot to the 100-slot wheel, but to no avail. Without a big winner to hype, ticket sales have dropped from $23 million in fiscal 1988 to $11.2 million in 1989. Checkpoint 11: Is the big wheel fixed? Why or why not?