Chapter 5 Discrete Probability Distributions Goal: To become familiar with how to use Excel 2007/2010 for binomial distributions. Instructions: Open Excel and click on the Stat button in the Quick Access Bar. Scroll down until you see BINOM.DIST. (It might be spelt slightly different in Excel 2007). Select that tool. Here is what you should see: Try finding the probability of 5 or fewer successes when there were 24 trials and the probability of success on any one trial is 0.5. Fill out the tool as follows: Number_s: 5 Trials: 24 Probability_s: 0.5 Cumulative: true Midway down the tool screen on the right, you ll see the answer. It should read 0.003305376. Try it. 15
Goal: To become familiar with how to use Excel 2007/2010 for Poisson distributions. Instructions: Open Excel and click on the Stat button in the Quick Access Bar. Scroll down until you see POISSON. (It might be spelt slightly different in Excel 2007). Select that tool. Here is what you should see: Try finding the probability of 7 arrivals during some minute when the average number of arrivals is 3.5 per minute. Fill out the tool as follows: X: 7 Mean: 3.5 Cumulative: false You should see a probability of 0.038549. Try it. 16
Chapter 5 Goal: Discrete Probability Distributions To become familiar with discrete probability distributions and specifically, the Binomial Distribution and the Poisson Distribution. Reading: Triola, Chapter 5, Sections 5.2 5.3 A stochastic process is any process that generates values in a random fashion, each of which has a probability associated with it. For example, rolling dice is a stochastic process generating numbers, 2 through 12, in a random way, and the probability of throwing a 7 for example, is 1/6. Taking a poll is a stochastic process because you cannot predict how a person is going to answer other than yes or no if those are your only two choices A random variable is a number that results from a stochastic process and hence has a probability associated with it. For example, if we re rolling dice, then the random variable is the sum showing on the dice, a number between 2 and 12. If we use to represent the random number, then is a specific event. We typically use capital letters like A or B. Therefore if A is the event that we roll a seven all of the following are equivalent: ( ) ( ) In the case of polling, if we code a yes answer as the number 1 and a no answer as a 0, then X can take on one of two values, 0 or 1, and we can then ask such questions as what is P(X=7) equal to. A probability distribution is a set of all possible outcomes from some stochastic process, showing or describing each random variable generated by the process along with its associated probability. For example, the following table shows all possible outcomes of rolling dice and the probability of each outcome, and hence is a probability distribution: Notice the similarity between a probability distribution and a frequency distribution. How can you turn any frequency distribution into a probability distribution? 17
Each value of X is a possible outcome of rolling the dice. Note that the sum of all the probabilities is 1.0, as well it should be since one of those rolls has to occur. This is an important property of a probability distribution. The probabilities must always sum to 1.0 or otherwise it s not a probability distribution. A discrete probability distribution is one that results from a stochastic process, where the random variable is a discrete number. We will study continuous probability distributions in a later chapter. Rolling dice is a good example of a process that generates a discrete probability distribution. Binomial Distribution A commonly encountered discrete probability distribution is the binomial probability distribution. The following process will generate one: 1. The process has a fixed number of trials. For example, let s say we roll the dice 50 times. 2. The trials must be independent. Each roll is independent of the other rolls; no roll depends on a previous roll regardless of what the spectators are telling you at the casino. 3. Each trial must have only one of two outcomes. Here, our dice example deviates, because anyone of eleven numbers can come up. However, if we change things just slightly, we can make it fit. If we define a win or a success as when the number 7 comes up, and everything else as a loss or failure, then we have only two possible outcomes, success or failure. 4. The probability of a win or a success remains the same throughout all the trials. In our case, given the we defined a success as rolling a seven, the probability of a success is for each roll (the probability of a failure or loss would then be ). There are tables for finding the probabilities of different events given that we are working with a binomial distribution. However, we are going to use Excel. For example, let s say that we roll the dice 10 times. What is the probability that we will roll exactly 4 sevens? You run the BINOM.DIST tool, found on the Statistics function menu, and fill it out as shown below. Number_s is the number of successes you re testing for, in this case 4. Trials is the total number of times you roll the dice. Probability_s is the probability of rolling a seven on any one roll. Cumulative is set to false if we want exactly 4 times. As you can see in the middle of the window, the probability of rolling a 7 exactly four times out of ten is 0.0543 (rounded to four decimal places). 18
Now suppose that we wanted to know the probability of rolling a 7 no more than four times. This means that in addition to rolling a 7 four times, we also include the case of rolling a 7 three times, or two times, or one time or no time. This is what we mean by no more than aka, less than or equal to. The only change in the use of the BINOMDIST tool is that we now enter true for Cumulative: We now see that the probability of rolling a 7 no more than four times has jumped to 0.9845. Finally, how would we find the probability of rolling a 7 at least four times. This means that we would count rolling a 7 four times, five times, six, seven, eight, nine, or ten times. Note, it s or and not and. Take a moment here and think about the differences and similarities between, no more than four, at least four, four or less, four or more. The problem we encounter is that the tool is designed to give us only the case where we are asking for no more than a certain number. Hence, we have to use the complement of the event, and then subtract that result from 1.0. The complement of at least four times is no more than three times. Think about that for a while. We use the tool to find the probability of rolling a 7 no more than three times (Number_s will be 3). The probability is 0.930. Therefore, the probability of rolling a 7 at least four times is ( ) ( ) Here s another example of how to the binomial distribution is used. Let s say that Kim felt she was highly qualified for a job she applied for but didn t get it, and that she suspected the company of gender discrimination. After a little research, she found that out of the last 20 new employees hired, only three were women. Furthermore, the applicant pool was very large and had an equal number of qualified men and women in it. If there was no hiring bias, you would expect that each person had a 50-50 chance of getting hired or a probability of 0.5. However, Kim found that only 15% of the new employees were women. Now, 15% is a lot smaller than 50%, so how likely is it that only 15% hired were women if we assume that there is no gender bias? To answer this question, we have to find the probability that only three women or fewer were hired purely by chance. firm grasp of what we re doing here. Read this last sentence again until you have a 19
We use the binomial distribution to find the above mentioned probability. The number of successes in this case is 3, the number of trials is 20, the total number of new employees, and the probability of getting hired if chance alone was at work would be 0.5. Finally, we want to know, given this scenario, what is the probability that three or less women would be hired purely by chance: As you can see, the probability is 0.001. Statisticians have agreed that any event that has a probability of less than 0.05 of occurring is a highly unlikely event. This is known as the Rare Event Rule. If given a set of assumptions, the probability of an event occurring purely by chance is less than 0.05, if the event actually does occur, we can assume that the given set of assumptions was most likely incorrect. In this example, the given assumption was that there was no bias, and hence every candidate had a 50-50 chance of being hired. However, under that assumption, the probability of no more than 3 women being hired is 0.001, quite a bit less than 0.05. Therefore, we can conclude that it is most likely that the original assumption was incorrect, i.e. it is far more likely that a hiring bias based on gender did exist. This is an important and powerful application of the binomial distribution. Please reread this last example until you understand it. Poisson Distribution This distribution is less common than the Binomial Distribution, but it has important applications. One such application is predicting how many visits to a website will be occurring at the same time. Whenever you have events occurring in a random fashion and filling some bucket you will have a Poisson Distribution. Here are some examples: 20
1. People queuing at a cash register. Let s say that in a certain supermarket, people arrive at the cash register at an average rate of one every two minutes during their busy time. If the average time to service a customer is two minutes, then on average, there should never be anyone waiting in line. However, arrivals are a random event. If we consider the two minute window for servicing a customer as our bucket, we can ask, what is the probability that we ll end up with people waiting one minute, two minutes, three minutes, etc. 2. Let s say that a monkey is throwing darts at a board. The board has been evenly divided into 100 squares. The monkey is given 50 darts to throw, and let s further assume that where the dart lands on the board is completely random. In other words, the dart hits are uniformly distributed over the board. With this information we can calculate the probability that any one square will be hit with more than one dart. In this case, the bucket are the squares. The dart hitting the squares are the random event filling the bucket. Here are the requirements for using the Poisson Distribution. 1. The occurrences must be random. 2. The occurrences must be independent of each other. 3. The occurrences must be uniformly distributed over the buckets. Here s an example of how the Poisson Distribution would be used. When using a pellet fertilizer spread by a broadcast method, we would like an even distribution of fertilizer. Too little and growth would be stunted, too much, and it will cause a burn. Let s say that on average, 100 pellets fall on a square meter. What is the probability that 80 pellets or less fall on some square meter? The average number of hits per square meter is 100. To find the probability that at most 80 pellets fall on some square meter we use the Excel tool, POISSON.DIST: The chances would be 0.023 or 2.3%. 21