D Appendix F Hypergeometric Distribution A hypergeometric experiment is an experiment where a sample of n items is taen without replacement from a finite population of items, each of which is classified as a success or a failure. (If the sampling is done with replacement the experiment is binomial.) Let = number of successes and (-) is the number of failures in the population. Hypergeometric Random Variable The hypergeometric random variable is the number of successes in a hypergeometric experiment. A hypergeometric random variable is a discrete random variable that can tae on any one of the values 0, 1, 2,, n. The hypergeometric probability distribution can be derived using the multiplication, addition, and complement rules or more easily by applying a probability tree. The following was derived in this manner. Hypergeometric Probability Distribution The random variable X is defined as the number of successes in a sample of size n taen without replacement from a population consisting of successes and - failures. The probability distribution of X is P(X = x) = P(x) = x n x n, for x = 0, 1, 2,, n The denominator n is the number of possible samples of n items that can be drawn from a population of items. The numerator is the number of samples of n items that contain exactly x successes and (n-x) failures; the x successes can be chosen from successes in the population and the (n-x) failures can be chosen from the (-) failures in the population. We ll demonstrate its use by reviewing Example 6.5 (in the boo).
Example 6.5 A graduate statistics course has seven male and three female students. The professor wants to select two students at random to help her conduct a research project. What is the probability that the two students chosen are female? Solution In the boo we used the multiplication law (and we could have used a probability tree) to solve this problem. However, it can be solved using the hypergeometric probability distribution. In this example we have a total of = 10 students (items) of which = 3 are female (successes) and (- ) = 7 are male (failures). The experiment consists of selecting n = 2 items and we want to compute the probability that X = 2. The general form of the hypergeometric distribution is x n x n P(X = x) = p(x) = x = 0, 1,, n We can solve the example in the following way. 3 10 3 22 2 10 2 P(X = 2) = P(2) = = 3! (2!)(1!) 7! (0!)(7!) 10! (2!)(8!) = (3)(1) 45 =.067 Using the omputer Excel ISTRUTIOS f x 1. Activate any empty cell. lic and select the category Statistical, and the function HYPERGEOMDIST. 2. Type the value of x in the Sample_s box, the value of n in the umber_sample box, the number of successes, in the Population_s box, and the population size in the umber_pop box. The probability appears on the right side of the dialog box. licing OK will print the probability in the active cell.
Alternatively, to calculate the probability of an individual value of X, type the following into any active cell. =HYPERGEOMDIST([x], [n], [], []) Minitab ISTRUTIOS 1. lic alc, Probability Distributions, and Hypergeometric. 2. Select either Probability or umulative probability. 3. Specify the,, and n in the Population size (): box, the Successes in population (M): box, and Sample size(n): box, respectively. 4. If you wish to mae a probability statement about one value of x, specify Input constant and type the value of x. 5. If you wish to mae probability statements about several values of x from the same binomial distribution, type the values of x into a column before clicing alc. At step 4 specify Input column and type the name of the column. Mean, Variance, and Standard Distribution of the Hypergeometric Distribution Using the definitions of the expected value and variance and a little arithmetic we produce the parameters of the distribution. Mean, Variance, and Standard Deviation μ = E (X) = n σ 2 = Var(X) = n 1 n 1 σ = n 1 n 1
Relationship Between the Binomial and Hypergeometric Distributions Both the binomial and hypergeometric distributions are used to calculate probabilities in experiments where there are n trials, 2 outcomes per trial and the random variable is defined as the total number of successes in the n trials. The critical difference is that the trials are not independent and the probability of success in each trial is not constant. However, under certain circumstances the probabilities that each density function produces are similar. To show the relationship between the two distributions suppose that in an experiment is large compared to n. For example, suppose that we draw (without replacement) two cards out of 100 decs of cards (5200 cards) and we see the probability of drawing two clubs. We can use the hypergeometric distribution to obtain the exact probability or we can use the binomial distribution to obtain a very good approximation. The exact probability is 1300 1299 P(X = 2) = =. 06246 5200 5199 The binomial approximation is 1300 1300 P(X = 2) =. 0625 5200 5200 Example F.1 Use the hypergeometric and binomial distributions to determine the probability of drawing 3 clubs in 5 cards without replacement from 5200 cards (1300 clubs and 3900 nonclubs) Solution Using the hypergeometric distribution we have x = 3, n = 5, = 1,300, and = 5,200. Thus, 1300 5200 1300 3 5 3 5200 5 P(X = 3) = P(3) = = (1300)(1299)(1298)(3900)(3899) (3)(2)(1)(2)(1) (5200)(5199)(5198)(5197)(5196) (5)(4)(3)(2)(1) =.087834
We can use the binomial distribution to approximate this probability where p = / = 1300/5200 =.25. P(X = 3) = P(3) = (5)(4)(3)(2)(1) (2)(1)(3)(2)(1) 5 3 5 3 3 2 3 p (1 p) = (.25) (1.25) =.087891 As you can see the two probabilities are almost identical. We won t show how the formula for the hypergeometric density function can be approximated by the binomial density function when is large relative to n. However, you can easily compare parameters. Using the experiment described in Example F.1 we have the mean and variance of the hypergeometric distribution. 1300 μ = E (X) = n = 5 = 5(.25) = 1. 25 5200 σ 2 = Var(X) = n 1 n = n 1 1 n 1300 1300 5200 5 = 5 1 1 5200 5200 5200 1 5195 = 5 (.25)(1.25) = (.9375)(.999231) =.936779 5199 The mean and variance for the related binomial distribution is μ = E (X) = np = 5(.25) = 1.25 σ 2 = Var(X) = np(1 p) = 5(.25)(1.25) =.9375 otice that the means are identical and the only difference between the variances is that we multiply the variance of the binomial distribution by ( ( n) /( 1) to compute the variance of the hypergeometric distribution. The quantity n 1 is called the finite population correction factor (FPF). The name derives from an analysis of the difference between the experiments that produce the binomial and hypergeometric random variables. The essential difference is that we can draw an infinite number of items from a binomial population whereas the hypergeometric population is finite (because we draw items without replacement). otice however, that
when is large relative to n the finite population correction factor is approximately 1. In Example F.1 the FPF =.999231. In hapter 12 we use the finite population correction factor to estimate the mean and a proportion when the population is small relative to the sample size. EXERISES F.1 Find the probability that you deal a hand of 4 spades (and another card) by drawing 5 cards from a well-shuffled dec. F.2 A shipment of 100 computer monitors contains 12 that are defective. Find the probability that 4 defective monitors are found in the first 20 monitors checed. F.3 Refer to Exercise F.2. alculate the mean and variance of the hypergeometric distribution. F.4 An urn contains 5 red and 7 white marbles. An experiment is conducted wherein 3 marbles are selected without replacement. Find the following probabilities. a. 2 red marbles and 1 white marble b. 1 red marble and 2 white marbles. c. 3 white marbles. F.5 An urn contains 500 red marbles and 700 white ones. a. Use the hypergeometric distribution to find the probability of selecting 4 white marbles out of 5 selected without replacement. b. Approximate the probability in part a using the binomial distribution.
F.6 Fifty students are registered in an MBA statistics course, of which 10 are majoring in finance. If 4 students are selected at random to participate in a class demonstration, what is the probability that 2 of those students are finance majors? F.7 A statistics professor puts four $5 dollar bills and six $10 bills in an envelope. He tells his wife that she may draw 3 bills blindfolded from the envelope. What is the probability that the professor will be $30 poorer after the experiment? F.8 A stoc index consists of 30 stocs. On a given day 25 increased in value. A professor selects 4 stocs at random. What is the probability that 3 of them increased in value? F9. A stoc index consists of 500 stocs. On a given day 400 increased in value. A professor selects 4 stocs at random. a Use the hypergeometric distribution to determine the probability that 3 of them increased in value? b Repeat part a using the binomial distribution F10 A statistics textboo has 1200 exercises whose answers are provided in an appendix. Suppose that there are 100 incorrect answers. A professor puts 5 exercises on an assignment. What is the probability that two of the exercises have wrong answers in the appendix?