Probability and Statistics for Engineers Chapter 4 Probability Distributions ruochen Liu ruochenliu@xidian.edu.cn Institute of Intelligent Information Processing, Xidian University
Outline Random variables The binomial The hypergeometric The mean and the variance of a probability The Poisson approximation to the binomial Poisson processes The geometric The multinomial Chebyshev s Theorem Simulation
Random variables In most statistical problems we are concerned with one number or a few numbers that are associated with the outcomes of experiments. they are values of random variables. In the study of random variables, we are usually interested in their probability s, namely, in the probabilities with which they take on the various values in their range. Section 4.1 introduce random variables and probability. Some special probability s are followed.
Outline Random variables The binomial The hypergeometric The mean and the variance of a probability The Poisson approximation to the binomial Poisson processes The geometric The multinomial Chebyshev s Theorem Simulation
Random Variables E 1 E 2 E 3 Fig.4.1 P 1 P 2 P 1 P 2 P 1 P 2 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 C 1 C 2 C 3 Probability 0.07 0.13 0.06 0.05 0.07 0.02 0.07 0.14 0.07 0.08 0.11 0.03 0.02 0.03 0.01 0.01 0.02 0.01 Number of preferred ratings 1 1 2 2 2 3 0 0 1 1 1 2 0 0 1 1 1 2 Let us refer to the lawn-mower-rating example on page 58. E1(easy to operate), P2 (inexpensive), and C3 (low average cost of repairs) as preferred ratings. (the number of preferred rating) The numbers 0,1,2 and 3 are values of a random variables: the number of preferred ratings. x 0 Probability 1 0.26 0.50 2 0.22 3 0.02 The random variables are functions defined over the elements of a sample space
Random variables A random variable is any function that assigns a numerical value to each possible outcome. Random variables are denoted by capital letters X, Y and so on, to distinguish them from their possible values given in lower case x, y and so on. Classification of random variables Discrete random variables, which can take on only a finite number, or a countable infinity of values Continuous random variables, which take on an infinite number of values
Probability x 0 1 Probability 0.26 0.50 2 0.22 3 0.02 The table displays another function, called the probability of the random variable. probability is the function which assigns probability to each possible outcome x that is called the probability : f x = P X = x To denote the values of a probability, use such symbols as f (x), g (x), h (z). Probability can be expressed by means of Equation ( ) ( ) Table Exhibit the correspondence between the value of the random variable and its probability ( rolling a balanced die. ) f ( x) = 1/ 6 for x = 1, 2,3, 4,5,6
Probability The probability of a discrete random variable X is a list of the possible values of X together with their probabilities ( ) = [ = ] f x P X x The probability always satisfies the conditions ( ) ( ) f x 0 and f x = 1 all Of course, not every function defined for the values of a random variable can serve as a probability, there has some demands for it. x
Probability Example: Checking for non-negativity and total probability equals one Check whether the following can serve as probability s x 2 ( a) f( x) = for x = 1,2,3,4 2 2 x ( b) h( x) = for x = 0,1,2,3,4,5 25 Solution. (a) this function cannot serve as a probability because f(1) is negative. (b) This function also can not serve as a probability because the sum of the five probabilities is 6/5 which is larger than 1.
Probability It is often helpful to visualize probability s by means of graphs x 0 Probability Probability histogram: the areas of the rectangles are equal to the corresponding probabilities so their heights are proportional to the probabilities. The bases touch so that there are no gaps between rectangles representing the successive values of the random variable. Probability bar chart: the heights of the rectangles are also proportional to the corresponding probabilities, but they are narrow and their width is of no significance. 1 0.26 0.50 2 0.22 3 0.02 f ( x) 0.5 0.4 0.3 0.2 0.1 f ( x) 0.5 0.4 0.3 0.2 0.1 Fig.4.2 0 1 2 3 Probability histogram 0 1 2 3 Probability bar chart x Fig.4.2 x
Distribution function As we see later, there are many problems in which we are interested not only in the probability f(x) that the value of a random variable is x, but also in the probability F(x) The probability F(x) that the value of a random variable is less than or equal to x. F(x) is called the cumulative function or just the function of the random variable. x 0 1 2 3 x 0 1 2 3 Probability 0.26 0.50 0.22 0.02 F(x) 0.26 0.76 0.98 1
Outline Random variables The binomial The hypergeometric The mean and the variance of a probability The Poisson approximation to the binomial Poisson processes The geometric The multinomial Chebyshev s Theorem Simulation
The binomial Many statistical problems deal with the situations referred to as repeated trials Repeated trials Probability that 1 of 5 rivets will rupture in a tensile test Probability that 9 of 10 DVD players will run at least 1000 hours the probability that 45 of 300 drivers stopped at a roadblock will be wearing seat belts. we are interested in the Probability of getting x successes in n trials, or x successes and n-x failures in n attempts
Bernoulli trials The assumptions of Bernoulli trials 1. There are only two possible outcomes for each trial 2. The probability of success is the same for each trial 3. The outcomes from different trials are independent There are a fixed number n of Bernoulli trials conducted
Examples Binomial probability Over a period of a few months, an engineer found that her computer would often hang up while she was doing Internet searches. She postulates that the probability is 0.1 that any half-hour search session will require at least one reboot of the computer. Next week she will perform 3 half-hour searches, each on a different day. (a) List all possible outcomes for the 3 searches in terms of success S, no hang up, and Failure F, at least one hang up, during each session. (b) Find the probability of the number of successes, X, among the 3 searches. solution: (a) SSS SSF SFS SFF FSS FSF FFS FFF (SSS means all thee searches are successful. SSF means the first and second search are successful and the third is a failure). (b) 3 x ( ) ( ) ( 0.9) ( 0.1 ) 3 x f x = P X = x =, for x= 0,1, 2,3 x
Binomial Let X be the random variable that equals the number of successes in n trials. If p and 1-p are the probabilities of success and failure on any one trial, then probability of getting x successes and n-x failures, in some special order is x n x p 1 p Binomial ( ) n x n x b( x; n, p) = p ( 1 p) x= 0,1,2,, n x ( p) binomial expansion p + 1 n binomial coefficients n x
Binomial Example: Evaluating binomial probabilities It has been claimed that in 60% of all solar-heat installations the utility bill is reduced by at least one third. Accordingly, what are the probabilities that the utility bill will be reduced by at least one third in (a) four of five installations; (b) at least four of five installations. 5 4 5 4 b( 4;5,0.6) = 0.6 (1 0.6) = 0.259 4 5 5 5 5 b( 5;5,0.6) = 0.6 (1 0.6) = 0.078 5 And the answer is the sum of the two terms. We have 0.259+0.078=0.337
Binomial Table 1 gives the values of (at this end of book) ( ) ( ) ( ) bxnp ;, = Bxnp ;, Bx 1; np, and B( 1) = 0 Example ( ) ( ) x B x; n, p = b k; n, p, for x= 0,1,2,, n Relationship k = 0 If the probability is 0.05 that a certain wide-flange column will fail under a given axial load, what are the probabilities that among 16 such columns (a) at most two will fail; (b) at least four will fail? Solution (a) Table 1 shows that B(2;16,0.05)=0.9571 (b) Since 1- B(4;16,0.05)= 1-0.9930=0.007
Binomial Example If the probability is 0.2 that any one person will dislike the taste of a new toothpaste, what is the probability that 5 of 18 randomly selected persons will dislike it? b 18 5 4 18 5 ( 5;18, 0.2) = 0.2 (1 0.2) b 5;18,0.2 = B 5;18,0.2 B 4;18,0.2 ( ) ( ) ( ) = 0.8671 0.7164 = 0.1507
Example: A binomial probability to guide decision making A manufacturer of fax machines claims that only 10% of his machines requires repairs within the warranty period of 12 months. If 5 of 20 of this machines required repairs within the first year, does this tend to support or refute the claim. Solution: let us find the probability that 5 of 20 of the fax machines will require repairs within a year when the probability that any one will require repairs within a year is 0.10. b(5;20,0.1)=b(5;20,0.1)-b(4;20,0.1)=0.9887-0.9568=0.0324 This probability is very small, The event that 5 of 20 of his machines required repairs occurs, this shows that this event should have a large probability. it would seem to reject the fax machine manufacturer s claim.
Shape of binomial s Important information about the shape of binomial s can be shown in probability histograms since so Symmetrical (p=0.5) n b( x; n,0.5) = 0.5 n x n n =, n x x ( ;,0.5) ( ;,0.5) b( x;5,0.5) b x n = b n x n 0 1 2 3 4 5 x 10 32 5 32 1 32 Fig.4.3
Shape of binomial s A long-tailed or Skewed (p 0.5) P is less than 0.5 P is greater than 0.5 b( x;5,0.2) b( x;5,0.8) 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 Positively skewed Fig.4.4 x 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 Negatively skewed A probability that has a probability histogram like either of those in Figure 4.4 is said to be a long-tailed or skewed. It is said to be a positively skewed if the tail is on the right, and it is said to be negatively skewed if the tail is on the left. x
Outline Random variables The binomial The hypergeometric The mean and the variance of a probability The Poisson approximation to the binomial Poisson processes The geometric The multinomial Chebyshev s Theorem Simulation
The hypergeometric Suppose that we are interested in the number of defectives in a sample of n units drawn without replacement from a lost containing N units, of which a are defective. Let the sample be drawn in such a way that at each successive drawing, whatever units are left in the lot have the same chance of being selected. the first drawing will yield a defective unite is a/n, but for the second drawing it is (a-1)/(n-1) or a/(n-1), depending on whether or not the first unit drawn was defective. Thus the trials are not independent, the binomial does not apply. Note that the binomial would apply if we do sampling with replacement
Hypergeometric Consider all the possibilities as equally likely, it follows that for sampling without replacement the probability of getting x successes (defective) in n trials is (of which, a are defective) a N a x n x h( x; n, a, N) = for x= 0,1, L, a N n The x successes (defectives) can be chosen in : ways a x N a n x a N a x n x N n the n-x failures (nondefectives) can be chosen in : ways x successes and n-x failures can be chosen in: ways. n objects can be chosen from a set of N objects in :
Example A shipment of 20 digital voice recorders contains 5 that are defective. If 10 of them are randomly chosen for inspection, what is the probability that 2 of the 10 will be defective. Solution, substituting N=20, a=5, x=2, and n=10 into the formula for the hypergeometric, 5 20 5 2 10 2 h( 2;10,5,20 ) = = 0.348 20 10 However, when n is small compared to N, less than N /10, the composition of the lot is not seriously affected by drawing the sample without replacement, and the binomial with the parameters n and p=a / N will yield a good approximation.
Example A numerical comparison of the hypergeometric and binomial s (a) (b) Repeat the preceding example for a lot of 100 digital voice recorders, of which 25 are defective, by using The formula for the hypergeometric The formula for the binomial as an approximation what is the probability that 2 of the 10 will be defective Solution (a) here x=2, a=25, n=10, and N=100 h 25 100 25 25 75 2 10 2 2 8 2;10,25,100 = = = 0.292 100 100 10 10 ( ) (b) Here p=25/100, x=2 and n=10 b 10 2 10 2 2;10,0.25 = 0.25 1 0.25 = 0.282 2 ( ) ( ) ( ) Observe that the difference between the two values is only 0.01
hypergeometric and binomial s Observe that the difference between the two values is only 0.01 In general, it can be shown that h(x;n,a,n) approaches b(x;n,p) with p=a/n when N approaches infinite, the binomial is used as an approximation to the hypergeometric if n<=n/10. because we can use Table 1 to check the probability of events.
Next lecture and Homework In the next lecture, the chapter 4 (4.4-4.5) in the textbook will be discussed. Please read the them. Page112 4.3, 4.5, 4.6, 4.12, 4.20, 4.25,