Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao The binomial: mean and variance Recall that the number of successes out of n, denoted by S n is a random variable taking values in {0,1,..., n} (eg. S 4 is the number of successes out of 4 and has the outcomes {0,1,2, 3,4}). S n has all the properties of a random variable, we can associate a probability to each outcome (the binomial distribution) and it has a probability plot. Since it has a probability plot, it must have a center and a spread, therefore it has a mean and a variance. The mean of a binomial is n π. The variance of a binomial is n π (1 π). 1 Example 1 Example 2 Suppose we make 4 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.4 and the probability of a failure is P(X = 0) = 0.6. We are interested in the number of successes out of 4, this is the random variable S 4. Using the arguments we gave earlier we can show that: P(S 4 = 0) = (0.6) 4, P(S 4 = 1) = 4 (0.6) 3 (0.4) P(S 4 = 2) = 6 (0.6) 2 (0.4) 2, P(S 4 = 3) = 4 (0.6) 1 (0.4) 3 P(S 4 = 4) = (0.4) 4 Hence we can plot the histogram, which has a center and a spread. The mean of S 4 is 4 0.4 = 1.6 and the variance is 4 0.4 0.6 = 0.96. Suppose we make 50 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.5 and the probability of a failure is P(X = 0) = 0.5. We are interested in S 50 (number of successes out of 50). The average number of successes is 50 0.5 = 25. Because S 50 is a random variable, it has a histgram (distribution) and thus a variance (measure of spread). Its variance is 50 0.5 0.5 = 12.5. This measures how spread out the distribution is from the mean. Make a rough sketch of the histogram. We see that it is symmetric about 25. 2 3
Example 3 Suppose we make 50 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.8 and the probability of a failure is P(X = 0) = 0.2. We are interested in S 50 (number of successes out of 50). The average number of successes is 50 0.8 = 40. Its variance is 50 0.8 0.2 = 8. Observe that the variance is less than the previous example (same number of people, just different probabilities). This the distribution is concentrated about the mean of 40. Make a rough sketch of the histogram. The distribution is not symmetric but it is close to symmetric locally about 40. Observations on the binomial distribution We showed if P(X = 1) = 0.8 and P(X = 0) = 0.2, then for n = 4 the mean is 4 0.8 (the variance is 4 0.2 0.8) and the histogram is right skewed (leaning towards the right). This means we were more likely to observe large values of S n (in terms of surveys this means a lot of people say yes). On the other hand if P(X = 1) = 0.2 and P(X = 0) = 0.8, then for n = 4, the mean is 4 0.2 (the variance is 4 0.2 0.8) and the histogram is left skewed (leaning towards the left). This meant we were more likely to observe small values of S n (in terms of surveys this means a lot of people xsay no). If P(X = 1) = P(X = 0) = 1/2, then for n = 4, the mean is 4 0.5 = 2 and we are most likely to observe in the middle of the interval [0,4]. This time the histogram is symmetric (about 2). 4 5 Now suppose the number of people we sample increases (we go from n = 4 to n = 100). The above observations hold true, but what we observe is that around the peak of the histogram there is a symmetry (regardless of whether overall there is symmetry or not). In other words, regardless of the overall skew, about the peak its close to symmetric and, as we shall demonstrate, is almost normal (as in the distribution). Approximations of the binomial distribution Suppose that the number of trials, n, is quite large and we want to evaluate the probability that 20 or less people out of 100 prefered apple juice to orange. This means calculating the probability P(S 100 20) = P(S 100 = 0) + P(S 100 = 1) +... + P(S 100 = 20). Calculating this is cumbersome! We would like to have a quick and dirty way of calculating this probability. Look at the handout approximation binomial lecture7.pdf to see what happens if n (number of trials) is large and the probabilities π and 1 π are not too small. We see that if n is quite large, π and 1 π are not too small, the distribution of S n looks rather like a bell shape. 6 7
But you cannot say that the distribution curves get closer and closer, because as n grows the mean of S n gets larger (recall that the mean is n/2) and the variance also grows (recall the variance is n/4). So the distribution curve keeps moving to the right (because the mean is moving to the right) and because the variance is getting larger (notice that the range of S n is [0, n]) the distribution is getting more and more stretched (look at the plots). To stop the distributions from shifting getting stretched, we transform the x-axis (normalise) but keep the probabilities almost the same as before, actually they need to be multiplied by the standard deviation n/4 (see the example of Y4 in the handout). Subtracting the mean means shifting the range from [0, n] (centered at n/2) to [ n/2,n/2] (centered at 0). Dividing by the standard deviation means squashing the range from [ n/2,n/2] to [ n, n]. Indeed you will see that most of Y n s will lie in the interval [ 3,3]. This leads to the normalisation Y n = Sn n/2 n/4 approximation binomial lecture7.pdf. and is discusssed in In the general case that the success and failure probabilities are not the same and the probability of a success is π, we have the normalisation: Y n = Sn nπ. We normalise (or standardise) the distribution by subtracting the mean from S n (this centers it about zero) and squashing it (stopping it spreading out) by dividing by the standard deviation. 8 9 The distribution of Y n = S n nπ Suppose we plot the distribution of Y n = is Y n = S n nπ (that S n nπ against the probabilities like the plots in approximation binomial lecture7.pdf). deviation and plot this value against Y n, but don t worry too much about this. What we see is: When n is large, the plots have a very distinctive bell shape. (i) It is centered about zero and about 68% of the Y n s lie in the interval [ 1,1]. (ii) It closely approximates the standard normal distribution (which we define below). Aside: In the plots you need to multiple the probabilities by the standard 10 11
Aside: convergence of the distributions The normal distribution What is convergence: Suppose I walk one mile on day one, 1/2 a mile on day two (in total I have walked 1.5 miles), 1/4 mile on day three (in total I have walked 1.75 miles), 1/8 mile on day four (in total I have walked 1.875 miles), 1/16 mile on day five (in total I have walked 1.9735 miles) etc. As the days pass the total distance travelled does not change much and it gets closer and closer to two. We see that the total distance travelled converges to two. The same idea is true for the plots of Y n = Sn nπ against the probabilities P(Y n ). As n gets large the density plots do not change very much and in its limit converge to the normal distribution. We often find that the distribution of random variables that arise in nature have a distinctive shape. This distinctive shape of bell shape curve is called a normal distribution. The arises all over the place: The distribution of bullets when fired at a target. The outcomes of social surveys. The normal distribution is a family of densities which are different but have certain characteristics in common. The normal distribution (sometimes called the Gaussian) is the most commonly used distribution in statistics. 12 13 The normal distribution (cont.) The standard normal - page 1090 of Longnecker and Ott It is completely defined by two parameters, the mean and variance. The mean µ. The variance σ 2. Formally the density function of the normal distribution looks like: ( ) 1 y = f(x) = exp (x µ)2 2πσ 2 σ 2 (you don t have to remember this!) This is a symmetric curve which is centered about µ and with spread σ. See handout: normal distribution introduction.pdf. The normal tables give the probabilities P(Z < z) in the special case Z N(0,1) (the so called standard normal): mean is zero (µ = 0) variance is one σ 2 = 1. Look at the normal tables. Suppose we want to use it to evaluate the P(Z < b). The two sides of the table give together b, the inside of the table yields the probability P(Z < b). Suppose we want to evaluate P(Z 1.23), since 1.23 = 1.2 + 0.03, the first column gives the 1.2 values and first row gives the 0.03 value. We find the 1.2 and 0.03 values and locate the value in the inside of the table where this column and row intersect. 14 15
This intersection point is the probability, that is P(Z 1.23) = 0.8907. Examples - standard normal P(Z<b) (a) Evaluate P(0.6 < Z 1.3). (b) (i) P(Z 1.1), (ii) P(Z 0.6), (iii) P(Z 3.0), (iv) P(Z 2.12). (c) How to interprete P(Z 1.1) and P(Z 3.0)? (d) (i) P(Z > 1.1), (ii) P(Z > 0.6), (iii) P(Z > 3.0), (iv) P(Z > 2.12). 0 b (e) (i) P( 1.1 < Z 0.6), (ii) P( 2.12 < Z 3.0), (iii) P( 2.12 < Z 0) The area under the graph is the probability, which corresponds to the value given in the table. Look at the handout standard normal tables.pdf for the solutions. 16 17