Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible outcomes (e.g., 0 or, failure or success, healthy or sick). The count n is specified a priori. The probability of success, denoted by p, is constant for the sequence of trials. The trials are mutually independent. A binomial random variable X counts the number of successes among the n trials. The probability mass function of X can be derived from the above conditions. It is given by n x n x P( X = x) = p ( p), x = 0,,, n. x One can use the binomial theorem to verify that these probabilities add to. The mean of the distribution is np and the variance is np( p). A small table of cumulative binomial probabilities is given in Devore (Table A.). In general, binomial probabilities are easily calculated with JMP. For example, suppose n = 60 and p = 0.8. Form a data table with 6 rows and two columns. Label the first column x. Via the Formula Editor (right click at the top of the column and choose Formula for access) select Row/Row and then subtract. Call the second column Prob and in the Formula Editor choose Probability/Binomial Probability (0.8, 60, x) (for the third argument just click on x in the list of Table Columns. To obtain cumulative probabilities form a third column. In the Formula Editor choose Probability/Binomial Distribution (0.8, 60, x). The Normal Distribution A random variable X has a normal distribution with mean and standard deviation if it has the probability density function f ( x; µ, σ ) = exp ( x µ ), < <. x πσ σ The ranges of the parameters are - < < and 0 < <. As the parameters vary over these ranges, we obtain the family of normal distributions. Note that, in fact, the parameters of a normal distribution are its mean and standard deviation. It is common to
denote this distribution by N(, ), or by N(, ). The density function is symmetric about and has points of inflection located at ±. Another name for the normal distribution is the Gaussian distribution. The normal density is popularly known as the bell-shaped curve. The standard normal is the distribution from the family with mean 0 and standard deviation, that is, N(0, ). If X is N(, ), then Z = (X )/ is N(0, ). This result allows us to use a table for the N(0, ) distribution to calculate probabilities for any normal distribution. For example, if X is N(, ), a µ X µ b µ a µ b µ P( a X b) = P = P Z. σ σ σ σ σ The value on the horizontal axis to the right of which the area under the N(0, ) density is is denoted by z. See the picture on page 64 of Devore. Note that we may write P( Z z α ) = α. The value z is called the 00( )th percentile (or quantile) of the distribution. For example, the 90 th percentile is.8, the 95 th percentile is.645 and the 99 th percentile is.6. Almost all of the area under a normal density curve falls within three standard deviations of the mean. In particular, the area is 0.686 within one standard deviation of the mean, 0.9544 within two standard deviations of the mean, and 0.997 within three standard deviations of the mean. Devore s table of the cumulative standard normal distribution is inside the front cover, and is also given as Table A.. In the JMP Formula Editor values of the standard normal density are given by Probability/Normal Density. The default is the standard normal density. To obtain values for any density from the normal family, click twice on the caret in the keypad and insert values for the mean and standard deviation. Values of the cumulative standard normal distribution are given by Probability/Normal Distribution, and the caret may be used for any mean and standard deviation. The percentiles of the standard normal distribution are available from Probability/Normal Quantile, and the caret option can be used. The Central Limit Theorem Let X,, X n be independent random variables, all from the N(, ) distribution. This is just a random sample from the indicated normal distribution. Define the sample mean and the sample sum, respectively, by
X = n + n ( X + + X ), T = X + X. n The probability distribution of the sample mean is N(, /n), and the distribution of the sample sum is N( n, n ). It is a remarkable result in probability theory that the above properties hold approximately for sufficiently large n, no matter what the distribution of the X i s is (it can be a continuous distribution or a discrete distribution). This result is known as the central limit theorem. For a statement of theorem, see page 9 of Devore (he gives the theorem for both the sample mean and the sample sum). How large is sufficiently large? The answer depends on the shape of the distribution of the X i s. If this distribution is close to the normal, then a relatively small value of n will allow for a very good approximation. For X i distributions with shape very different from the normal, a rather large value of n will be needed. An example of application of the central limit theorem is given by the normal approximation to the binomial distribution. For this example we consider the sample sum T, with the distribution of each X i given by P( X = ) = p, P( X = 0) = p. i That is, for this example T has a binomial distribution with parameters n and p. To approximate binomial probabilities, one uses the normal density with a / correction factor to account for the fact that the binomial distribution is discrete and the normal distribution is continuous. The approximation is quite good if np and n( p) are both at least 0. For details see the last part of Section 4. of Devore. Here s another example. Let X,, X n be a random sample from the Uniform (0, ) distribution and consider the sample sum T = X + + X n. (I ll discuss the exact distribution of T in class.) Let s see how well the normal distribution describes the distribution of T for several small values of n. We ll use JMP to simulate uniform data and then look at normal quantile plots (see the class notes on Probability Plots) of T. Note that the mean and variance of the Uniform (0, ) distribution are / and /, respectively. Let s generate 00 values of T for n =, 5 and 7; call these values of T, T 5 and T 7, respectively. First we form seven columns of 00 values each from the uniform distribution, via Random/Random Uniform in the JMP Formula Editor. Then sum three, five and seven of the columns. The simulated results are in Uniformsums.jmp. The histograms and normal quantile plots are shown on the next page. The plots show that T, T 5 and T 7 have very good agreement with the normal distribution for this sample size (00), even for n as small as (T ). For comparison, the histogram and normal quantile plot for the first uniform column is also given. i
. 0.9 0.8 0.7 0.6 0.5 0.4 0. 0. 0. 0-0. U.0.05.0.5.50.75.90.95.99 - - - 0 Mean 0.474908 Std Dev 0.07705 N 00 T.0.05.0.5.50.75.90.95.99.5.5 0.5 - - - 0 Mean.59 Std Dev 0.5506 N 00 T5 4.0.05.0.5.50.75.90.95.99 - - - 0 Mean.4988554 Std Dev 0.6896474 N 00 4
5.5 5 4.5 4.5.5.5 T7.0.05.0.5.50.75.90.95.99 - - - 0 Mean.5568 Std Dev 0.8950 N 00 The sample size for these examples, 00, is in fact somewhat small, and this is partly why the plots suggest the data are normal. Often a given data set will resemble a sample from a normal distribution in the middle of the range of the data, and the question whether normality prevails is resolved more definitively by examining the tails. For a sample size of only 00 with the random variables T, T 5 and T 7, we are unlikely to detect nonnormality in the tails, because we encounter very few tail observations. Let s increase the sample size to 0,000. Here is the plot for T : T.00.0.05.0.5.50.75.90.95.99.999 0-4 - - - 0 4 Mean.4954576 Std Dev 0.500067 N 0000 Normality does fail in the tails. I ll show the 0,000-point plots for T 5 and T 7 in class; for these the failure is less pronounced. Further, we should note that it is because of the central limit theorem that the normal distribution is so widely applicable in the description of real data. Many phenomena arise via a summation or averaging process. This is commonly the case with measurement error. Anthropometric measurements, as another example, tend to be normally distributed. 5
Estimation and the Standard Error Suppose we are given a random sample of n observations, x,, x n, drawn from some population. These data points are typically viewed as observed values of random variables X,, X n, assumed to be independent and with the same distribution. Further, assume that this distribution is characterized by an unknown parameter. For example, the observations may be the outcomes of the n trials of a binomial experiment, in which case we may take = p. Or the observations may be n independent draws from the N(, ) distribution, in which case = (, ). In statistical inference we use the data values to draw conclusions about the values of the parameter, and ultimately to answer questions about the environment in which the data were collected. Two major topics in inference are estimation and hypothesis testing. In Chapter 6 Devore discusses estimation. In Section 6. he defines point estimation and discusses properties of some estimators. For example, some estimators are unbiased, and for some purposes an unbiased estimator is desirable. For a given sample of data, one hopes that a point estimate of a parameter in question is close to the target (the parameter) it is trying to estimate. If we draw repeated samples of data and form from these repeated estimates of the parameter, we can construct a histogram of the repeated estimates. This histogram estimates the probability distribution (actually, the density or the mass function) of the estimator itself. The spread of this distribution of point estimates is a measure of the precision of the estimator. It is common to use the standard deviation of the estimator to assess its precision. If the standard deviation is small, then it is highly probable that an estimate constructed from data will be close to its target value. Furthermore, some estimators have an approximate normal distribution, and for such estimators which are also unbiased, the probability is approximately 0.95 that an estimate constructed from a sample of data will fall within two standard deviations (the standard deviation of the estimator, that is) of the intended target. In statistical language the standard deviation of the estimator is called the standard error of the estimator. Usually we have to use the data to estimate this standard error. Often such an estimate of the standard error is obtained by a plug-in procedure. It is customary to use the term standard error to describe what is in reality an estimate of the standard error. To a beginner this discussion is undoubtedly confusing, so let s consider a simple example. Suppose X,, X n is a random sample from the N(, ) distribution. An unbiased estimator of the population mean is the sample average X. The standard deviation of this estimator is σ / n. In practice σ is not known and is estimated by s, the sample standard deviation. The estimate of the standard error of X is then s / n. This is a plug-in estimate. See pages 6-64 of Devore. In some cases the estimator of a parameter is complicated, and no convenient mathematical expression is known for the standard error of the estimator. In such cases a bootstrap estimate of the standard error may be constructed. See page 64 of Devore for details. 6