5.3 Statistics and Their Distributions

Size: px

Start display at page:

Download "5.3 Statistics and Their Distributions"

Cory Mason
5 years ago
Views:

1 Chapter 5 Joint Probability Distributions and Random Samples Instructor: Lingsong Zhang 1 Statistics and Their Distributions 5.3 Statistics and Their Distributions Statistics and Their Distributions Consider selecting two different samples of size n from the same population distribution. The x i s in the second sample will virtually always differ at least a bit from those in the first sample. For example, a first sample of n = 3 cars of a particular type might result in fuel efficiencies x 1 = 30.7, x 2 = 29.4, x 3 = 31.1, whereas a second sample may give x 1 = 28.8, x 2 = 30.0, and x 3 = Before we obtain data, there is uncertainty about the value of each x i. Statistics and Their Distributions Because of this uncertainty, before the data becomes available we view each observation as a random variable and denote the sample by X 1, X 2,, X n (uppercase letters for random variables). This variation in observed values in turn implies that the value of any function of the sample observations such as the sample mean, sample standard deviation, or sample fourth spread also varies from sample to sample. That is, prior to obtaining x 1,, x n, there is uncertainty as to the value of x, the value of s, and so on. Statistics and Their Distributions Definition 1. A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic. 1

2 Statistics and Their Distributions The probability distribution of a statistic is sometimes referred to as its sampling distribution to emphasize that it describes how the statistic varies in value across all samples that might be selected. Random Samples Random Samples Definition 2. The rv s X 1, X 2,, X n are said to form a (simple) random sample of size n if 1. The X i s are independent rv s. 2. Every X i has the same probability distribution. Random Samples Conditions 1 and 2 can be paraphrased by saying that the X i s are independent and identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) population, Conditions 1 and 2 are satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet the sample size n is much smaller than the population size N. Random Samples In practice, if n/n.05 (at most 5% of the population is sampled), we can proceed as if the X i s form a random sample. The virtue of this sampling method is that the probability distribution of any statistic can be more easily obtained than for any other sampling method. There are two general methods for obtaining information about a statistic s sampling distribution. One method involves calculations based on probability rules, and the other involves carrying out a simulation experiment. Deriving a Sampling Distribution Deriving a Sampling Distribution Probability rules can be used to obtain the distribution of a statistic provided that it is a fairly simple function of the X i s and either there are relatively few different X values in the population or else the population distribution has a nice form. Our next example illustrate such situation. 2

3 A certain brand of MP3 player comes in three configurations: a model with 2 GB of memory, costing $80, a 4 GB model priced at $100, and an 8 GB version with a price tag of $120. If 20% of all purchasers choose the 2 GB model, 30% choose the 4 GB model, and 50% choose the 8 GB model, then the probability distribution of the cost X of a single randomly selected MP3 player purchase is given by x p(x) with µ = 106, σ 2 = 244. Suppose on a particular day only two MP3 players are sold. Let X 1 = the revenue from the first sale and X 2 = the revenue from the second. Suppose that X 1 and X 2 are independent, each with the probability distribution shown in (5.2) [so that X 1 and X 2 constitute a random sample from the distribution (5.2)]. Table 5.2 lists possible (x 1, x 2 ) pairs, the probability of each [computed using (5.2) and the assumption of independence], and the resulting x and s 2 values. Now to obtain the probability distribution of X, the sample average revenue per sale, we must consider each possible value x and compute its probability. For example, x = 100 occurs three times in the table with probabilities.10,.09, and.10, so Similarly, P x (100) = P (X = 100) = =.29 p S 2(800) = P (S 2 = 800) = P (X 1 = 80, X 2 = 120 or X 1 = 120, X 2 = 80) = =.20 3

The complete sampling distributions of X and S 2 appear in (5.3) and (5.4). x 80 90 100 110 120 p X (x).04.12.29.30.25 s 2 0 200 800 p S 2(s 2 ).38.42.20 (5.3) (5.4) Figure 5.

4 The complete sampling distributions of X and S 2 appear in (5.3) and (5.4). x p X (x) s p S 2(s 2 ) (5.3) (5.4) Figure 5.7 pictures a probability histogram for both the original distribution (5.2) and the X distribution (5.3). The figure suggests first that the mean (expected value) of the distribution X is equal to the mean 106 of the original distribution, since both histograms appear to be centered at the same place. From (5.3), µ X = E(X) = xp X (x) = (80)(.04) + + (120)(.25) = 106 = µ Second, it appears that the X distribution has smaller spread (variability) than the original distribution, since probability mass has moved in toward the mean. Again from (5.3), σ 2 X = V (X) = x 2 p X (x) µ 2 X = (80 2 )(.04) + + (120 2 )(.25) (106) 2 = 122 = = σ2 2 The variance of X is precisely half that of the original variance (because n = 2). Using (5.4), the mean value of S 2 is µ S 2 = E(S 2 ) = S 2 p S 2(s 2 ) = (0)(.38) + (200)(.42) + (800)(.20) = σ 2 That is, the X sampling distribution is centered at the population mean µ, and the S 2 sampling distribution is centered at the population variance σ 2. 4

5 If there had been four purchases on the day of interest, the sample average revenue X would be based on a random sample of four X i s, each having the distribution (5.2). More calculation eventually yields the pmf of X for n = 4 as x p X (x) x p X (x) From this, µ x = 106 = µ and σ 2 x = 61 = σ 2 /4. Figure 5.8 is a probability histogram of this pmf. should suggest first of all that the computation of p X (x) and p S 2(s 2 ) can be tedious. If the original distribution (5.2) had allowed for more than three possible values, then even for n = 2 the computations would have been more involved. The example should also suggest, however, that there are some general relationships between E(X), V (X), E(S 2 ), and the mean µ and variance σ 2 of the original distribution. Simulation Experiments Simulation Experiments The second method of obtaining information about a statistic s sampling distribution is to perform a simulation experiment. This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer. 5

6 Simulation Experiments The following characteristics of an experiment must be specified: 1. The statistic of interest (X, S, a particular trimmed mean, etc.) 2. The population distribution (normal with µ = 100 and σ = 15, uniform with lower limit A = 5 and upper limit B = 10, etc.) 3. The sample size n (e.g., n = 10 or n = 50) 4. The number of replications k (number of samples to be obtained) Simulation Experiments Then use appropriate software to obtain k different random samples, each of size n, from the designated population distribution. For each sample, calculate the value of the statistic and construct a histogram of the k values. This histogram gives the approximate sampling distribution of the statistic. The larger the value of k, the better the approximation will tend to be (the actual sampling distribution emerges as k ). In practice, k = 500 or 1000 is usually sufficient if the statistic is fairly simple. Simulation Experiments The final aspect of the histograms to note is their spread relative to one another. The larger the value of n, the more concentrated is the sampling distribution about the mean value. This is why the histograms for n = 20 and n = 30 are based on narrower class intervals than those for the two smaller sample sizes. For the larger sample sizes, most of the x values are quite close to This is the effect of averaging. When n is small, a single unusual x value can result in an x value far from the center. Simulation Experiments With a larger sample size, any unusual x values, when averaged in with the other sample values, still tend to yield x an value close to µ. Combining these insights yields a result that should appeal to your intuition: X based on a large n tends to be closer to µ than does X based on a small n The Distribution of the Sample Mean 5.4 The Distribution of the Sample Mean 6

7 The Distribution of the Sample Mean The importance of the sample mean X springs from its use in drawing conclusions about the population mean. Some of the most frequently used inferential procedures are based on properties of the sampling distribution of X. A preview of these properties appeared in the calculations and simulation experiments of the previous section, where we noted relationships between E(X) and µ and also among V (X), σ 2, and n. The Distribution of the Sample Mean Proposition 1. Let X 1, X 2,, X n be a random sample from a distribution with mean value µ and standard deviation σ. Then 1. E(X) = µ X = µ 2. V (X) = σ 2 X = σ2 /n and σ X = σ/ n In addition, with T 0 = X X n (the sample total), E(T 0 ) = nµ, V (T 0 ) = nσ 2, and σ T0 = nσ. The Distribution of the Sample Mean According to Result 1, the sampling (i.e., probability) distribution of X is centered precisely at the mean of the population from which the sample has been selected. Result 2 shows that the X distribution becomes more concentrated about µ as the sample size n increases. In marked contrast, the distribution of T 0 becomes more spread out as n increases. Averaging moves probability in toward the middle, whereas totaling spreads probability out over a wider and wider range of values. The Distribution of the Sample Mean The standard deviation σ X = σ/ n is often called the standard error of the mean; it describes the magnitude of a typical or representative deviation of the sample mean from the population mean. Example 24 In a notched tensile fatigue test on a titanium specimen, the expected number of cycles to first acoustic emission (used to indicate crack initiation) is µ = 28, 000, and the standard deviation of the number of cycles is σ = Let X 1, X 2,, X 25 be a random sample of size 25, where each X i is the number of cycles on a different randomly selected specimen. Then the expected value of the sample mean number of cycles until first emission is E(X) = µ = 28, 000, and the expected total number of cycles for the 25 specimens is E(T 0 ) = nµ = 25(28, 000) = 700,

8 Example 24 The standard deviation of X (standard error of the mean) and of T 0 are σ X = σ/ n = = 1000 σ T0 = nσ = 25(5000) = 25, 000 If the sample size increases to n = 100, E(X) is unchanged, but σ X = 500, half of its previous value (the sample size must be quadrupled to halve the standard deviation of X). The Case of a Normal Population Distribution The Case of a Normal Population Distribution Proposition 2. Let X 1, X 2,, X n be a random sample from a normal distribution with mean µ and standard deviation σ. Then for any n, X is normally distributed (with mean µ and standard deviation σ/ n), as is T 0 (with mean nµ and standard Deviation nσ). We know everything there is to know about the X and T 0 distributions when the population distribution is normal. In particular, probabilities such as P (a X b) and P (c T 0 d) can be obtained simply by standardizing. The Case of a Normal Population Distribution Figure 5.14 illustrates the proposition. Example 25 The time that it takes a randomly selected rat of a certain subspecies to find its way through a maze is a normally distributed rv with µ = 1.5 min and σ =.35 min. Suppose five rats are selected. Let X 1,, X 5 denote their times in the maze. Assuming the X i s to be a random sample from this normal distribution, what is the probability that the total time T 0 = X1+ +X5 for the five is between 6 and 8 min? 8

9 Example 25 By the proposition, T 0 has a normal distribution with µ T0 = nµ = 5(1.5) = 7.5 and variance σt 2 0 = nσ 2 = 5(.1225) =.6125, so σ T0 =.783. To standardize T o, subtract µ T0 and divide by σ T0 : ( P (6 T 0 8) = P Z ) = P ( 1.92 Z.64) = Φ(.64) Φ( 1.92) =.7115 Example 25 Determination of the probability that the sample average time X (a normally distributed variable) is at most 2.0 min requires µ X = µ = 1.5 and σ X = σ/ n =.35/ 5 = Then P (X 2.0) = P ( Z = P (Z 3.19) = Φ(3.19) =.9993 ) The Central Limit Theorem The Central Limit Theorem When the X i s are normally distributed, so is X for every sample size n. Even when the population distribution is highly nonnormal, averaging produces a distribution more bell-shaped than the one being sampled. A reasonable conjecture is that if n is large, a suitable normal curve will approximate the actual distribution of X. The formal statement of this result is the most important theorem of probability. The Central Limit Theorem Theorem 3. (The Central Limit Theorem (CLT)) Let X 1, X 2,, X n be a random sample from a distribution with mean µ and variance σ 2. Then if n is sufficiently large, X has approximately a normal distribution with µ X = µ and σ 2 X = σ2 /n, and T 0 also has approximately a normal distribution with µ T0 = nµ, σ 2 T 0 = nσ 2. The larger the value of n, the better the approximation. 9

10 The Central Limit Theorem Figure 5.15 illustrates the Central Limit Theorem. The Central Limit Theorem According to the CLT, when n is large and we wish to calculate a probability such as P (a X b), we need only pretend that X is normal, standardize it, and use the normal table. The resulting answer will be approximately correct. The exact answer could be obtained only by first finding the distribution of X, so the CLT provides a truly impressive shortcut. Example 26 The amount of a particular impurity in a batch of a certain chemical product is a random variable with mean value 4.0 g and standard deviation 1.5 g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity X is between 3.5 and 3.8 g? According to the rule of thumb to be stated shortly, n = 50 is large enough for the CLT to be applicable. Example 26 X then has approximately a normal distribution with mean value µ X = 4.0 and σ X = 1.5/ 50 =.2121, so ( P (3.5 X 3.8) P Z.2121 = Φ(.94) Φ( 2.36) =.1645 )

11 The Central Limit Theorem The CLT provides insight into why many random variables have probability distributions that are approximately normal. For example, the measurement error in a scientific experiment can be thought of as the sum of a number of underlying perturbations and errors of small magnitude. A practical difficulty in applying the CLT is in knowing when n is sufficiently large. The problem is that the accuracy of the approximation for a particular n depends on the shape of the original underlying distribution being sampled. The Central Limit Theorem If the underlying distribution is close to a normal density curve, then the approximation will be good even for a small n, whereas if it is far from being normal, then a large n will be required. Rule of Thumb If n > 30, the Central Limit Theorem can be used. There are population distributions for which even an n of 40 or 50 does not suffice, but such distributions are rarely encountered in practice. The Central Limit Theorem On the other hand, the rule of thumb is often conservative; for many population distributions, an n much less than 30 would suffice. For example, in the case of a uniform population distribution, the CLT gives a good approximation for n 12. Other Applications of the Central Limit Theorem Other Applications of the Central Limit Theorem The CLT can be used to justify the normal approximation to the binomial distribution discussed earlier. We know that a binomial variable X is the number of successes in a binomial experiment consisting of n independent success/failure trials with p = P (S) for any particular trial. Define a new rv X 1 by { 1 if the 1st trial results in a success X 1 = 0 if the 1st trial results in a failure and define X 2, X 3,, X n analogously for the other n 1 trials. Each X i indicates whether or not there is a success on the corresponding trial. 11

12 Other Applications of the Central Limit Theorem Because the trials are independent and P (S) is constant from trial to trial, the X i s are iid (a random sample from a Bernoulli distribution). The CLT then implies that if n is sufficiently large, both the sum and the average of the X i s have approximately normal distributions. Other Applications of the Central Limit Theorem When the X i s are summed, a 1 is added for every S that occurs and a 0 for every F, so X 1 + +X n = X. The sample mean of the X i s is X/n, the sample proportion of successes. That is, both X and X/n are approximately normal when n is large. Other Applications of the Central Limit Theorem The necessary sample size for this approximation depends on the value of p: When p is close to.5, the distribution of each X i is reasonably symmetric (see Figure 5.19), whereas the distribution is quite skewed when p is near 0 or 1. Using the approximation only if both np 10 and n(1 p) 10 ensures that n is large enough to overcome any skewness in the underlying Bernoulli distribution. Two Bernoulli distributions: (a) p=.4 (reasonably symmetric); (b) p=.1 (very skewed) 12

6 Central Limit Theorem. (Chs 6.4, 6.5)

6 Central Limit Theorem. (Chs 6.4, 6.5) 6 Central Limit Theorem (Chs 6.4, 6.5) Motivating Example In the next few weeks, we will be focusing on making statistical inference about the true mean of a population by using sample datasets. Examples?