CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS Note: This section uses session window commands instead of menu choices CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE STATISTICS) The Central Limit Theorem says that if x is a random variable with any distribution having mean µ and standard deviation σ, then the distribution of sample means x based on random samples of size n is such that for sufficiently large n: (a) The mean of the x distribution is approximately the same as the mean of the x distribution. (b) The standard deviation of the x distribution is approximately σ n. (c) The x distribution is approximately a normal distribution. Furthermore, as the sample size n becomes larger and larger, the approximations mentions in (a), (b) and (c) become better. We can use MINITAB to demonstrate the Central Limit Theorem. The computer does not prove the theorem. A proof of the Central Limit Theorem requires advanced mathematics and is beyond the scope of an introductory course. However, we can use the computer to gain a better understanding of the theorem. To demonstrate the Central Limit Theorem, we need a specific x distribution. One of the simplest is the uniform probability distribution. 229

230 Technology Guide Understandable Statistics, 8th Edition The normal distribution is the usual bell-shaped curve, but the uniform distribution is the rectangular or box-shaped graph. The two distributions are very different. The uniform distribution has the property that all subintervals of the same length inside the interval 0 to 9 have the same probability of occurrence no matter where they are located. This means that the uniform distribution on the interval from 0 to 9 could be represented on the computer by selecting random numbers from 0 to 9. Since all numbers from 0 to 9 would be equally likely to be chosen, we say we are dealing with a uniform (equally likely) probability distribution. Note that when we say we are selecting random numbers from 0 to 9, we do not just mean whole numbers or integers; we mean real numbers in decimal form such as 2.413912, and so forth. Because the interval from 0 to 9 is 9 units long and because the total area under the probability graph must by 1 (why?), the height of the uniform probability graph must be 1/9. The mean of the uniform distribution on the interval from 0 to 9 is the balance point. Looking at the Figure, it is fairly clear that the mean is 4.5. Using advanced methods of statistics, it can be shown that for the uniform probability distribution x between 0 and 9, µ = 4.5 and σ = 3 3 2 2.598 The figure shows us that the uniform x distribution and the normal distribution are quite different. However, using the computer we will construct one hundred sample means x from the x distribution using a sample size of n = 40. We can vary the sample size n according to how many columns we use in RANDOM command. RANDOM 100 C1-C40 UNIFORM 0 9 (that is, from a = 0 to b = 9) We will see that even though the uniform distribution is very different from the normal distribution, the histogram of the sample means is somewhat bell shaped. Looking at the DESCRIBE command, we will also see that the mean or the x distribution is close to the predicted mean of 4.5 and that the standard deviation is close to σ n or 2.598 40 or 0.411. Example The following MINITAB program will draw 100 random samples of size 40 from the uniform distribution on the interval from 0 to 9. We put the data into 40 columns. Then we take the mean of each of the 100 rows (40 columns across) and store the result in C50. To do this, we use the RMEAN C1-C40 put into C50 command. Next we DESCRIBE C50 to look at the mean and standard deviation of the distribution of sample means. Finally we look at a histogram of the sample means in C50. MTB > RANDOM 100 C1-C40; SUBC > UNIFORM 0 9. MTB > # Take the mean of the 40 data values in each row. MTB > # Put the means in C50. MTB > RMEAN C1-C40 C50 MTB > DESCRIBE C50 The result follows.

Part III: MINITAB Guide 231 Note the MEAN and STDEV are very close to the values predicted by the Central Limit Theorem. MTB > GSTD MTB > HISTOGRAM C50 The result follows.

232 Technology Guide Understandable Statistics, 8th Edition The histogram for this sample does not appear very similar to a normal distribution. Let s try another sample. The following are the results.

Part III: MINITAB Guide 233 This histogram looks more like a normal distribution. You will get slightly different results each time you draw 100 samples. The number of samples used is determined by K, and the size of the samples is determined by the number of columns in the command. RANDOM K C1-Cn UNIFORM a b Be sure that when you take the RMEAN of the rows, you use the same number of columns as you used in the random command RMEAN C1-Cn C Then use the DESCRIBE and HISTOGRAM commands on the column C where you put the means. You can sample from a variety of distributions, some of which were listed under the RANDOM command in the Chapter 1 Command Summary. LAB ACTIVITIES FOR CENTRAL LIMIT THEOREM 1. Repeat the experiment of Example 1. That is, draw 100 random samples of size 40 each from the uniform probability distribution between 0 and 9. Then take the means of each of these samples and put the results in C50. Use the commands RANDOM 100 C1-C40 UNIFORM 0 9 RMEAN C1-C40 C50 Next use DESCRIBE on C50. How does the mean and standard deviation of the distribution of sample means compare to those predicted by the Central Limit Theorem? Use HISTOGRAM C50 to draw a histogram of the distribution of sample means. How does it compare to a normal curve? 2. Next take 100 random samples of size 20 from the uniform probability distribution between 0 and 9. To do this, use only 20 columns in the RANDOM and RMEAN commands. Again put the means in C50, use DESCRIBE and HISTOGRAM on C50 and comment on the results. How do these results compare to those in problem 1? How do the standard deviations compare?