Central Limit Theorem Lots of Samples 1
Homework Read Sec 6-5. Discussion Question pg 329 Do Ex 6-5 8-15 2
Objective Use the Central Limit Theorem to solve problems involving sample means 3
Sample Means If we were trying to find the mean GPA for students at CHHS we could randomly sample 50 students and find the mean GPA. This would be an estimate of the GPA for the population of CHHS. We can take several more samples of 50 students to try to improve our estimate. What happens when we look at the means of these multiple samples? The means of many samples become data values in a distribution of statistics (summaries). 4
Population In a large population of size N we can find N C n different randomly selected (with replacement) samples of size n. We can then look at a sampling distribution (of sample means). The means of these samples will usually differ from µ even though each sample mean ( ) is an estimate of µ. X The difference between X and µ is due to sampling error or sampling variability. 5
Sample of Sample Means Of all the possible samples, let us take several. Not so few, but certainly not all of the possible samples. We would now have a new distribution. A distribution of sample means. A sampling distribution. This distribution of sample means has... a mean and standard deviation. 6
µ and σ We would expect that the mean of all possible sample means would equal the population mean. µ X = µ The standard deviation of the sample means would NOT be equal to the population standard deviation. Obviously the sample means would tend to be much more alike (closer together) than the actual observations. The reduced variation results in a smaller standard deviation. σ X = σ, n = size of samples n Take it on faith, we are not going to develop the formula for σ X 7
Example Suppose we roll a die 6 times with results 1, 2, 3, 4, 5, 6. Unlikely but let us accept that as the result for the moment. Using your calculator find the mean and standard deviation of our population. µ = 3.5, and σ 1.7078. Record these values for future comparison. 8
Samples Now we take all the samples of size 2 from our population Sample Mean Sample Mean Sample Mean 1, 1 1 1, 2 1.5 1, 3 2 1, 4 2.5 1, 5 3 1, 6 3.5 2, 1 1.5 2, 2 2 2, 3 2.5 2, 4 3 2, 5 3.5 2, 6 4 3, 1 2 3, 2 2.5 3, 3 3 3, 4 3.5 3, 5 4 3, 6 4.5 4, 1 2.5 4, 2 3 4, 3 3.5 4, 4 4 4, 5 4.5 4, 6 5 5, 1 3 5, 2 3.5 5, 3 4 5, 4 4.5 5, 5 5 5, 6 5.5 6, 1 3.5 6, 2 4 6, 3 4.5 6, 4 5 6, 5 5.5 6, 6 6 9
Distribution of Sample Means Mean f Condensing the table 1 1 1.5 2 This is now a frequency distribution of sample means. 2 3 2.5 4 3 5 This sampling distribution is a new distribution of the means from all of the samples of size 2. 3.5 6 4 5 On your calculator do a histogram of the distribution. 4.5 4 5 3 5.5 2 6 1 10
Distribution of Sample Means The distribution of sample means is unimodal and Mean f symmetric, or approximately normally distributed. 6 4.5 3 1 1 1.5 2 2 3 2.5 4 3 5 3.5 6 4 5 1.5 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 4.5 4 5 3 5.5 2 6 1 11
Mean and Standard Deviation Mean f On your calculator find the mean and standard 1 1 deviation of the distribution of sample means. 1.5 2 2 3 µ X = 3.5 σ X = 1.2076 2.5 4 3 5 3.5 6 4 5 This is consistent with the values from our original population. µ = 3.5, and σ 1.7078. 4.5 4 5 3 σ X = σ n = 1.7078 = 1.2076 5.5 2 2 6 1 12
Central Limit Theorem As sample size (n) taken randomly increases (with replacement) from a population with mean µ and standard deviation σ, the shape of the distribution of sample means will approach a normal distribution with: µ X = µ σ X = σ n, n = size of samples Since the distribution of sample means is normal, the central limit theorem allows us to ask questions about sample means, just like the questions asked about raw data, including using z-scores. 13
Limitations If the original data from which samples were taken are normally distributed, then a distribution of sample means will be normally distributed If the original data from which samples were taken are not normally distributed, then the samples must be large enough (n 30) to ensure the distribution of means is normal. This is a remarkable and incredibly useful fact. Even if the original population is not normally distributed, the sampling distribution will be sufficiently normal if the sample size is large enough. The more the original observations deviate from unimodal and symmetric the larger the samples need to be. Thus n > 30 should suffice for all but the most extreme cases. 14
Z-scores We can now use z-scores to draw inferences about sample means. Such as, if 20 people are chosen at random from a population with mean 100 and standard deviation 15, what is the probability the mean of the sample will be greater than 110? Note that this is a different question than, If people are chosen at random from a population with mean 100 and standard deviation 15, what is the probability an individual selected at random will score greater than 110? Which question has the greater probability? 15
Sample We calculate z-scores as always, only differing by keeping in mind the new standard deviation. 110.0014 z = X µ 110 100 = 2.9814 σ 15 n 20 P (z 2.98) = Normalcdf(2.98, 9, 0, 1) =.0014 90-3σ 93.3 96.7 100 103.4 106.7 110.1-2σ -1σ 0 1σ 2σ 3σ Note what the change in σ has done to the curve. P (X 110) = Normalcdf(110, 10^99, 100, 15/ 20) =.0014 The probability a sample of 20 people will have a mean score greater than 110 is.0014. Not very likely. 16
Individual If we calculate the probability of an individual having a score 110. 110 z = X µ σ = 110 100 15.6667.2525 P (z.6667) = Normalcdf(.6667, 9, 0, 1) =.2525 55-3σ 70-2σ 85-1σ 100 115 130 145 0 1σ 2σ 3σ P (x 110) = Normalcdf(110, 10^99, 100, 15) =.2525 The probability an individual will have a mean score greater than 110 is.2525. Much more likely. When calculating probabilities; first determine if you are calculating for an individual observation, or a sample summary statistic. 17
Example Assume the appropriate mean systolic blood pressure of an adult is 120 mm/hg with a standard deviation of 5.6 mm/hg. If blood pressure is normally distributed find the probability that an individual selected at random would have systolic pressure between 120 and 121.8 mm/hg. 121.8.1261 z = X µ σ = 121.8 120 5.6.3214 103.2-3σ 108.8-2σ 114.4-1σ 120 125.6 131.2 136.8 0 1σ 2σ 3σ P(120 x 121.8) = Normalcdf(120, 121.8, 120, 5.6) =.1261 P(0 z.3214) = Normalcdf(0,.3214, 0, 1) =.1261 The probability an individual would have a systolic blood pressure between 120 and 121.8 mm/hg is.1261 18
Example Assume the appropriate mean systolic blood pressure of an adult is 120 mm/hg with a standard deviation of 5.6. If blood pressure is normally distributed find the probability that a sample of 30 people would have a mean systolic pressure between 120 and 121.8 mm Hg. 121.8 σ = 5.6 X 30 = 1.0224 z = X µ σ = 121.8 120 5.6 30 1.7605 117-3σ 118-2σ 119-1σ.1261 120 121 122 123 0 1σ 2σ 3σ P(120 x 121.8) = Normalcdf(120, 121.8, 120, 5.6/ 30) =.4608 P(0 z 1.0224) = Normalcdf(0, 1.7605, 0, 1) =.4608 The probability that a sample of 30 people would have a mean systolic pressure between 120 and 121.8 mm Hg is.4608. 19