Distribution of the Sample Mean MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018
Experiment (1 of 3) Suppose we have the following population : 4 8 1 2 3 4 9 1 0 4 3 5 6 8 9 3 0 0 7 2 1 0 3 7 9 3 2 5 7 2 4 7 5 7 7 6 4 1 3 0 If we take random samples of size 10, we might obtain: S 1 = {2, 8, 3, 1, 5, 7, 4, 1, 3, 9} S 2 = {8, 7, 6, 7, 4, 4, 0, 2, 2, 0} S 3 = {6, 0, 0, 7, 4, 0, 7, 0, 6, 1}
Experiment (2 of 3) The samples were: S 1 = {2, 8, 3, 1, 5, 7, 4, 1, 3, 9} S 2 = {8, 7, 6, 7, 4, 4, 0, 2, 2, 0} S 3 = {6, 0, 0, 7, 4, 0, 7, 0, 6, 1} and there corresponding means were: x 1 = 4.3 x 2 = 4.0 x 3 = 3.1 Observation: since the samples are chosen randomly the mean calculated from the sample is a random variable. What is the distribution of this random variable?
Experiment (3 of 3) One way to determine the distribution of the sample mean for samples of size 10 from this population of size 40, would be to list all the possible samples; however, since this is impractical. 40C 10 = 847, 660, 520 Perhaps we can get an idea from a smaller population.
Example (1 of 3) Suppose we have a population {1, 2, 3, 4, 5, 6}, we can list all the samples of size n = 2 with their corresponding sample means. Sample x Sample x Sample x {1, 2} 1.5 {1, 3} 2.0 {1, 4} 2.5 {1, 5} 3.0 {1, 6} 3.5 {2, 3} 2.5 {2, 4} 3.0 {2, 5} 3.5 {2, 6} 4.0 {3, 4} 3.5 {3, 5} 4.0 {3, 6} 4.5 {4, 5} 4.5 {4, 6} 5.0 {5, 6} 5.5
Example (2 of 3) The frequency distribution of the sample means is below. x Freq. x Freq. x Freq. 1.5 1 2.0 1 2.5 2 3.0 2 3.5 3 4.0 2 4.5 2 5.0 1 5.5 1 Question: what is the probability of a random sample of size n = 2 having a sample mean 2.5 x 4.0?
Example (3 of 3) x Relative Frequency 1.5 0.0667 2.0 0.0667 2.5 0.1333 3.0 0.1333 3.5 0.2000 4.0 0.1333 4.5 0.1333 5.0 0.0667 5.5 0.0667 Rel. Freq. 0.20 0.15 0.10 0.05 1 2 3 4 5 6 x
Example Suppose a population of 100 people yields the following ages: 14 16 26 58 31 41 12 50 35 15 6 18 42 53 27 41 45 27 44 55 59 17 60 36 39 63 52 19 59 32 64 33 87 66 35 39 10 15 30 4 39 56 58 63 12 37 11 3 31 43 12 60 42 66 28 23 32 51 69 41 8 65 82 39 28 79 32 5 40 23 34 73 21 62 20 43 23 36 16 46 27 53 35 58 3 28 86 83 40 61 4 43 64 49 54 17 61 39 66 30 Question: what are the population mean and standard deviation?
Population Age Distribution Age distribution for the population: 20 15 10 5 µ = 39.3 σ = 20.9
100 Samples of size 30 If we create 100 random samples of size n = 30 and calculate the sample means we get: 36.4 37.5 36.4 38.2 37.5 36.5 36.6 35.7 35.8 36.3 35.5 36.6 37.2 37.3 37.4 35.7 36.6 35.7 35.4 36.4 36.8 35.7 37.3 36.7 37.5 36.3 36.1 36.4 37. 37.6 36.7 35.6 37.5 38.1 36.6 36.3 38.0 36.4 35.5 36.7 36.1 36.5 37.8 36.0 37.4 36. 36.8 36.1 36.5 35.9 36.9 36.2 37.1 36.5 37.3 35.4 36.3 38.2 38.1 36.7 37.4 35.5 36.8 37.5 37.0 37.1 35.9 37.4 36.6 37.5 36.3 36.6 35.5 37.3 36.6 35.6 35.7 34.8 34.9 35.4 34.6 35.7 36.3 36.4 36.5 34.8 35.7 34.8 34.5 35.5 35.9 34.8 36.4 35.8 36.6 35.4 35.2 35.5 36.1 36.7
Distribution of the Sample Means Distribution of x: 15 10 5 µ = 36.4 σ = 0.9
Increasing the Sample Size n As the sample size increases the mean of the sample means approaches the population mean. 38 36 34 32 30 n
Law of Large Numbers Theorem (Law of Large Numbers) As additional observations are added to a sample, the sample mean x approaches the population mean µ.
What Happens to the Standard Deviation? As the sample size increases the standard deviation of the sample means decreases, but does not disappear. 2.5 2.0 1.5 1.0 0.5 n
Result Theorem Suppose that a simple random sample of size n is drawn from a large population with a mean µ and a standard deviation σ. The sampling distribution of x will have mean µ x and standard deviation σ x = σ n. The standard deviation of the sampling distribution of x is called the standard error of the mean and is denoted σ x.
Example For the population of 100 ages µ = 39.3 and σ = 20.9. 1. What is the standard error of the mean for samples of size n = 20? σ x = σ n = 20.9 20 = 4.7 2. What is the standard error of the mean for samples of size n = 25? σ x = 20.9 25 = 4.2 3. What is the standard error of the mean for samples of size n = 30? σ x = 20.9 30 = 3.8
Normal Distributions Theorem If a random variable X is normally distributed, the distribution of the sample mean x is normally distributed. 0.06 0.05 Sample 0.04 0.03 0.02 0.01 Population X
Central Limit Theorem Theorem (Central Limit Theorem) Regardless of the shape of the population distribution, the sample distribution of x becomes approximately normal as the sample size increases. Remark: if the sample size is greater than 30, generally the distribution of x can be treated as normal.
Illustration The random variable X is not normally distributed, but the means of samples of size 30 randomly sampled from this distribution are nearly normally distributed. 60 1.0 50 0.8 40 0.6 30 0.4 20 0.2 10
Application The average speed of winds in Honolulu, HI is µ = 11.3 mph. The standard deviation of the wind speeds is σ = 3.5 mph. Assuming that the wind speeds are normally distributed, 1. find the probability that a single wind speed reading will exceed 13.9 mph, 2. describe the sampling distribution of the means for samples of size n = 9, 3. find the probability that the mean of 9 wind speed readings will exceed 13.9 mph.
Solution Probability that a single wind speed reading will exceed 13.9 mph, P(X > 13.9) = P(Z > 0.74) = 1 0.7704 = 0.2296. The sampling distribution of means for samples of size 9 is normally distributed with a mean of µ x = 11.3 mph and the standard deviation of σ x = 3.5 9 = 1.2 mph. Probability that the mean of 9 wind speed readings will exceed 13.9 mph, P(X > 13.9) = P(Z > 2.23) = 1 0.9871 = 0.0129.
Application The average salary for a registered nurse is $45,900 and the standard deviation in salaries is $7790. Suppose a sample of salaries of 50 registered nurses is collected. What is the probability that the sample mean is between $43,000 and $47,000?
Solution The mean of samples of size 50 is µ x = 45, 900 and the standard error of the mean of samples of size 50 is σ x = 7790 50 = 1102. P(43, 000 < X < 47, 000) = P( 2.63 < Z < 1.00) = 0.8413 0.0043 = 0.8370