Sampling Distribution Mols Copyright 2009 Pearson Education, Inc.
Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples. The histogram we d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. 2
It turns out that the histogram is unimodal, symmetric, and centered at p. More specifically, it s an amazing and fortunate fact that a Normal mol is just the right one for the histogram of sample proportions. 3
Mol how sample proportions vary from sample to sample. A sampling distribution mol for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we d observe a sample proportion in any particular interval. 4
When working with proportions, Mean = p Standard viation = pq n So, the distribution of the sample proportions is moled with a probability mol that is N p, pq n 5
A picture of what we just discussed is as follows: 6
Normal mol says that 95% of values are within two standard viations of the mean. So 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard viations. This is what we mean by sampling error. It s not really an error at all, but just variability you d expect to see from one sample to another. 7
The Normal mol gets better as a good mol for the distribution of sample proportions as the sample size gets bigger. 8
There are two assumptions in the case of the mol for the distribution of sample proportions: 1. The Inpennce Assumption: The sampled values must be inpennt of each other. 2. The Sample Size Assumption: The sample size, n, must be large enough. 9
1. Randomization Condition: The sample should be a simple random sample of the population. 2. 10% Condition: If sampling has not been ma with replacement, then the sample size, n, must be no larger than 10% of the population. 3. Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10. 10
Sampling distribution mols are important because they act as a bridge from the real world of data to the imaginary mol of the statistic and enable us to say something about the population when all we have is data from the real world. 11
Proportions summarize categorical variables. The Normal sampling distribution mol looks like it will be very useful. Can we do something similar with quantitative data? We can ined. Even more remarkable, not only can we use all of the same concepts, but almost the same mol. 12
A sample mean also has a sampling distribution. Let s start with a simulation of 10,000 tosses of a die. A histogram of the results is: 13
Looking at the average of two dice after a simulation of 10,000 tosses: The average of three dice after a simulation of 10,000 tosses looks like: 14
The average of 5 dice after a simulation of 10,000 tosses looks like: The average of 20 dice after a simulation of 10,000 tosses looks like: 15
As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. The sampling distribution of a mean becomes Normal. 16
The sampling distribution of any mean becomes more nearly Normal as the sample size grows. All we need is for the observations to be inpennt and collected with randomization. We don t even care about the shape of the population distribution! The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT). 17
The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal mol. The larger the sample, the better the approximation will be. 18
The CLT requires essentially the same assumptions we saw for moling proportions: Inpennce Assumption: The sampled values must be inpennt of each other. Sample Size Assumption: The sample size must be sufficiently large. 19
The Normal mol for the sampling distribution of the mean has a mean equal to the population mean: y = μ And a standard viation equal to SD y n where σ is the population standard viation. 20
Both of the sampling distributions we ve looked at are Normal. For proportions For means SD ˆp pq n SD y n 21
When we don t know p or σ, we will use sample statistics to estimate these population parameters. Whenever we estimate the standard viation of a sampling distribution, we call it a standard error. 22
For a sample proportion, the standard error is SE ˆp ˆp ˆq For the sample mean, the standard error is n SE y s n 23
Be careful! Now we have two distributions to al with. The first is the real world distribution of the sample, which we might display with a histogram. The second is the math world sampling distribution of the statistic, which we mol with a Normal mol based on the Central Limit Theorem. Don t confuse the two! 24
There are two basic truths about sampling distributions: 1. Sampling distributions arise because samples vary. Each random sample will have different cases and so, a different value of the statistic. 2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions. 25
26
Don t confuse the sampling distribution with the distribution of the sample. When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples the one you got and the ones you didn t get. Watch out for small samples from skewed populations. The more skewed the distribution, the larger the sample size we need for the CLT to work. 27
Based on past experience, a bank believes that 12% of people who receive loans will not make payments on time. The bank has recently approved 500 loans. a) What is the mean and standard viation of the proportion of clients in this group who many not make timely payments? μ = p = 0.12 SD = pq/n =. 12. 88/500 = 0. 015 b) What is the probability that over 14% of these clients will not make payments on time? P(p > 0.14) = 1 Normdist(0.14,0.12,0.015,1) = 0.91 28
Just before a referendum on a school budget, a local newspaper polls 435 voters to predict whether the budget will pass. Suppose the budget has the support of 54% of the voters. What is the probability that the newspaper s sample will lead it to predict feat? a) mean and standard viation of the proportion: μ = p = 0.54 SD = pq/n =. 54. 46/435 = 0. 024 P(p < 0.5) = 1 Normdist(0.5,0.54,0.024,1) = 0.048 29
When a truckload of apples arrives at a packing plant, a random sample of 125 is selected and examined for bruises, discoloration, and other fects. The whole truckload is rejected if more than 10% of the sample is unsatisfactory. Suppose in fact that 12% of the apples on the truck do not meet the sired standard. What is the probability that the shipment will be accepted anyway? mean and standard viation of the proportion: μ = p = 0.12 SD = pq/n =. 12. 88/125 = 0. 029 P(p < 0.10) = 1 Normdist(0.1,0.12,0.029,1) = 0.245 30
A new restaurant with 119 seats is being planned. Studies show that 63% of the customers mand a smoke-free area. How many seats should be in the non-smoking area in orr to be very sure (μ+3σ) of having enough seating there? mean and standard viation of the proportion: μ = p = 0.63 SD = pq/n =. 63. 37/119 = 0. 044 μ+3σ =.63 + 3*0.044 = 0.763 0.763*119 = 90 seats 31
Assume that the duration of human pregnancies can be scribed by a normal mol with mean 268 days and standard viation 16 days. a) What percentage of pregnancies should last between 255 and 270 days? P(255<x < 270) = normdist(270,268,16,1)-normdist(255,268,16,1) =.341 = 34.1% 32
Assume that the duration of human pregnancies can be scribed by a normal mol with mean 268 days and standard viation 16 days. b) At least how many days should the longest 30% of all pregnancies last? P(x >?) = 0.3 norminv(0.7,268,16) = 276.4 33
c) Suppose a certain obstetrician is currently providing prenatal care to 40 pregnant women. According to the CLT, what is the mean and standard viation of this mol? Mean = 268 SD = σ n = 16 40 = 2. 53 d) What is the probability that the mean duration of these patients pregnancies will be less than 274 days? P(y < 274) = normdist(274,268,2.53,1) =.991 34
The score distribution shown in the table is for all stunts who took a yearly AP statistics exam. An AP statistics teacher had 46 stunts preparing to take the AP exam. He consired his stunts to be typical of all the national stunts. Score Percent of stunts 5 13.9 4 22.5 3 25.3 2 17.2 1 21.1 35
The score distribution shown in the table is for all stunts who took a yearly AP statistics exam. An AP statistics teacher had 46 stunts preparing to take the AP exam. He consired his stunts to be typical of all the national stunts. Score Percent of stunts 5 13.9 4 22.5 3 25.3 2 17.2 1 21.1 36
What is the probability that his stunts will achieve an average score of at least 3. 1. Find mean and standard viation of the population. Score 5 13.9 4 22.5 3 25.3 2 17.2 1 21.1 Percent of stunts μ= E(X) = Σ x * P(X) = 2.909 σ = sqrt(σ (x μ) 2 * P(x)) = 1.337 37
What is the probability that his stunts will achieve an average score of at least 3? 2. Find mean and standard viation of the sample. Mean = 2.909 SD = σ/sqrt(n)= 1.337/sqrt(46) =.171 3. Find probability: P(x > 3) = 1 normdist(3,2.909,.171,1) =.2976 38
The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal mol with a mean of 16.3 ounces and a standard viation of.21 ounces. a) What fraction of all bags sold are unrweight? P(x<16) = normdist(16,16.3,0.21,1) =.0766 b) Some of the chips are sold in bargain packs of 5 bags. What is the probability that none of the 5 is unrweight? P(x = 0) = p 0 q 5 = (.0766) 5 =.6715 39
The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal mol with a mean of 16.3 ounces and a standard viation of.21 ounces. c) What is the probability that the mean weight of the 5 bags is below the stated amount? P(x<16) = normdist(16,16.3,0.21/sqrt(5),1) =.0007 d) What is the probability that the mean weight of a 30-bag case of potato chips is below 16 ounces? P(x<16) = normdist(16,16.3,0.21/sqrt(30),1) =.0000 40
Suppose that the IQs of university A s stunts can be scribed by a normal mol with mean 130 and standard viation 7 points. Also suppose that IQs of stunts from university B can be scribed by a normal mol with mean 110 and standard viation 12. a) Select a stunt at random from university A. Find the probability that the stunt s IQ is at least 125 points. P(x > 125) = 1 - normdist(125,130, 7,1) =.762 b) Select a stunt at random from each school. Find the probability that the university A stunt s IQ is at least 5 points higher than the university B stunt s IQ. Define Z = A B μ= 130 110 = 20 σ = sqrt(7 2 + 12 2 ) = 13.89 P(Z > 5) = 1 normdist(5,20,13,89,1) = 0.860 41
Suppose that the IQs of university A s stunts can be scribed by a normal mol with mean 130 and standard viation 7 points. Also suppose that IQs of stunts from university B can be scribed by a normal mol with mean 110 and standard viation 12. c) Select 3 university B stunts at random. Find the probability that this groups average IQ is at least 115 points. P(x > 115) = 1 - normdist(115,110, 12/sqrt(3),1) =.235 d) Also select 3 university A stunts at random. What is the probability that their average IQ is at least 5 points higher than the average for the 3 university B stunt? Define Z = A B μ= 130 110 = 20 σ = sqrt(7 2 /3+ 12 2 /3) = 8.02 P(Z > 5) = 1 normdist(5,20,8.02,1) = 0.969 42