Sampling Distributions

Similar documents
Chapter 7. Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

Chapter 7 Study Guide: The Central Limit Theorem

Chapter 9 & 10. Multiple Choice.

Sampling Distributions Chapter 18

7 THE CENTRAL LIMIT THEOREM

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Normal Probability Distributions

8.1 Estimation of the Mean and Proportion

22.2 Shape, Center, and Spread

Making Sense of Cents

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

and µ Asian male > " men

The Central Limit Theorem

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Chapter 7: Point Estimation and Sampling Distributions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Lecture 6: Chapter 6

Sampling and sampling distribution

1 Sampling Distributions

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Section The Sampling Distribution of a Sample Mean

AMS7: WEEK 4. CLASS 3

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Chapter 5. Sampling Distributions

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Multiple-Choice Questions

AP * Statistics Review

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Binomial Random Variable - The count X of successes in a binomial setting

Confidence Intervals and Sample Size

Chapter 8 Estimation

CHAPTER 5 SAMPLING DISTRIBUTIONS

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 9 Chapter Friday, June 4 th

AP Stats ~ Lesson 6B: Transforming and Combining Random variables

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

STAT Chapter 7: Confidence Intervals

Statistical Intervals (One sample) (Chs )

Chapter 15: Sampling distributions

Name PID Section # (enrolled)

The Normal Approximation to the Binomial

Central Limit Theorem

The Normal Distribution

Sampling Distribution Models. Copyright 2009 Pearson Education, Inc.

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

CHAPTER 5 Sampling Distributions

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

1. Variability in estimates and CLT

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

5-1 pg ,4,5, EOO,39,47,50,53, pg ,5,9,13,17,19,21,22,25,30,31,32, pg.269 1,29,13,16,17,19,20,25,26,28,31,33,38

The Normal Model The famous bell curve

Stats CH 6 Intro Activity 1

Chapter 7. Sampling Distributions and the Central Limit Theorem

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

The Central Limit Theorem for Sample Means (Averages)

Sampling Distributions For Counts and Proportions

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

AP Stats. Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Module 4: Point Estimation Statistics (OA3102)

DO NOT POST THESE ANSWERS ONLINE BFW Publishers 2014

Study Ch. 7.3, # 63 71

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Normal Model (Part 1)

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter Seven. The Normal Distribution

Chapter 7. Sampling Distributions and the Central Limit Theorem

Homework: (Due Wed) Chapter 10: #5, 22, 42

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

ECON 214 Elements of Statistics for Economists 2016/2017

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Business Statistics 41000: Probability 4

Chapter 5: Statistical Inference (in General)

Chapter 6: Random Variables

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

Section Introduction to Normal Distributions

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

AP Stats Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 9: Sampling Distributions

Normal Probability Distributions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Using the Central Limit

The Binomial Probability Distribution

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Transcription:

AP Statistics Ch. 7 Notes Sampling Distributions A major field of statistics is statistical inference, which is using information from a sample to draw conclusions about a wider population. Parameter: A number that describes some characteristic of the population. In practice, the value of the parameter is usually unknown because it is often impossible to eamine the entire population. µ (population mean) 2 σ (population variance) and σ (population standard deviation) p (population proportion the proportion of individuals in the population with a certain characteristic) Statistic: A number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample data. We often use a statistic to estimate an unknown parameter. (sample mean) 2 s (sample variance) and s (sample standard deviation) ˆp (sample proportion the proportion of individuals in the sample with a certain characteristic). Remember: statistics come from samples and parameters come from populations. Eample: Identify the population, the parameter, the sample, and the statistic in each of the following setting. a) A pediatrician wants to know the 75 th percentile for the distribution of heights of 10-year-old boys, so she takes a sample of 50 patients and calculates Q3 = 56 inches. b) A Pew Research Center poll asked 1102 12- to 17-year-olds in the United States if they have a cell phone. Of the respondents, 71% said yes. Sampling Variability: The value of a statistic varies from sample to sample in repeated samples from the same population. In order to answer a question about a parameter based on a sample, we need to know eactly how the value of a statistic varies from sample to sample. We ask ourselves, What would happen if we took many samples of the same size from this population? Take a large number of samples of the same size from the same population. Calculate the statistic (like or ˆp ) for each sample. Make a graph of the different values of the statistic. Eamine the distribution displayed in the graph for shape, center, and spread, as well as for outliers or other deviations.

Sampling Distribution: The distribution of values taken by the statistic in all possible samples of the same size from the same population. Eamples: The aces and face cards are removed from a deck of cards so that only cards 2 through 10 remain. The deck is thoroughly shuffled and a sample of 5 cards is selected. The median value of the five cards is recorded. This process is repeated 25 times and the following values of sample median are observed. 2 3 4 5 6 7 Sample Median 8 9 10 Describe what you see: shape, center, spread, and any unusual values. A computer was used to simulate choosing 500 SRSs of size 5 from the deck of cards described above. The graph below shows the distribution of the sample median for these 500 samples. 2 3 4 5 6 7 8 9 10 SampleMedian Is this the sampling distribution of the sample median? Why or why not? Describe the distribution. Suppose that another student prepared a different deck of cards and claimed that it was eactly the same as the one used previously. When you took an SRS of size 5, the median was 4. Does this provide convincing evidence that the student s deck is different?

There are three distributions involved when we sample repeatedly, and it is very important to be clear which one we are talking about. Population distribution: Gives the values of the variable of interest for all individuals in the population. Distribution of sample data: Gives the values of the variable of interest for the individuals in the sample. Sampling distribution: Gives the values of the statistic for all possible samples of the same size taken from the population. The population distribution and the distribution of sample data describe individuals. A sampling distribution describes how a statistic varies in many samples from the population. When we calculate the value of a statistic, we usually want to use it to estimate the value of a population parameter. To determine how reliable a prediction based on a statistic is, we consider the shape, center, and spread of the sampling distribution for the statistic. Unbiased estimator: A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated. Unbiased doesn t mean perfect. An unbiased estimator will almost always provide an estimate that is not eactly equal to the true value of the population parameter. It is called unbiased because in repeated samples, the estimates won t be consistently too high or consistently too low. When we talk about biased and unbiased estimators, we are assuming that the sampling process we are using has no bias. We are assuming that there are no sampling or non-sampling errors present, just sampling variability. If the sampling process is flawed, there can be bias even if we are using what is otherwise considered an unbiased estimator.

Eample: A teacher thoroughly mies identically-sized slips of paper numbered 1 through 342 in a bag. Students draw out repeated samples of four numbers each and work to develop a formula to estimate the total number of slips, N, that are in the bag. The graph below shows the estimates produced by the following five different methods: (1) Partition = sample maimum (5/4) (2) Ma = sample maimum (3) MeanMediam = sample mean + sample median (4) SumQuartiles = Q 1 + Q 3 (both sample values) (5) TwiceIQR = 2 sample IQR Partition Ma MeanMedian SumQuartil... The thick line through the graph marks the true value of N = 342. a) Which of these statistics appear to be biased estimators? Eplain. TwiceIQR 0 100 200 300 400 500 600 700 b) Of the unbiased estimators, which is best? Why? c) Eplain why a biased estimator might be preferred over an unbiased estimator. Variability of a statistic: How much the value of the statistic varies from sample to sample. It is described by the spread of its sampling distribution. This spread is determined primarily by the size of the random sample. Larger samples give smaller variability in the values of a statistic. Larger samples will reduce the variability of a statistic, but they don t eliminate bias! Eample: Suppose that the heights of adult males are approimately Normally distributed with a mean of 70 inches and a standard deviation of 3 inches. To see why sample size matters, we took 1000 SRSs of size 100 and calculated the sample mean height and then took 1000 SRSs of size 1500 and calculated the sample mean height. Here are the results, graphed on the same scale for easy comparisons: 500 400 300 200 100 500 400 300 200 100 69.0 70.0 71.0 SampleMean100 69.0 70.0 71.0 SampleMean1500 Compare the shape, center, and spread of the distributions. What does this tell you about the relationship between sample size and sample mean?

We can represent the true value of the parameter we are trying to estimate by the bulls-eye of a target. The values of the statistic from sample to sample are represented by an arrow that is repeatedly shot at the target. Bias means our aim is off and we consistently miss the true value of the parameter. High variability means that the repeated shots (the values of the statistic from sample to sample) are widely scattered. When we select which statistic we want to use to estimate the value of a parameter, we want to choose a statistic that is accurate (unbiased) and precise (low variability).

Sample Proportions How good is the statistic ˆp (the proportion of individuals in a sample with a given characteristic) as an estimate of the parameter p (the proportion of individuals in the population with the given characteristic)? To answer, we ask, What would happen if we take many samples? Eample: In a population of N = 616 pennies, the proportion that were minted after 2005 is p = 0.175. Five hundred samples each of sizes 5, 10, 20, and 50 are taken. The distributions of ˆp for the 500 samples of each size are shown below. Compare the shape, center, and spread of the distributions. p-hat 5 p-hat 10 p-hat 20 p-hat 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sample Proportions Each sy mbol represents up to 18 observ ations. 0.8 Sampling Distribution of a Sample Proportion Choose an SRS of size n from a population of size N with proportion p of successes and proportion q of failures. Let ˆp be the sample proportion of successes. Then: The mean of the sampling distribution of ˆp is µ ˆ = p. p pq The standard deviation of the sampling distribution of ˆp is σ ˆp = as long as the observations are n 1 independent or the 10% condition is satisfied: n N or N 10 n. 10 As n increases, the sampling distribution of ˆp becomes approimately Normal. Before you use Normal calculations, check that the Normal condition is satisfied: np 10 and nq 10. Since larger random samples give better information, it sometimes make sense to sample more than 10% of the population. In this case, there s a more accurate formula for calculating the standard deviation σ pˆ. It uses something called a finite population correction (FPC). The formula without the FPC will always give a larger (more conservative) estimate of standard deviation than the actual standard deviation. In case you are dying to pq n know, the formula is σ pˆ = 1. n N

When solving problems involving sample proportions: 1. Justify using the Normal distribution by checking np 10 and nq 10. 2. Find µ ˆp and σ pˆ. Check the 10% condition to justify using the formula for σ pˆ. 3. Write a probability statement and draw and shade a Normal curve. p ˆ µ pˆ 4. Perform Normal calculations either by using z-scores z = and a table or by using normalcdf on σ pˆ your calculator. 5. Write your answer in contet. Eamples: About 75% of young adult Internet users (ages 18 to 29) watch online video. Suppose that a sample survey contacts an SRS of 1000 young adult Internet users and calculates the proportion ˆp in this sample who watch online video. (a) What is the mean of the sampling distribution of p ˆ? Eplain the meaning of µ ˆ. p (b) Find the standard deviation of the sampling distribution of p ˆ. Check that the 10% condition is satisfied. Then eplain the meaning of σ pˆ. (c) Is the sampling distribution of ˆp approimately Normal? Check that the Normal condition is met. (d) If the sample size were 9000 rather than 1000, how would this change the sampling distribution of p ˆ? Eample: The superintendent of a large school district wants to know what proportion of middle school students in her district are planning to attend a four-year college or university. Suppose that 80% of all middle school students in her district are planning to attend a four-year college or university. What is the probability that an SRS of size 125 will give an estimate of this proportion that is within 7 percentage points of the true value?

Sample Means Eample: The histogram below shows the distribution of mint dates on a population of N = 616 pennies. Proportion 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 1960 1968 1976 1984 1992 Year 500 samples of size n = 5 and 500 samples of size n = 25 were taken from this population. The sample mean for each sample was calculated. The distributions of sample means for each sample size are shown below. 2000 2008 0.20 0.20 Proportion 0.15 0.10 0.05 Proportion 0.15 0.10 0.05 0.00 0.00 1980 1984 1988 1992 1996 2000 2004 2008 1980 1984 1988 1992 1996 2000 Sample Means, n=5 Sample Means, n=25 Compare the population distribution to the two distributions of sample means. 2004 2008 Mean and Standard Deviation of the Sampling Distribution of Suppose that is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ. Then: The mean of the sampling distribution of is µ = µ. σ The standard deviation of the sampling distribution of is σ = as long as the observations are n 1 independent or the 10% condition is satisfied: n N or N 10 n. 10 These facts about the mean and standard deviation of are true no matter what shape the population distribution has. If the sample is larger than 10% of the population, the finite population correction factor (FPC) is used and the σ n formula for the standard deviation of is σ = 1. n N

Eample: Suppose that the number of movies viewed in the last year by high school students has an average of 19.3 with a standard deviation of 15.8. Suppose we take an SRS of 100 high school students and calculate the mean number of movies viewed by the members of the sample. (a) What is the mean of the sampling distribution of? Eplain the meaning of µ. (b) What is the standard deviation of the sampling distribution of? Check that the 10% condition is satisfied. Eplain the meaning of σ. Sampling Distribution of a Sample Mean from a Normal Population Suppose that a population is Normally distributed with mean µ and standard deviation σ. Then the sampling distribution of has the Normal distribution with mean µ and standard deviation σ n, provided that the 10% condition is meant. This is true no matter what the sample size is. Eample: At the P. Nutty Peanut Company, dry-roasted, shelled peanuts are placed in jars by a machine. The distribution of weights in the jars is approimately Normal, with a mean of 16.1 ounces and a standard deviation of 0.15 ounces. (a) Without doing any calculations, eplain which outcome is more likely: randomly selecting a single jar and finding that the contents weigh less than 16 ounces or randomly selecting 10 jars and finding that the average contents weigh less than 16 ounces. (b) Find the probability of each event described above. The fact that averages of several observations are less variable than individual observations is important in many settings. It is common practice to repeat a measurement several times and report the average of the results. Think of the results of n repeated measurements as an SRS from the population of outcomes we would get if we repeated the measurement forever. The average of the n results is less variable than a single measurement.

Most population distributions are not Normal, so we need to figure out what shape the sampling distribution of is for a non-normal population or a population of unknown shape. The Central Limit Theorem (CLT) If a random sample of n observations is selected from any population and the sample size is sufficiently large ( n 30 ), then the sampling distribution of is approimately Normal. Normal Condition for Sample Means If the population distribution is Normal, then so is the sampling distribution of. This is true no matter what the sample size n is. If the population distribution is not Normal, the CLT tells us that the sampling distribution of will be approimately Normal in most cases if n 30. When solving problems involving sample means: 1. Justify using the Normal distribution using the conditions above. 2. Find µ and σ. Check the 10% condition to justify using the formula for σ. 3. Write a probability statement and draw and shade a Normal curve. µ 4. Perform Normal calculations either by using z-scores z = and a table or by using normalcdf on σ your calculator. 5. Write your answer in contet. Eamples: Suppose that the number of tets sent during a typical day by a randomly selected high school student follows a right-skewed distribution with a mean of 15 and a standard deviation of 35. Assuming that students at your school are typical teters, how likely is it that a random sample of 50 students will have sent more than a total of 1000 tets in the last 24 hours?