Lecture 8 - Sampling Distributions and the CLT

Similar documents
1. Variability in estimates and CLT

Lecture 9 - Sampling Distributions and the CLT

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Nicole Dalzell. July 7, 2014

Milgram experiment. Unit 2: Probability and distributions Lecture 4: Binomial distribution. Statistics 101. Milgram experiment (cont.

Unit 2: Probability and distributions Lecture 4: Binomial distribution

Chapter 3: Distributions of Random Variables

LECTURE 6 DISTRIBUTIONS

Chapter 3: Distributions of Random Variables

MA 1125 Lecture 18 - Normal Approximations to Binomial Distributions. Objectives: Compute probabilities for a binomial as a normal distribution.

Midterm Exam III Review

MidTerm 1) Find the following (round off to one decimal place):

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Chapter 7 Sampling Distributions and Point Estimation of Parameters

STAT Chapter 7: Central Limit Theorem

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Part V - Chance Variability

Elementary Statistics Lecture 5

Statistics and Probability

Statistics, Their Distributions, and the Central Limit Theorem

Module 4: Probability

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Chapter 9: Sampling Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Announcements. Data resources: Data and GIS Services. Project. Lab 3a due tomorrow at 6 PM Project Proposal. Nicole Dalzell.

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Data Analysis and Statistical Methods Statistics 651

What do you think "Binomial" involves?

CHAPTER 5 SAMPLING DISTRIBUTIONS

STAT 241/251 - Chapter 7: Central Limit Theorem

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

BIOL The Normal Distribution and the Central Limit Theorem

ECON 214 Elements of Statistics for Economists 2016/2017

5.4 Normal Approximation of the Binomial Distribution

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

7 THE CENTRAL LIMIT THEOREM

Lecture 9. Probability Distributions. Outline. Outline

Chapter 6: Random Variables

Central Limit Theorem (cont d) 7/28/2006

Lecture 10 - Confidence Intervals for Sample Means

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 9. Probability Distributions

Statistics. Marco Caserta IE University. Stats 1 / 56

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

CHAPTER 6 Random Variables

Sampling Distribution Models. Copyright 2009 Pearson Education, Inc.

Lesson 97 - Binomial Distributions IBHL2 - SANTOWSKI

Opening Exercise: Lesson 91 - Binomial Distributions IBHL2 - SANTOWSKI

Binomial Random Variable - The count X of successes in a binomial setting

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Central Limit Theorem

Business Statistics 41000: Probability 4

Math 361. Day 8 Binomial Random Variables pages 27 and 28 Inv Do you have ESP? Inv. 1.3 Tim or Bob?

Chapter 7: Point Estimation and Sampling Distributions

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

The normal distribution is a theoretical model derived mathematically and not empirically.

Chapter 8: The Binomial and Geometric Distributions

Random Variables. Chapter 6: Random Variables 2/2/2014. Discrete and Continuous Random Variables. Transforming and Combining Random Variables

Chapter 8. Binomial and Geometric Distributions

4.1 Probability Distributions

Section The Sampling Distribution of a Sample Mean

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

2011 Pearson Education, Inc

Probability Models. Grab a copy of the notes on the table by the door

Math 14 Lecture Notes Ch The Normal Approximation to the Binomial Distribution. P (X ) = nc X p X q n X =

Chapter 8: Binomial and Geometric Distributions

Chapter 9. Sampling Distributions. A sampling distribution is created by, as the name suggests, sampling.

AMS7: WEEK 4. CLASS 3

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

STAT 201 Chapter 6. Distribution

Distribution of the Sample Mean

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

***SECTION 8.1*** The Binomial Distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial


MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Lecture 4: The binomial distribution

Statistics for Business and Economics: Random Variables:Continuous

Standard Normal, Inverse Normal and Sampling Distributions

5.4 Normal Approximation of the Binomial Distribution Lesson MDM4U Jensen

12 Math Chapter Review April 16 th, Multiple Choice Identify the choice that best completes the statement or answers the question.

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

Chapter 7 Study Guide: The Central Limit Theorem

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Sampling and sampling distribution

(# of die rolls that satisfy the criteria) (# of possible die rolls)

4.3 Normal distribution

MA : Introductory Probability

6. THE BINOMIAL DISTRIBUTION

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Theoretical Foundations

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Chapter 7. Sampling Distributions

The Binomial and Geometric Distributions. Chapter 8

Transcription:

Lecture 8 - Sampling Distributions and the CLT Statistics 102 Kenneth K. Lopiano September 18, 2013

1 Basics Improvements 2 Variability of Estimates Activity Sampling distributions - via simulation Sampling distributions - via CLT Statistics 102

Basics Histograms of number of successes Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases? 0 2 4 6 n = 10 0 2 4 6 8 10 n = 30 0 5 10 15 20 n = 100 10 20 30 40 50 n = 300 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 2 / 26

Basics Histograms of number of successes Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases? 2 1 0 1 2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 n = 10 2 1 0 1 2 0 2 4 6 8 n = 30 2 1 0 1 2 5 10 15 n = 100 2 1 0 1 2 20 25 30 35 40 n = 300 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 2 / 26

Basics How large is large enough? The sample size is considered large enough if the expected number of successes and failures are both at least 10. np 15 and n(1 p) 15 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 3 / 26

Basics An analysis of Facebook users A recent study found that Facebook users get more than they give. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends content an average of 14 times, but had their content liked an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained? http:// www.pewinternet.org/ Reports/ 2012/ Facebook-users/ Summary.aspx Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 4 / 26

Basics An analysis of Facebook users A recent study found that Facebook users get more than they give. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends content an average of 14 times, but had their content liked an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained? Power users - add much more content than the typical user http:// www.pewinternet.org/ Reports/ 2012/ Facebook-users/ Summary.aspx Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 4 / 26

Basics Facebook cont. This study found that approximately 25% of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(X 70). Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 5 / 26

Basics Facebook cont. This study found that approximately 25% of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(X 70). P(X 70) = P(X = 70 or X = 71 or X = 72 or or X = 245) = P(X = 70) + P(X = 71) + P(X = 72) + + P(X = 245) Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 5 / 26

Basics Facebook cont. This study found that approximately 25% of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(X 70). P(X 70) = P(X = 70 or X = 71 or X = 72 or or X = 245) = P(X = 70) + P(X = 71) + P(X = 72) + + P(X = 245) This seems like an awful lot of work... Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 5 / 26

Basics Normal approximation to the binomial When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with parameters µ = np and σ = np(1 p). In the case of the Facebook power users, n = 245 and p = 0.25. µ = 245 0.25 = 61.25 σ = 245 0.25 0.75 = 6.78 Bin(n = 245, p = 0.25) N(µ = 61.25, σ = 6.78). 0.06 0.05 Bin(245,0.25) N(61.5,6.78) 0.04 0.03 0.02 0.01 0.00 20 40 60 80 100 Statistics 102 (Kenneth K. Lopiano) Lec 8 k September 18, 2013 6 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? P(X 70) 61.25 70 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? P(X 70) Z = obs mean SD = 70 61.25 6.78 = 1.29 61.25 70 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? P(X 70) Z = obs mean SD = 70 61.25 6.78 = 1.29 Second decimal place of Z Z 0.05 0.06 0.07 0.08 0.09 1.0 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8944 0.8962 0.8980 0.8997 0.9015 61.25 70 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? P(X 70) obs mean 70 61.25 Z = = = 1.29 SD 6.78 P(Z 1.29) = 1 0.9015 = 0.0985 Second decimal place of Z Z 0.05 0.06 0.07 0.08 0.09 1.0 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8944 0.8962 0.8980 0.8997 0.9015 61.25 70 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Basics Facebook cont. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? P(X 70) obs mean 70 61.25 Z = = = 1.29 SD 6.78 P(Z 1.29) = 1 0.9015 = 0.0985 61.25 70 P(X 70) = 0.1128 Second decimal place of Z Z 0.05 0.06 0.07 0.08 0.09 1.0 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8944 0.8962 0.8980 0.8997 0.9015 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 7 / 26

Improvements Improving the approximation Take for example a Binomial distribution where n = 20 and p = 0.5, we should be able to approximate the distribution of X using N(10, 5). 2 3 4 5 6 7 8 9 10 12 14 16 18 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 8 / 26

Improvements Improving the approximation Take for example a Binomial distribution where n = 20 and p = 0.5, we should be able to approximate the distribution of X using N(10, 5). 2 3 4 5 6 7 8 9 10 12 14 16 18 It is clear that our approximation is missing about 1/2 of P(X = 7) and P(X = 13), as n this error is very small. In this case P(X = 7) = P(X = 13) = 0.073 so our approximation is off by 7%. Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 8 / 26

Improvements Improving the approximation, cont. Binomial probability: 13 ( ) 20 P(7 X 13) = 0.5 k (1 0.5) 20 k = 0.88468 k k=7 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 9 / 26

Improvements Improving the approximation, cont. Binomial probability: 13 ( ) 20 P(7 X 13) = 0.5 k (1 0.5) 20 k = 0.88468 k Naive approximation: P(7 X 13) P k=7 ( ) ( 13 10 Z P Z 7 10 ) = 0.82029 5 5 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 9 / 26

Improvements Improving the approximation, cont. Binomial probability: 13 ( ) 20 P(7 X 13) = 0.5 k (1 0.5) 20 k = 0.88468 k Naive approximation: P(7 X 13) P k=7 ( ) ( 13 10 Z P Z 7 10 ) = 0.82029 5 5 Continuity corrected approximation: P(7 X 13) P ( ) 13 + 1/2 10 Z P 5 ( ) 7 1/2 10 Z = 0.88248 5 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 9 / 26

Improvements Improving the approximation, cont. This correction also lets us do, moderately useless things like calculate the probability for a particular value of k. Such as, what is the chance of 50 Heads in 100 tosses of slightly unfair coin (p = 0.55)? Binomial probability: P(X = 50) = Naive approximation: P(X = 50) P ( ) 100 0.55 50 (1 0.55) 50 = 0.04815 50 ( Z Continuity corrected approximation: P(X = 50) P ) ( 50 55 P Z 4.97 ( ) 50 + 1/2 55 Z P 4.97 ) 50 55 = 0 4.97 ( ) 50 1/2 55 Z = 0.04839 4.97 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 10 / 26

Improvements Example - Rolling lots of dice Roll a fair die 500 times, what s the probability of rolling at least 100 ones? Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 11 / 26

Improvements Example - Rolling lots of dice Roll a fair die 500 times, what s the probability of rolling at least 100 ones? P(X 100) = 500 k=100 ( 500 k=0 k ) (1/6) k (5/6) 500 k 99 ( ) 500 = 1 (1/6) k (5/6) 500 k k = 1 pbinom(99, 500, 1/6) = 1 0.9717129 = 0.0282871 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 11 / 26

Improvements Example - Rolling lots of dice Roll a fair die 500 times, what s the probability of rolling at least 100 ones? Since n is large, X is approximately normal with mean µ = np = 500/6 = 83.33 and SD σ = npq = 2500/36 = 8.333 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 11 / 26

Improvements Example - Rolling lots of dice Roll a fair die 500 times, what s the probability of rolling at least 100 ones? Since n is large, X is approximately normal with mean µ = np = 500/6 = 83.33 and SD σ = npq = 2500/36 = 8.333 ( P(X 100) P Z = P ( Z ) 100 1/2 µ σ = 1 P(Z 1.94) = 1 0.09738 = 0.0262 100 1/2 83.33 8.333 ) Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 11 / 26

Variability of Estimates 1 Basics Improvements 2 Variability of Estimates Activity Sampling distributions - via simulation Sampling distributions - via CLT Statistics 102

Variability of Estimates Parameter estimation We are often interested in population parameters. Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. Sample statistics vary from sample to sample. Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. But before we get to quantifying the variability among samples, let s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means to be the same, somewhat different, or very different? Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 12 / 26

Variability of Estimates Activity Estimate the avg. # of drinks it takes to get drunk We would like to estimate the average (self reported) number of drinks it takes a person get drunk, we assume that we have the population data: Number of drinks to get drunk 0 5 10 15 20 25 0 2 4 6 8 10 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 13 / 26

Variability of Estimates Activity Estimate the avg. # of drinks it takes to get drunk (cont.) Sample, with replacement, ten respondents and record the number of drinks it takes them to get drunk. Use RStudio to generate 10 random numbers between 1 and 146 sample(1:146, size = 10, replace = FALSE) If you don t have a computer, ask a neighbor to generate a sample for you. Find the sample mean, round it to 1 decimal place, and record it. Time permitting, obtain another sample. If we randomly select observations from this data set, which values are most likely to be selected, which are least likely? Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 14 / 26

Variability of Estimates Activity Estimate the avg. # of drinks it takes to get drunk (cont.) sample(1:146, size = 10, replace = FALSE) ## [1] 59 121 88 46 58 72 82 81 5 10 (8 + 6 + 10 + 4 + 5 + 3 + 5 + 6 + 6 + 6)/10 = 5.9 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 15 / 26

Variability of Estimates Activity 1 7 21 6 41 6 61 10 81 6 101 4 121 6 141 4 2 5 22 2 42 10 62 7 82 5 102 7 122 5 142 6 3 4 23 6 43 3 63 4 83 6 103 6 123 3 143 6 4 4 24 7 44 6 64 5 84 8 104 8 124 2 144 4 5 6 25 3 45 10 65 6 85 4 105 3 125 2 145 5 6 2 26 6 46 4 66 6 86 10 106 6 126 5 146 5 7 3 27 5 47 3 67 6 87 5 107 2 127 10 8 5 28 8 48 3 68 7 88 10 108 5 128 4 9 5 29 0 49 6 69 7 89 8 109 1 129 1 10 6 30 8 50 8 70 5 90 5 110 5 130 4 11 1 31 5 51 8 71 10 91 4 111 5 131 10 12 10 32 9 52 8 72 3 92 0.5 112 4 132 8 13 4 33 7 53 2 73 5.5 93 3 113 4 133 10 14 4 34 5 54 4 74 7 94 3 114 9 134 6 15 6 35 5 55 8 75 10 95 5 115 4 135 6 16 3 36 7 56 3 76 6 96 6 116 3 136 6 17 10 37 4 57 5 77 6 97 4 117 3 137 7 18 8 38 0 58 5 78 5 98 4 118 4 138 3 19 5 39 4 59 8 79 4 99 2 119 4 139 10 20 10 40 3 60 4 80 5 100 5 120 8 140 4 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 16 / 26

Variability of Estimates Activity Sampling distribution What we just constructed is called a sampling distribution. What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 17 / 26

Variability of Estimates Activity Sampling distribution What we just constructed is called a sampling distribution. What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? µ = 5.39 σ = 2.37 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 17 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended Next let s look at the population data for the number of Duke basketball games attended: Frequency 0 50 100 150 0 10 20 30 40 50 60 70 Statistics 102 (Kenneth K. Lopiano) number Lec of Duke 8 games attended September 18, 2013 18 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Is the variability of the sampling distribution smaller or larger than the variability of the population distribution? Why? 0 5 10 15 20 sample means from samples of n = 10 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 19 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Sample mean, x, of samples of size n = 10. Is the variability of the sampling distribution smaller or larger than the variability of the population distribution? Why? 0 5 10 15 20 sample means from samples of n = 10 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 19 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Sample mean, x, of samples of size n = 10. Is the variability of the sampling distribution smaller or larger than the variability of the population distribution? Why? Smaller, sample means will vary less than individual observations. 0 5 10 15 20 sample means from samples of n = 10 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 19 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: Frequency 0 200 400 600 800 How did the shape, center, and spread of the sampling distribution change going from n = 10 to n = 30? 2 4 6 8 10 sample means from samples of n = 30 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 20 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: Frequency 0 200 400 600 800 How did the shape, center, and spread of the sampling distribution change going from n = 10 to n = 30? Shape is more symmetric, center is about the same, spread is smaller. 2 4 6 8 10 sample means from samples of n = 30 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 20 / 26

Variability of Estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 70: Frequency 0 200 400 600 800 1000 1200 3 4 5 6 7 8 9 sample means from samples of n = 70 Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 21 / 26

Variability of Estimates Sampling distributions - via CLT Central Limit Theorem Central limit theorem The distribution of the sample mean is well approximated by a normal model: x N (mean = µ, SE = n σ ) If σ is unknown, use s. So it wasn t a coincidence that the sampling distributions we saw earlier were symmetric. We won t go into the proving why SE = σ n, but note that as n increases SE decreases. As the sample size increases we would expect samples to yield more consistent sample means, hence the variability among the sample means would be lower. Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 22 / 26

Variability of Estimates Sampling distributions - via CLT CLT - Conditions Certain conditions must be met for the CLT to apply: 1 Independence: Sampled observations must be independent. This is difficult to verify, but is more likely if random sampling/assignment is used, and n < 10% of the population. 2 Sample size/skew: the population distribution must be nearly normal or n > 30 and the population distribution is not extremely skewed. This is also difficult to verify for the population, but we can check it using the sample data, and assume that the sample mirrors the population. Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 23 / 26

Variability of Estimates Sampling distributions - via CLT CLT - sample size/skew condition - simulations (1) http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 24 / 26

Variability of Estimates Sampling distributions - via CLT CLT - sample size/skew condition - simulations (2) http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 25 / 26

Variability of Estimates Sampling distributions - via CLT CLT - sample size/skew condition - simulations (3) http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 102 (Kenneth K. Lopiano) Lec 8 September 18, 2013 26 / 26