AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Similar documents
. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Elementary Statistics Lecture 5

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Confidence Intervals Introduction

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Chapter 5. Sampling Distributions

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Chapter 9: Sampling Distributions

STAT Chapter 7: Central Limit Theorem

Chapter 9. Sampling Distributions. A sampling distribution is created by, as the name suggests, sampling.

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

SAMPLING DISTRIBUTIONS. Chapter 7

2011 Pearson Education, Inc

Distribution of the Sample Mean

Activity #17b: Central Limit Theorem #2. 1) Explain the Central Limit Theorem in your own words.

University of California, Los Angeles Department of Statistics. Normal distribution

ECON 214 Elements of Statistics for Economists 2016/2017

Statistics Class 15 3/21/2012

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Statistics 13 Elementary Statistics

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Midterm Exam III Review

5.3 Interval Estimation

STAT 241/251 - Chapter 7: Central Limit Theorem

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

Statistics and Probability

Chapter 8 Statistical Intervals for a Single Sample

STAT Chapter 7: Confidence Intervals

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Section Introduction to Normal Distributions

Lecture 8 - Sampling Distributions and the CLT

Sampling and sampling distribution

Estimation Y 3. Confidence intervals I, Feb 11,

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Data Analysis and Statistical Methods Statistics 651

Lecture 9. Probability Distributions. Outline. Outline

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Chapter 5. Statistical inference for Parametric Models

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Lecture 9. Probability Distributions

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

The Binomial Probability Distribution

CHAPTER 5 SAMPLING DISTRIBUTIONS

BIO5312 Biostatistics Lecture 5: Estimations

The Binomial Distribution

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Part V - Chance Variability

Density curves. (James Madison University) February 4, / 20

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

BIOL The Normal Distribution and the Central Limit Theorem

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

1 Small Sample CI for a Population Mean µ

MATH 3200 Exam 3 Dr. Syring

The Normal Approximation to the Binomial

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Standard Normal, Inverse Normal and Sampling Distributions

AP Statistics Ch 8 The Binomial and Geometric Distributions

1 Sampling Distributions

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

The Bernoulli distribution

AMS7: WEEK 4. CLASS 3

4.2 Bernoulli Trials and Binomial Distributions

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Central Limit Theorem (cont d) 7/28/2006

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Statistics 251: Statistical Methods Sampling Distributions Module

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Section Random Variables and Histograms

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Review: Population, sample, and sampling distributions

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

5.4 Normal Approximation of the Binomial Distribution

Confidence Intervals and Sample Size

11.5: Normal Distributions

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

MidTerm 1) Find the following (round off to one decimal place):

Chapter Six Probability Distributions

Commonly Used Distributions

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

MA 1125 Lecture 18 - Normal Approximations to Binomial Distributions. Objectives: Compute probabilities for a binomial as a normal distribution.

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

1. Variability in estimates and CLT

Data Analysis and Statistical Methods Statistics 651

Problem Set 07 Discrete Random Variables

Binomal and Geometric Distributions

χ 2 distributions and confidence intervals for population variance

LECTURE 6 DISTRIBUTIONS

Homework Assignments

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Math : Spring 2008

The binomial distribution p314

Math 14 Lecture Notes Ch The Normal Approximation to the Binomial Distribution. P (X ) = nc X p X q n X =

Transcription:

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Summer 2014 1 / 26 Sampling Distributions!!!!!! If we sample cookies, what is the distribution of the # of chips in a cookie? Poisson This is the sampling distribution of the # of chips. What if we think about the average # of chips per cookie in a bag? Is this average the same for all bags? No, it is random, and its distribution is derived from the sampling distribution of an individual cookie. This is the sampling distribution of the mean 2 / 26

x 1 x 2 x 3 x M where x s are the mean number of chocolate chips in a cookie PER box, and M is the total number of boxes (number of trials. We are interested in what the distribution of the sample means looks like. X? 3 / 26 Formal definitions: The sampling distribution of the mean is the probability distribution of the sample means, x, with all samples having the same sample size n. The sampling distribution of the proportion is the probability distribution of the sample proportions, ˆp, with all samples having the same sample size n. 4 / 26

The Central Limit Theorem tells us that the sampling distribution of the mean will be approximately normal no matter what the distribution of the individual observations is. Formally... Central Limit Theorem If samples of size n are drawn from a population with mean µ and standard deviation σ, then the sampling distribution of the samples means, x, will be approximately normally distributed with mean µ x = µ and sd σ x = σ n. X? with mean µ and sd σ X N (µ x = µ, σ x = n σ where σ x is called the (population standard error (of the mean. 5 / 26 NOTE: If the population of individuals is normally distributed, then x is exactly normally distributed. Otherwise x is approximately normally distributed, with the approximation getting better as n increases. In general, n 30 is good. E.g. The population of the # of chocolate chips in a cookie follows a Poisson distribution, so the distribution of the mean # of chocolate chips in a box will be approximately normally distributed. E.g. The population of the length of sharks follows a normal distribution, so the distribution of the mean length of sharks is exactly normally distributed. Also note that as n, σ x 0. 6 / 26

Example: The weight of a cookie is normally distributed with mean µ = 11 grams and sd σ = 0.5 grams. What is the distribution of the mean weight of cookies in a sample of size 32? X N(11, ( 0.5 X N µ x = µ = 11, σ x = σ n = 0.5 32 = 0.088 So What is the probability that the average weight of a cookie in a bag of 32 will be at least 10.8? 7 / 26 P( X > 10.8 = P ( X µ x σ x > 10.8 µ x σ x = P ( Z > 10.8 11 0.088 = P(Z > 2.27 = 1 P(Z < 2.27 = 1 0.0116 = 0.9884 whereas the probability of selecting a SINGLE cookie from the population with weight at least 10.8 would be... P(X > 10.8 = P ( X µ σ > 10.8 µ σ = P ( Z > 10.8 11 0.5 = P(Z > 0.4 = 1 P(Z < 0.4 = 1 0.3446 = 0.6554 8 / 26

Now, suppose the manufacturer wants to label their packages so that 99% of the bags will have an average cookie weight at least that large. What mean cookie weight should they specify? P(Z < z = 0.01 z = 2.33 P(Zσ x + µ x < 2.33σ x + µ x = P( X < 2.33(0.088 + 11 = P( X < 10.79 Therefore, 99% of the bags will have an average cookie weight at least 10.79. whereas for the population of individual cookie weights, we can work out to see that 99% of the cookies will have a cookie weight of at least 9.84. 9 / 26. What if the manufacturer wants to claim that 95% of the bags will have an average weight in some interval? How do we find this interval? (assume symmetry We need to find z such that P( z < Z < z = 0.95 P(Z < z = 1 0.95 2 = 0.025 P(Z < 1.96 = 0.025 (from z-table so, P( 1.96 < Z < 1.96 = 0.95 P( 1.96σ x + µ x < Zσ x + µ x < 1.96σ x + µ x = P( 1.96(0.088 + 11 < X < 1.96(0.088 + 11 = P(10.83 < X < 11.17 = 0.95 So, 95% of the bags will have an average cookie weight between 10.83g and 11.17g. 10 / 26

(Some more Central Limit Theorem Examples Example: IQ scores are normally distributed with a mean of 100 points and a sd of 15 points. µ = 100 and σ = 15 1. Find the probability that a randomly selected person will have a score below 97. P(X < 97 = P ( X 100 15 < 97 100 15 = P(Z < 0.2 = 0.4207 2. If a random sample of 100 people is taken, find the probability that the mean score of the sample is below 97. By the CLT X N(µ x = 100, σ x = 15 100 : P( X < 97 = P ( X 100 15/ 100 < 97 100 15/ 100 = P(Z < 2 = 0.0228 11 / 26 Example: Suppose a batch of pepper seeds has a mean time to germination of 10.4 days with a sd of 2.13 days. What is the probability that a random sample of 49 seeds will have a mean germination time between 10 and 11 days? We need to find P(10 < X < 11... P(10 < X ( < 11 = P 10 10.4 2.13/ < X 10.4 49 2.13/ < 11 10.4 49 2.13/ 49 = P( 1.31 < Z < 1.97 = P(Z < 1.97 P(Z < 1.31 = 0.9616 0.0951 = 0.8665 12 / 26

Normal Approximation to the Binomial Recall the Binomial distribution... Binomial Probability Distribution: If the following are met 1. Fixed number of trials, n 2. Trials are independent 3. Each trial has only two possible outcomes ( success, failure 4. The probability of success, p, is the same for each trial where the mean is given by µ = np and the sd is given by σ = np(1 p. 13 / 26 Although the binomial is a discrete distribution, and the normal is a continuous distribution, the normal distribution is a good approximation to the binomial distribution provided that, np 5 and n(1 p 5, i.e. there are (on average at least 5 successes and 5 failures Sort of like averaging over coin flips, so CLT applies. We approximate a binomial distribution with n and p, Bin(n, p, with a normal distribution having mean µ = np and sd σ = np(1 p: Bin(n, p N(µ = np, σ = np(1 p Note that because the binomial is discrete, but the normal is continuous, we typically use a continuity correction, moving the count by 1/2 such that the inequality still holds. 14 / 26

Example: A study found that 62% of the households in Alaska have a computer. If we take a random sample of 1000 Alaskan households, what is the probability that at least 640 have a computer? Basic idea: X = # households in the sample with a computer. X Bin(n = 1000, p = 0.62 µ = 1000(0.62 = 620, σ = 1000(0.62(0.38 = 15.35 So, X N(µ = 620, σ = 15.35 Detail: X is discrete, so you could never get X = 639.6 or X = 639.78. Continuity Correction: P(X 640 = P(X > 639.5 P ( X 620 15.35 > 639.5 620 15.35 = P(Z > 1.27 = 1 P(Z < 1.27 = 1 0.8980 = 0.1020 15 / 26 How about a range? What is the probability of the number of households in the sample having a computer being between 615 and 625 non inclusive? P (615 < X < 625 = P (615.5 < X < 624.5 = P ( 615.5 620 15.35 < X 620 15.35 < 624.5 620 15.35 = P(Z < 0.29 P(Z < 0.29 = 0.6141 0.3859 = 0.2282 16 / 26

Recall the example of finding an interval such that the mean cookie weight of the cookies in a randomly chosen bag of cookies has probability 0.95 of being in that interval. We want to find x 1 and x 2 such that P(x 1 < X < x 2 = 0.95. Instead we can find z such that P( z < Z < z = 0.95. 17 / 26 From z-table we get P( 1.96 < Z < 1.96 = 0.95. We use µ x and σ x to convert this interval into one in terms of mean weight of the cookies in a bag of 32 cookies. Z = X µ x σ x X = Zσ x + µ x P( 1.96σ x + µ x < Zσ x + µ x < 1.96σ x + µ x = P( 1.96(0.088 + 11 < X < 1.96(0.088 + 11 = P(10.83 < X < 11.17 = 0.95 18 / 26

What if we don t know the true population mean of a cookie, and want to learn it from the data (measurements of the sample? Inference: Our best guess (point estimate of the population mean µ is the sample mean x. How good do we think this guess is? It depends on the data: How much data? How were they collected? How much variability in the data? Example: Assume we know that the standard deviation of the weight of a cookie is 0.5g, but we don t know the mean weight of a cookie. We get a bag (sample of 32 cookies and find the average weight is 10.9g. How confident are we in this estimate? What would be an interval of plausible values? 19 / 26 Assuming cookie weights are approximately normally distributed (or using the CLT for the mean with n 30, that σ = 0.5 is known and that we have a simple random sample, starting with a standard normal, before we observe X we know P( 1.96 Z 1.96 = 0.95 ( x µ x P( 1.96 Z 1.96 = P 1.96 1.96 ( σ x = P 1.96 x µ σ/ n 1.96 = P ( 1.96 n σ x µ 1.96 n σ = P (1.96 n σ µ x 1.96 n σ = P ( 1.96 n σ µ x 1.96 n σ = P ( x 1.96 n σ µ x + 1.96 n σ = 0.95 So, the random interval ( x 1.96 σ n, x + 1.96 σ n probability of containing the true population mean, µ!! has a 95% 95% of the intervals constructed this way will contain the true population mean µ! 20 / 26

An axiom (assumption of Frequentist Statistics is that unknown parameters have a fixed, right answer. So once we observe x, the interval is fixed, and the value of µ is fixed. So µ is either in the interval, or it isn t, but we don t know which. The interval we get when we use the observed x is a called a confidence interval. Example: For a sample bag of 32 cookies we have: x = 10.9g and σ = 0.5g. Find the 95% CI for the mean weight of all cookies. x 1.96 σ n = 10.9 1.96 0.5 32 = 10.73 x + 1.96 σ n = 10.9 + 1.96 0.5 32 = 11.07 So (10.73, 11.07 is a 95% CI for µ. Technical Interpretation: We are 95% confident that µ is in this interval. 21 / 26 On average 95% of the intervals constructed this way will contain µ, but we don t know if this particular interval does contain it or not. Note that this is not a probability. The probability that µ is in this particular interval is 0 or 1, but we don t know which. CI is an interval estimate for µ. It provides a range of plausible values for µ. 22 / 26

More general form: (1 αci is x ± E, where E = z α/2 σ n = margin of error. z α/2 is the α 2 quantile of the standard normal distribution, i.e. P(Z z α/2 = α 2. 23 / 26 Example: Suppose a soda distributor is filling 20oz bottles and that from historical data, the sd of the contents of a bottle is known to be 0.03oz. Is the right amount of soda going into each bottle? Suppose a random sample of 34 bottles is found to have an average of 19.98oz. Find a 90% CI for the population mean contents. 1 α = 0.9 α = 0.1 α 2 = 0.05 = P(Z < 1.645 (z-table So, z α/2 = 1.645 and E = z α/2 σ n = 1.645 0.03 34 = 0.0085 x ± E = 19.98 ± 0.0085 = (19.9715, 19.9885. So (19.9715, 19.9885 is 90% CI for µ. Is it reasonable that 20oz are going to each bottle? 24 / 26

How do we determine the sample size needed for a desired margin of error?... example continued: Suppose we want to be able to estimate soda contents with a margin of error of 0.001. 0.001 = E = z α/2 σ n = 1.645 0.03 n n = 1.645 0.03 0.001 = 49.35 n = (49.35 2 = 2435.42 We need to round up, so that the margin of error is no larger than specified, so n = 2436. In general, n = ( zα/2 σ E 2 Note: the sample size increases rapidly as the margin of error reduces. 25 / 26 Key Concepts!!!!! Sampling Distribution of the mean Sampling Distribution of the proportion Central Limit Theorem Normal Approximation to the Binomial Confidence Interval 26 / 26