Outline. 1 Introduction. 2 Sampling distribution of a proportion. 3 Sampling distribution of the mean. 4 Normal approximation to the binomial

Similar documents
No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

The binomial distribution p314

Sampling Distribution

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Section The Sampling Distribution of a Sample Mean

Sampling and sampling distribution

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Sampling & populations

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 5. Sampling Distributions

work to get full credit.

Statistics, Their Distributions, and the Central Limit Theorem

Elementary Statistics Lecture 5

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

CHAPTER 5 SAMPLING DISTRIBUTIONS

Review. Binomial random variable

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Engineering Statistics ECIV 2305

χ 2 distributions and confidence intervals for population variance

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

STAT Chapter 7: Central Limit Theorem

Central Limit Theorem (cont d) 7/28/2006

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 7: Point Estimation and Sampling Distributions

2) There is a fixed number of observations n. 3) The n observations are all independent

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Statistics for Business and Economics

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Chapter 7. Sampling Distributions and the Central Limit Theorem

Stat 213: Intro to Statistics 9 Central Limit Theorem

8.1 Binomial Distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

Standard Normal, Inverse Normal and Sampling Distributions

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

Chapter Five. The Binomial Probability Distribution and Related Topics

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

BIO5312 Biostatistics Lecture 5: Estimations

Confidence Intervals Introduction

STA215 Confidence Intervals for Proportions

Sampling Distributions Chapter 18

Making Sense of Cents

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

STAT 241/251 - Chapter 7: Central Limit Theorem

Data Analysis and Statistical Methods Statistics 651

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Binomial and Geometric Distributions

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

Probability Distributions II

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Lecture 7 Random Variables

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

AMS7: WEEK 4. CLASS 3

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

AP Statistics Test 5

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Random Variables. Chapter 6: Random Variables 2/2/2014. Discrete and Continuous Random Variables. Transforming and Combining Random Variables

*****CENTRAL LIMIT THEOREM (CLT)*****

CHAPTER 5 Sampling Distributions

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

I. Standard Error II. Standard Error III. Standard Error 2.54

Business Statistics Fall Quarter 2015 Midterm Answer Key

Section 5 3 The Mean and Standard Deviation of a Binomial Distribution!

1 Introduction 1. 3 Confidence interval for proportion p 6

1 Sampling Distributions

Chapter 8: The Binomial and Geometric Distributions

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Chapter 9: Sampling Distributions

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Chapter 9 & 10. Multiple Choice.

1. State Sales Tax. 2. Baggage Check

Answer Key: Quiz2-Chapter5: Discrete Probability Distribution

5. In fact, any function of a random variable is also a random variable

The Normal Probability Distribution

Sampling Distributions For Counts and Proportions

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

The Central Limit Theorem

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

15.063: Communicating with Data Summer Recitation 4 Probability III

FINAL REVIEW W/ANSWERS

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

Chapter 6: Random Variables and Probability Distributions

8.4: The Binomial Distribution

Review of the Topics for Midterm I

Chapter 5: Statistical Inference (in General)

PROBABILITY DISTRIBUTIONS

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Section Sampling Distributions for Counts and Proportions

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

Discrete Probability Distribution

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Statistical Tables Compiled by Alan J. Terry

5.3 Statistics and Their Distributions

Transcription:

Outline Sampling distributions Cécile Ané Stat 371 Spring 2006 1 Introduction 2 Sampling distribution of a proportion 3 Sampling distribution of the mean 4 Normal approximation to the binomial 5 The continuity correction Sampling distributions Sampling distribution of a proportion Example: cross of two heterozygotes Aa Aa. Probability distribution of the offspring s genotype: What does it mean to take a sample of size n? Y 1,...,Y n form a random sample if they are independent and have a common distribution. From a sample, we can calculate a sample statistic such as the sample mean Ȳ. Ȳ is random too! It can differ from sample to sample. The textbook refers to a meta-experiment. The distribution of Ȳ is called a sampling distribution. Offspring genotype AA Aa aa 0.25 0.50 0.25 An offspring is dominant if it has genotype AA or Aa. Experiment: Get n = 2 offsprings, count the number Y of dominant offspring, and calculate the sample proportion ˆp = Y /2. We would like ˆp to be close to the true value p = 0.75 ˆp is random Distribution of ˆp (from the binomial distribution): Y 0 1 2 ˆp 0.0 0.5 1.0 IP 0.0625 0.3750 0.5625

Sampling distribution of a proportion Sampling distribution of the mean Larger sample size: Y = # of dominant offspring out of n = 20, ˆp = Y /20 the sample proportion. We still want ˆp to be close to the true value p = 0.75 ˆp is still random What is the probability that ˆp is within 0.05 of p? Translate into a binomial question IP{0.70 ˆp 0.80} = IP{0.70 Y /20 0.80} = IP{14 Y 16} = IP{Y = 14} + IP{Y = 15} + IP{Y = 16} = 0.56 Example: weight of seeds of some variety of beans. Sample size n = 4 Student # Observations sample mean ȳ 1 462 368 607 483 ȳ = 480 2 346 535 650 451 ȳ = 495.5 3 579 677 636 529 ȳ = 605.25 Ȳ is random. How do we know its distribution? We will see 3 key facts. Sample size of 20 better than sample size of 2!! Keyfact#1 If Y 1,...,Y n is a random sample, and if the Y i s have mean µ and standard deviation σ, then Ȳ has mean µȳ = µ and variance var(ȳ )=σ2 /n, i.e. standard deviation σȳ = Seed weight example: Assume beans have mean µ = 500 mg and σ = 120 mg. In a sample of size n = 4, the sample mean Ȳ has mean µȳ = 500 mg and standard deviation σȳ = 120/ 4 = 60 mg. σ n Keyfact#2 If Y 1,...,Y n is a random sample, and if the Y i s are all from N (µ, σ), then σ Ȳ N(µ, ) n Actually, Y 1 + + Y n = n Ȳ N too. Seed weight example: 100 students do the same experiment. n=4 3 14 30 32 17 4 n=16 0 5 56 32 7 0 350 450 550 650 350 450 550 650 sample mean

Keyfact#3 Ex: beans are filtered, discarded if too small. n = 1 n = 5 Central limit theorem If Y 1,...,Y n is a random sample from (almost) any distribution. Then, as n gets large, Ȳ is normally distributed. Note: Y 1 + + Y n normally too. How big must n be? 0 200 400 600 800 1000 300 400 500 600 700 Usually, n = 30 is big enough, unless the distribution is strongly skewed. Remarkable result! It explains why the normal distribution is so common, so normal. It is what we get when we average over lots of pieces. Ex: human height. Results from... n = 10 n = 30 350 400 450 500 550 600 650 450 500 550 Example: Mixture of 2 bean varieties. Exercise n = 1 n = 5 0 200 400 600 800 1000 300 400 500 600 700 n = 10 n = 30 Snowfall Y N(.53,.21) on winter days (inches). Take the sample mean Ȳ of a random sample of 30 winter days, over the 10 previous years. What is the probability that Ȳ.50 in? Ȳ has mean 0.53 inches Ȳ has standard deviation 0.21/ 30 = 0.0383 inches Ȳ s distribution is approximately normal, because the sample size is large enough (n = 30) IP { Ȳ.50 } = {Ȳ } 0.53 0.50 0.53 IP.0383.0383 IP{Z 0.782} = 0.217 350 400 450 500 550 600 650 450 500 550

The normal approximation to the binomial Example: X = # of children with side effects after a vaccine, out of n = 200 children. Probability of side effect: p = 0.05. So X B(200, 0.05). What is IP{X 15}? Direct calculation: Probability 0.0 0.1 0.2 0.3 0.4 0.5 n= 10, p= 0.05 0 2 4 6 8 10 n= 20, p= 0.1 n= 50, p= 0.05 0 2 4 6 8 10 n= 20, p= 0.5 0.00 0.02 0.04 0.06 0.08 0.10 0.12 n= 200, p= 0.05 25 n= 20, p= 0.9 IP{X = 0} + IP{X = 1} + + IP{X = 15} = Heavy! 200C 0.05 0.95 200 + + 200 C 15.05 15.95 185 Or we can use a trick: the binomial might be close to a normal distribution. Pretend X is normally distributed! Probability 0.05 0.10 0.15 Some Possible Values The normal approximation to the binomial X = Y{ 1 + + Y 200 where 1 if child #1 has side effects, Y 1 = 0 otherwise. { 1 if child #200 has side effects, Y 200 = 0 otherwise. Apply key result #3: if n (# of children) is large enough, then Y 1 + + Y n has a normal distribution. Use the normal distribution with X s mean and variance: µ = np = 10, σ = np(1 p) =3.08 If X B(n, p) and if n is large enough: if np 5 and n(1 p) 5 (rule of thumb), then X s distribution is approximately The normal approximation to the binomial Back to our question: IP{X 15}. np = 10 and n(1 p) =190 are both 5, so X N(10, 3.08). True value: IP{X 15} = { } X 10 15 10 IP 3.08 3.08 IP{Z 1.62} = 0.9474 > sum( dbinom(0:15, size=200, prob=0.05)) [1] 0.9556444 N (np, np(1 p))

The continuity correction The continuity correction X binomial B(200, 0.05), and Y normal N (10, 3.08). No continuity correction: IP{X 15} { Y 10 IP{Y 15} = IP 3.08 = IP{Z 1.62} = 0.9474 } 15 10 3.08 # of children with side effect The continuity correction gives a better approximation. IP{X 15} { } Y 10 15.5 10 IP{Y 15.5} = IP 3.08 3.08 = IP{Z 1.78} = 0.9624 (true value was 0.9556) The continuity correction X binomial B(200, 0.05), and Y normal N (10, 3.08). What is the probability that between 8 and 15 children get side effects? IP{8 X 15} IP{7.5 X 15.5} { 7.5 10 = IP Y 10 } 15.5 10 3.08 3.08 3.08 = IP{ 0.81 Z 1.78} = IP{Z 1.78} IP{Z 0.81} = 0.7535 True value: > sum( dbinom(8:15, size=200, prob=0.05) ) [1] 0.7423397