Section Sampling Distributions for Counts and Proportions

Similar documents
The Binomial Distribution

Section The Sampling Distribution of a Sample Mean

The Binomial Distribution

***SECTION 8.1*** The Binomial Distributions

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Chapter 5. Sampling Distributions

Standard Normal, Inverse Normal and Sampling Distributions

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

STAB22 section 5.2 and Chapter 5 exercises

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

Sampling Distributions For Counts and Proportions

Chapter 9: Sampling Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

1 Sampling Distributions

6.3: The Binomial Model

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

MAKING SENSE OF DATA Essentials series

The normal distribution is a theoretical model derived mathematically and not empirically.

Binomial Probabilities The actual probability that P ( X k ) the formula n P X k p p. = for any k in the range {0, 1, 2,, n} is given by. n n!

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

A useful modeling tricks.

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

AP Statistics Ch 8 The Binomial and Geometric Distributions

Binomial Random Variable - The count X of successes in a binomial setting

ECON 214 Elements of Statistics for Economists 2016/2017

5.1 Sampling Distributions for Counts and Proportions. Ulrich Hoensch MAT210 Rocky Mountain College Billings, MT 59102

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

STAT 201 Chapter 6. Distribution

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Business Statistics 41000: Probability 4

5.2 Random Variables, Probability Histograms and Probability Distributions

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

***SECTION 7.1*** Discrete and Continuous Random Variables

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 6: Random Variables

Chapter 3 Discrete Random Variables and Probability Distributions

The Bernoulli distribution

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

The binomial distribution p314

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

The Binomial Probability Distribution

Statistical Methods in Practice STAT/MATH 3379

CHAPTER 6 Random Variables

Chapter 8: Binomial and Geometric Distributions

Binomial Distributions

Section 6.3 Binomial and Geometric Random Variables

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Sampling and sampling distribution

Bin(20,.5) and N(10,5) distributions

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

STAT 157 HW1 Solutions

Binomial Random Variables. Binomial Random Variables

The Binomial and Geometric Distributions. Chapter 8

Part V - Chance Variability

Normal Approximation to Binomial Distributions

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

5. In fact, any function of a random variable is also a random variable

Statistics 431 Spring 2007 P. Shaman. Preliminaries

2) There is a fixed number of observations n. 3) The n observations are all independent

MATH 112 Section 7.3: Understanding Chance

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Statistics 6 th Edition

Chapter 8. Binomial and Geometric Distributions

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Chapter 5 Basic Probability

Counting Basics. Venn diagrams

x is a random variable which is a numerical description of the outcome of an experiment.

BIOL The Normal Distribution and the Central Limit Theorem

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Sampling & Confidence Intervals

Module 4: Probability

Expected Value of a Random Variable

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

Stat511 Additional Materials

MATH 264 Problem Homework I

4.2 Bernoulli Trials and Binomial Distributions

Section Distributions of Random Variables

The topics in this section are related and necessary topics for both course objectives.

Chapter 8: The Binomial and Geometric Distributions

Discrete Probability Distribution

Chapter 5: Probability models

Transcription:

Section 5.1 - Sampling Distributions for Counts and Proportions Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin

Distributions When dealing with inference procedures, there are two different distributions that you need to keep track of Population Distribution The population distribution of a variable is the distributions of its values for all members of the population. The population distribution is also the probability distribution of the variable when we choose one individual from the population at random. Sampling Distribution A statistic from a random sample or randomized experiment is a random variable. The probability distribution of the statistic is its sampling distribution. Chapter 5 - Introduction 1

LSAT Population Distribution LSAT Sampling Distribution for X 15 Density 0.000 0.002 0.004 0.006 0.008 0.010 Density 0.00 0.01 0.02 0.03 0.04 450 500 550 600 650 700 750 LSAT 450 500 550 600 650 700 750 X 15 Chapter 5 - Introduction 2

Binomial Distribution Example: Did you attend church of synagogue in the previous week? Sampled 1785 and 550 said yes. This gives a sample proportion of ˆp = 550 1785 = 0.42 What is the sampling distribution of ˆp? This can be modelled with the Binomial Distribution. Section 5.1 - Sampling Distributions for Counts and Proportions 3

Binomial Distribution 1. Fixed number of observations n 2. Each of the n observations are independent 3. Each observation falls into one of two categories, which for convenience get called Success and Failure 4. The probability of successes (call it p), is the same for each observation Interested in the number of successes (call it X). X is said to have a binomial distribution with parameters n and p. (X Bin(n, p)). Section 5.1 - Sampling Distributions for Counts and Proportions 4

Binomial or not? 1. Flip a coin 20 times and count the number of heads. Yes. Bin(n = 20, p = 0.5) if its a fair coin. 2. Draw 5 cards from a standard deck of cards and count the number of black cards. No. The draws are not independent which implies that the probabilities change as you go through the draws. P [1 st card black] = 1 2 P [2 nd card black 1 st card black] = 25 51 P [2 nd card black 1 st card red] = 26 51 Section 5.1 - Sampling Distributions for Counts and Proportions 5

3. Number of faulty switches out of 6 from one company. P [Faulty] = 0.2 Probably ok. 4. The number of successful field goals that Adam Vinatieri will kick in Sunday s Patriots game. No. n, the number of kicks is random and currently unknown. 5. Take a simple random sample of 1000 voters. Count the number who say that they voted to re-elect President Bush. Close, but not quite. Its similar to the deck example. When the population is much larger that the sample size, the count of successes in a SRS of size n has approximately a Bin(n, p) distribution if the population proportion of successes is p. Rule of thumb for the approximation to be ok Population size > 10n Section 5.1 - Sampling Distributions for Counts and Proportions 6

Lets suppose that we have a population of 100,000 individuals and that 20% are successes P [Success on draw 1] = 0.2 P [Success on draw 2 Success on draw 1] = 19999 99999 = 0.199992 P [Success on draw 2 Failure on draw 1] = 20000 99999 = 0.200002 The success probabilities won t change much as the various units get sampled. Now suppose that the population size is 5, still with a 20% success rate P [Success on draw 1] = 0.2 P [Success on draw 2 Success on draw 1] = 0 4 = 0 P [Success on draw 2 Failure on draw 1] = 1 4 = 0.25 Section 5.1 - Sampling Distributions for Counts and Proportions 7

Calculating binomial probabilities The probability of exactly k successes when X Bin(n, p) is P [X = k] = ( ) n p k (1 p) n k k where ( ) n k = n! k!(n k)! is the number of ways of choosing k items from n. Its often pronounced n choose k for this reason. Section 5.1 - Sampling Distributions for Counts and Proportions 8

Motivation: For each trial P [Success] = p; P [Failure] = 1 p Assume that k successes are followed by n k failures. This has probability p p... p }{{} k (1 p) (1 p)... (1 p) }{{} n k = p k (1 p) n k Now each other possibility with k successes has exactly the same probability, which implies P [X = k] = ( ) n p k (1 p) n k k Section 5.1 - Sampling Distributions for Counts and Proportions 9

Why is ( n k) the number of ways of choosing k items from n? You have n ways of picking the first success, then n 1 ways of picking the second success after the first one, and so on down to n k + 1 ways of picking the kth success. Multiplying these together gives n (n 1) (n 2)... (n k + 1) = n! (n k)! Now the order of the successes doesn t matter. Given k items there is k! different ways of ordering them. You have k choices for the list item in the list, which leaves k 1 choices for the 2nd item in the list, and so. Combining this with the above gives ( ) n k = n! k!(n k)! Section 5.1 - Sampling Distributions for Counts and Proportions 10

One way of getting probabilities involving binomials is to work with the earlier probability formula. For example, if X Bin(6, 0.2) P [X > 4] = P [X = 5] + P [X = 6] ( ) ( ) 6 6 = 0.2 5 0.8 1 + 0.2 6 0.8 0 5 6 = 0.0016 Section 5.1 - Sampling Distributions for Counts and Proportions 11

Another option is to work with binomial probability tables (Table C in Moore and McCabe) This table gives binomial probabilities for certain choices of n and p. For the X Bin(6, 0.2) example, we need to look at the block with n = 6 and p = 0.2. Section 5.1 - Sampling Distributions for Counts and Proportions 12

The table doesn t have anything for p > 0.5. This is not a problem as we can just switch the definition of success and failure to fit the problem. Let X Bin(n, p) and Y Bin(n, 1 p). Then P [X = k] = ( ) n p k (1 p) n k = P [Y = n k] k Most stat packages, Excel, scientific calculators can also be used to get binomial probabilities. There is one big advantage to using software: n and p are not restricted. For example, if X Bin(11, 0.78), P [X = 7] = 0.1358 which isn t available from the table. Section 5.1 - Sampling Distributions for Counts and Proportions 13

Bin(5,0.25) Bin(5,0.5) P[X=x] 0.0 0.1 0.2 0.3 P[X=x] 0.00 0.10 0.20 0.30 0 1 2 3 4 5 0 1 2 3 4 5 Number of Successes Number of Successes Bin(10,0.75) Bin(10,0.5) P[X=x] 0.00 0.10 0.20 P[X=x] 0.00 0.10 0.20 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Number of Successes Number of Successes The binomial distribution is always unimodal, but can be symmetric or skewed. It is symmetric if p = 0.5, skewed left if p < 0.5 and skewed right if p > 0.5 Section 5.1 - Sampling Distributions for Counts and Proportions 14

Mean and Variance of a Binomial µ x = n ( ) n x p x (1 p) n x x x=0 by the definition of the mean for a discrete random variable. However this is somewhat ugly, though can be solved with a little algebra. The variance is even worse (though still solvable this way) σ 2 x = n ( ) n (x µ x ) 2 p x (1 p) n x x x=0 There is an easier way to get a handle on this though. Define Z i to be the result of trial i where Z i = { 1 trial i is a success 0 trial i is a failure Section 5.1 - Sampling Distributions for Counts and Proportions 15

Therefore X = Z 1 + Z 2 +... + Z n, the sum of n independent random variables. So we need to figure out µ z and σ 2 z. These are easy, as µ z = 0 (1 p) + 1 p = p σ 2 z = (0 p) 2 (1 p) + (1 p) 2 p = p(1 p) These give µ x = µ z1 + µ z2 +... + µ zn = p + p +... + p = np σx 2 = σz 2 1 + σz 2 2 +... + σz 2 n = p(1 p) + p(1 p) +... + p(1 p) = np(1 p) σ x = np(1 p) Section 5.1 - Sampling Distributions for Counts and Proportions 16

So for the switch example (Bin(6, 0.2)) µ x = 6 0.2 = 1.2 σx 2 = 6 0.2 0.8 = 0.96 σ x = 6 0.2 0.8 = 0.96 = 0.9798 Section 5.1 - Sampling Distributions for Counts and Proportions 17

Sample Proportions ˆp = # successes sample size = X n So if we know X we know ˆp, and vice versa. Probability Calculations We can use this one to one relationship between sample proportions and counts to do probability calculations Example: Switch example (Bin(6, 0.2)) P [ˆp 0.5] = P [X 3] = P [X = 3] + P [X = 4] + P [X = 5] + P [X = 6] = 0.0989 Section 5.1 - Sampling Distributions for Counts and Proportions 18

We can also use this idea to get means and variances for proportions. µˆp = 1 n µ x = 1 n np = p σ 2ˆp = 1 n 2σ2 x = 1 p(1 p) n2np(1 p) = n p(1 p) σˆp = σ 2ˆp = n This is based on the rules discussed earlier for linear transformations of random variables. Section 5.1 - Sampling Distributions for Counts and Proportions 19

So for the switch example µˆp = 0.2 σ 2ˆp = σˆp = 0.2 0.8 = 0.02667 6 0.2 0.8 = 0.02667 = 0.1633 6 Section 5.1 - Sampling Distributions for Counts and Proportions 20

Notice that as n increases, σˆp = p(1 p) n decreases. This implies that with a larger sample size, you are more likely to have your sample proportion close to the true population proportion. Its also a justification of using long run frequencies to motivate probabilities. With a little more work (take Stat 110 to see it), you can show that as n. ˆp n p Section 5.1 - Sampling Distributions for Counts and Proportions 21

Example: Flip a coin 100 times. Count the number of heads. What is P [ˆp 0.6]? Similarly for 1000 flips. 100 flips: P [ˆp 0.6] = P [X 60] = P [X = 60] + P [X = 61] +... + P [X = 100] 1000 flips: P [ˆp 0.6] = P [X 600] = P [X = 600] + P [X = 601] +... + P [X = 1000] In theory its easy to get the answer just add up a whole bunch of terms. In fact its easy in Stata as there is a function (Binomial(n,k,p)) which gives probabilities of the form P [X x]. Other packages have similar functions though most are based on P [X x], the Binomial CDF. Section 5.1 - Sampling Distributions for Counts and Proportions 22

100 flips Density 0.00 0.04 0 0.25 0.5 0.75 1 p^ 1000 flips Density 0.000 0.010 0.020 0.4 0.45 0.5 0.55 0.6 p^ Section 5.1 - Sampling Distributions for Counts and Proportions 23

Both of these cases are symmetric and unimodal. In fact, both are close to normal distributions. Normal Approximation to the Binomial When n is large, ˆp is approximately normally distributed with µˆp = p σˆp = p(1 p) n and X is also approximately normal with µ x = np σ x = np(1 p) Section 5.1 - Sampling Distributions for Counts and Proportions 24

For n = 100 flips µˆp = 0.5 σˆp = Z = 0.5 0.5 = 0.05 100 ˆp 0.5 is approximately N(0, 1) 0.05 P [ˆp 0.6] = P [ ˆp 0.5 0.05 ] 0.6 0.5 0.05 = P [Z 2] 0.0228 The true probability is 0.0284. Density 0 2 4 6 8 100 flips 0.3 0.4 0.5 0.6 0.7 p^ Section 5.1 - Sampling Distributions for Counts and Proportions 25

For n = 1000 flips µˆp = 0.5 σˆp = 0.5 0.5 1000 = 0.0158 P [ˆp 0.6] = P [ ˆp 0.5 0.0158 = P [Z 6.329] 1.234 10 10 ] 0.6 0.5 0.0158 Density 0 5 10 15 20 25 1000 flips The true probability is 1.364 10 10. 0.3 0.4 0.5 0.6 0.7 p^ Section 5.1 - Sampling Distributions for Counts and Proportions 26

Should John Kerry have conceded Ohio while the provisional and absentee ballots still needed to be counted? Assumptions: Kerry is behind by 140,000 votes (its slightly less than this). There are 200,000 valid ballots still to be counted (probably a bit higher than actually the case) For each ballot, P [Kerry] = 2 3, P [Bush] = 1 3 (this is the split in Cuyahoga county, the county John Kerry his highest percentage in Ohio) For John Kerry to win Ohio, he needs to get over 170,000 (85%) of the 200,000 votes to be counted. Assuming that these ballots can be considered by a Binomial model with the probabilities given above, what is the probability that John Kerry would get enough votes? Section 5.1 - Sampling Distributions for Counts and Proportions 27

µ x = 200000 2 3 = 133333.3 σ x = 200000 2 3 1 3 = 210.82 P [X 170000] = [ X 133333.3 P 210.82 = P [Z 173.92] 0 (< 10 6570 ) ] 170000 133333.3 210.82 This is the most extreme z-score I have ever seen. Remember that the table in the book only goes up to 3.49. Kerry has no chance of passing Bush, assuming everything is on the up and up in Ohio. Section 5.1 - Sampling Distributions for Counts and Proportions 28

Now lets look at different combinations of n and p to see how well the approximation works. Let p = 0.2 and 0.5 and n = 6, 49, 100, 1000. p = 0.2, n = 6 p = 0.2, n = 50 0.0 0.1 0.2 0.3 0.4 0.00 0.04 0.08 0.12 0 0.25 0.5 0.75 1 0 0.125 0.25 0.375 0.5 p^ p^ 0.00 0.04 0.08 p = 0.2, n = 100 0.05 0.125 0.2 0.275 0.35 0.000 0.010 0.020 0.030 p = 0.2, n = 1000 0.15 0.175 0.2 0.225 0.25 p^ p^ Section 5.1 - Sampling Distributions for Counts and Proportions 29

p = 0.5, n = 6 p = 0.5, n = 50 0.00 0.10 0.20 0.30 0.00 0.04 0.08 0 0.25 0.5 0.75 1 0.24 0.37 0.5 0.63 0.76 p^ p^ p = 0.5, n = 100 p = 0.5, n = 1000 0.00 0.02 0.04 0.06 0.35 0.425 0.5 0.575 0.65 0.000 0.010 0.020 0.45 0.475 0.5 0.525 0.55 p^ p^ Section 5.1 - Sampling Distributions for Counts and Proportions 30

The approximation appears to work better when n is bigger and when p is close to 0.5. Rule of Thumb: The approximation is ok if np 10 and n(1 p) 10 e.g. the expected number and successes and failures are both at least 10. So the closer p gets to 0 or 1, the bigger n needs to be Section 5.1 - Sampling Distributions for Counts and Proportions 31

So for p = 0.2, what is P [ˆp 0.1] for various sample sizes n Normal Approximation True Probability 10 0.21460 0.37581 50 0.03855 0.04803 100 0.00621 0.00570 200 0.00020 0.00011 Section 5.1 - Sampling Distributions for Counts and Proportions 32

Continuity correction Suppose we want to get P [X 12] by using the normal approximation. Notice that for the bar corresponding to X = 12, the normal curve picks up about half the area, as the bar gets drawn from 11.5 to 12.5. The normal approximation for this problem can be improved if we ask for the area under the normal curve up to 12.5 True Prob = 0.2229 Estimated Prob (no correction) = 0.1773 Estimated Prob (correction) = 0.2202 0.00 0.04 0.08 0.12 p = 0.3 n = 50 10 11 12 13 14 15 x While this does give a better answer, for many problems, I recommend ignoring it. If the correction makes an important difference, you probably want to be doing an exact probability calculation instead. Section 5.1 - Sampling Distributions for Counts and Proportions 33