Standard Normal, Inverse Normal and Sampling Distributions

Similar documents
Inverse Normal Distribution and Approximation to Binomial

Standard Normal Calculations

Statistics for Business and Economics

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Lecture 6: Chapter 6

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Binomal and Geometric Distributions

Section Introduction to Normal Distributions

The Normal Probability Distribution

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

ECON 214 Elements of Statistics for Economists 2016/2017

Math 227 Elementary Statistics. Bluman 5 th edition

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Data Analysis and Statistical Methods Statistics 651

Chapter 6. The Normal Probability Distributions

LECTURE 6 DISTRIBUTIONS

Binomial and Geometric Distributions

Review of the Topics for Midterm I

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Chapter 6 Continuous Probability Distributions. Learning objectives

Introduction to Statistics I

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

Chapter 4 Continuous Random Variables and Probability Distributions

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Chapter 4 Continuous Random Variables and Probability Distributions

Elementary Statistics Lecture 5

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Section Sampling Distributions for Counts and Proportions

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Introduction to Business Statistics QM 120 Chapter 6

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

ECO220Y Sampling Distributions of Sample Statistics: Sample Proportion Readings: Chapter 10, section

Chapter 4. The Normal Distribution

Nicole Dalzell. July 7, 2014

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Examples of continuous probability distributions: The normal and standard normal

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

The binomial distribution p314

Statistical Methods in Practice STAT/MATH 3379

ECON 214 Elements of Statistics for Economists

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

MAKING SENSE OF DATA Essentials series

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Section Distributions of Random Variables

Lecture 9. Probability Distributions. Outline. Outline

Chapter 6: Random Variables and Probability Distributions

Counting Basics. Venn diagrams

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Lecture 9. Probability Distributions

STAT 201 Chapter 6. Distribution

Statistics and Probability

AMS7: WEEK 4. CLASS 3

7 THE CENTRAL LIMIT THEOREM

MA 1125 Lecture 18 - Normal Approximations to Binomial Distributions. Objectives: Compute probabilities for a binomial as a normal distribution.

Variance, Standard Deviation Counting Techniques

The Binomial Distribution

Statistics for Business and Economics: Random Variables:Continuous

The Normal Approximation to the Binomial

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

What was in the last lecture?

Section The Sampling Distribution of a Sample Mean

6. THE BINOMIAL DISTRIBUTION

Sampling & populations

Continuous Distributions

The Binomial Probability Distribution

STUDY SET 2. Continuous Probability Distributions. ANSWER: Without continuity correction P(X>10) = P(Z>-0.66) =

Central Limit Theorem, Joint Distributions Spring 2018

Normal Cumulative Distribution Function (CDF)

Chapter 7. Sampling Distributions and the Central Limit Theorem

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

Chapter 5. Sampling Distributions

Statistics 511 Supplemental Materials

Section Random Variables and Histograms

Sampling and sampling distribution

Lecture 5 - Continuous Distributions

Stat 213: Intro to Statistics 9 Central Limit Theorem

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Math Week in Review #10. Experiments with two outcomes ( success and failure ) are called Bernoulli or binomial trials.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

18.05 Problem Set 3, Spring 2014 Solutions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

11.5: Normal Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Probability Distributions II

Activity #17b: Central Limit Theorem #2. 1) Explain the Central Limit Theorem in your own words.

Transcription:

Standard Normal, Inverse Normal and Sampling Distributions Section 5.5 & 6.6 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 9-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 1 / 56

Outline 1 Using the z-table 2 Inverse Normal 3 Sums of Random Variables 4 Sampling Distributions 5 Sampling Distribution of X 6 Finding Probabilities for X 7 Approximating the Binomial Distribution 8 Proportions 9 Sampling Distribution of ˆp Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 2 / 56

Introduction Questions Let a random variable X have a Normal distribution with mean µ = 10 and standard deviation σ = 2. For the following questions determine what is the proper way to solve these probabilities. 1. P(X < 7.25) 2. P(X 5) a) pnorm(7.25,10,2) c) pnorm(7,10,2) b) 1-pnorm(7.25,10,2) d) dnorm(7.25, 10, 2) a) pnorm(5, 10, 2) c) 1 - pnorm(4, 10, 2) b) 1 - pnorm(5, 10, 2) d) dnorm(6, 10, 2) 3. P(9 X 11) a) pnorm(11, 10, 2) - pnorm(8, 10, 2) b) pnorm(11, 10, 2) - 1- pnorm(9, 10, 2) c) pnorm(11, 10, 2) - pnorm(9, 10, 2) d) dnorm(11, 10, 2) - dnorm(9, 10, 2) Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 3 / 56

Normal Distribution Calculations Area under a Normal curve represent proportions (probability) of observations within a range of values. There is no easy way to find the area under a Normal curve. We use a table or software that calculates the desired areas. The table we use is Z-table It uses a cumulative proportion. A cumulative proportion is the proportion (probability) of observations in a distribution that lie at or below a given value. This is Φ(z). When the distribution is given by a density curve, the cumulative proportion is the area under the curve to the left of a given value. Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 4 / 56

Using The Z-table The vertical margin are the left most digits of a z-score. The top margin is the hundredths place of a z-score. The numbers inside the table represents the area from to that z-score. Remember that the standard Normal density curve is symmetric and the total area is equal to 1. Note: R can calculate these probabilities and also some calculators. Without having to convert to z-scores. Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 5 / 56

P(Z 1.52) P(Z -1.52) -1.52 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 6 / 56

P(Z 1.52) = 0.0643 R: pnorm(-1.52) = 0.06425549 Table A: P(Z < z) z 0.00 0.01 0.02 0.03-3.4 0.0003 0.0003 0.0003 0.0003-3.3 0.0005 0.0005 0.0005 0.0004-3.2 0.0007 0.0007 0.0006 0.0006... -1.5 0.0668 0.0655 0.0643 0.0630 P(Z -1.52) Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 7 / 56

P(Z 0.95) P(Z 0.95) Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 8 / 56

P(Z 0.95) = 0.1711 R: 1 - pnorm(0.95) = 0.1710561 Table A: P(Z < z) z 0.00 0.01 0.02 0.03 0.04 0.05 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 P(Z 0.95)= 1 P( Z < 0.95) = 1 0.8289 = 0.1711 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Houston Lecture ) 9-3339 9 / 56

P(1.3 < Z < 1.72) P(1.3 < Z < 1.72) - 3339 10 / 56

P(1.3 < Z < 1.72) = 0.0541 R: pnorm(1.72) - pnorm(1.3) = 0.05408426 z 0.00 0.01 0.02 0.03 0.0 0.5000 0.5040 0.5080 0.5120 0.1 1.3 0.9032 0.9049 0.9066 0.9082 1.4 1.7 0.9554 0.9564 0.9573 0.9582 P(1.3 < Z < 1.72) = 0.9573 0.9032 = 0.0541-3339 11 / 56

Example Let X = amount of juice in ounces in a orange, X N(4.7, 0.4). 1. Determine the probability (using the z-table) that less than 5 ounces of juice are in an orange. 2. Determine the probability (using the z-table) that between 4 and 4.5 ounces of juice are in an orange. - 3339 12 / 56

Finding a value when given a proportion Called inverse Normal. This is working Backwards using Z-Table. Finding the observed values when given a percent. In R: qnorm(proportion,mean,sd). - 3339 13 / 56

Backward Normal calculations Using Z-Table 1. State the problem. Since, Z-Table and qnorm are based on the areas to the left of z-scores or x-scores, always state the problem in terms of the area to the left of x. Keep in mind that the total area under the standard Normal curve is 1. 2. Use Table A to find c. This is the value from the table not a value that we calculate. 3. Unstandardized to transform the solution from the z-score back to the original x scale. Solving for x using the equation gives the equation x = σ(c) + µ. c = x µ σ - 3339 14 / 56

Examples to Work "Backwards" with the Normal Distribution Find the value of c so that: 1. P(Z < c) = 0.7704 2. P(Z > c) = 0.006 3. P( c < Z < c) = 0.966-3339 15 / 56

MPG for Prius The miles per gallon for a Toyota Prius has a Normal distribution with mean µ = 49 mpg and standard deviation σ = 3.5 mpg. 25% of the Prius have a MPG of what value and lower? 1. We want c, such that P(Z < c) = 0.25. That is we want to know what z-score cuts off the lowest 25%. P( Z <?) =0.25 z - 3339 16 / 56

Find c such that P(Z < c) = 0.25 3. From Table A, find something close to 0.25 inside the table. z 0.00 0.01 0.02 0.07 0.08 0.09-3.4 0.0003 0.0003 0.0003 0.0003 0.0002-0.7 0.2420 0.2389 0.2206 0.2177 0.2148-0.6 0.2743 0.2709 0.2514 0.2483 0.2451 P(Z <?) = 0.25 (closes value is 0.2514) z = -0.67 (-0.6 row + 0.07 column ) - 3339 17 / 56

Find c such that P(Z < c) = 0.25 4. Unstandardized: x = σ(c) + µ = 3.5( 0.67) + 49 = 46.655 5. This means that 25% of the Prius has a mpg of less than 46.655 mpg. Using R: qnorm(0.25,49,3.5) = 46.63929-3339 18 / 56

Top 10% Suppose you rank in the 10% of your class. If the mean GPA is 2.7 and the standard deviation is 0.59, what is your GPA? ( Assume a Normal distribution) 1. We want c, such that P(Z > c) = 0.10. That is we want to know what z-score cuts off the highest 10%. P(Z >?) = 0.10 z - 3339 19 / 56

Find c such that P(Z > c) = 0.1 3. From Table A, the areas are below or to the left of a z-score thus we want to find something close to 0.90 inside the table. z 0.00 0.01 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5675 0.5714 0.5753 0.2 0.5793 1.2 0.8849 0.8869 0.8980 0.8997 0.9015 P(Z <?) = 0.90 (close value is 0.8997) z = 1.28 (1.2 row + 0.08 column ) - 3339 20 / 56

Find c such that P(Z > c) = 0.1 4. Unstandardized: x = σ(c) + µ = 0.59(1.28) + 2.7 = 3.4552 5. This means that your gpa is 3.4375 if you rank at the 10% of your class. In R: qnorm(0.9,2.7,0.59) = 3.456115-3339 21 / 56

Example Let X = amount of juice in ounces in a orange, X N(4.7, 0.4). 1. Determine the third quartile. 2. Determine the 95th percentile. - 3339 22 / 56

Recall E(X + Y ) If X and Y are two different random variables, then the expected value (mean) of the sums of the pairs of the random variable is the same as the sum of their means: µ X+Y = E(X + Y ) = E(X) + E(Y ) = µ X + µ Y. This is called the addition rule for means. The expected value (mean) of the difference of the pairs of the random variable is the same as the difference of their means: µ X Y = E(X Y ) = E(X) E(Y ) = µ X µ y. - 3339 23 / 56

Recall VAR(X + Y ) If X and Y are independent random variables and σ 2 X+Y = Var(X + Y ) = Var(X) + Var(Y ) = σ2 X + σ2 Y σ 2 X Y = Var(X Y ) = Var(X) + Var(Y ) = σ2 X + σ2 Y - 3339 24 / 56

If X & Y are dependent If X and Y are dependent random variables then σx+y 2 = Var(X + Y ) = Var(X)+Var(Y ) + 2cov(X, Y ) = σx 2 + σy 2 + 2cov(X, Y ) and σx Y 2 = Var(X Y ) = Var(X)+Var(Y ) 2cov(X, Y ) = σx 2 + σy 2 2cov(X, Y ) - 3339 25 / 56

Example Suppose we have two independent random variables, X and Y where µ X = 10, σ X = 2, µ Y = 10 and σ Y = 2. a. Determine: µ X+Y and σ X+Y b. Suppose we want the mean of X and Y, what would be the expected value of the mean? - 3339 26 / 56

Popper 12 Questions Consider one family as a population of five children. We are looking at the ages of these five children: 3, 5, 9, 11, 14. 1. Determine the population mean, µ, age of these children. a. 9 b. 10 c. 8.4 d. 11 2. Determine the population standard deviation, σ, of these children. a. 10 b. 4 c. 8.4 0 3. Suppose we take a sample of 2 children from this population. What would we expect the sample mean, x from the 2 children to be? a. 2 b. 8.4 c. 4 d. 16-3339 27 / 56

Sampling Distribution of size 2 From the five children, we want to list out all possible pairs of size 2 and determine their mean. Ages are: 3,5, 9, 11, 14 Pairs Sample mean, x (3,5) 4 (3,9) 6 (3,11) 7 (3, 14) 8.5 (5, 9) 7 (5, 11) 8 (5, 14) 9.5 (9, 11) 10 (9, 14) 11.5 (11, 14) 12.5 The list above is a sampling distribution from a sample of 2 of x, the possible values of the sample mean. What is the mean of the sample means, µ x? What is the standard deviation of the sample means, σ x? - 3339 28 / 56

Sampling Distribution of size 3 What about the sampling distribution of size 3 from the family of five? Sets x (3, 5, 9) 5.6667 (3, 5, 11) 6.3333 (3, 5, 14) 7.3333 (3, 9, 11) 7.6667 (3, 9, 14) 8.6667 (3, 11, 14) 9.3333 (5, 9, 11) 8.3333 (5, 9, 14) 9.3333 (5, 11, 14) 10 (9, 11, 14) 11.3333 What is the mean of these means, µ x? What is the standard deviation of these means, σ x? - 3339 29 / 56

Sampling distribution When we describe distributions we use three characteristics: Shape Center Spread To describe the sampling distribution we can use the same three characteristics. This can be shown through histograms or numerical values. - 3339 30 / 56

Sampling Distribution of X Suppose that X is the sample mean of a simple random sample of size n from a large population with mean µ and standard deviation σ. X is a random variable because every time we take a random sample we will not get the same sample mean X. Thus we want to know the distribution of the sample means X. The center of the sample means (mean of the sample means) µ X is µ. Also called the expected value. The spread of the sample means (standard deviation of the sample means) σ X is σ/ n. - 3339 31 / 56

Sampling Distribution Example Assume that cans of Pepsi are filled so that the actual amount have a mean µ = 12 oz and a standard deviation σ = 0.09 oz. We take a sample of 25 cans and find the mean amount X in these 25 cans. What would we expect the mean to be? Would the sample mean be exactly that value? If not how far off could the sample mean be? - 3339 32 / 56

Sampling Distribution Example Assume that cans of Pepsi are filled so that the actual amount have a mean µ = 12 oz and a standard deviation σ = 0.09 oz. We take a sample of 100 cans and find the mean amount X in these 100 cans. What would we expect the mean to be? Would the sample mean be exactly that value? If not how far off could the sample mean be? - 3339 33 / 56

Shape of the Sample Mean Distribution If a population has a Normal distribution, then the sample mean X of n independent observations also has a Normal distribution with mean µ and standard deviation σ/ n. Central limit theorem: For any population, when n is large (n > 30), the sampling distribution of the sample mean X is approximately a Normal distribution with mean µ and standard deviation σ/ n. - 3339 34 / 56

Example: Amount of Pepsi Assume that cans of Pepsi are filled so that the actual amount have a mean µ = 12 oz and a standard deviation σ = 0.09 oz. Suppose that a random sample of 4 cans are examined, describe the distribution of the sample means X. Center: µ X = µ = 12 Spread: σ X = σ n = 0.09 4 = 0.045 Shape: Unknown because we do not know the original distribution and the sample size is small. - 3339 35 / 56

Example: Amount of Pepsi Assume that cans of Pepsi are filled so that the actual amount have a mean µ = 12 oz and a standard deviation σ = 0.09 oz. Suppose that a random sample of 100 cans are examined, describe the distribution of the sample means X. Center: µ X = µ = 12 Spread: σ X = σ n = 0.09 100 = 0.009 Shape: Normal because we have a large sample thus we can apply the Central Limit Theorem. - 3339 36 / 56

Finding Probabilities Assume that cans of Pepsi are filled so that the actual amount have a mean µ = 12 oz and a standard deviation σ = 0.09 oz. Suppose that a random sample of 36 cans are examined, determine the probability that a sample of 36 cans will have a sample mean amount, X of at least 12.01 oz. To find this probability we need to first describe the distribution: Shape: Normal because of the Central Limit Theorem Center: E[ X] = µ x = µ = 12 Spread: SD[ X] = σ x = σ/ n = 0.09/ 36 = 0.015 this is the standard deviation we use. We want to know: P( X 12.01) - 3339 37 / 56

Notes about finding probabilities for X We have a sample size n. Thus the standard deviation changes by that value SD( X) = σ X = σ n. The mean stays the same. mean( X) = µ X = µ. If we know that the original distribution is Normal or we have a large enough sample (n > 30). We can use the Normal distributions to find the probabilities. - 3339 38 / 56

Orange Juice An orange juice producer buys all his oranges from a large orange grove. The amount of juice squeezed from each of these oranges is approximately normally distributed, with a mean of 4.70 ounces and a standard deviation of 0.40 ounce. Suppose we take a random sample of 4 oranges and determine the mean of this sample, X. 1. What is the shape of the sampling distribution of X. 2. What is the mean of the sampling distribution of X. 3. What is the standard deviation of the sampling distribution of X. 4. What is the probability that the sample mean of the 4 oranges will be at 4.5 or less? - 3339 39 / 56

Approximation for Binomial Suppose a random variable X has a binomial distribution with p = 0.1. The following is a histogram with n = 10. n = 10, p= 0.1 Density 0.0 0.2 0.4 0.6 0.8 0 1 2 3 4 5-3339 40 / 56

Approximation for Binomial Suppose a random variable X has a binomial distribution with p = 0.1. The following is a histogram with n = 20. n = 20, p= 0.1 Density 0.0 0.1 0.2 0.3 0.4 0 2 4 6 8-3339 41 / 56

Approximation for Binomial Suppose a random variable X has a binomial distribution with p = 0.1. The following is a histogram with n = 50. n = 50, p= 0.1 Density 0.00 0.05 0.10 0.15 0 5 10 15-3339 42 / 56

Approximation for Binomial Suppose a random variable X has a binomial distribution with p = 0.1. The following is a histogram with n = 100. n = 100, p= 0.1 Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0 5 10 15 20-3339 43 / 56

Theorem 5.3 Let X be a binomial random variable based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has an approximate Normal distribution with µ = np and σ = np(1 p). In particular, for x = a possible value of X, P(X x) = Binom(x; n, p) (area under the normal curve to the left of x + 0.5) ( ) x + 0.5 np = Φ np(1 p) In practice, the approximate is adequate provided that both np 10 and n(1 p) 10. - 3339 44 / 56

Example of Normal Approximation Suppose that your mail-order company advertises that it ships 90% of its orders within three working days. Suppose you take a simple random sample of 100 orders: 1. What is the probability that 86 or fewer of the orders are shipped on time? 2. What is the probability that more than 95 of the orders are shipped on time? - 3339 45 / 56

Sample Proportions The population proportion is p a parameter. In some cases we do not know the population proportion, thus we use the sample proportion, ˆp to estimate p. The sample proportion is calculated by: ˆp = X n X = the number of observations of interest in the sample or the number of "successes" in the sample. n = the sample size or number of observations. - 3339 46 / 56

Example According to the National Retail Federation, 34% of taxpayers used computer software to do their taxes. A sample of 50 taxpayers was selected what do we expect the sample proportion ˆp to be? Of we take other samples will the sample proportions always be the same value? If not what would ˆp be off by? - 3339 47 / 56

Sample Distribution of n = 50. - 3339 48 / 56

Sample Distribution of n = 125-3339 49 / 56

Shape of the distribution of ˆp Notice from the previous histograms that it appears to have a Normal distribution. We can use the Normal distribution as long as np 10 the number of successes are at least 10 and n(1 p) 10 the number of failures are at least 10. - 3339 50 / 56

Center of the distribution of ˆp The center is the mean (expected value): µˆp = p the proportion of success. ˆp = X n where X is the number of successes out of n observations. Thus X has a binomial distribution with parameters n and p. The mean of X is: µ X = E(X) = np Thus by rule 1b for means, the mean of ˆp is: ( ) X µˆp = E(ˆp) = E = µ X n n = np n = p - 3339 51 / 56

Spread of the distribution of ˆp The spread is the standard deviation σˆp = The variance of X is: p(1 p) n. σ 2 X = Var(X) = np(1 p) By rule 1b for variance, the variance of ˆp is: ( ) X σ 2ˆp = Var (ˆp) = Var = Var(X) np(1 p) p(1 p) n n 2 = n 2 = n athy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Section (Department 5.5 & 6.6of Mathematics University of Lecture Houston 9) - 3339 52 / 56

Assumptions The sampled values must be random and independent of each other. This can be tested by 10% Condition: The sample size must be no larger than 10% of the population. The sample size, n must be large enough. This can be be tested by Success / Failure Condition: The sample size has to be big enough so that both np and n(1 p) at least 10. - 3339 53 / 56

Example for distribution of ˆp According to the National Retail Federation, 34% of taxpayers used computer software to do their taxes. A sample of 125 taxpayers was selected. What is the distribution of ˆp, the sample proportion of the 125 taxpayers that used computer software to do their taxes? 1. Check if we can use the Normal distribution. p = 0.34, n = 125 np = 125(0.34) = 42.5 n(1 p) = 125(1 0.34) = 125(0.66) = 82.5 Both np and n(1 p) are greater than 10 so we can use the Normal distribution. 2. The mean is: µˆp = p = 0.34. If we take a sample we "expect" 34% to have used computer software to do their taxes. 3. The standard deviation is: σˆp = p(1 p) n = 0.34(1 0.34) 125 = 0.0424-3339 54 / 56

Example continued According to the National Retail Federation, 34% of taxpayers used computer software to do their taxes. A sample of 125 taxpayers was selected. What is the probability that between 28% and 40% of the taxpayers from the sample of 125 used computer software to do their taxes? 1. We want: P(0.28 < ˆp < 0.40) - 3339 55 / 56

Facebook Example The Social Media and Personal Responsibility Survey in 2010 found the 69% of parents are "friends" with their children on Facebook. A random sample of 140 parents was selected and we determined the proportion of parents from this sample, ˆp that are "friends" with their children on Facebook. 1. What is the shape of the sampling distribution of ˆp. 2. What is the mean of the sampling distribution of ˆp. 3. What is the standard deviation of the sampling distribution of ˆp. 4. What is the probability that the sample proportion of 140 parents is greater than 72%? - 3339 56 / 56