CHAPTER 5 SAMPLING DISTRIBUTIONS

Similar documents
Lecture 6: Chapter 6

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Midterm Exam III Review

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

MA131 Lecture 8.2. The normal distribution curve can be considered as a probability distribution curve for normally distributed variables.

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

ECON 214 Elements of Statistics for Economists 2016/2017

SAMPLING DISTRIBUTIONS. Chapter 7

Part V - Chance Variability

Section 6.5. The Central Limit Theorem

MAKING SENSE OF DATA Essentials series

The Normal Probability Distribution

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

CHAPTER 5 Sampling Distributions

Chapter 7 1. Random Variables

Elementary Statistics Lecture 5

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution

MidTerm 1) Find the following (round off to one decimal place):

The normal distribution is a theoretical model derived mathematically and not empirically.

ECON 214 Elements of Statistics for Economists

2011 Pearson Education, Inc

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

6 Central Limit Theorem. (Chs 6.4, 6.5)

Chapter 7: Point Estimation and Sampling Distributions

1. Variability in estimates and CLT

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chapter 5. Sampling Distributions

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Business Statistics 41000: Probability 4

Chapter 7. Sampling Distributions and the Central Limit Theorem

Statistics for Business and Economics: Random Variables:Continuous

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Math 227 Elementary Statistics. Bluman 5 th edition

Lecture 8 - Sampling Distributions and the CLT

Chapter Seven. The Normal Distribution

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Chapter 9: Sampling Distributions

Chapter 8. Binomial and Geometric Distributions

Normal Curves & Sampling Distributions

Section Introduction to Normal Distributions

The Central Limit Theorem

AP Stats Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Counting Basics. Venn diagrams

7 THE CENTRAL LIMIT THEOREM

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Lecture 9. Probability Distributions. Outline. Outline

Chapter 7. Sampling Distributions

CHAPTER 6 Random Variables

8.1 Estimation of the Mean and Proportion

Sampling Distributions

MATH 264 Problem Homework I

The Binomial Probability Distribution

Chapter 8: The Binomial and Geometric Distributions

Lecture 9. Probability Distributions

4.2 Probability Distributions

Making Sense of Cents

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

AMS7: WEEK 4. CLASS 3

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Terms & Characteristics

Statistical Intervals (One sample) (Chs )

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Chapter 5: Statistical Inference (in General)

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Math 243 Lecture Notes

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

AP * Statistics Review

Sampling Distributions

Section The Sampling Distribution of a Sample Mean

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

CHAPTER 6 Random Variables

Normal Probability Distributions

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Chapter 7 Study Guide: The Central Limit Theorem

Chapter 9 & 10. Multiple Choice.

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 15: Sampling distributions

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

Review of the Topics for Midterm I

Probability Distribution Unit Review

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

The Binomial Distribution

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

χ 2 distributions and confidence intervals for population variance

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 8 Estimation

Transcription:

CHAPTER 5 SAMPLING DISTRIBUTIONS Sampling Variability. We will visualize our data as a random sample from the population with unknown parameter μ. Our sample mean Ȳ is intended to estimate population mean μ. Each sample we take is going to give us a different estimate. The variability among the samples is called sampling variability. Probability distribution of Ȳ values from all possible samples from our population is called sampling distribution of Ȳ. In General: Sampling Distribution of a statistics = the distribution of all possible values of the statistic for samples of a given size. Sampling Error the error resulting from using a sample characteristic (statistic) to estimate a population characteristic (parameter). SAMPLING DISTRIBUTION OF A SAMPLE MEAN We will try to observe a behavior of Ȳ for unrealistic small population of size N=5. Example 1. Weight of certain breed of dogs. Below we have unrealistic small population of 5 dogs and their weight in pound. dog A B C D E weight 42 48 52 58 60 The population mean height is x 48+ 52+ 58+ 60 μ= =42+ =52 pounds. σ=6.57 pounds N 5 Possible samples of size two and its means are summarized in the following table. Samples A, B A, C A, D A, E B, C B, D B, E C, D C, E D, E Weights 42,48 42,52 42,58 42,60 48,52 48,58 48,60 52,58 52,60 58,60 Y 45 47 50 51 50 53 54 55 56 59 Any sample of size 2 we will take is going to be one of the above 10 possible samples, so probability of obtaining each value of Ȳ is 1/10. How confident of our sample mean of size two is going to estimate our population mean within 2 pounds of the population mean weight? In other words what is P(50 Ȳ 54)? 1

Since there are 5 samples ({A,D}, {A,E}, {B,C}, {B,D} and {B, E} ) which lie within 2 pounds of the population mean 52, P(50 Ȳ 54)=50% We can also compute mean and standard deviation of all μ y =52 and σ Ȳ=4.02 Ȳ values, they are: Notice that mean of all Ȳ values is the same as the population mean and standard deviation is smaller than the population standard deviation. Now we repeat our example for samples of size 4 Possible samples of size four and its means are summarized in the following table. Samples A, B, C,D A,B,C,E A,B,D,E A,C,D,E B,C,D,E Weights 42,48, 52, 58 42,48, 52, 60 42,48,58,60 42,52,58,60 48,52,58,60 Y 50 50.5 52 53 54.5 Any sample of size 4 we will take is going to be one of the above 5 possible samples, so probability of obtaining each value of Ȳ is 1/5. This time P(50 Ȳ 54) = 4/5=80% We can also compute mean and standard deviation of all μ y =52 and σ Ȳ=1.64 Ȳ values, they are: In conclusion we can clearly see that mean the distribution of Ȳ remains the same as a population mean, regardless of the sample size. Standard deviation of that distribution decreases as n increases. Sample size and Sampling Error As sample size increases, the more sample means cluster around the population mean, and the sampling error of estimating, by Ȳ is smaller. The Mean and Standard deviation of Ȳ We use the sampling distribution of the sample mean to make inferences about a population mean based on the mean of a sample from the population. Bur generally we do not know the exact distribution of the sample mean (sampling distribution) Under certain conditions, we can approximate the sampling distribution of the sample mean ( Ȳ ) by the normal distribution. Normal distribution is determined by its mean and standard deviation. So let s denote its mean is μȳ and its standard deviation is σ Ȳ. 2

Mean of the variable Ȳ For samples of size n, the mean of the variable Ȳ equals the mean of the variable Y under consideration i.e. the mean of all possible sample means equals the population mean. μȳ=μ. Standard Deviation of the variable Ȳ For samples of size n, the standard deviation of the variable Ȳ equals the standard deviation of the variable under consideration divided by the square root of the sample size, i.e. the standard deviation of all possible sample means equals the population standard deviation divided by the square root of the sample size) σ Ȳ= σ n Sample Size and Sampling Error 1. The larger the sample size, the smaller the standard deviation of Ȳ. 2. The smaller the standard deviation of Ȳ, the more closely its possible values cluster around the mean of Y. NOTE: The standard deviation of Ȳ determines the amount of sampling error to be expected when a population mean is estimated by s ample mean. The Shape of the Sampling Distribution of the Sample Mean Ȳ If the variable Y of a population is normally distributed with mean and standard deviation, then, for any sample of size n 1, the variable Ȳ is also normally distributed with mean and standard deviation. n So what is the distribution shape if Y is not normally distributed? We can then apply the following theorem: The Central Limit Theorem (CLT) one of the most important theorems is statistics For a relatively large sample size ( n 30 ), the variable Ȳ is approximately normally distributed, regardless of the distribution of the variable Y under consideration. The approximation becomes better and better with increasing sample size. Note: If departure from normality is not extreme, smaller samples than 30 may be used. 3

Following graphs represent distribution of IQ scores (Y) in some population (a) and sampling distributions of a sample mean ( ȳ ) for n=4 (b) and n=16 (c) Notice that all three distribution curves center at 100 ( μ ) and graphs get to be narrower as sample sizes are increasing Following graph illustrates distribution of househols sizes in the USA (clearly not normal distribution) This histogram illustrates distribution of the sample mean of 1000 samples of size n=30. Clearly nearly normal distribution 4

Example. Let Y be a height of males in certain population. Assume Y has approximately normal distribution with mean μ=69.7 inches and SD σ=2.8 inches. a) Suppose we randomly select one individual from that population, what is the probability that his height will exceed 72 inches? P(Y>72)= P(Z>0.82)=0.2061, where Z = 72 69.7 2.8 under N(69.7, 2.8) to the right of 72. and probability equals to the area b) Suppose we randomly select a sample of 4 individuals from that population. What is the probability that their average height ( Ȳ ) will estimate population mean with an error of no more than 1 inch? P(68.7 Ȳ 70.7)=P ( 0.71 Z 0.71)=0.5223, where 0.71= 68.7 69.7 and 2.8/ 4 0.71= 70.7 69.7 and probability equals to the area under N(69.7, 2.8/ 4=1.4 ) 2.8/ 4 c) Without computations, will the answer in part b change or will it remain the same if we take a sample of size 16? If n=16, distribution of Ȳ will be N(69.7, 2.8/ 16=0.7 ), so it will be narrower than the distribution for n=4. The % of all Ȳ values within 1 inch of off the mean will be larger, so our probability will increase. d) Suppose our population was not normal, but severely left skewed, what would be the answers to questions b and c? If population is not normal, we need to use Central Limit Theorem, but that requires us to have a large sample (of size at least 30). If samples are as small as 4 and 16, we can't assume that the distributiln of Ȳ will be normal or approximately normal, so we can't answer our questions in both parts. 5

Example (Central Limit Theorem) Based on service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour and the standard deviation is σ = 1. Your company will service an SRS of 70 conditioners. You budgeted 1.1 hour per unit. Will that be enough? The Central Limit Theorem stateds that the sampling distribution of the mean time spent working on 70 units is approximately normal with mean 1 and SD 0.12 (since n=70>30). z= 1.1 1 0.12 =0.83 P ( ȳ>1.1)= P( Z >0.83 ) =1 0.7967=0. 2033 If you budgeted 1.1 hour per unit, there is over 20% chance that the technicians will not complete the work within the budgeted time. SAMPLING DISTRIBUTION OF COUNTS AND SAMPLE PROPORTIONS. The Normal Approximation to the Binomial Distribution. The Central Limit Theorem tells us that the sampling distribution of a mean becomes bell shaped as the sample size increases. Suppose we have a large dichotomous population with classes that can be labeled success and failure. If we take a sample and we let Y=number of successes and we compute a sample proportion of successes p= Y n, than ^p is also governed by Central Limit Theorem. This means that if the sample is large, then distribution of Y and also of ^p will be approximately normal. Recall from chapter 3 that Y and consequently If p=probablity of a success, then ^p will have a binomial distribution. 6

For Binomial Random Variable Y: Average (Expected) value of Y: E(Y )=μ Y =np Standard Deviation of Y: σ Y = np(1 p) ^p also has a binomial distribution. Mean and SD of p= Y = n : μ p = p σ p(1 p) p n Normal Approximation for Counts and Proportions. Draw an SRS from dichotomous population with proportion of successes equal to p. Let Y be a number of successes in that sample. For large n: 1) sampling distribution of Y is approximately Normal μ Y =np σ Y = np(1 p) 2) sampling distribution of ^p= Y n is approximately Normal μ p = p = σ p(1 p) p n The approximation gets better with increasing n. It is best if p is close to 0.5, less so for very small or very large p. We will consider n large enough if np 5and n(1 p) 5 (our book), some researchers suggest a stronger requirement, namely: np 10and n(1 p) 10 Example. Financial audit Suppose the financial record of a business are examined by tax authorities. Auditors examine an SRS of 150 sales records for compliance with tax laws. Suppose in fact 8% of company's records have errors. What is the probability that auditors will find no more than 10 erroneous records? Solution: (0)First check if n=150 is large enough: 150(0.08)=12>10, OK 1) Let X= # of erroneous records found in a sample of 150, X~N( 12, 3.3226) (mean = np=.08(150)=12, SD of X= sqrt(np(1-p)= sqrt[150(.08)(.92)]=3.3226) 7

P ( X 10)=P ( X 12 3.3226 10 12 )=P (Z 0.60)=.2743 3.3226 Using binomial distribution our answer is: binomcdf(150,0.08,10)=0.338, so our approximation is not that close 2)We can get the same answer using the following: Let ^p be a proportion of erroneous records in a sample of 150. p ~ N(.08, 0.02215) (mean=p=.08, SD=sqrt[p(1-p)/n]=sqrt[.08(.92)/150]=.02215) P ( X 10)=P ( ^p 0.06667)=P ( ^p 0.08 0.02215 0.06667.08 )= P( Z 0.60)=.2743 0.02215 (in these computations, in particular in part 2, try keep several decimal places, do not round up to much) The Continuity Correction. Normal Curves are continuous, so probabilities associated with normally distributed variables are computed as areas under their normal distribution curve. In case of normal approximation to binomial distribution we may be interested to use it to compute probability P(Y=k). Area under the normal curve over a point is zero, so we instead can compute P ( y=k)=p (k 0.5<Y <k +0.5), similarly P (Y k )=P (Y <k +0.5) or P (Y k )=P (Y <k 0.5) We can use that approach in our previous example: P ( X 10)=P ( X <10.5)= P( X 12 3.3226 10.5 12 )=P (Z 0.45)=.3264 3.3226 or using X=10.5, ^p= 10.5 150 =0.7 P ( X 10)=P ( ^p 0.07)=P ( ^p 0.08 0.02215 0.07.08 )= P(Z 0.45)=0.3264 0.02215 Notice that using binomial formula P ( X 10)=binomcdf (150,.08,10)=0.3384, we can see that using a continuity correction will give us slightly better approximation to the exact value. 8