Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Similar documents
CHAPTER 5 SAMPLING DISTRIBUTIONS

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Sampling and sampling distribution

Elementary Statistics Lecture 5

Business Statistics 41000: Probability 4

Chapter 5. Sampling Distributions

Lecture 8 - Sampling Distributions and the CLT

Chapter 9: Sampling Distributions

Chapter 7: Point Estimation and Sampling Distributions

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

The topics in this section are related and necessary topics for both course objectives.

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

The binomial distribution p314

Chapter 5: Statistical Inference (in General)

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

Sampling Distributions For Counts and Proportions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

The normal distribution is a theoretical model derived mathematically and not empirically.

Chapter 6: Random Variables

Binomial Random Variables. Binomial Random Variables

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

Making Sense of Cents

STAT Chapter 6: Sampling Distributions

Statistics, Their Distributions, and the Central Limit Theorem

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Part V - Chance Variability

2011 Pearson Education, Inc

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Sampling Distributions

Section Sampling Distributions for Counts and Proportions

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

1 Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

Midterm Exam III Review

8.1 Estimation of the Mean and Proportion

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Random Variables. Chapter 6: Random Variables 2/2/2014. Discrete and Continuous Random Variables. Transforming and Combining Random Variables

SAMPLING DISTRIBUTIONS. Chapter 7

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Section The Sampling Distribution of a Sample Mean

Engineering Statistics ECIV 2305

Statistics, Measures of Central Tendency I

6 Central Limit Theorem. (Chs 6.4, 6.5)

Chapter 8: Binomial and Geometric Distributions

Discrete Random Variables and Probability Distributions

Chapter 3 Discrete Random Variables and Probability Distributions

*****CENTRAL LIMIT THEOREM (CLT)*****

Data Analysis and Statistical Methods Statistics 651

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

BIO5312 Biostatistics Lecture 5: Estimations

The Binomial Distribution

Stat 213: Intro to Statistics 9 Central Limit Theorem

The Binomial Distribution

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

The Binomial Distribution

The Binomial Probability Distribution

Distribution of the Sample Mean

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

Chapter 7. Sampling Distributions and the Central Limit Theorem

CHAPTER 6 Random Variables

Central Limit Theorem (cont d) 7/28/2006

5.4 Normal Approximation of the Binomial Distribution

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Chapter 7. Sampling Distributions and the Central Limit Theorem

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 9 & 10. Multiple Choice.

4 Random Variables and Distributions

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

ECON 214 Elements of Statistics for Economists 2016/2017

Lecture 6: Chapter 6

CH 5 Normal Probability Distributions Properties of the Normal Distribution

MAKING SENSE OF DATA Essentials series

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Mean of a Discrete Random variable. Suppose that X is a discrete random variable whose distribution is : :

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Simple Random Sampling. Sampling Distribution

Sampling Distributions and the Central Limit Theorem

5.3 Statistics and Their Distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

7 THE CENTRAL LIMIT THEOREM

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

ECO220Y Sampling Distributions of Sample Statistics: Sample Proportion Readings: Chapter 10, section

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

AMS7: WEEK 4. CLASS 3

Statistical Intervals (One sample) (Chs )

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Statistics for Business and Economics: Random Variables:Continuous

Lecture 2. Probability Distributions Theophanis Tsandilas

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Introduction to Statistical Data Analysis II

Transcription:

Sampling Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Sampling 1 / 20

Table of Contents 1 Sampling Distributions 2 Central Limit Theorem 3 Binomial Distribution (University of New Haven) Sampling 2 / 20

Sampling Distributions Sampling Distributions Sampling Distributions (University of New Haven) Sampling 3 / 20

Sampling Distributions Parameters and Statistics As we begin to use sample data to draw conclusions about a wider population, we must be clear about whether a number describes a sample or a population. A parameter is is a number that describes some characteristic of of the population. In In statistical practice, the value of of a parameter is is not known because we cannot examine the entire population. A statistic is is a number that describes some characteristic of of a sample. The value of of a statistic can be computed directly from the sample data. We often use a statistic to to estimate an unknown parameter. Remember s and p: statistics come from samples and parameters come from populations. We write µ (the Greek letter mu) for the population mean and σ for the population standard deviation. We write x(x-bar) for the sample mean and s for the sample standard deviation. 4 (University of New Haven) Sampling 4 / 20

Sampling Distributions Statistical Estimation The process of statistical inference involves using information from a sample to draw conclusions about a wider population. Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference. We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Population Sample Collect data from a representative Sample... Make an Inference about the Population. 5 (University of New Haven) Sampling 5 / 20

Sampling Distributions Sampling Variability Different random samples yield different statistics. This basic fact is called sampling variability: the value of a statistic varies in repeated random sampling. To make sense of sampling variability, we ask, What would happen if we took many samples? Population Sample Sample Sample Sample Sample Sample Sample Sample 6 (University of New Haven) Sampling 6 / 20

Sampling Distributions Sampling Distributions The law of large numbers assures us that if we measure enough subjects, the statistic x-bar will eventually get very close to the unknown parameter µ. If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we d have a sampling distribution. The population distribution of of a variable is is the distribution of of values of of the variable among all individuals in in the population. The sampling distribution of of a statistic is is the distribution of of values taken by the statistic in in all possible samples of of the same size from the same population. 7 (University of New Haven) Sampling 7 / 20

Sampling Distributions Mean and Standard Deviation of a Sample Mean Mean of a sampling distribution of a sample mean There is no tendency for a sample mean to fall systematically above or below µ, even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased estimate of the population mean µ. Standard deviation of a sampling distribution of a sample mean The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample. It is smaller than the standard deviation of the population by a factor of n. Averages are less variable than individual observations. 8 (University of New Haven) Sampling 8 / 20

Sampling Distributions The Sampling Distribution of a Sample Mean When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. The Sampling Distribution of of Sample Means Suppose that x is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ. Then : The mean of the sampling distribution of x is µ x = µ The standard deviation of the sampling distribution of x is σ x = σ n Note : These facts about the mean and standard deviation of x are true no matter what shape the population distribution has. If If individual observations have the N(µ,σ) distribution, then the sample mean of of an SRS of of size n has the N(µ, σ/ n) distribution regardless of of the sample size 9 n. n. 9 (University of New Haven) Sampling 9 / 20

Central Limit Theorem Central Limit Theorem Central Limit Theorem (University of New Haven) Sampling 10 / 20

Central Limit Theorem Central Limit Theorem I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the law of frequency of error [the normal distribution]. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self effacement amidst the wildest confusion. The huger the mob, and the greater the anarchy, the more perfect is its sway. It is the supreme law of Unreason. Francis Galton In the previous slide, the sampling distribution of X is depicted as: 1 with mean µ, ie unbiased. 2 with standard deviation σ/ n. 3 with normal distribution. The first two depictions are always true, regardless of sample size or population distribution. The Central Limit Theorem (below) says the third depiction is approximately true, regardless of population distribution, for large sample sizes, n. As Francis Galton said, the averaged effects of random acts from a large mob form a familiar pattern. Theorem (Central Limit Theorem, CLT) Consider a random sample of size n from a population with mean µ and standard deviation σ. For large n, the sampling distribution of X is approximately N ( µ, σ/ n ). (University of New Haven) Sampling 11 / 20

Central Limit Theorem Example Based on service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour and the standard deviation is σ = 1. Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hours per unit. Will this be enough? The central limit theorem states that the sampling distribution of the mean time spent working on the 70 units is: = μ =1 μ x σ x = σ n = 1 70 = 0.12 The sampling distribution of the mean time spent working is approximately N(1, 0.12) because n = 70 30. 1.1 1 z = 0.12 = 0.83 P(x > 1.1) = P(Z > 0.83) = 1 0.7967 = 0.2033 If you budget 1.1 hours per unit, there is a 20% chance the technicians will not complete the work within the budgeted time. 11 (University of New Haven) Sampling 12 / 20

Central Limit Theorem A Few More Facts Any linear combination of independent Normal random variables is also Normally distributed. More generally, the central limit theorem notes that the distribution of a sum or average of many small random quantities is close to Normal. Finally, the central limit theorem also applies to discrete random variables. 12 (University of New Haven) Sampling 13 / 20

Binomial Distribution Binomial Distribution Binomial Distribution (University of New Haven) Sampling 14 / 20

Binomial Distribution Definition (Bernoulli Distribution, X BIN(1, p)) Model: X = # heads after tossing a coin once, that has a probability of heads on each toss equal to p. Definition (Binomial Distribution, X BIN(n, p)) Model: X = # heads after tossing a coin n times, that has a probability of heads on each toss equal to p. Theorem If X BIN(n, p) and j is a nonnegative integer between 0 and n inclusive ( ) n P(X = j) = p j (1 p) n j. j Furthermore µ X = np, σ 2 X = np(1 p) and σ X = np(1 p). (University of New Haven) Sampling 15 / 20

Binomial Distribution Let Y 1, Y 2,, Y n be a random sample from BIN(1, p). Then 1 X def = n j=1 Y j BIN(n, p). 2 ˆp def = Ȳ = # of heads # of tosses is an unbiased estimator of p. 3 For ( large n, the) distribution of ˆp = Ȳ is approximately N p, by the Central Limit Theorem. Since X = nȳ p(1 p) n one has Theorem (Normal Approximation for Binomial Distribution) For ( large n, one has X BIN(n, p) is approximately distributed as N np, ) np(1 p). For how large of n is the above approximate good? Convention When np 10 and n(1 p) 10. (University of New Haven) Sampling 16 / 20

Binomial Distribution When dealing with discrete random variables as the binomial distribution, a continuity correction can greatly improve accuracy. For instance consider the example: Example (Exact) Joe always runs red lights. The probability of an accident for each red light run is 0.3. Of the last 100 red lights run, what is the probability that there were 25 or fewer accidents? Solution: Letting X BIN(100, 0.3) be the number of accidents. The exact answer is ( ) 25 25 100 P(X = j) = (0.3) j (0.7) 100 j = 0.1631, j j=0 j=0 (obtained with Mathematica). Or using R, > pbinom(25,100,0.3) [1] 0.1631301 The exact answer can t easily be obtained without a computer. (University of New Haven) Sampling 17 / 20

Binomial Distribution Example (Normal approximation without continuity correction) Joe always runs red lights. The probability of an accident for each red light run is 0.3. Of the last 100 red lights run, what is the probability, approximately, that there were 25 or fewer accidents? Solution: Let X BIN(100, 0.3). Since 100(0.3) ( 10 and 100(1 0.3) 10, X has approximately the same distribution as Y N 30, ) 100(0.3)(1 0.3) = N(30, 4.582576). Thus P[X 25] P [Y 25] [ ] Y 30 25 30 = P 4.582576 4.582576 = P [Z 1.091089] = 0.1379 using the Table. Instead of using a table, one can get more accuracy using R for the normal approximation without continuity correction: > pnorm(25,30,sqrt(100*0.3*(1-0.3))) [1] 0.1376168 The approximation is unsatisfactory. (University of New Haven) Sampling 18 / 20

Binomial Distribution Continuity Correction Let X BIN(n, p) and let j, k be integers such that 0 j k n. Then it is common practice to use the following approximation when np 10 and n(1 p) 10: P [j X k] P [j 0.5 Y k + 0.5] ( where Y N np, ) np(1 p). (University of New Haven) Sampling 19 / 20

Binomial Distribution Example (Normal approximation with continuity correction) Joe always runs red lights. The probability of an accident for each red light run is 0.3. Of the last 100 red lights run, what is the probability, approximately, that there were 25 or fewer accidents? Since 100(0.3) ( 10 and 100(0.7) 10 the above convention says, letting Y N 30, ) 100(0.3)(1 0.3) = N(30, 4.582576) P(X 25) P(Y 25.5) ( ) Y 30 25.5 30 = P 4.582576 4.582576 = P(Z 0.9819805) 0.1635 using the Table. Instead of using a table, one can get more accuracy using R for the normal approximation with continuity correction: > pnorm(25.5,30,sqrt(100*0.3*(1-0.3))) [1] 0.1630547 This approximation is much, much better than the normal approximation without continuity correction. (University of New Haven) Sampling 20 / 20