Chapter 7. Sampling Distributions and the Central Limit Theorem

Similar documents
Chapter 7. Sampling Distributions and the Central Limit Theorem

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Binomial Random Variables. Binomial Random Variables

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Review of the Topics for Midterm I

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 5. Sampling Distributions

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

BIO5312 Biostatistics Lecture 5: Estimations

Chapter 7: Point Estimation and Sampling Distributions

Elementary Statistics Lecture 5

MA : Introductory Probability

Chapter 3 Discrete Random Variables and Probability Distributions

The Binomial Probability Distribution

Chapter 3 - Lecture 5 The Binomial Probability Distribution

PROBABILITY DISTRIBUTIONS

MATH 264 Problem Homework I

4 Random Variables and Distributions

Central Limit Theorem (cont d) 7/28/2006

Bernoulli and Binomial Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Business Statistics 41000: Probability 4

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

MATH 3200 Exam 3 Dr. Syring

The binomial distribution p314

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

CHAPTER 5 SAMPLING DISTRIBUTIONS

Probability Distributions for Discrete RV

Statistics for Business and Economics

Introduction to Business Statistics QM 120 Chapter 6

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

5. In fact, any function of a random variable is also a random variable

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

4.2 Bernoulli Trials and Binomial Distributions

The Binomial Distribution

Random Variables Handout. Xavier Vilà

Test 7A AP Statistics Name: Directions: Work on these sheets.

Uniform Probability Distribution. Continuous Random Variables &

Statistics 431 Spring 2007 P. Shaman. Preliminaries

The Central Limit Theorem

Chapter 9: Sampling Distributions

Chapter 5: Statistical Inference (in General)

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

AMS7: WEEK 4. CLASS 3

5.3 Statistics and Their Distributions

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Statistical Methods in Practice STAT/MATH 3379

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Chapter 8: The Binomial and Geometric Distributions

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Counting Basics. Venn diagrams

Section Random Variables and Histograms

Problem Set 07 Discrete Random Variables

Section The Sampling Distribution of a Sample Mean

The Normal Approximation to the Binomial

Chapter 6 Section Review day s.notebook. May 11, Honors Statistics. Aug 23-8:26 PM. 3. Review team test.

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Stat 213: Intro to Statistics 9 Central Limit Theorem

2011 Pearson Education, Inc

Section Distributions of Random Variables

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Simple Random Sample

STOR Lecture 7. Random Variables - I

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

CHAPTER 5 Sampling Distributions

Statistics and Their Distributions

Back to estimators...

Section Distributions of Random Variables

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

7 THE CENTRAL LIMIT THEOREM

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Sampling Distribution

Homework Assignments

Sampling & populations

FINAL REVIEW W/ANSWERS

Standard Normal, Inverse Normal and Sampling Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Midterm Exam III Review

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations

SAMPLING DISTRIBUTIONS. Chapter 7

Populations and Samples Bios 662

Random Variables and Probability Functions

Chapter 7. Sampling Distributions

The Bernoulli distribution

9 Expectation and Variance

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

The Binomial distribution

Transcription:

Chapter 7. Sampling Distributions and the Central Limit Theorem 1 Introduction 2 Sampling Distributions related to the normal distribution 3 The central limit theorem 4 The normal approximation to binomial distribution 1

1. Introduction Assume that Y 1, Y 2,..., Y n is a random sample from a population with the common distribution. Suppose one is interested in estimating a population mean µ from observed samples Y 1 = y 1, Y 2 = y 2,..., Y n = y n. Then one might want to use the sample mean ȳ = 1 n ni=1 y i to estimate µ. Given observed samples, ȳ is just a single number. Then how one can know the goodness of this estimate ȳ for µ? Note that ȳ is calculated from the formula Ȳ = 1 n n i=1 Y i, a function of the observable r.v. s Y 1, Y 2,..., Y n and the (constant) sample size n. Since Ȳ is also a r.v. itself, it has the probability distribution. If one knows the probability distribution of Ȳ, one can know the goodness of Ȳ for µ. 2

Assumption Y 1, Y 2,..., Y n is a random sample from a population with probability mass function p(y) or probability density function f(y) the random variable (r.v.)s Y 1, Y 2,..., Y n are independent with common probability mass function p(y) or common density function f(y) Y 1,..., Y n iid p(y) or f(y) (Def 7.1) A statistic is a function of the observable random variables in a sample and known constants for a parameter of our interest. A statistic itself is a random variable. (e.g.) Ȳ = 1 n ni=1 Y i (Def 7.2) The sampling distribution of the statistic is the probability distribution for the statistic (or the distribution of the statistic for all possible samples of a given size). (e.g.) Sampling distribution of Ȳ = 1 n ni=1 Y i? 3

How one can obtain the sampling distribution of a statistic? [M1] The sampling distribution of a statistic is the probability distribution, under repeated sampling of the population, of a given statistic. (Example 1) The sample Ȳ is to be calculated from a random sample of size 2 taken from a population consisting of ten values (2,3,4,5,6,7,8,9,10,11). Find the sampling distribution of Ȳ, based on a random sample of size 2. There are possible samples of two items selected from the ten items(see [Table 1]). Assuming each sample of size 2 is equally likely, [Table 2] shows the sampling distribution for Ȳ based on n = 2 observations selected from the population (2,3,4,5,6,7,8,9,10,11). (Example 2) Consider a large normal population. Assume we repeatedly take samples of a given size from the population and calculate the sample mean of the data values for each sample, ȳ. Different samples will lead to different sample means. The distribution of these means is the sampling distribution of Ȳ (for the given sample size). [M2] One can mathematically derive the sampling distribution of a statistic if one knows the distribution of the random variables Y 1,..., Y n (using Chapter 6) 4

[Table 1] for (Example 1) Sample ȳ Sample ȳ Sample ȳ Sample ȳ 2,3 2.5 2,4 3 2,5 3.5 2,6 4 2,7 4.5 2,8 5 2,9 5.5 2,10 6 2,11 6.5 3,4 3.5 3,5 4 3,6 4.5 3,7 5 3,8 5.5 3,9 6 3,10 6.5 3, 11 7 4,5 4.5 4,6 5 4,7 5.5 4,8 6 4,9 6.5 4,10 7 4,11 7.5 5,6 5.5 5,7 6 5,8 6.5 5,9 7 5,10 7.5 5,11 8 6,7 6.5 6,8 7 6,9 7.5 6,10 8 6,11 8.5 7,8 7.5 7,9 8 7,10 8.5 7,11 9 8,9 8.5 8,10 9 8,11 9.5 9,10 9.5 9,11 10 10,11 10.5 [Table 2] for (Example 1) ȳ 2.5 3 3.5 4 4.5 5 5.5 6 6.5 1 1 2 2 3 3 4 4 5 p(ȳ) ȳ 7 7.5 8 8.5 9 9.5 10 10.5 p(ȳ) 4 4 3 3 2 This sampling distribution provides a way to make statistical inferences about Ȳ in the example: calculate the following probability: P (3.5 Ȳ 9.5) = 41/. 2 1 1 5

2. Sampling Distributions related to the normal distribution In many applied problems it is reasonable to assume that the observed random variables in a random sample, Y 1, Y 2,..., Y n, are independent with a common, normal density function. In this section, we will develop the sampling distributions of various statistics calculated by using the observations in a random sample from a normal population(or independent random samples from two normal populations). Inference about µ of a normal population with known variance σ 2 (Theorem 7.1) Let Y 1, Y 2,..., Y n be a random sample of size n from a normal distribution with mean µ and variance σ 2. Then Ȳ = 1 n n i=1 Z Ȳ µ Ȳ σ 2 Ȳ Y i N ( µ, σ 2 /n ). = n (Ȳ µ σ ) N (0, 1). 6

(Example 7.2) A bottling machine can be regulated so that it discharges an average of µ ounces per bottle. It has been observed that the amount of fill dispensed by the machine is normally distributed with σ = 1.0 ounce. A sample of n = 9 filled bottles is randomly selected from the output of the machine a given day and the ounces of fill machined for each. Find the probability that the sample mean will be within 0.3 ounce of the true mean µ for the particular setting. (Example 7.3) In Example 7.2, how many observations should be included in the sample if we wish Ȳ to be within 0.3 ounce of µ with (at least) probability 0.95? 7

3. The central limit theorem By Theorem 5.12, E(Ȳ ) = µ and V (Ȳ ) = σ 2 /n if Y 1, Y 2,..., Y n represents a random sample from any distribution with mean µ and variance σ 2. If one samples from a normal distribution, Ȳ has a normal distribution(theorem 7.1). [Question] But what can we say about the sampling distribution of Ȳ if the variables Y i are not normally distributed? [Answer] Under some conditions, Ȳ will have a sampling distribution that is approximately normal as long as the sample size is large. In this section we will develop an approximation for the sampling distribution of Ȳ that can used regardless of the distribution of the population from which the sample is taken : the Central Limit Theorem. 8

(Theorem 7.4) Let Y 1, Y 2,..., Y n be independent and identically distributed random variables with E(Y i ) = µ < and V (Y i ) = σ 2 <. Define U n Ȳ E(Ȳ ) = (Ȳ ) µ n V ar(ȳ ) σ where Ȳ = 1 n ni=1 Y i. Then the distribution function of U n converges to a standard normal distribution function as n. Note that P (a Ȳ µ b) = P ( P a σ/ n Z b σ/ n Z N(0, 1). ( ) a σ/ n U n b σ/ n ) for large n where The central limit theorem can be applied to a random sample Y 1, Y 2,..., Y n from any distributions, so long as E(Y i ) = µ and V (Y i ) = σ 2 are both finite and the sample size is large. 9

(Example 7.8) Achievement test scores of all high school seniors in a state have mean 60 and variance 64. A random sample of n = 100 students from one large high school had a mean score of 58. Is there evidence to suggest that this high school is inferior?(calculate the probability that the sample mean is at most 58 when n = 100) (Example 7.9) The service time for customers through a checkout counter in a retail store are independent random variable with mean 1.5 minutes and variance 1.0. Approximate the probability that 100 customers can be served in less than 2 hours of total service time. 10

4. The normal approximation to binomial distribution The central limit theorem also can be used to approximate probabilities for some discrete random variables when the exact probabilities are tedious to calculate. One useful example involves the binomial distribution for large values of the number of trials, n. Suppose that Y has a binomial distribution with n trials and probability of success on any one trial denoted by p. How we can obtain P (Y b)? [M1]. P (Y b) = b i=0 P (Y = b) where Y b(n, p). For some values of the sample size n, tables are available, but direct calculation is tedious for large values of n for which tables may be not available. 11

[M2]. We can use the central limit theorem for large values of n : we can think Y, the number of successes in n trials, as a sum of a sample consisting of 0s and 1s; that is where X i = Y = n X i i=1 1 if ith trial results in success, 0 otherwise. The X i for i = 1, 2,..., n are independent Bernoulli random variables, and X i has E(X i ) = p and V (X i ) = p(1 p) for i = 1, 2,..., n. Consequently, when n is large, the sample fraction of successes, Y n = 1 n n i=1 X i = X possesses an approximately normal sampling distribution with mean E(Y/n) = E(X i ) = p and variance V (Y/n) = V (X i )/n = p(1 p)/n. 12

Thus, by the central limit theorem, we can think that if Y b(n, p) and n is large, then Y/n has an approximately normal sampling distribution with mean E(Y/n) = p and variance V (Y/n) = p(1 p)/n (in other words, Y possesses an approximately normal sampling distribution with mean E(Y ) = np and variance V (Y ) = np(1 p) ) (Example 7.10)Candidate A believes that she can win a city election if she can earn at least 55% of the votes in precinct I. She also believes that about 50% of the city s voters favor her. If n = 100 voters show up to vote at precinct I, what is the probability that candidate A will receive at least 55% of their votes? 13

In this approximation, (1) One tries to approximate a discrete distribution represented by a histogram with a continuous density function. (2) Slight adjustment on the boundaries (called 0.5 continuity correction) can lead to substantial improvement in the approximation. (Example) Suppose Y B(6, 5). Calculate P (2 Y 4) (Example 7.10 revisited) 14