STAT 241/251 - Chapter 7: Central Limit Theorem

Similar documents
STAT Chapter 7: Central Limit Theorem

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 7: Point Estimation and Sampling Distributions

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Elementary Statistics Lecture 5

Random Variable: Definition

Commonly Used Distributions

Section The Sampling Distribution of a Sample Mean

Chapter 3 - Lecture 5 The Binomial Probability Distribution

STAT Chapter 4/6: Random Variables and Probability Distributions

Part V - Chance Variability

Engineering Statistics ECIV 2305

Statistics and Probability

Business Statistics 41000: Probability 4

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Statistics 6 th Edition

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics

5.3 Statistics and Their Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

TOPIC: PROBABILITY DISTRIBUTIONS

Chapter 5: Statistical Inference (in General)

Central Limit Thm, Normal Approximations

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Business Statistics. Chapter 5 Discrete Probability Distributions QMIS 120. Dr. Mohammad Zainal

Simple Random Sample

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Lecture 8 - Sampling Distributions and the CLT

Introduction to Business Statistics QM 120 Chapter 6

Chapter 5. Statistical inference for Parametric Models

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 7: Random Variables

Probability. An intro for calculus students P= Figure 1: A normal integral

Statistics for Managers Using Microsoft Excel 7 th Edition

Midterm Exam III Review

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

MATH 3200 Exam 3 Dr. Syring

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

Chapter 5. Sampling Distributions

Section Distributions of Random Variables

8.1 Binomial Distributions

Chapter 6: Random Variables

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

ECON 214 Elements of Statistics for Economists 2016/2017

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Random Variables. 6.1 Discrete and Continuous Random Variables. Probability Distribution. Discrete Random Variables. Chapter 6, Section 1

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

4.2 Bernoulli Trials and Binomial Distributions

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

STAT 201 Chapter 6. Distribution

Lecture 9 - Sampling Distributions and the CLT

Central Limit Theorem, Joint Distributions Spring 2018

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

1 Sampling Distributions

5. In fact, any function of a random variable is also a random variable

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Module 4: Probability

4.3 Normal distribution

Chapter 5. Discrete Probability Distributions. McGraw-Hill, Bluman, 7 th ed, Chapter 5 1

ECON 214 Elements of Statistics for Economists 2016/2017

STAT Chapter 6: Sampling Distributions

STATS 200: Introduction to Statistical Inference. Lecture 4: Asymptotics and simulation

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Statistics, Their Distributions, and the Central Limit Theorem

Sampling Distributions For Counts and Proportions

Lecture 9. Probability Distributions. Outline. Outline

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Mathematics of Randomness

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Lecture 9. Probability Distributions

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

STAT 111 Recitation 3

Statistical Methods in Practice STAT/MATH 3379

Bernoulli and Binomial Distributions

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Central Limit Theorem (cont d) 7/28/2006

4.2 Probability Distributions

Section Distributions of Random Variables

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Data Analysis and Statistical Methods Statistics 651

***SECTION 8.1*** The Binomial Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

PROBABILITY DISTRIBUTIONS

MidTerm 1) Find the following (round off to one decimal place):

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Chapter 4 Discrete Random variables

The Binomial Distribution

Transcription:

STAT 241/251 - Chapter 7: Central Limit Theorem In this chapter we will introduce the most important theorem in statistics; the central limit theorem. What have we seen so far? First, we saw that for an i.i.d random sample X 1, X 2,..., X n, where E(X i ) = µ and Var(X i ) = σ 2 E( X) = µ X = µ, Var( X) = σ 2 X = σ2 n E(S) = µ S = nµ, Var(S) = σ 2 S = nσ2 Further, in chapter 5 we saw that if the X i N(µ, σ 2 ), then... X N(µ, σ2 n ) S N(nµ, nσ 2 ) (ie) Linear functions of a normal RV are also normally distributed. These are nice results, although often we have a random sample where each of the X i s do not follow a normal distribution. (The distribution we are sampling from is not Normal) (eg) We may take a random sample of new fluorescent lightbulbs and measure the life-time of each bulb. In this case, we might model the life-time of each bulb as an exponential RV. For this random sample, we will still probably look at the mean life-time of all the bulbs or maybe the sum of the life-times. So, what distribution do X and S follow then? 1

The Central Limit Theorem (CLT) If our random sample X 1, X 2,..., X n is i.i.d and comes from any particular distribution (may not be normal) with mean µ and variance σ 2, then when n is large enough (and the samples are independent of one another and are made randomly), the sample mean X and the sample sum S approximately follow a normal distribution. (ie) If X i Almost any shape distribution with mean µ and variance σ 2, and large n then... X N(µ, σ2 n ) S N(nµ, nσ 2 ) This result is very important and is used extensively throughout statistics, as it tells us that no matter what distribution our random sample comes from, that the sample mean and sample sum follow a Normal distribution as long as certain conditions are met. The CLT is an asymptotic or limit result, meaning that when n =, X and S are Normally distributed, but when n <, X and S are only approximately Normally distributed. This raises the question, when is n large enough to say that X and S are Normally distributed? Answer: The more symmetric and light tailed the distribution of the X i s, the quicker that X and S will converge to Normality. Provided that the distribution of the X i s is not too skewed or asymetric, a sample size of n 20 is usually adequate for the CLT to kick in. If the distribution we are sampling from is very skewed, then we need a larger sample size. For the sake of this class, we will use n=20 as the magic number. In reality it depends more on the shape of the distribution that we are sampling from. 2

First we will look at a simulation to explain the central limit theorem. Then I will draw a picture to illustrate the concept. The CLT can also be proven mathematically, although that is beyond the scope of this course. Simulating The Sampling Distribution of a Mean: Below is a picture of histograms of some simulated rolls of dice. I will talk about these in class. Note: I used dice as the example, as these cover many areas. (ie) A dice can lead to proportions, it can be binomial, and it is an ordinal variable, which is similar to a quantitative variable. 10000 Rolls of 1 Die The Mean of 10000 Rolls of 2 Dice The Mean of 10000 Rolls of 3 Dice 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 die1 1 2 3 4 5 6 die2 1 2 3 4 5 die3 The Mean of 10000 Rolls of 5 Dice The Mean of 10000 Rolls of 25 Dice 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 die5 2.5 3.0 3.5 4.0 4.5 5.0 die25 We can see that as the sample size increases (1, 2, 3, 5, 25), the sampling distribution of the mean begins to look like a Normal distribution. Here is another picture interpretation of the CLT... 3

Examples: 1. You have designed a new sattelite that is planned to orbit in space for the next 150 years. It is set up in the following way. It has 25 battery packs. One powers the sattelite, and when it burns out the next battery takes over, and when that one burns out the next takes over, and so on. The lifetime of each battery follows an exponential distribution with a mean of 8 years and a standard deviation of 8 years. What is the probability that the sattelite runs out of batteries before the 150 years is up? 2. A standard bottle of beer advertises that it contains 341mL of beer. In fact, the machine that pours the beer into the bottle pours a mean amount of 343mL with a standard deviation of 2mL. The amount of beer poured follows a normal distribution. (a) What is the probability that a randomly selected bottle of beer is underfilled? (b) If you buy a two-four, what is the probability that no more than 4 bottles are underfilled? (c) If you buy a 6-pack, what is the probability that the average amount of liquid is less than 341mL? 3. An elevator has a limit of 10 people or 2000lbs. If 10 people get on the elevator what is the probability that they surpass the limit? Suppose that the weights of people follow a normal distribution with a mean of 170lbs and a standard deviation of 30lbs. 4. You have developed a new type of concrete that is reinforced with steel fibers. Suppose you know that a concrete block of this type has a true mean breaking strength of 500lbs with a standard deviation of 12lbs. If you take a sample of 25 blocks and test their breaking strengths, what is the probability that the 25 blocks have a mean breaking strength greater than 505lbs? 5. Suppose you will go to the casino and make 81 bets, each of $1, on the colour of the number coming up. What is the mean and variance of the gain/loss you will make from these 81 bets? What is the probability that you leave the casino having made money? 4

So far, we have only dealt with the CLT and continuous distributions. But what happens when we have a random sample where each X i comes from some discrete distribution? We present a few results here that follow from the CLT. Shortly, we will see that under certain conditions, we can use a normal distribution to approximate the binomial and the poisson distributions. Normal Approximation to the Binomial Recall: That if X BIN(n, p), then µ x = np and σ 2 x = np(1 p) When n is large and p is not too close to 0 or 1, then... BIN(n, p) N(np, np(1-p)) The rule-of-thumb for this approximation to work is that min{np, n(1 p)} 10 Continuity Correction: Because we are approximating a discrete distribution with a continuous distribution, we must make a continuity correction. (ie) In the discrete case, P(X x) P(X > x) P (X = k) = P (k 0.5 X k + 0.5) P (a X b) = P (a 0.5 X b + 0.5) P (a < X < b) = P (a + 0.5 X b 0.5) P (X < a) = P (X a 0.5) P (X a) = P (X a + 0.5) P (X > a) = P (X a + 0.5) P (X a) = P (X a 0.5) Note: The continuity correction makes little difference when n is large. I will explain the idea of the continuity correction more in depth during lecture, and the above should make more sense then. 5

Examples: 1. Consider rolling a die 150 times. What is the probability that you get... (a) Exactly 23 6 s? (b) Between 15 and 35 6 s? (c) More than 3 6 s? 2. You go to the casino and make 81 bets, each of $1, on the colour of the number coming up. What is the probability that you leave the casino having made money? 3. It is believed that 4% of children have a gene that may be linked to juvenille diabetes. Researchers are hoping to track 20 or more of these children (with the defect) for several years. they will test 732 newborn babies for the presence of this gene, and if the gene is present, they will track the child for several years. What is the probability that they find 20 or more subjects to be in the study? 4. Suppose you will toss a coin 100 times. Create an interval that you are 95% sure that the number of heads tossed will be in. Center this interval around the expected number of heads. 6

Normal Approximation to the Poisson Recall: That if X POISSON(λ ), then µ x = λ and σ 2 x = λ When λ is large, then... POISSON(λ ) N(λ, λ ) The rule-of-thumb for this approximation to work is that λ 20 Like in the Binomial case, here we are approximating a discrete distribution using a continuous distribution, so we must use the same continuity correction as in the Binomial case. Example: 1. Recall example (1) from chapter 6, in the section on Poisson Processes. We were monitoring the number of earthquakes in California over 6.7, and there were an average of 1.5 per year. (a) What is the probability of having more than 14 large earthquakes in the next 15 years? (b) What is the probability of having exactly 1 large earthquake in the coming year? 2. A factory produces sheet metal. Impurities occur at a rate of 0.1 per meter squared, and are equally likely to occur anywhere on a given sheet. The occurrence of impurities are independent of one another. If you buy 500 square meters of this sheet metal, what is the probability that there is more than 40 impurities on the sheet? 7