Chapter 7 Sampling Distributions and Point Estimation of Parameters

Similar documents
Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Chapter 7: Point Estimation and Sampling Distributions

ECON 214 Elements of Statistics for Economists

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

The normal distribution is a theoretical model derived mathematically and not empirically.

CHAPTER 5 SAMPLING DISTRIBUTIONS

Part V - Chance Variability

Statistics for Business and Economics: Random Variables:Continuous

Business Statistics 41000: Probability 4

Chapter 7 Study Guide: The Central Limit Theorem

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Data Analysis and Statistical Methods Statistics 651

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Business Statistics 41000: Probability 3

Chapter 8 Statistical Intervals for a Single Sample

ECON 214 Elements of Statistics for Economists 2016/2017

Chapter 4 Continuous Random Variables and Probability Distributions

Central Limit Theorem

Introduction to Statistical Data Analysis II

Sampling and sampling distribution

Statistics, Their Distributions, and the Central Limit Theorem

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Review: Population, sample, and sampling distributions

4.2 Probability Distributions

Module 4: Probability

Distribution of the Sample Mean

Chapter 4 Continuous Random Variables and Probability Distributions

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Section 6.5. The Central Limit Theorem

Midterm Exam III Review

Chapter 5: Statistical Inference (in General)

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Statistics and Probability

2011 Pearson Education, Inc

Normal Probability Distributions

Chapter 15: Sampling distributions

Math 227 Elementary Statistics. Bluman 5 th edition

Lecture 6: Normal distribution

STAT Chapter 6: Sampling Distributions

Chapter 3 Discrete Random Variables and Probability Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 8. Introduction to Statistical Inference

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

Lecture 8 - Sampling Distributions and the CLT

The Normal Distribution

Chapter 8 Estimation

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

1. Variability in estimates and CLT

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Lecture 6: Confidence Intervals

Chapter 6: Random Variables

Data Analysis and Statistical Methods Statistics 651

Section The Sampling Distribution of a Sample Mean

Measure of Variation

Sampling Distributions

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

CHAPTER 6 Random Variables

Lecture 6: Chapter 6

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Lecture 5 - Continuous Distributions

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Lecture 9 - Sampling Distributions and the CLT

Chapter 7. Sampling Distributions

Confidence Intervals Introduction

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Introduction to Statistics I

Discrete Random Variables

What was in the last lecture?

5.7 Probability Distributions and Variance

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Statistics for Business and Economics

Density curves. (James Madison University) February 4, / 20

Stat 213: Intro to Statistics 9 Central Limit Theorem

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Lecture 2. Probability Distributions Theophanis Tsandilas

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Chapter 7. Random Variables

Chapter Seven. The Normal Distribution

Chapter 9: Sampling Distributions

Introduction to Business Statistics QM 120 Chapter 6

8.1 Estimation of the Mean and Proportion

6 Central Limit Theorem. (Chs 6.4, 6.5)

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Chapter 5. Statistical inference for Parametric Models

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

1 Sampling Distributions

University of California, Los Angeles Department of Statistics. The central limit theorem The distribution of the sample mean

Some Characteristics of Data

Data Analysis and Statistical Methods Statistics 651

5.3 Statistics and Their Distributions

The Central Limit Theorem

Transcription:

Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25

Statistical Inferences A random sample is collected on a population to draw conclusions, or make statistical inferences, about the population. Definition (Random Sample) The random variables X 1, X 2,..., X n are a random sample of size n if... 1) the X i s are independent 2) every X has the same probability distribution Types of statistical inference: 1 Parameter estimation (e.g. estimating µ) with a confidence interval For estimating µ, we collect data and we use the observed sample mean x as a point estimate for µ and create a confidence interval to report a likely range in which µ lies. 2 Hypothesis testing about a population parameter (e.g. H 0 : µ = 50) We wish to compare the mean time that women and men spend at the CRWC. H 0 : µ M = µ W? Or perhaps there is evidence against this hypothesis. 2 / 25

Sample Mean X, a Point Estimate for µ The sample mean X is used as a point estimate for the population parameter µ. It is a point estimate because it is a single value. NOTATION: ˆµ = X (a hat over a parameter represents an estimator) X is the estimator here Prior to data collection, X is random variable and it is the statistic of interest calculated from the data when estimating µ. The value we get for X (the sample mean) depends on the specific sample chosen! If X is random variable, then it has a certain expected value, variance, and distribution. The distribution of the random variable X is called the sampling distribution of X. 3 / 25

Sample-to-Sample Variability As stated earlier, there is randomness in the X value we get from a random sample. Suppose I want to estimate a population mean height µ using a sample mean X. Suppose I randomly select 50 individuals from a population, measure their heights, and find the sample mean x = 5 foot 6 inches Suppose I repeat the process, I again randomly select 50 individuals from a population, measure their heights, and find the sample mean x = 5 foot 8 inches Suppose I repeat the process, I again randomly select 50 individuals from a population, measure their heights, and find the sample mean x = 5 foot 5 inches I didn t do anything wrong in my data collection, this is just SAMPLING VARIABILITY! [NOTE: In reality, we only take one sample. The above is meant to emphasize the existence of sample-to-sample variability.] 4 / 25

The Sampling Distribution of X Definition (Sampling Distribution) The probability distribution of a statistic is called a sampling distribution. X is a statistic calculated from a random sample X 1, X 2,..., X n. X is a linear combination of random variables. X = n i=1 X i n = 1 n X 1 + 1 n X 2 + + 1 n X n For a random sample X 1, X 2,..., X n drawn from any distribution with E(X i ) = µ and V (X i ) = σ 2 or X i?(µ, σ 2 ), we have E( X) = µ and V ( X) = σ2 n But a mean and variance does not fully specify a distribution. Do we know the probability distribution of X?... 5 / 25

The Sampling Distribution of X It turns out that X has some predictable behavior... If the X 1, X 2,..., X n are drawn from a normal distribution, or by notation X i N(µ, σ 2 ) for all i, then Example X N(µ, σ2 n ) for any sample size n. Suppose IQ scores are normally distributed with mean µ = 100 and variance σ 2 = 256. If n = 9 IQ scores are drawn at random from this population, what is the probability that the sample mean is less than 93? ANSWER: Find P ( X < 93) (next page). 6 / 25

The Sampling Distribution of X Example Suppose IQ scores are normally distributed with mean µ = 100 and variance σ 2 = 256. If n = 9 IQ scores are drawn at random from this population, what is the probability that the sample mean is less than 98? ANSWER: Find P ( X < 93). We first need a distribution for X (it follows a normal distribution!), and then we ll use it to create a Z random variable and use the Z-table. 7 / 25

The Sampling Distribution of X Notation: E( X) = µ X = E(X) = µ V ( X) = σ 2 X = V (X) n = σ2 n Terminology: The term standard deviation refers to the population standard deviation, or V (X) = σ, and... Z = X µ σ The term standard error is a value related to X and is also more fully stated as the standard error of the sample mean and it is the square root of the variance of X. Std. Error of X is V ( X) = σ 2 n = σ n And then... Z = X µ σ 2 n = X µ σ/ n 8 / 25

The Sampling Distribution of X Even when X i are NOT drawn from a normal distribution, it turns out that X has some predictable behavior... If the X 1, X 2,..., X n were NOT drawn from a normal distribution, or by notation X i?(µ, σ 2 ) for all i, then X is approximately normally distributed as long as n is large enough or X N(µ, σ2 ) for n > 25 or 30. n Thus, X follows a normal distribution!!! (for a sufficiently large n) This is an incredibly useful result for calculating probabilities for X!! 9 / 25

The Sampling Distribution of X Example (Probability for X, Flaws in a copper wire) Let X denote the number of flaws in a 1 inch length of copper wire. The probability mass function of X is presented in the following table: x P (X = x) 0 0.48 1 0.39 2 0.12 3 0.01 Suppose n = 100 wires are sampled from this population. What is the probability that the average number of flaws per wire in the sample is less than 0.5? (i.e. find P ( X < 0.5)... next page) 10 / 25

The Sampling Distribution of X Example (Probability for X, Flaws in a copper wire) ANSWER: P ( X < 0.5))= 11 / 25

Central Limit Theorem (CLT) Definition (Central Limit Theorem) Let X 1, X 2,..., X n be a random sample drawn from any population (or distribution) with mean µ and variance σ 2. If the sample size is *sufficiently large*, then X follows an approximate normal distribution. We write: X d N(µ, σ 2 n ) as n Or: Z = X µ σ/ n d N(0, 1) as n If the random sample is drawn from a non-normal population, then X is approximately normal for sufficient large n (at least 25 or 30) and the approximation gets better and better as n increases. NOTE: If the original parent population from which the sample was drawn is normal, then X follows a normal distribution for any n (a linear combination of normals is normal), and the CLT is not needed to achieve normality. 12 / 25

f(x) x The Sampling Distribution of X (simulation) Let s simulate this situation... Case 1: Original population is normally distributed 1 Choose a sample of size n from a normal distribution 2 Compute x 3 Plot the x on our frequency histogram 4 Do steps 1-3 many time, such as 1000 times 5 Draw a histogram of the 1000 x values (to see the sampling distribution of X) See applet at: http://onlinestatbook.com/stat sim/sampling dist/index.html 13 / 25

The Sampling Distribution of X (simulation) Case 1: Original population is normally distributed (with n=2) The empirical distribution for X n=2 is in the lower plot (in blue). Its mean is very close to the parent population mean µ = 16, and its standard error of 3.59 is very close to the theoretical σ/ n = 5/ 2 = 3.54. 14 / 25

The Sampling Distribution of X (simulation) Case 1: Original population is normally distributed (with n=25) The empirical distribution for X n=25 is in the lower plot (in blue). Its mean is very close to the parent population mean µ = 16, and its standard error of 1.0 is the same as the theoretical σ/ n = 5/ 25 = 1. 15 / 25

x The Sampling Distribution of X (simulation) f(x) RESULT - If the parent population (the one you are drawing from) is normal, then X will follow a normal distribution for any sample size n with known mean and variance as show below. X N(µ, σ2 n ) 16 / 25

f(x) x f(x) x f(x) x The Sampling Distribution of X (simulation) Let s simulate this situation... Case 2: Original population is NOT normally distributed... 1 Choose a sample of size n from a NON-normal distribution 2 Compute x 3 Plot the x on our frequency histogram 4 Do steps 1-3 many time, such as 1000 times 5 Draw a histogram of the 1000 x values (to see the sampling distribution of X) See applet at: http://onlinestatbook.com/stat sim/sampling dist/index.html 17 / 25

The Sampling Distribution of X (simulation) Case 2: Original population is NOT normally distributed (with right-skewed parent population and n=10) The empirical distribution for X n=10 is in the lower plot (in blue). Its bell-shaped with a mean equal to the parent population mean µ = 8.08. σ Its standard error of 1.96 is very close to the theoretical n = 6.22 10 = 1.97. 18 / 25

The Sampling Distribution of X (simulation) Case 2: Original population is NOT normally distributed (with very non-normal parent population and n=2) FAIL!!!! The empirical distribution for X n=2 is in the lower plot (in blue) and it is not normally distributed. This is just too small of a sample size to overcome the very non-normal parent population. 19 / 25

The Sampling Distribution of X (simulation) Case 2: Original population is NOT normally distributed (with very non-normal parent population and n=25) The empirical distribution for X n=25 is in the lower plot (in blue). Its bell-shaped with a mean close to the parent population mean µ = 16.92. Its standard error of σ 2.46 is very close to the theoretical n = 12.29 25 = 2.458. 20 / 25

x x x The Sampling Distribution of X (simulation) f(x) f(x) f(x) RESULT - If the parent population (the one you are drawing from) is NOT normal, then X will follow an approximate normal distribution for sufficiently large n (we ll say n > 25 or 30). X N(µ, σ2 n ) This is the Central Limit Theorem. The approximation improves as n increases. 21 / 25

The Sampling Distribution of X A couple comments: Averages are less variable than individual observations. The distribution for X has less variability than the distribution for X. The distribution of our estimator X n is squeezed closer to, or is tighter, around the thing we re trying to estimate as n increases. For some non-normal distributions, the approximation is pretty good for n lower than 25 or 30, so it depends on the parent population from which we are drawing. 22 / 25

The Sampling Distribution of X The next graphic shows 3 different original populations (one nearly normal, two that are not), and the sampling distribution for X based on a sample of size n = 5 and size n = 30. The three original distributions are on the far left (one that is nearly symmetric and bell-shaped, one that is right skewed, and one that is highly right skewed). The graphic emphasizes the concept that the normal approximation becomes better as n increases. 23 / 25

The Sampling Distribution of X As shown in: Navidi, W. Statistics for Engineers and Scientists, McGraw Hill, 2006 24 / 25

The Sampling Distribution of X The variability of X decreases as n increases Recall: V ( X) = σ2 n. If the original population has a shape that s closer to normal, smaller n is sufficient for X to be normal. The normal approximation gets better with larger n when you re starting with a non-normal population. Even when X has a very non-normal distribution, X still has a normal distribution with a large enough n. 25 / 25