Data Analysis and Statistical Methods Statistics 651

Similar documents
Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 5. Sampling Distributions

Confidence Intervals Introduction

Data Analysis and Statistical Methods Statistics 651

ECON 214 Elements of Statistics for Economists 2016/2017

Sampling and sampling distribution

Data Analysis and Statistical Methods Statistics 651

Statistics for Business and Economics: Random Variables:Continuous

ECON 214 Elements of Statistics for Economists

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

AMS7: WEEK 4. CLASS 3

WebAssign Math 3680 Homework 5 Devore Fall 2013 (Homework)

Data Analysis and Statistical Methods Statistics 651

Business Statistics 41000: Probability 4

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Chapter 9: Sampling Distributions

Chapter 8 Statistical Intervals for a Single Sample

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

1. Variability in estimates and CLT

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

5.3 Statistics and Their Distributions

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Central Limit Theorem (cont d) 7/28/2006

Chapter 7 Study Guide: The Central Limit Theorem

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Homework: (Due Wed) Chapter 10: #5, 22, 42

Statistics 13 Elementary Statistics

Statistics and Probability

Lecture 8 - Sampling Distributions and the CLT

4.2 Probability Distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Statistical Methods in Practice STAT/MATH 3379

STAT Chapter 7: Confidence Intervals

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Chapter 7: Point Estimation and Sampling Distributions

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

6 Central Limit Theorem. (Chs 6.4, 6.5)

Elementary Statistics Lecture 5

Probability. An intro for calculus students P= Figure 1: A normal integral

Section The Sampling Distribution of a Sample Mean

Statistics, Their Distributions, and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem

Sampling & Confidence Intervals

Lecture 6: Chapter 6

Data Analysis and Statistical Methods Statistics 651

MAS187/AEF258. University of Newcastle upon Tyne

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

CH 5 Normal Probability Distributions Properties of the Normal Distribution

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Elementary Statistics

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

1 Sampling Distributions

Discrete Random Variables

Lecture 6: Confidence Intervals

Expected Value of a Random Variable

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

STAT Chapter 7: Central Limit Theorem

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

8.1 Estimation of the Mean and Proportion

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Confidence Intervals and Sample Size

6.1, 7.1 Estimating with confidence (CIS: Chapter 10)

1 Introduction 1. 3 Confidence interval for proportion p 6

Lecture 9 - Sampling Distributions and the CLT

The Binomial Probability Distribution

Chapter 8 Estimation

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

BIO5312 Biostatistics Lecture 5: Estimations

CHAPTER 5 SAMPLING DISTRIBUTIONS

Value (x) probability Example A-2: Construct a histogram for population Ψ.

MATH 3200 Exam 3 Dr. Syring

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

1 Small Sample CI for a Population Mean µ

Chapter 7 1. Random Variables

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

STAT 241/251 - Chapter 7: Central Limit Theorem

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

2011 Pearson Education, Inc

Chapter 9 & 10. Multiple Choice.

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Confidence Intervals. σ unknown, small samples The t-statistic /22

Commonly Used Distributions

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

SAMPLING DISTRIBUTIONS. Chapter 7

Transcription:

Review of previous lecture: Why confidence intervals? Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Suppose you want to know the mean height of people at A&M. You take as the sample all the people in this class and evaluate the average height of the class (this is the sample mean). The sample mean based on this class is 5.5 feet. What does 5.5 feet tell you about the mean height in the university? It is highly unlikely that the population mean will be exactly 5.5 feet. A more informative piece of information, than just the average, is to give an interval which one can say with a certain degree of confidence the mean should lie in. This is known as a confidence interval (CI). The size of the interval gives us information about how accurate the estimator 5.5 may be. 1 However without any assumptions on the distribution of the estimator, we are unable to construct an interval. We show today that it is often possible to assume normality. Example I Let us suppose that we have an estimator which we call X of the mean. This estimator is a random variable, which, for now, we assume is normally distributed with N(µ, σ 2 ). Suppose X = 3, this does not really tell us much about the location of the true mean µ. But (i) [3 1.96σ,3+1.96σ] tells us with 95% confidence the unknown mean µ lies in this interval. (ii) [3 1.64σ,3 + 1.64σ] tells us with 90% confidence the mean µ lies in this interval. (iii) [3 2.56σ,3 + 2.56σ] tells us with 99% confidence the mean µ lies in this interval. The smaller the confidence level (95% is smaller than 99%), the smaller the interval. Conversely the larger the confidence we require the larger the interval needs to be. Hence as illustrated in Example I there is a trade off between pinpointing the location of the mean and how much confidence we want in the interval. If we want to pin-point the mean, the interval should be smaller but then the confidence we have in that interval will be less. If we want more confidence that the mean lies in that interval, then the interval should be larger. But a larger interval is not very informative about the location of the mean. An extreme example is an interval which goes from minus infinity to plus infinity. The mean is definitely inside this interval (100% confidence), but it s not very informative about the location of the true mean! Example II Consider the example above. (i) In the case that the standard deviation is σ = 1, then the 95% CI is 2 3

[3 1.96,3 + 1.96]. (i) In the case that the standard deviation is σ = 100, then the 95% CI is [3 196,3 + 196]. The larger the variance the wider the CI. When the variance is large we need a larger interval to ensure that it includes the unknown mean µ. Note that the interval [X 1.96 σ, X + 1.96 σ], tells us that for every 100 draws of the random variable X, the mean µ should lie in this interval approximately 95 times. Elections on planet Frog In the local newspaper I found this; A poll of likely voters put candidate Smith in second place with 21%... Below the article I found this; The telephone survey of 500 likely voters was conducted by Ramissen Reports. The margin of error of sampling for the survey is +/ 4.5% percentage points at the midpoint with 95% confidence. What is the survey saying? What does +/ 4.5% tell us? 4 5 Interpretation of poll report The total number of people in the population who will vote for Smith is unknown, let us call it π, we use the sample to estimate it. 21% is calculated by dividing the number of people who said they will vote for Smith by the total number of people interviewed. 21% is used as an estimator of the population proportion π. Of course the population proportion does not equal 21%, but the true proportion may lie in some interval about 21%. The newspaper says that with 95% confidence the mean lies in the interval [21 4.5,21 + 4.5] = [16.5,25.5]. The way this interval is calculated is by assuming the sample proportion is approximately normally distributed. It can be shown that this proportion is close to normal. In the telephone poll, they asked each person whether they would vote for Smith. The outcome could be either {yes (indicated by one) or no (indicated by a zero)}. Therefore, the response of a person is a binary random variable X i. The probability of a yes will be the true population proportion π (the number of people in the population who will vote for yes divided the total number of people in the population) which we are trying to estimate. Remember that P(X i = 1) = π. The number of people out of the 500 people sampled in the telephone survey who say they will vote for Smith is S 500 = X 1 + X 2 +... + X 500, this is a Binomial random variable with S 500 Bin(500, π). 6 7

There the estimator of π is the proportion estimator ˆπ = S n /500. The sample mean Remember that we showed in lecture 7 that under certain conditions S n (the number of people who said they would vote for X in the telephone poll) is close to a normal distributed. Hence the proportion ˆπ = S n /n is close to normal, which means that it is okay to assume normality of the proportion estimator, and the CI makes sense. This argument is true for any general average not just the proportion, we show why now. In statistics it is often of interest to estimate the population mean. But we usually only have a sample from the population. We can evaluate the sample mean. This raises several questions; How close is the sample mean to the population mean. How good an estimator is the sample mean? The confidence interval gives us information about location of the population mean and the estimator s accuracy. 8 9 The sample mean is random too The sample X 1,..., X n are random variables with mean µ and variance σ 2. The sample mean X = 1 n n i=1 X i, is also a random variable. The sample mean is an estimator of the population mean µ. Generally, it will never give spot on the population mean, but an interval constructed about the sample mean may contain the population mean. In the same way everytime I draw a sample it will be different, each time I calculate the sample mean from the sample I get a different sample mean. Since X 1,..., X n are random variables with a distribution, then the sample mean is also a random variable with a distribution. It raises several questions? What is the standard deviation of this estimator (this is the standard error)? How does the standard error of the estimator relate to the standard deviation of the original distribution? What is the distribution of the sample mean (ie. if I draw all possible size n samples from a population and made a density plot based on all these samples what would it look like)? The variance of the estimator is σ 2 /n, the standard error is σ/ n (see how it relates the variance of the original distribution). In fact if n is large, it can be shown that the distribution is approximately normal (this is the central limit theorem). Armed with these two facts we can construct confidence intervals. 10 11

The central limit theorem: The central limit theorem Suppose X 1,..., X n is a sample from a population with mean µ and variance σ 2. If the sample size n is large, then the sample mean X = 1 n n X i, i=1 (approximately) has the distribution X N(µ, σ2 n ). 12 This means that the random variable X i has mean µ and variance σ 2 (that is the height of a randomly chosen person has mean height µ and variance σ 2 ), then the average taken of a sample of n individuals, let us call this X, has mean µ and variance σ 2 /n. Observe how the variance does from σ 2 (variance of individual) to σ 2 /n variance of average of n people. Look at CLT lecture12.pdf pages 1-3. Look at http://onlinestatbook.com/stat sim/sampling dist/index.htm see how the distribution of the sample mean becomes more normal as we increase the sample size - regardless of the underlying distribution of the population is normal or not. 13 How large is large? How large, is large, is a difficult question, and varies from data to data. But rule of thumb is that is about 30. Also notice how the standard deviation (the stretch in the histogram) gets narrower as you increase the sample size from 5 to 25. If the data is highly non-normal (you can check this by making a QQplot), more observations are required for the sample mean to be normal. Look at http://onlinestatbook.com/stat sim/sampling dist/index.htm Select your underlying population distribution - it can be normal, or highly non-normal like a skew or a uniform. Choose the sample size from 5-25 (this is the number in the samples). Select 1000 samples, and make plots of the histogram. You will see that for sample size 25 all the sample means histograms look quite normal. But for smaller sample sizes, the histogram of the sample means look less normal. 14 15

The larger sample, the smaller the variance To recollect suppose X i is a random variable with mean µ and variance σ 2. We have a sample X 1,..., X n. X is a random variable. If the sample size n is large we have, X N(µ, σ 2 n ). What else do we notice about the variance? As the sample size n gets larger the variance gets smaller. This is what we would expect, the larger the sample size, the more accurate is the the sample mean estimator of the population mean. Calculate the standard errors Suppose that X 1,..., X n is a random sample. It has a mean µ and variance 6. What is a suitable estimator of µ? What is the variance of estimator when n = 5? What is the variance of estimator when n = 25? What is the variance of estimator when n = 60? What is the standard error of estimator when n = 5? What is the standard error of estimator when n = 25? What is the standard error of estimator when n = 60? 16 17 Can we construct confidence intervals for the mean? We have the random sample X 1, X 2,..., X n. We do not know the distribution it was drawn from. But we do know that the variance is σ 2 = 4, but the mean µ is unknown. What is a suitable estimator of µ? Is it possible to construct a 95% CI for µ? Look at CLT lecture12.pdf pages 4-5. Example Let us suppose that we observe the random sample X 1, X 2,..., X n, the sample mean is X = 6. We know that the population variance is 4. Under what assumptions can be construct a 95% CI for the population mean µ, if these assumptions are satisfied, construct a 95% CI. 18