Data Analysis and Statistical Methods Statistics 651

Similar documents
Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Confidence Intervals Introduction

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

6.1, 7.1 Estimating with confidence (CIS: Chapter 10)

STAT Chapter 7: Confidence Intervals

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

1 Inferential Statistic

Data Analysis and Statistical Methods Statistics 651

Chapter 8 Statistical Intervals for a Single Sample

Descriptive Statistics (Devore Chapter One)

Estimation and Confidence Intervals

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

8.1 Estimation of the Mean and Proportion

Statistics 13 Elementary Statistics

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Chapter Seven: Confidence Intervals and Sample Size

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

CHAPTER 8 Estimating with Confidence

Section 7-2 Estimating a Population Proportion

Uniform Probability Distribution. Continuous Random Variables &

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

5.3 Interval Estimation

Confidence Intervals and Sample Size

Sampling and sampling distribution

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Data Analysis and Statistical Methods Statistics 651

Lecture 9 - Sampling Distributions and the CLT

Statistics 511 Supplemental Materials

Lecture 2 INTERVAL ESTIMATION II

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

MATH 264 Problem Homework I

1. Variability in estimates and CLT

Chapter 7 Study Guide: The Central Limit Theorem

Chapter 5. Sampling Distributions

7 THE CENTRAL LIMIT THEOREM

Elementary Statistics

Numerical Descriptive Measures. Measures of Center: Mean and Median

Statistical Methods in Practice STAT/MATH 3379

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

Statistics, Their Distributions, and the Central Limit Theorem

Symmetric Game. In animal behaviour a typical realization involves two parents balancing their individual investment in the common

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

ECON 214 Elements of Statistics for Economists 2016/2017

Chapter 9. Sampling Distributions. A sampling distribution is created by, as the name suggests, sampling.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 9. Probability Distributions. Outline. Outline

Chapter 7. Sampling Distributions

Simple Random Sampling. Sampling Distribution

Lecture 9. Probability Distributions

Estimation Y 3. Confidence intervals I, Feb 11,

Chapter 4: Estimation

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Expected Value of a Random Variable

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 5. Statistical inference for Parametric Models

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Statistics Class 15 3/21/2012

5.1 Mean, Median, & Mode

1. Confidence Intervals (cont.)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Central Limit Theorem

Chapter 9 & 10. Multiple Choice.

Sampling Distributions

ECE 295: Lecture 03 Estimation and Confidence Interval

Lecture 22. Survey Sampling: an Overview

Statistics for Managers Using Microsoft Excel 7 th Edition

Statistics for Business and Economics

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Estimation and Confidence Intervals

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

CHAPTER 5 SAMPLING DISTRIBUTIONS

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Chapter 7 Sampling Distributions and Point Estimation of Parameters

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

The Two-Sample Independent Sample t Test

χ 2 distributions and confidence intervals for population variance

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

ECON 214 Elements of Statistics for Economists

A point estimate is a single value (statistic) used to estimate a population value (parameter).

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Math 140 Introductory Statistics

Back to estimators...

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

Transcription:

Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 13 (MWF) Designing the experiment: Margin of Error Suhasini Subba Rao

Terminology: The population and sample mean The population mean µ is the mean of the entire population. population can be (and often is) infinite. The Suppose that X 1,..., X n are numbers drawn from the population. The sample mean is the average of X 1,..., X n. 1

Terminology: Standard deviations and errors The standard deviation is a measure of variation/spread of a variable (in the population). This is typically denoted as σ. See Lecture 4. The standard error is a measure of variation/spread of the sample mean. The standard error of the sample mean is σ n. See Lecture 12. Usually, σ is unknown. To get some idea of the spread, we estimate it 2

from the sample {X i } n i=1 s = 1 n 1 n (X i X) 2. i=1 using the formula Lecture 13 (MWF) Confidence intervals and Margin of Error We call s the sample standard deviation. It is an estimator of the standard deviation σ. Usually s σ. Often s < σ (especially, when the sample size is not large). Since σ is usually unknown, the standard error σ/ n is usually unknown. Instead we estimate it using the sample standard error is s n. 3

Review of previous lecture A student did several experiments (measuring the size of some cells), due to the variability in conditions the numbers she got varied from from experiment to experiment. Here is a summary of her data: 0.025, 0.025, 0.057, 0.064, 0.054, 0.035, 0.047, 0.059, 0.045. (see http://www.stat.tamu.edu/~suhasini/teaching651/msa.txt). The raw data does not convey much information. One is interested in understanding a feature in the population it came from. The mean, µ, is one such feature. It informs us about the center. The sample mean of this data set is X = 0.046. It is very unlikely that the population mean is equal to 0.046, what to do? 4

Without using that X has a distribution there is nothing we can do. But since we know that 0.046 is a number from the distribution of all sample means (of sample 5 drawn from the distribution of measurements), then we know that 0.046 is likely to be be within a few standard errors (we use standard error for standard deviation of the sample mean) of the unknown population mean (the truth). This means that the unknown population mean is likely to be within a few standard errors of the 0.046. We can make likely and few precise if the distribution of the sample mean is normal. Then we know that with a very large confidence, indeed 99.6% (think chance, but this is not strictly correct), the population 5

mean is within 3 standard errors of 0.046: Lecture 13 (MWF) Confidence intervals and Margin of Error [0.046 3 standard error, 0.046 + 3 standard error]. We are usually willing to drop the level of confidence to reduce the length of the confidence interval. The standard error in this example is standard deviation. σ 9, where σ is the population However, the population standard deviation is unknown. But it can be estimated from the data using the formula: s = 1 7 ([0.025 0.046]2 +... + [0.045 0.046] 2 ) = 0.0145. 6

We replace the unknown population standard deviation with the sample standard deviation. However, we need to correct for the fact that it is an estimate - we cover this in lecture 14. Using this correction we can: Evaluate probabilities. Construct confidence intervals. To do statistical tests (later on). 7

Margin of Error According to a recent survey, Americans walk on average 40 miles a week with a margin of error of 2.5 miles. What does this mean in terms of confidence intervals? 40 miles corresponds to the average number of miles walked in the sample, the margin of error is the plus and minus in the confidence interval. In other words for a 95% confidence interval Margin of Error = 1.96 σ n. The smaller the margin of error the more precisely we can pin point the population mean. Of course it is worth bearing in mind that we can 8

never be sure that our confidence interval contains the mean, which is why we prescribe a level (such as a 95%) to the interval. In other words, we can never be sure that the population mean is within the prescribed margin of error of the sample mean. 9

Relationships: Sample size and MoE We compare the 95% confidence intervals for n = 9 and n = 25. We see n=9 n=25 [ [ X 1.96 σ 9, X + 1.96 σ 9 ] X 1.96 σ, X + 1.96 σ ] 25 25 What are the lengths of the above intervals? For n=9 the margin of error is 1.96 σ 9. For n=25 the margin of error is 1.96 σ 25. Observe that the length and margin of error does not depend on X. 10

Example: X = 10.38, σ = 33. Lecture 13 (MWF) Confidence intervals and Margin of Error n=9 [10.38 1.96 33 33, 10.38 + 1.96 ] = [6.63, 14.13] 9 9 MoE=3.75 33 33 n=25 [10.38 1.96, 10.38 + 1.96 ] = [8.12, 12.63] MoE = 2.255 25 25 The second interval has a smaller margin of error. When the sample size is large the estimator tends to be closer to the true parameter. Thus the confidence interval will be narrower; since margin of error is smaller. 11

Relationships: standard deviation and MoE We see that the variability in the sample measured by the standard deviation σ will have an impact of the reliability of an estimator and it s margin of error. Example: Suppose X = 10.38, n = 9, but the variability in the two populations are different: σ = 5.7 σ = 10 [ 10.38 1.96 5.7 9, 10.38 + 1.96 5.7 9 ] = [6.63, 14.13], MoE = 3.75 [ 10.38 1.96 10 9, 10.38 + 1.96 10 9 ] = [3.38, 16.91], MoE = 6.8 The more variablility within the population (as measure by the standard 12

deviation) the more variability in the sample mean (as measure by the standard error). The only way we can compensate for this variability is to use a larger sample size (recall that the standard error is σ/ n). 13

How large an interval to use? You read in a newspaper that the proportion of the public that support same-sex marriage is 55% ± 15%. This means a survey was done, the proportion in the survey who said they supported same-sex marriage was 55% and the confidence interval for the population proportion is [55 15, 55 + 15]% = [40, 70]%. This is an extremely large interval, it is so wide, that it is uninformative about the majority opinion of the public. The reason it is too wide is that the sample size is too small. This experiment was not designed well. 14

Typically, before data is calculated, we need to decide how large a sample to collect. This is usually done by deciding how much above and below the estimator is acceptable. For example, an interval of the type [55-3,55+3]% = [52,58]% tells us that the majority appear to support same-sex marriage. The 3% is is the margin of error. Given a margin of error we can then determine the sample size to collect. 15

Choosing the sample size for estimating µ In an ideal world we would have a very large sample size. A large sample size gives a small standard error, which in turn gives a a narrow confidence interval and a smaller margin of error. However, obtaining very large samples can be impossible for many different reasons: To have a very large sample size would be nice, but often it can be too costly or infeasible. A sample size which is too small is not informative. How can one determine the number of observations to be included in a sample? How to choose the sample size n? Answer: Usually we have a margin of error in mind. We can accept the 16

reliability of a estimator up to a certain margin or error. Once we know what margin of error is acceptable we can then choose the sample size. 17

Formula for choosing the sample size To choose the sample size according to the margin of error, we need to know (or guess apriori) the standard deviation σ (if we don t know what it is, then we err on the cautious side and use a value that seems reasonable but large). We recall that in the confidence interval: [ X 1.96 σ n, X + 1.96 σ n ] the margin of error is MoE = 1.96 σ n. Therefore, if we want to choose the sample size such that the margin of 18

error for a given E we need to solve for Lecture 13 (MWF) Confidence intervals and Margin of Error MoE = 1.96 σ n solving for n gives n = ( ) 2 1.96σ. E 19

Example: Suppose we guess that σ = 3 and we want the margin of error MoE = 0.25. The confidence interval is [ X 1.96 3 n, X + 1.96 ] 3 n and we solve 1.96 3 n = 0.25. This gives n = ( 2 1.96 3) 2 = 184.4 0.5 Of course, a larger value of n will give a smaller margin of error, so we round up and use n = 185. In other words, for this experiment we need to choose a sample size of at least 185 to be sure that the margin error is at most 0.25. 20

General CIs and tolerable error How should we choose n for the 99% confidence interval? If we want to use a 99% CI, we first look up 0.5% in the z-tables, z 0.5% = 2.57. Then we need to solve MoE = 2.57 σ n n = ( ) 2 2.57 σ. MoE In general, for the (100 α)% CI use the formula MoE = z α/2 σ n and solve it to give n = ( ) 2 zα/2 σ. MoE 21

Example 4: Heights Researchers want to estimate the mean height of students at a university (in meters) with a margin of error of 0.04 (using a 95% CI level). The sample standard deviation from a small sample taken previously is 0.113. How many students must they sample to achieve their specifications? Solution 4: Since the true population standard deviation is unknown, they use the sample standard deviation in the calculation. Use the formula. E= 0.04. Using the formula we have n = (1.96)2 (0.113) 2 (0.04) 2 = 30.65. 22

Hence they must sample 31 people such that a 95% confidence interval has length 2 0.04 = 0.08. 23

Example 5: Caffine content The caffine content is coffee is being analysed and it is known that standard deviation of a randomly selected coffee is 7.1mg. Suppose 100 cups of coffee are analysed, and the total weight of caffine in all the cups is 100 i=1 X i = 11000mg, construct a 95% CI for the mean caffine content. Construct an 80% CI for the mean caffine. Find the minimum number of coffees which must be analysed for the 80% CI to have MoE 0.45mg? Solution 5: The total weight of caffine for the 100 cups is 11000 mg. Therefore the sample average of caffine per cup is x = 11000/100 = 110. 24

Calculating the 95% CI (use the formala or calculate yourself): z 0.05/2 = z 0.025 = 1.96, n = 100, σ = 7.1 and X = 110. The CI is: [ 110 1.96 7.1 100, 110 + 1.96 7.1 100 ] = [108.6, 111.4]. To construct an 80% CI only one thing has to change, that is we only have to replace the 1.96 above with another number. To find this value go to the normal table an look inside it for 0.1, you should see 1.28. Replace 1.96 with 1.28 to give [ 110 1.28 7.1 100, 110 + 1.28 7.1 100 ] = [109.1, 110.9]. 25

If we want the MoE to have length 0.45, then the interval [ X 1.28 7.1, X + 1.28 7.1 ] n n must have length 1. This means that MoE = 0.5 = 1.28 7.1 n. Solve this (or use the formula) to give n = ( ) 2 1.28 7.1 = 400. 0.45 Hence we need to sample at least 400 cups to obtain a margin of error which is 0.45 (half of what existed previously). 26

Example 6 How large a sample size do we require such that the margin of error for a 95% confidence interval for the mean of human heights is maximum 0.25 inch. The standard deviation is unknown, but it is believed that σ lies somewhere between 2-5 inches. Why this question matters: In general the standard deviation will be unknown. But we can guess limits on how large or small it is based on own expertize. 27

Solution 6 The more variable the data the larger the confidence interval. Therefore, when given a range of standard deviations and our aim is that the margin of error should be no larger than 0.25 (i.e. 0.25 or less), then we need to use the largest standard deviation in the given range in the calculation In other words n = ( ) 2 1.96 σlargest = 0.2 ( ) 2 1.96 5 = 1537. 0.2 For any other σ < 5, using n = 1537 will lead to a Margin of Error which is less than 0.25. To see why, recall MoE = 1.96 σ n = 1.96 σ ( 1.96 5 0.2 ) = 0.25 σ 5. 28

Thus we see that if the true σ < 5, then the MoE will be less than 0.25, since σ/5 < 1. If we use σ = 2 in the margin of error calculation, then n = 246. However, if the true σ > 2 using n = 246 will lead to a Margin of Error which is larger than 0.25. 29

Margin of Error calculations using software There are various software tools on the web that will do margin of error calculations. For example, https://www.emathhelp.net/calculators/probability-statistics/margin-of-error-calculator/ Here is one by survey monkey https://www.surveymonkey.com/mp/margin-of-error-calculator/. This calculator is specifically designed for calculating the MoE of proportions; where the standard deviation need not be specified. Here is another one http://www.raosoft.com/samplesize.html, which which can give smaller sample sizes if a proportion is specified. We cover this later on in the course. The calculations done in class assume that the population size is infinite (or that the sample is a SRS, that is the same person is sampled again). 30

However, when surveying certain populations the population size will be finite. Therefore, some calculators will also ask for the population size. Using the finite population size they make what is called a finite sample correction. You can read more about it here: https://en.wikipedia.org/wiki/margin_of_error#effect_of_population_size. 31

Example 7 A confidence interval for the length of parrots is [4,10] inches. It is based on a sample size n. By what factor should the sample size increase such that the margin of error reduces to 1? 32

Solution 7 The original margin of error is 3. Thus 1.96 σ/ n = 3. We want to increase the sample size such that it decreases to 1. 1.96 1.96 σ Factor n = 1 σ Factor n = 3 Factor = 1. Solving for this we see that we need to increase the sample size by factor 9 in order to decrease the margin of error by a factor 3. An extremely large increase in sample size has to be made for a moderate reduction in margin of error. 33