Lecture 9 - Sampling Distributions and the CLT

Similar documents
Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Lecture 8 - Sampling Distributions and the CLT

1. Variability in estimates and CLT

Chapter 7: Point Estimation and Sampling Distributions

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Lecture 10 - Confidence Intervals for Sample Means

Confidence Intervals Introduction

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

STAT Chapter 7: Confidence Intervals

Lecture 5 - Continuous Distributions

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Business Statistics 41000: Probability 4

Midterm Exam III Review

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 5. Statistical inference for Parametric Models

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Statistical Intervals (One sample) (Chs )

Statistics and Probability

Data Analysis and Statistical Methods Statistics 651

Chapter 8 Statistical Intervals for a Single Sample

Statistical analysis and bootstrapping

Unit2: Probabilityanddistributions. 3. Normal distribution

Chapter 5: Statistical Inference (in General)

STA Module 3B Discrete Random Variables

Introduction to Business Statistics QM 120 Chapter 6

STAT Chapter 7: Central Limit Theorem

Sampling and sampling distribution

Part V - Chance Variability

6 Central Limit Theorem. (Chs 6.4, 6.5)

4.3 Normal distribution

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Lecture 6: Normal distribution

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Section 0: Introduction and Review of Basic Concepts

Chapter 5. Sampling Distributions

STAT 241/251 - Chapter 7: Central Limit Theorem

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Introduction to Statistical Data Analysis II

8.1 Estimation of the Mean and Proportion

MATH 3200 Exam 3 Dr. Syring

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

5.3 Statistics and Their Distributions

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Module 4: Probability

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

1 Inferential Statistic

Chapter 8: Sampling distributions of estimators Sections

Business Statistics 41000: Probability 3

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Statistical Methods in Practice STAT/MATH 3379

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

Estimation Y 3. Confidence intervals I, Feb 11,

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Data Analysis and Statistical Methods Statistics 651

STAT Chapter 6: Sampling Distributions

Sampling Distribution

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Chapter 16. Random Variables. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statistics, Their Distributions, and the Central Limit Theorem

BIO5312 Biostatistics Lecture 5: Estimations

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Law of Large Numbers, Central Limit Theorem

WebAssign Math 3680 Homework 5 Devore Fall 2013 (Homework)

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Section 1.4: Learning from data

STA215 Confidence Intervals for Proportions

Confidence Intervals and Sample Size

Lecture 2. Probability Distributions Theophanis Tsandilas

Introduction to Statistics I

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

χ 2 distributions and confidence intervals for population variance

Section The Sampling Distribution of a Sample Mean

1 Small Sample CI for a Population Mean µ

Announcements. Unit 2: Probability and distributions Lecture 3: Normal distribution. Normal distribution. Heights of males

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Probability. An intro for calculus students P= Figure 1: A normal integral

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter Seven: Confidence Intervals and Sample Size

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 8 Estimation

Homework Assignments

AP Statistics Chapter 6 - Random Variables

Mean of a Discrete Random variable. Suppose that X is a discrete random variable whose distribution is : :

A useful modeling tricks.

Estimation and Confidence Intervals

Lecture 9. Probability Distributions. Outline. Outline

Transcription:

Lecture 9 - Sampling Distributions and the CLT Sta102/BME102 Colin Rundel September 23, 2015

1 Variability of Estimates Activity Sampling distributions - via simulation Sampling distributions - via CLT 2 Confidence intervals Why do we report confidence intervals? Constructing a confidence interval A more accurate interval Sta102/BME102

Mean Sample mean ( X ): Population mean (µ): X = 1 n (x 1 + x 2 + x 3 + + x n ) = 1 n n i=1 x i µ = 1 N (x 1 + x 2 + x 3 + + x N ) = 1 N The sample mean ( X ) is a point estimate of the population mean (µ) - the estimate may not be perfect, but if the sample is good (representative of the population) it should be close - today we will discuss how close. N i=1 x i Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 2 / 29

Variance Sample Variance (s 2 ) Population Variance (σ 2 ) - s 2 = 1 n 1 σ 2 = 1 N n (x i X ) 2 i=1 N (x i µ) 2 i=1 Similarly, the sample variance (s 2 ) is a point estimate of the population variance (σ 2 ). For a decent sample, this should also be close to the population variance. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 3 / 29

Parameter estimation We are usually interested in population parameters. Since full populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. Sample statistics vary from sample to sample. Quantifying how much sample statistics vary provides a way to estimate the margin of error associated with our point estimates. First we will look at how much point estimates vary from sample to sample. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 4 / 29

Activity Estimate the avg. # of drinks it takes to get drunk We would like to estimate the average (self reported from students in a Duke Statistics class) number of drinks it takes a person get drunk, we will assume that this is population data: Number of drinks to get drunk 0 5 10 15 20 25 0 2 4 6 8 10 µ = 5.39 σ = 2.37 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 5 / 29

Activity Activity Use RStudio to generate 10 random numbers between 1 and 146 (with replacement) sample(1:146, size = 10, replace = TRUE) If you don t have a computer, ask a neighbor to generate a sample for you. Using the handout find the 10 data points associated with your sampled values then Calculate the sample mean of these 10 values Round this mean to 1 decimal place Report it using http://bit.ly/sta102_clt Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 6 / 29

Activity sample(1:146, size = 10, replace = TRUE) ## [1] 17 91 89 92 126 94 2 34 98 76 1 7 21 6 41 6 61 10 81 6 101 4 121 6 141 4 2 5 22 2 42 10 62 7 82 5 102 7 122 5 142 6 3 4 23 6 43 3 63 4 83 6 103 6 123 3 143 6 4 4 24 7 44 6 64 5 84 8 104 8 124 2 144 4 5 6 25 3 45 10 65 6 85 4 105 3 125 2 145 5 6 2 26 6 46 4 66 6 86 10 106 6 126 5 146 5 7 3 27 5 47 3 67 6 87 5 107 2 127 10 8 5 28 8 48 3 68 7 88 10 108 5 128 4 9 5 29 0 49 6 69 7 89 8 109 1 129 1 10 6 30 8 50 8 70 5 90 5 110 5 130 4 11 1 31 5 51 8 71 10 91 4 111 5 131 10 12 10 32 9 52 8 72 3 92 0.5 112 4 132 8 13 4 33 7 53 2 73 5.5 93 3 113 4 133 10 14 4 34 5 54 4 74 7 94 3 114 9 134 6 15 6 35 5 55 8 75 10 95 5 115 4 135 6 16 3 36 7 56 3 76 6 96 6 116 3 136 6 17 10 37 4 57 5 77 6 97 4 117 3 137 7 18 8 38 0 58 5 78 5 98 4 118 4 138 3 19 5 39 4 59 8 79 4 99 2 119 4 139 10 20 10 40 3 60 4 80 5 100 5 120 8 140 4 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 7 / 29

Activity Sampling distribution What we just constructed is called a sampling distribution - it is an empirical distribution of sample statistics ( X in this case). Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 8 / 29

Activity Sampling distribution What we just constructed is called a sampling distribution - it is an empirical distribution of sample statistics ( X in this case). What is the shape and center of this distribution? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 8 / 29

Activity Increasing number of samples If we increase the number of X s we calculated to 1000 the sampling distribution looks like the following: Histogram of means means Frequency 3 4 5 6 7 8 0 50 100 150 200 250 3 2 1 0 1 2 3 3 4 5 6 7 Normal Q Q Plot Theoretical Quantiles Sample Quantiles Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 9 / 29

Activity Increasing number of samples If we increase the number of X s we calculated to 1000 the sampling distribution looks like the following: Histogram of means means Frequency 3 4 5 6 7 8 0 50 100 150 200 250 3 2 1 0 1 2 3 3 4 5 6 7 Normal Q Q Plot Theoretical Quantiles Sample Quantiles avg( X ) = 5.4 SD( X ) = 0.74 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 9 / 29

Sampling distributions - via simulation Average number of Duke games attended Next let s look at the population data for the number of basketball games attended by a class of Duke students: Frequency 0 50 100 150 0 10 20 30 40 50 60 70 number of Duke games attended Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 10 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? 0 5 10 15 20 sample means from samples of n = 10 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 11 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Sample mean, X, of samples of size n = 10. 0 5 10 15 20 sample means from samples of n = 10 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 11 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Sample mean, X, of samples of size n = 10. Is the variability of the sampling distribution smaller or larger than the variability of the population distribution? Why? 0 5 10 15 20 sample means from samples of n = 10 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 11 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Frequency Sampling distribution, n = 10: 0 500 1000 1500 2000 What does each observation in this distribution represent? Sample mean, X, of samples of size n = 10. Is the variability of the sampling distribution smaller or larger than the variability of the population distribution? Why? Smaller, sample means will vary less than individual observations. 0 5 10 15 20 sample means from samples of n = 10 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 11 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: Frequency 0 200 400 600 800 How did the shape, center, and spread of the sampling distribution change going from n = 10 to n = 30? 2 4 6 8 10 sample means from samples of n = 30 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 12 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: Frequency 0 200 400 600 800 How did the shape, center, and spread of the sampling distribution change going from n = 10 to n = 30? Shape is more symmetric, center is about the same, spread is smaller. 2 4 6 8 10 sample means from samples of n = 30 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 12 / 29

Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 70: Frequency 0 200 400 600 800 1000 1200 3 4 5 6 7 8 9 sample means from samples of n = 70 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 13 / 29

Sampling distributions - via CLT Sums of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define S n = X 1 + X 2 + + X n then what is expected value and variance of S n? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 14 / 29

Sampling distributions - via CLT Sums of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define S n = X 1 + X 2 + + X n then what is expected value and variance of S n? E(S n ) = E(X 1 + X 2 + + X n ) = E(X 1 ) + E(X 2 ) + + E(X n ) = µ + µ + + µ = nµ Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 14 / 29

Sampling distributions - via CLT Sums of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define S n = X 1 + X 2 + + X n then what is expected value and variance of S n? E(S n ) = E(X 1 + X 2 + + X n ) = E(X 1 ) + E(X 2 ) + + E(X n ) = µ + µ + + µ = nµ Var(S n ) = Var(X 1 + X 2 + + X n ) = Var(X 1 ) + Var(X 2 ) + + Var(X n ) = σ 2 + σ 2 + + σ 2 = nσ 2 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 14 / 29

Sampling distributions - via CLT Average of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define X n = (X 1 + X 2 + + X n )/n = S n /n then what is the expected value and variance of X n? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 15 / 29

Sampling distributions - via CLT Average of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define X n = (X 1 + X 2 + + X n )/n = S n /n then what is the expected value and variance of X n? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 15 / 29

Sampling distributions - via CLT Average of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define X n = (X 1 + X 2 + + X n )/n = S n /n then what is the expected value and variance of X n? E( X n ) = E(S n /n) = E(S n )/n = µ Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 15 / 29

Sampling distributions - via CLT Average of iid Random Variables Let X 1, X 2,, X n iid D where D is some probability distribution with E(X i ) = µ and Var(X i ) = σ 2. If we define X n = (X 1 + X 2 + + X n )/n = S n /n then what is the expected value and variance of X n? E( X n ) = E(S n /n) = E(S n )/n = µ Var( X n ) = Var(S n /n) = 1 n 2 Var(S n) = nσ2 n 2 = σ2 n Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 15 / 29

Sampling distributions - via CLT Central Limit Theorem Central limit theorem - sum of iid RVs (S n ) The distribution of the sum of n independent and identically distributed random variables X is approximately normal when n is large. S n N ( µ = n E(X ), σ 2 = n Var(X ) ) Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 16 / 29

Sampling distributions - via CLT Central Limit Theorem Central limit theorem - sum of iid RVs (S n ) The distribution of the sum of n independent and identically distributed random variables X is approximately normal when n is large. S n N ( µ = n E(X ), σ 2 = n Var(X ) ) Central limit theorem - avergae of iid RVs ( X ) The distribution of the average of n independent and identically distributed random variables X is approximately normal when n is large. X N ( µ = E(X ), σ 2 = Var(X )/n ) Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 16 / 29

Sampling distributions - via CLT CLT - Conditions Certain conditions must be met for the CLT to apply: 1 Independence: Sampled observations must be independent and identically distributed. This is difficult to verify, but is usually reasonable if random sampling/assignment is used, and n < 10% of the population. 2 Sample size/skew: the population distribution must be nearly normal or the sample size must be large (the less normal the population distribution, the larger the sample size needs to be). This is also difficult to verify for the population, but we can check it using the sample data, and assume that the sample distribution is similar to the population distribution. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 17 / 29

Sampling distributions - via CLT CLT - Simulation http://bit.ly/clt_mean Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 18 / 29

Sampling distributions - via CLT Review To the right is a plot of a population distribution. Match each of the following descriptions to one of the three plots below. 1 a single random sample of 100 observations from this population 2 a distribution of 100 sample means from random samples with size 7 3 a distribution of 100 sample means from random samples with size 49 Population µ = 10 σ = 7 0 10 20 30 40 50 30 25 20 15 10 5 0 4 6 8 10 12 14 16 18 Plot A 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 Plot B 20 15 10 5 0 8 9 10 11 12 Plot C Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 19 / 29

Sampling distributions - via CLT Review To the right is a plot of a population distribution. Match each of the following descriptions to one of the three plots below. 1 a single random sample of 100 observations from this population 2 a distribution of 100 sample means from random samples with size 7 3 a distribution of 100 sample means from random samples with size 49 Population µ = 10 σ = 7 0 10 20 30 40 50 30 25 20 15 10 5 0 4 6 8 10 12 14 16 18 Plot A 30 25 20 20 15 15 10 10 5 5 0 0 0 5 10 15 20 25 30 35 8 9 10 11 12 Plot (1) B Plot C Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 19 / 29

Sampling distributions - via CLT Review To the right is a plot of a population distribution. Match each of the following descriptions to one of the three plots below. 1 a single random sample of 100 observations from this population 2 a distribution of 100 sample means from random samples with size 7 3 a distribution of 100 sample means from random samples with size 49 Population µ = 10 σ = 7 0 10 20 30 40 50 30 25 20 15 10 5 0 30 25 20 15 10 5 0 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 35 8 9 10 11 12 Plot (2) A Plot (1) B Plot C 20 15 10 5 0 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 19 / 29

Sampling distributions - via CLT Review To the right is a plot of a population distribution. Match each of the following descriptions to one of the three plots below. 1 a single random sample of 100 observations from this population 2 a distribution of 100 sample means from random samples with size 7 3 a distribution of 100 sample means from random samples with size 49 Population µ = 10 σ = 7 0 10 20 30 40 50 30 25 20 15 10 5 0 30 25 20 15 10 5 0 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 35 8 9 10 11 12 Plot (2) A Plot (1) B Plot (3) C 20 15 10 5 0 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 19 / 29

Confidence intervals 1 Variability of Estimates Activity Sampling distributions - via simulation Sampling distributions - via CLT 2 Confidence intervals Why do we report confidence intervals? Constructing a confidence interval A more accurate interval Sta102/BME102

Confidence intervals Why do we report confidence intervals? Confidence intervals A plausible range of values for the population parameter is called a confidence interval. Using only a point estimate to estimate a parameter is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net. We can throw a spear where we saw a fish but we are more likely to miss. If we toss a net in that area, we have a better chance of catching the fish. If we report a point estimate, we probably will not hit the exact population parameter. If we report a range of plausible values a confidence interval we have a good shot at capturing the parameter. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 20 / 29

Confidence intervals Why do we report confidence intervals? Confidence intervals and the CLT We have a point estimate X for the population mean µ, but we want to design a net to have a reasonable chance of capturing µ. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 21 / 29

Confidence intervals Why do we report confidence intervals? Confidence intervals and the CLT We have a point estimate X for the population mean µ, but we want to design a net to have a reasonable chance of capturing µ. From the CLT we know that we can think of X as a sample from N(µ, σ/ n). Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 21 / 29

Confidence intervals Why do we report confidence intervals? Confidence intervals and the CLT We have a point estimate X for the population mean µ, but we want to design a net to have a reasonable chance of capturing µ. From the CLT we know that we can think of X as a sample from N(µ, σ/ n). Therefore, 96% of observed X s should be within 2 SEs (2σ/ n) of µ. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 21 / 29

Confidence intervals Why do we report confidence intervals? Confidence intervals and the CLT We have a point estimate X for the population mean µ, but we want to design a net to have a reasonable chance of capturing µ. From the CLT we know that we can think of X as a sample from N(µ, σ/ n). Therefore, 96% of observed X s should be within 2 SEs (2σ/ n) of µ. Clearly then for 96% of random samples from the population, µ must then be with in 2 SEs of X. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 21 / 29

Confidence intervals Why do we report confidence intervals? Confidence intervals and the CLT We have a point estimate X for the population mean µ, but we want to design a net to have a reasonable chance of capturing µ. From the CLT we know that we can think of X as a sample from N(µ, σ/ n). Therefore, 96% of observed X s should be within 2 SEs (2σ/ n) of µ. Clearly then for 96% of random samples from the population, µ must then be with in 2 SEs of X. Note that we are being very careful about the language here - the 96% here only applies to random samples in the abstract. Once we have actually taken a sample X will either be within 2 SEs or outside of 2 SEs of µ. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 21 / 29

Confidence intervals Constructing a confidence interval Example - Cardinals A transect was sampled 50 times by counting the number of cardinals seen when walking a 1 mile path in the Duke forest. The mean of these samples was 13.2. Estimate the true average number of cardinals along this path, assuming the population distribution is nearly normal with a population standard deviation of 1.74. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 22 / 29

Confidence intervals Constructing a confidence interval Example - Cardinals A transect was sampled 50 times by counting the number of cardinals seen when walking a 1 mile path in the Duke forest. The mean of these samples was 13.2. Estimate the true average number of cardinals along this path, assuming the population distribution is nearly normal with a population standard deviation of 1.74. The 96% confidence interval is defined as point estimate ± 2 SE Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 22 / 29

Confidence intervals Constructing a confidence interval Example - Cardinals A transect was sampled 50 times by counting the number of cardinals seen when walking a 1 mile path in the Duke forest. The mean of these samples was 13.2. Estimate the true average number of cardinals along this path, assuming the population distribution is nearly normal with a population standard deviation of 1.74. The 96% confidence interval is defined as point estimate ± 2 SE X = 13.2 σ = 1.74 SE = σ n = 1.74 50 = 0.25 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 22 / 29

Confidence intervals Constructing a confidence interval Example - Cardinals A transect was sampled 50 times by counting the number of cardinals seen when walking a 1 mile path in the Duke forest. The mean of these samples was 13.2. Estimate the true average number of cardinals along this path, assuming the population distribution is nearly normal with a population standard deviation of 1.74. The 96% confidence interval is defined as point estimate ± 2 SE X = 13.2 σ = 1.74 SE = σ n = 1.74 50 = 0.25 X ± 2 SE = 13.2 ± 2 0.25 = (13.2 0.5, 13.2 + 0.5) = (12.7, 13.7) Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 22 / 29

Confidence intervals Constructing a confidence interval Example - Cardinals A transect was sampled 50 times by counting the number of cardinals seen when walking a 1 mile path in the Duke forest. The mean of these samples was 13.2. Estimate the true average number of cardinals along this path, assuming the population distribution is nearly normal with a population standard deviation of 1.74. The 96% confidence interval is defined as point estimate ± 2 SE X = 13.2 σ = 1.74 SE = σ n = 1.74 50 = 0.25 X ± 2 SE = 13.2 ± 2 0.25 = (13.2 0.5, 13.2 + 0.5) = (12.7, 13.7) We are 96% confident that the true average number of cardinals on the transect is between 12.7 and 13.7. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 22 / 29

Confidence intervals Constructing a confidence interval What does 96% confident mean? Suppose we took many samples and built a confidence interval from each sample using the equation point estimate ± 2 SE. Then about 96% of those intervals would contain the true population mean (µ). The figure on the left shows this process with 25 samples, where 24 of the resulting confidence intervals contain the true average number of exclusive relationships, and one does not. µ = 3.207 It does not mean there is a 96% probability the CI contains the true value Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 23 / 29

Confidence intervals A more accurate interval A more accurate interval Confidence interval, a general formula point estimate ± CV SE Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 24 / 29

Confidence intervals A more accurate interval A more accurate interval Confidence interval, a general formula point estimate ± CV SE Conditions when the point estimate = X : 1 Independence: Observations in the sample must be independent random sample/assignment n < 10% of population 2 Normality: nearly normal population distribution 3 Population Variance: so far we ve assumed this is known, this is almost never true. We ll talk about a more general approach after the midterm. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 24 / 29

Confidence intervals A more accurate interval Changing the confidence level In general, point estimate ± CV SE In order to change the confidence level all we need to do is adjust the critical value in the above formula. Commonly used confidence levels in practice are 90%, 95%, 98%, and 99%. If the conditions for the CLT are met then, For a 95% confidence interval, CV = Z = 1.96. Using the Z table it is possible to find the appropriate Z for any desired confidence level. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 25 / 29

Confidence intervals A more accurate interval Example - Calculating Z What is the appropriate value for Z when calculating a 98% confidence interval? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 26 / 29

Confidence intervals A more accurate interval Example - Calculating Z What is the appropriate value for Z when calculating a 98% confidence interval? 0.98 z = -2.33 z = 2.33 0.01 0.01-3 -2-1 0 1 2 3 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 26 / 29

Confidence intervals A more accurate interval Width of an interval If we want to be very certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 27 / 29

Confidence intervals A more accurate interval Width of an interval If we want to be very certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? A wider interval. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 27 / 29

Confidence intervals A more accurate interval Width of an interval If we want to be very certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? A wider interval. Can you see any drawbacks to using a wider interval? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 27 / 29

Confidence intervals A more accurate interval Width of an interval If we want to be very certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? A wider interval. Can you see any drawbacks to using a wider interval? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 27 / 29

Confidence intervals A more accurate interval Width of an interval If we want to be very certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? A wider interval. Can you see any drawbacks to using a wider interval? If the interval is too wide it may not be very informative. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 27 / 29

Confidence intervals A more accurate interval Example - Sample Size Coca-Cola wants to estimate the per capita number of Coke products consumed each year in the United States, in order to properly forecast market demands they need their margin of error to be 5 items at the 95% confidence level. From previous years they know that σ 30. How many people should they survey to achieve the desired accuracy? What if the requirement was at the 99% confidence level? Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 28 / 29

Confidence intervals A more accurate interval Example - Sample Size Coca-Cola wants to estimate the per capita number of Coke products consumed each year in the United States, in order to properly forecast market demands they need their margin of error to be 5 items at the 95% confidence level. From previous years they know that σ 30. How many people should they survey to achieve the desired accuracy? What if the requirement was at the 99% confidence level? At the 95% and 99% confidence levels Z is 1.96 and 2.58 respectively. Therefore, Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 28 / 29

Confidence intervals A more accurate interval Example - Sample Size Coca-Cola wants to estimate the per capita number of Coke products consumed each year in the United States, in order to properly forecast market demands they need their margin of error to be 5 items at the 95% confidence level. From previous years they know that σ 30. How many people should they survey to achieve the desired accuracy? What if the requirement was at the 99% confidence level? At the 95% and 99% confidence levels Z is 1.96 and 2.58 respectively. Therefore, MoE = Z σ n = 5 n = Z σ 5 ( n = Z n 95 = n 99 = σ ) 2 5 ( 1.96 30 5 ( 2.58 30 5 ) 2 = 138.30 = 139 ) 2 = 239.63 = 240 Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 28 / 29

Confidence intervals A more accurate interval Common Misconceptions 1 The confidence level of a confidence interval is the probability that the interval contains the true population parameter. 2 A narrower confidence interval is always better. 3 A wider interval means less confidence. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 29 / 29

Confidence intervals A more accurate interval Common Misconceptions 1 The confidence level of a confidence interval is the probability that the interval contains the true population parameter. This is incorrect, CIs are part of the frequentist paradigm and as such the population parameter is fixed but unknown. Consequently, the probability any given CI contains the true value must be 0 or 1 (it does or does not). 2 A narrower confidence interval is always better. 3 A wider interval means less confidence. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 29 / 29

Confidence intervals A more accurate interval Common Misconceptions 1 The confidence level of a confidence interval is the probability that the interval contains the true population parameter. This is incorrect, CIs are part of the frequentist paradigm and as such the population parameter is fixed but unknown. Consequently, the probability any given CI contains the true value must be 0 or 1 (it does or does not). 2 A narrower confidence interval is always better. This is incorrect since the width is a function of both the confidence level and the standard error. 3 A wider interval means less confidence. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 29 / 29

Confidence intervals A more accurate interval Common Misconceptions 1 The confidence level of a confidence interval is the probability that the interval contains the true population parameter. This is incorrect, CIs are part of the frequentist paradigm and as such the population parameter is fixed but unknown. Consequently, the probability any given CI contains the true value must be 0 or 1 (it does or does not). 2 A narrower confidence interval is always better. This is incorrect since the width is a function of both the confidence level and the standard error. 3 A wider interval means less confidence. This is incorrect since it is possible to make very precise statements with very little confidence. Sta102/BME102 (Colin Rundel) Lec 9 September 23, 2015 29 / 29