Lecture 6: Confidence Intervals

Similar documents
Lecture 5: Sampling Distributions

Chapter 7. Sampling Distributions

Chapter 7 Sampling Distributions and Point Estimation of Parameters

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

8.1 Estimation of the Mean and Proportion

If the distribution of a random variable x is approximately normal, then

Module 4: Probability

Normal Probability Distributions

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 7 - Lecture 1 General concepts and criteria

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 5. Sampling Distributions

χ 2 distributions and confidence intervals for population variance

Business Statistics 41000: Probability 4

Homework: (Due Wed) Chapter 10: #5, 22, 42

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 9 & 10. Multiple Choice.

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Chapter 8. Introduction to Statistical Inference

The normal distribution is a theoretical model derived mathematically and not empirically.

Statistics for Managers Using Microsoft Excel 7 th Edition

Introduction to Statistics I

Data Analysis and Statistical Methods Statistics 651

Introduction to Statistical Data Analysis II

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Confidence Intervals. σ unknown, small samples The t-statistic /22

Continuous random variables

Chapter 8 Statistical Intervals for a Single Sample

Chapter 8 Estimation

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

STAT Chapter 7: Confidence Intervals

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Statistics and Probability

Lecture 2. Probability Distributions Theophanis Tsandilas

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Confidence Intervals Introduction

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Statistical Intervals (One sample) (Chs )

Learning Objectives for Ch. 7

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Statistics 13 Elementary Statistics

Chapter 4: Estimation

Chapter 7: Point Estimation and Sampling Distributions

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Simple Descriptive Statistics

Lecture 10 - Confidence Intervals for Sample Means

Statistics for Business and Economics

1. Variability in estimates and CLT

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter Seven: Confidence Intervals and Sample Size

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Chapter 4 Variability

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

STAT Chapter 6: Sampling Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Statistics for Business and Economics: Random Variables:Continuous

Simple Random Sampling. Sampling Distribution

MATH 3200 Exam 3 Dr. Syring

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

The Normal Probability Distribution

σ e, which will be large when prediction errors are Linear regression model

Chapter 6 Confidence Intervals

DATA SUMMARIZATION AND VISUALIZATION

Estimation Y 3. Confidence intervals I, Feb 11,

Lecture 2 INTERVAL ESTIMATION II

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Describing Data: One Quantitative Variable

STA215 Confidence Intervals for Proportions

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Sampling Distribution

Chapter 7. Sampling Distributions and the Central Limit Theorem

Elementary Statistics

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Chapter 9 Chapter Friday, June 4 th

Chapter 15: Sampling distributions

BIO5312 Biostatistics Lecture 5: Estimations

STATISTICS - CLUTCH CH.9: SAMPLING DISTRIBUTIONS: MEAN.

Sampling Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Elementary Statistics Lecture 5

ECON 214 Elements of Statistics for Economists 2016/2017

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

One sample z-test and t-test

Statistics Class 15 3/21/2012

Theoretical Foundations

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

1 Small Sample CI for a Population Mean µ

Lecture 6: Chapter 6

Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

ECON 214 Elements of Statistics for Economists

Transcription:

Lecture 6: Confidence Intervals Taeyong Park Washington University in St. Louis February 22, 2017 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 1 / 29

Today... Review of sampling distributions Answer key for problem set 1 on Blackboard Confidence interval Lab: central limit theorem; confidence interval Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 2 / 29

Roadmap The ultimate goal of this course: Conduct linear regression analysis using real-world data; Interprete the results; Present the results effectively using plots. Confidence intervals and hypothesis testing Interpretations Sampling distributions; standard error; central limit theorem Confidence intervals and hypothesis testing Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 3 / 29

Three types of distributions Population distribution Sample data distribution Sampling distribution Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 4 / 29

Sampling distribution Sampling Distributions A sampling distribution is the distribution of a statistic given repeated sampling. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 5 / 29

Sampling distribution: Example Population: American voters A multitude of polls (samples) Statistic: ex. proportion of respondents that voted for Trump; the mean age Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 6 / 29

Sampling distribution Standard error The standard deviation of sampling distribution. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 7 / 29

Central limit theorem For random sampling with a large sample size n, the sampling distribution of the sample mean y is approximately normal. The mean of the distribution is equal to population mean µ. The standard deviation of the distribution is equal to σ n. ) y N (µ, n σ Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 8 / 29

Central limit theorem: Some notes If Y is normal, the CLT applies for all n. Otherwise, you need a large enough sample. Usually n=30 is good enough, but it will depend on the distribution. As n, the standard error is going to get smaller and smaller. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 9 / 29

What is next? We are hunting population parameters [µ, σ]. What percentage of Americans approve of President Trump? What is the average age of Missourian people? We sample from the population and calculate sample statistics [y, S]. Today we are going to learn how to use sample statistics to estimate population parameters. How? Probability theory ) and sampling distributions. σ y N (µ, n This will be our first true statistical inference. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 10 / 29

Estimation basics Point estimation A sample statistic that gives a good guess about a population parameter. Example: Point estimation for population mean (ˆµ) y = 1 n y i n i=1 Example: Point estimate for population standard deviation (ˆσ) (yi y) 2 S = n 1 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 11 / 29

Estimation basics How do we choose among multiple possible estimators? Sample mean ȳ; sample median?; 1st quartile?; maximum number? We want our estimators to be: Unbiased (i.e., accurate), E(ˆµ) = µ with repeated sampling Efficient (i.e, precise), σˆµ is small(er) Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 12 / 29

Point estimates The point estimates for populations parameters µ and σ are: denoted ˆµ and ˆσ best estimated by y and S. They are best in terms of bias and efficiency. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 13 / 29

Point and interval estimates You are the campaign manager for a candidate who is deciding whether or not to publish a new deficit reduction proposal. You commission a poll of voters in the district to find out whether they approve or disapprove of this proposal. Which of the following statements would you find most useful from your pollster? 1 We can be 25% confident that between 54 and 55 percent of voters approve of the plan. 2 We can be 95% confident that between 48.5 and 59.5 percent of voters approve of the plan. 3 We can be 99% confident that between 45.75 and and 62.25 percent of voters approve of the plan. 4 We can be 100% confident that between 0 and 100 percent of voters approve of the plan. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 14 / 29

Confidence intervals A point estimate is OK, but it is not very useful without knowing how much confidence to have it. Solution interval estimation. Confidence interval A confidence interval for a population parameter is a range of numbers within which a parameter is believed to fall. Confidence level The probability that an interval would contain the parameter with repeated sampling. Examples: 0.95 95% confidence interval 0.70 70% confidence interval Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 15 / 29

Confidence intervals Confidence interval Point estimate ± Margin of error Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 16 / 29

Confidence interval for population means (large samples) We can use the sampling distribution of ȳ (assuming a large sample) to calculate a confidence interval for the population mean. Parameter: µ We want to estimate µ ˆµ. We use a sampling distribution and CLT: ȳ N ( ) σ µ, n A unbiased and efficient point estimate (ˆµ) is the sample mean ȳ. Then, how to calculate the margin of error in Point estimate ± Margin of error? The margin of error = z-score standard error. z-score depends on the confidence level: 95% level 1.96; 99% level 2.58. Standard error σȳ = σ by CLT. ˆσȳ = S n n Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 17 / 29

Confidence interval for population means (large samples) We plug in the estimated value of σ sample standard deviation S to get ˆσȳ. We use ȳ to estimate µ, which is sometimes denoted ˆµ Now we have an estimated sampling distribution, N(ȳ, ˆσȳ) We use our knowledge of the normal distribution to find a CI E.g., we want 2.5% of the probability to be outside of our interval on each side. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 18 / 29

Steps Calculate ȳ Calculate S and then ˆσȳ = S n How much area do we need under the curve to the right? (1-Confidence Coefficient)/2 Find the z-score associated with that number. Use these values to calculate ȳ ± Z ˆσȳ Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 19 / 29

Steps Calculate ȳ Calculate S and then ˆσȳ = S n How much area do we need under the curve to the right? (1-Confidence Coefficient)/2 Find the z-score associated with that number. Use these values to calculate ȳ ± Z ˆσȳ Exercise: If ȳ = 9.6, n = 100, and S = 4, what is the 99% confidence interval for µ? Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 19 / 29

Example If we know ȳ = 9.6, S = 4, n = 100, how can we find a 95% confidence interval for the population mean µ? Find values for L and R on the standard normal distribution such that: Pr(L µ R) = 0.95 Plug in our estimates, and see that ȳ N(µ, σȳ) N(ȳ, L = ȳ (Z ˆσȳ), R = ȳ + (Z ˆσȳ) Look for (1.95)/2 =.025 on the z-table S n ) Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 20 / 29

Example If we know ȳ = 9.6, S = 4, n = 100, how can we find a 95% confidence interval for the population mean µ? Find values for L and R on the standard normal distribution such that: Pr(L µ R) = 0.95 Plug in our estimates, and see that ȳ N(µ, σȳ) N(ȳ, L = ȳ (Z ˆσȳ), R = ȳ + (Z ˆσȳ) Look for (1.95)/2 =.025 on the z-table Answer: ȳ ± 1.96 ˆσȳ = 9.6 ± 1.96 4 10 = [8.816, 10.384] S n ) Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 20 / 29

When to use z-score? The sample size is large. or We know σ and the population is normally distributed. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 21 / 29

When the sample size is small To compute a CI for any random sample size, we need to assume that the population is normally distributed: CI = ȳ ± t n 1 ( s n ) We use the t-distribution (t-score) instead of the normal distribution (z-score), because the error produced by estimating σ using s is large due to the small sample size. n n A t-score is larger than a z-score a wider CI. Accounts for the increased error. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 22 / 29

The t-distribution and the normal Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 23 / 29

Notes on the t-distribution It has thicker tails than the normal distribution. Symmetric and bell-shaped Dispersion depends on degrees of freedom, sometimes listed as df or DOF. As df the t-distribution becomes essentially the normal distribution. NOTE: The use of the t-distribution is not related to the CLT. We are assuming the data is normally distributed. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 24 / 29

What about for categorical variables? Useful for any categorical variable. Citizens who plan to vote Nations with low tariffs Congressmen who support a balanced budget amendment. Students with blue eyes To summarize categorical data, we record the proportions of observations in the categories. Some new notation: Population parameter: 0 π 1 Estimator: ˆπ = Sample proportion Population parameter: σ = π(1 π) Population estimator: ˆσ = ˆπ(1 ˆπ) Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 25 / 29

Confidence intervals for proportions: A how-to guide y i = 0 or y i = 1 for all i You either have blue eyes or you don t, etc. So how to calculate a confidence interval? Calculate an estimator (ˆπ), ˆπ = 1 n n i=1 y i (psst... this is ȳ) a standard error for the estimator (σˆπ ), ˆσˆπ = ˆσ ˆπ(1 ˆπ) ˆπ(1 ˆπ) = = n n n and find the right Z for the confidence coefficient. The confidence interval is ˆπ ± Z ˆσˆπ Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 26 / 29

Take-away message The major conceptual difference is that this works for all categorical data. The major difference here is in the calculation of ˆσ We can calculate it just using ˆπ The formula is quite different than S. This is for large sample confidence intervals (n > 30) Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 27 / 29

Examples: Confidence intervals with proportions Source: Gallup poll Sample size: n=1785 Population: U.S. Adults Question: Did you, yourself, attend church or synagogue in the past 7 days? Sample Data: 750 said yes. How many No s? 1035 Find a 95% confidence interval for the proportion who went to church or synagogue. Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 28 / 29

Examples: Confidence intervals with proportions ˆπ(1 ˆπ) Sample Statistic: ˆπ = 0.420, ˆσˆπ = n.420(1.420) Confidence interval: 0.420 ± 1.96 1785 Final answer: [0.398, 0.442] = 0.420 ± 0.022 Park (Wash U.) U25 PS323 Intro to Quantitative Methods February 22, 2017 29 / 29