Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Similar documents
Sampling and sampling distribution

Chapter 5. Statistical inference for Parametric Models

Chapter 7 - Lecture 1 General concepts and criteria

Confidence Intervals Introduction

STAT Chapter 7: Confidence Intervals

8.1 Estimation of the Mean and Proportion

Estimation Y 3. Confidence intervals I, Feb 11,

Applied Statistics I

Confidence Intervals. σ unknown, small samples The t-statistic /22

Chapter 4: Estimation

Chapter 8. Introduction to Statistical Inference

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

1 Introduction 1. 3 Confidence interval for proportion p 6

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Chapter 8 Statistical Intervals for a Single Sample

Data Analysis and Statistical Methods Statistics 651

Chapter 8: Sampling distributions of estimators Sections

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

1 Inferential Statistic

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Review of key points about estimators

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Statistics 13 Elementary Statistics

Chapter 8 Estimation

Confidence Intervals and Sample Size

Simple Random Sampling. Sampling Distribution

BIO5312 Biostatistics Lecture 5: Estimations

Statistics for Business and Economics

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 7: Point Estimation and Sampling Distributions

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

χ 2 distributions and confidence intervals for population variance

Statistical Intervals (One sample) (Chs )

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Chapter 5. Sampling Distributions

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Review of key points about estimators

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

MVE051/MSG Lecture 7

STAT Chapter 6: Sampling Distributions

Data Analysis and Statistical Methods Statistics 651

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

Lecture 6: Confidence Intervals

Chapter 9: Sampling Distributions

IEOR E4703: Monte-Carlo Simulation

Lecture 9 - Sampling Distributions and the CLT

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Business Statistics 41000: Probability 4

Learning Objectives for Ch. 7

Chapter 4: Asymptotic Properties of MLE (Part 3)

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Lecture 2 INTERVAL ESTIMATION II

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Lecture 2. Probability Distributions Theophanis Tsandilas

Statistics for Managers Using Microsoft Excel 7 th Edition

Chapter 5: Statistical Inference (in General)

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Chapter 8: Sampling distributions of estimators Sections

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Section 0: Introduction and Review of Basic Concepts

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

MATH 3200 Exam 3 Dr. Syring

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Data Analysis and Statistical Methods Statistics 651

Back to estimators...

Data Analysis and Statistical Methods Statistics 651

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Statistical analysis and bootstrapping

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, σ Unknown

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Review of the Topics for Midterm I

Martingales, Part II, with Exercise Due 9/21

Statistics and Probability

Point Estimation. Edwin Leuven

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

6.1, 7.1 Estimating with confidence (CIS: Chapter 10)

Multi-armed bandit problems

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Chapter 7: Estimation Sections

Time Observations Time Period, t

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Business Statistics 41000: Probability 3

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Transcription:

Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1

Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide 2

Example: Studying household expenditure from a population Population N i=1 µ = X i N X 1, X 2,, X N N σ 2 i=1 = (X i µ) 2 N Sample X 1, X 2,, X n X = n i=1 X i n n s 2 i=1 = (X i X ) 2 n STAT 151 Class 7 Slide 3

Point estimation Expenditure example Suppose household expenditure (X ) in the population has mean µ and variance σ 2 and we wish to estimate the parameter µ We can take a sample of observations of X to estimate µ Suppose our sample consists of n = 10 household expenditure: (X 1,..., X 10 )=(1874, 1642, 1603, 1931, 2103, 2068, 1948, 1798, 2364, 1918), an estimate of µ using X is 1874 +... + 1918 10 = 1924.9 This type of estimate is called a point estimate There is no reference to the level of accuracy in a point estimate A SRS of independent observations STAT 151 Class 7 Slide 4

Interval estimation Our point estimate is a statistic calculated from a sample to estimate µ Recall that any statistic has sampling error defined by statistic parameter = point estimate µ Sampling error arises because we use a sample (only a part of the population) to infer about the entire population When we use X to estimate µ, we must account for the sampling error. Instead of saying µ = 1924.9 with no reference to our level of belief of our estimate, we say µ is within (a, b) with k% confidence. This type of estimate is called a confidence interval estimate STAT 151 Class 7 Slide 5

Confidence interval (CI) There are two basic components in a confidence interval estimate: Level of confidence a measure of our level of belief Margin of error a measure of the precision of our estimate Expenditure example We wish to say something like: we are 95% confident that the population household expenditure µ is between 1924.9 ± 131.2. In this case Level of confidence = 95% Margin of error = 131.2 The CI can also be written as (1924.9 131.2, 1924.9 + 131.2) = ( } 1793.7 {{}, 2056.1 }{{} ) Lower limit Upper limit The width of a CI is upper limit - lower limit = 1924.9 + 131.2 (1924.9 131.2) = 2 131.2 = 2 margin of error STAT 151 Class 7 Slide 6

Sampling error, variation and distribution Sampling error = 1924.9 µ is unknown and cannot be estimated The distribution of sampling errors can be studied and it tells us the likely values of the sampling error when X is used to estimate µ. The sampling error distribution is sometimes called a sampling distribution Population Sample k X =... X µ = Sample 2 X = 1995 1995 µ = Sample 1 X = 1924.9 1924.9 µ = Sampling distribution = distribution of = distribution of X STAT 151 Class 7 Slide 7

Sampling (error) distribution and Central Limit Theorem (CLT) Possible sampling errors = 0 The Central Limit Theorem (CLT) says that when using X from a reasonably big sample of n independent observations to estimate µ, the sampling (error) distribution is approximately a normal distribution The mean of the sampling (error) distribution is 0 and the variance of the sampling errors is var( ), which is also called the sampling variation CLT = X µ Normal }{{}}{{} (0, var( ) ) }{{} sampling error sampling sampling distribution variation We do not know where exactly is among the red s. However, using the empirical rules, we can be 95% certain that is no more than 0 ± 2 var( ) STAT 151 Class 7 Slide 8

From CLT to interval estimation In fact, the CLT says: = X µ Normal(0, σ 2 /n) X Normal(µ, σ 2 /n) Using CLT, our sampling error X µ behaves like one of the s in the following distribution 0 X 0 ± σ n 0 ± 1.96 σ n 0 ± 3 σ n 68% µ ± σ n 95% µ ± 1.96 σ n 99.7% µ ± 3 σ n Question: How do we translate these information into statements about µ? STAT 151 Class 7 Slide 9

95% confidence interval We are 95% certain that our sampling error X µ is no more than 0 ± 1.96 σ n (sometimes ±2 is used as an approximation) is equivalent to 1.96 σ n µ X 1.96 σ n 1.96 σ n + X µ X + X 1.96 σ n + X X 1.96 σ n µ X + 1.96 σ n We are 95% confident that µ is within ( X 1.96 σ n, X + 1.96 σ n ) ( X 1.96 σ n, X + 1.96 σ n ) is called a 95% confidence interval for µ The level of confidence is 95% and the margin of error is 1.96 σ n STAT 151 Class 7 Slide 10

Interpretation of a confidence level A confidence interval (CI) is a method for finding a plausible range for µ. Each time a CI is calculated using a random sample, we obtain a different interval. For example, a 95 % CI has the following property: If the method is used repeatedly, then 95% of the intervals will actually include µ. However, each time a 95% CI is calculated, the chance that µ is included in that particular interval is NOT 95% it is either { 0% (µ not inside CI, wrong estimate!) 100% (µ inside CI, correct estimate!). Therefore, our confidence in our interval is based on the fact that it may be one of the 95 (out of 100) that actually includes the unknown. STAT 151 Class 7 Slide 11

Other confidence intervals 90% confidence interval X ± 1.64 σ n 95% confidence interval X ± 1.96 σ n 99% confidence interval X ± 2.58 σ n n i=1 In practice σ is approximated by ˆσ = (X i X ) 2 n or n i=1 (X i X ) 2 n 1 90% confidence interval X ± 1.64 ˆσ n 95% confidence interval X ± 1.96 ˆσ n 99% confidence interval X ± 2.58 ˆσ n These approximations are reasonable as long as n is not too small. For very small n, the values 1.64, 1.96 and 2.58 are inflated by values in a table called the t-table. Unbiased estimator of σ STAT 151 Class 7 Slide 12

Which confidence interval? For any reasonably large sample of size n, we can construct a 90%, 95%, 99%, etc CI. In other words, we can make the following statements: We are 90% confident that µ is between X ± 1.64 ˆσ n We are 95% confident that µ is between X ± 1.96 ˆσ n We are 99% confident that µ is between X ± 2.58 ˆσ n In fact, there are infinitely many CIs we can construct. However, we report one interval that is meaningful. A meaningful interval should have: (a) a high level of confidence (b) a width that is not too wide Due to (a) and (b), we often use a 95% confidence interval. STAT 151 Class 7 Slide 13

Expenditure example Using (X 1,..., X 10 )=(1874, 1642, 1603, 1931, 2103, 2068, 1948, 1798, 2364, 1918), point estimates of (µ, σ 2 ) are n ˆµ = X = 1924.9, ˆσ 2 i=1 = (X i X ) 2 = 49774.54, n 1 and a 95% CI for µ is given by ˆµ ± 1.96 ˆσ n = 1924.9 ± 1.96 49774.54 10 1924.9 ± 131.2 = (1793.7, 2056.1) For comparison, we replace 1.96 with a value from a t-table. A t-table depends on a quantity called degree of freedom (df ), defined as df = n 1. Values for selected df s are below df = n 1 6 7 8 9 10 20 120 >120 value 2.447 2.365 2.306 2.262 2.228 2.086 1.98 1.96 In this example, n = 10, which gives df = 10 1 = 9; so we choose the value 2.262 in the table to replace 1.96 to arrive at a 95% confidence interval of 49774.54 1924.9 ± 2.262 = 1924.9 ± 159.5 = (1765.4, 2084.4), 10 which is wider than the interval using 1.96. In fact, a confidence interval based on a t-table is always wider than its equivalence using the CLT. The idea is, for small samples, ˆσ 2 may not be a very accurate estimate of σ 2 and a wider interval accounts for this extra layer of uncertainty. STAT 151 Class 7 Slide 14

Improving upon a CI: Reducing the margin of error The margin of error for a 95% CI is 1.96 σ n, which depends on σ and n σ 2 = var(x ) measures the variation of X in the population, which is beyond our control. However, n is the sample size, which is under our control n can be increased to reduce the margin of error Example: What sample size m would reduce the margin of error by a factor of 1/2? We want 1.96 σ = 1 (1.96 σ ) n m 2 }{{}}{{} new margin of error old margin of error 1 = 1 ( ) 1 n m 2 1 m = 1 ( ) 1 4 n m = 4n To reduce the margin of error by a factor of 1/k, the sample size needs to be increased by k 2 times STAT 151 Class 7 Slide 15

Estimating parameters other than the mean CLT for MLE A 95% CI for 1,2 µ based on X is: X ± 1.96 var( X ) θ based on the MLE ˆθ is: ˆθ ± 1.96 var(ˆθ) This result holds because a similar CLT 1,2 says = ˆθ θ Normal }{{}}{{} (0, var( ) ) = Normal(0, var(ˆθ)) }{{} sampling error sampling sampling distribution variation 0 ± 1.96 var(ˆθ) 0 ˆθ 95% θ ± 1.96 var(ˆθ) 1 Using a reasonably large sample of n independent observations of X 2 True for X from most distributions STAT 151 Class 7 Slide 16

Expenditure example: Difference between two population means Suppose in addition to a sample of n household expenditures (X 1,..., X n ) with a population mean µ, we obtained a sample of m household expenditures (Y 1,..., Y m ) with a population mean ν and we wish to estimate the parameter θ = µ ν. We can estimate µ and ν by X and Ȳ, respectively, and hence ˆθ = X Ȳ. 1.96 var(ˆθ) θ ˆθ 1.96 var(ˆθ) 1.96 var(ˆθ)+ˆθ θ ˆθ+ˆθ 1.96 var(ˆθ)+ˆθ ˆθ 1.96 var(ˆθ) θ ˆθ + 1.96 var(ˆθ) X Ȳ 1.96 var( X Ȳ ) θ X Ȳ + 1.96 var( X Ȳ ) Assume populations follow a normal distribution Chapter 9 exercise STAT 151 Class 7 Slide 17