Simple Random Sampling. Sampling Distribution

Similar documents
χ 2 distributions and confidence intervals for population variance

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Confidence Intervals Introduction

8.1 Estimation of the Mean and Proportion

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

BIO5312 Biostatistics Lecture 5: Estimations

MATH 3200 Exam 3 Dr. Syring

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 8 Statistical Intervals for a Single Sample

Statistics for Business and Economics

STAT Chapter 7: Confidence Intervals

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Statistical Intervals (One sample) (Chs )

Chapter 7 - Lecture 1 General concepts and criteria

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Applied Statistics I

For more information about how to cite these materials visit

Lecture 2. Probability Distributions Theophanis Tsandilas

1 Inferential Statistic

4.3 Normal distribution

Estimation Y 3. Confidence intervals I, Feb 11,

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Chapter 5. Sampling Distributions

Elementary Statistics Lecture 5

Statistics 13 Elementary Statistics

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7: Point Estimation and Sampling Distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Chapter 5. Statistical inference for Parametric Models

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Section The Sampling Distribution of a Sample Mean

STAT 111 Recitation 3

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Lecture 6: Confidence Intervals

Chapter 8. Introduction to Statistical Inference

Chapter 8 Estimation

1 Introduction 1. 3 Confidence interval for proportion p 6

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Statistics and Probability

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

ECON 214 Elements of Statistics for Economists 2016/2017

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Sampling and sampling distribution

Final/Exam #3 Form B - Statistics 211 (Fall 1999)

Lecture 9 - Sampling Distributions and the CLT

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

The topics in this section are related and necessary topics for both course objectives.

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

5.3 Interval Estimation

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

7 THE CENTRAL LIMIT THEOREM

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

ECO220Y Sampling Distributions of Sample Statistics: Sample Proportion Readings: Chapter 10, section

Chapter 8: Sampling distributions of estimators Sections

MATH 264 Problem Homework I

2011 Pearson Education, Inc

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Statistics, Their Distributions, and the Central Limit Theorem

Confidence Intervals. σ unknown, small samples The t-statistic /22

Introduction to Statistical Data Analysis II

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam

Data Analysis and Statistical Methods Statistics 651

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

Part V - Chance Variability

STAT Chapter 6: Sampling Distributions

Lecture 22. Survey Sampling: an Overview

Some Discrete Distribution Families

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

The Bernoulli distribution

Statistics for Managers Using Microsoft Excel 7 th Edition

Random Sampling & Confidence Intervals

Statistics, Measures of Central Tendency I

Data Analysis and Statistical Methods Statistics 651

2017 Fall QMS102 Tip Sheet 2

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Statistical Methods in Practice STAT/MATH 3379

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Transcription:

STAT 503 Sampling Distribution and Statistical Estimation 1 Simple Random Sampling Simple random sampling selects with equal chance from (available) members of population. The resulting sample is a simple random sample. Slide 1 Consider an urn containing N balls with numbers x i written on them. Draw n balls from the urn. Equal chance for C N n possible samples w/o replacement: finite population. Equal chance for N n possible samples w/ replacement: infinite population. The x i s need not to be all different. One can let N, but then one can not sample w/o replacement. A sample from an infinite population consists of independent, identically distributed (i.i.d.) observations. Sampling Distribution Sampling distribution describes the behavior of sample statistics such as X and s 2. Slide 2 In a certain human population, 30% of the individuals have superior distance vision (20/15 or better). Consider the sample proportion of superior vision, Clearly, X = 20ˆp Bin(20,.3). Possible values for ˆp are {0,.05,.10,...,.95,1}, with the probabilities given by ˆp = X/n, where X is the number of people in the sample with superior vision. Find the sampling distribution of ˆp for n = 20. P(ˆp=x) = C 20 20x(.3) 20x (.7) 20(1 x). For example, P(ˆp=.3) = C 20 6 (.3) 6 (.7) 14 =.192. In R, use dbinom(0:20,20,.3).

STAT 503 Sampling Distribution and Statistical Estimation 2 Sampling Distribution, Sampling The sampling distribution of the total of 2 rolls of a fair die. x 2 3 4 5 6 7 8 9 10 11 12 p 1 2 3 4 5 6 5 4 3 2 1 Slide 3 The sampling distribution of the total of 4 rolls of a fair die. x <- outer(1:6,1:6,"+") ## outer matrix dist2 <- table(x); dist2/sum(dist2) ## total of 2 rolls xx <- outer(x,x,"+") ## 4-way array dist4 <- table(xx); dist4/sum(dist4) ## total of 4 rolls Sampling from a finite collection of numbers. sample(1:200,10,rep=false) ## w/o replacement, default sample(1:6,33,replace=true) ## with replacement sample(0:3,30,rep=true,prob=c(1,3,3,1)) ## rbinom(30,3,.5) Simulating Sampling Distributions When analytical derivation is cumbersome or infeasible, one may use simulation to obtain the sampling distribution. Example: The sample median of 17 r.v. s from N(0,1). Slide 4 x <- matrix(rnorm(170000),17,10000) md <- apply(x,2,median) hist(md,nclass=50); plot(density(md)) Example: The largest of 5 Poisson counts from Poisson(3.3). x <- matrix(rpois(50000,3.3),5,10000) mx <- apply(x,2,max) table(mx); table(mx)/10000 ppois(1:13,3.3)^5-ppois(0:12,3.3)^5

STAT 503 Sampling Distribution and Statistical Estimation 3 Sampling Distribution of X Use upper case X to denote the sample mean as a r.v. For an infinite population with mean µ and standard deviation σ, µ X = µ and σ X = σ n. Slide 5 Theheightsofmalestudentsona large university campus have µ = 68 and σ = 5. Consider X with sample size n = 25. µ X = µ = 68 σ X = σ n = 5 25 = 1 X is more concentrated around µ. To double the accuracy of X, one needs to quadruple the sample size n. For finite population, σ X = σ n N n N 1. Central Limit Theorem Consider an infinite population with mean µ and standard deviation σ. For n large, Slide 6 P( X µ X σ X Given µ = 68 and σ = 5. Find the probability for the average height of n = 25 students to exceed 70. P( X>70) = P( X µ X σ X > 70 68 ) 1 1 Φ(2) =.0228. z) Φ(z). The shape of sampling distribution approaches normal as n. Usually, an n 30 is sufficiently large for CLT to kick in. For normal population, X is always normal, regardless of n.

STAT 503 Sampling Distribution and Statistical Estimation 4 n=1 Effects of Sample Size n=4 n=16 Slide 7 n=1 n=4 n=16 Normal Approximation of Binomial Recall that if X Bin(n,p), then X = n i=1 X i, where X i Bin(1,p). By the Central Limit Theorem, X np X/n p P( z) = P( z) Φ(z) np(1 p) p(1 p)/n Slide 8 Consider X Bin(25,.3). P(X 8) =.6769 P(X 8) = P(X 8.5) Φ( 8.5 7.5 2.291 ) =.6687 P(X=8) =.1651 P(X=8) = P(7.5 X 8.5) Φ(.4) Φ(0) =.1687 When a continuous distribution is used to approximate a discrete one, continuity correction is needed to preserve accuracy. For np,n(1 p) 5, the approximation is reasonably accurate.

STAT 503 Sampling Distribution and Statistical Estimation 5 Basic Structure of Inference Slide 9 Statistical Inference makes educated guesses about the population based on information from the sample. All guesses are prone to error, and the quantification of imprecision is an important part of statistical inference. 1. Estimation estimates the state of population, which is typically characterized by some parameter, say θ. 2. Hypothesis testing chooses from among postulated states of population, such as H 0 : θ = θ 0 versus H a : θ θ 0, where θ 0 is a known number. Examples of Estimation and Testing Slide 10 A plant physiologist grew 13 soybean seedlings of the type Wells II. She measured the total stem length (cm) for each plant after 16 days of growth, and got x = 21.34 and s = 1.22. She may estimate the average stem length by a point estimate, µ 21.34, or by an interval estimate, 18.68<µ<24.00. As reported by AMA, 16 out of every 100 doctors in any given year are subject to malpractice claims. A hospital of 300 physicians received claims against 58 of their doctors in one year. Was the hospital simply unlucky? Or does the number possibly indicate some systematic wrongdoings at the hospital? The number 58/300 is within chance variation of θ 0 =.16.

STAT 503 Sampling Distribution and Statistical Estimation 6 Estimating Population Mean Slide 11 Observing X 1,..., X n from a population with mean µ and variance σ 2, one is to estimate µ. The procedure (or formula) one uses is called an estimator, which yields an estimate after the data are plugged in. Observing X 1,..., X 5, one may use one of the following point estimators for µ: Observing 5.1, 5.1, 5.3, 5.2, 5.2, one may use one of the following point estimates for µ: ˆµ 1 = X ˆµ 2 = X 1 ˆµ 3 = (X 1 +X 3 )/2 ˆµ 4 = X ˆµ 5 = µ 0 ˆµ 1 = x = 5.18 ˆµ 2 = x 1 = 5.1 ˆµ 3 = (x 1 +x 3 )/2 = 5.2 ˆµ 4 = x = 5.2 ˆµ 5 = 5 Properties of Point Estimators Slide 12 To choose among all possible estimators, one compares properties of the estimators. Unbiasedness: µˆθ = θ. Small SD: σˆθ. ˆµ 1, ˆµ 2, and ˆµ 3 are all unbiased. µ X = µ X1 = µ (X1 +X 3 )/2 = µ σ 2 X = σ 2 /5 σ 2 X 1 = σ 2 σ 2 (X 1 +X 3 )/2 = σ 2 /2 A better estimator yields better estimates on average. A better estimator may not always yield a better estimate.

STAT 503 Sampling Distribution and Statistical Estimation 7 Sample Mean as Estimator of Population Mean One usually uses the sample mean x to estimate the population mean µ, as X has the smallest standard deviation among all unbiased estimators of µ. Slide 13 To quantify the imprecision of the estimation of µ by x, one estimates σ X = σ X n by s n, the standard error of the sample mean. Soybean stem length: n = 13, x = 21.34, and s = 1.22. ˆσ X = s n = 1.22 13 =.338 For X nearly normal, X lies within ±2 σx n of µ about 95% of the time. Do not confuse σ X with σ X. Confidence Intervals A point estimate will almost surely miss the target, although its standard error indicates by how far the miss is likely to be. An interval estimate provides a range for the parameter estimate. Slide 14 Soybean stem length: Assume normality with σ = 1.2 known. One has X N(µ,(1.2) 2 /13), so P( X µ 1.2/ 1.96) =.95. 13 Solving for µ, one obtains X 1.96 1.2 1.2 µ X+1.96 13 13 For X i N(µ,σ 2 ), i = 1,...,n with σ 2 known, X ±z 1 α/2 σ n provides an interval estimator that covers µ with probability (1 α). It yields a (1 α)100% confidence interval for µ, with a confidence coefficient (1 α)100%.

STAT 503 Sampling Distribution and Statistical Estimation 8 Coverage, Large Sample CIs Slide 15 As an estimator, a CI is a moving bracket chasing a fixed target. As an estimate, a CI may or may not cover the truth. With a large sample from an arbitrary distribution for σ unknown, an confidence interval for µ with an approximate conf. coef. (1 α)100% is given by X ±z 1 α/2 s n. Normality comes from CLT. Unknown σ estimated by s. Replace s by σ if known. Small Sample CIs based on t-distribution For a small sample with σ unknown, one has to assume normality. Slide 16 Consider Z i N(0,1), i = 1,...,n. The distribution of is called a t-distribution Z s/ n with a degree of freedom (df) ν = n 1. A t-distribution with ν = reduces to N(0,1). 0.0 0.1 0.2 0.3 0.4 df=1,10,100-2 0 2 For X i N(µ,σ 2 ), i = 1,...,n, P( X µ s/ n t 1 α/2,n 1) = 1 α, X ±t 1 α/2,n 1 s n provides a (1 α)100% CI for µ. t 1 α,ν as ν. For σ known, use z 1 α/2, σ. Table C.4 lists t 1 α,ν, but the notation in text drops (α,ν).

STAT 503 Sampling Distribution and Statistical Estimation 9 Confidence Intervals for µ: Summary Slide 17 An agronomist measured stem diameter (mm) in 8 plants of a variety of wheat, and calculated x = 2.275 and s =.2375. Assuming normality, a 95% CI for µ is given by 2.275±2.5(.2375)/ 8, or (2.076,2.474), where t.975,7 = 2.5. If one further knows that σ =.25, then he can use 2.275±1.96(.25)/ 8, or (2.102, 2.448). In the ideal situation with normality and known σ, always use X ±z 1 α/2 σ n With a small normal sample but unknown σ, estimate σ by s and replace z 1 α/2 by t 1 α/2,n 1 to allow for the uncertainty. When n is large, CLT grants normality of X, s estimates σ reliably, and z 1 α/2 t 1 α/2,n 1. Coverage versus Precision To cover the truth more often, one needs a higher confidence coefficient, but at the expense of wider intervals. Slide 18 The interval (, ) has 100% coverage but is useless. A point estimate is the most precise but always misses. Given sample size n, X ±z1 α/2 σ/ n is the shortest interval estimate for µ among all that have a confidence coefficient (1 α)100%. To achieve both coverage and precision, one has to take a large enough sample.

STAT 503 Sampling Distribution and Statistical Estimation 10 Planning Sample Size Slide 19 The agronomist is planning a new study of wheat stem diameter, and wants a 95% CI of µ no wider than.2 mm. From experience and pilot study, he believes that σ =.25 is about right. The half-width of CI is z.975 σ n = 1.96.25 n. Solving for n from 1.96(.25)/ n.1, one gets n 24. Let h be the desired half-width for a (1 α)100% CI. Solving for n from z 1 α/2 σ n h, one has ( z1 α/2 σ n h ) 2 z 1 α/2 t 1 α/2,n 1 for large n. Need a conservative estimate of σ. To cut the width by half, one needs to quadruple the sample size n. CI for Population Proportion Slide 20 123 adult female deer were captured and 97 found to be pregnant. Construct a 95% CI for pregnant proportion in the population. Since ˆp = 97 123 =.7886, ˆσˆp =.7886(1.7886) 123 =.08, the 95% CI is given by.7886±1.96(.08), or (.7165,.8607). Fora95%CIwithhalf-widthh 3%, it is safe to have n (1.96(0.5)/0.03) 2 = 1067.1. Consider X i Bin(1,p), i = 1,...,n, independent. One has X = i X i Bin(n,p). For n large, by CLT, X/n p P( z) Φ(z). p(1 p)/n The sample proportion ˆp = X/n is actually an X. As an estimate of σˆp = p(1 p)/n one may use ˆp(1 ˆp)/n. A (1 α)100% CI for p is thus ˆp±z 1 α/2 ˆp(1 ˆp)/n. σ = p(1 p) 0.5.

STAT 503 Sampling Distribution and Statistical Estimation 11 Confidence Interval for σ 2 For X 1,...,X n from a population with variance σ 2, the sample variance s 2 = i (X i X) 2 /(n 1) is an unbiased estimate of σ 2. Slide 21 With Z i N(0,1), i = 1,...,n. i (Z i Z) 2 follows a χ 2 -distribution with a degree of freedom (df) ν = n 1. 0.0 0.5 1.0 1.5 df=5,25 0.0 0.5 1.0 1.5 2.0 2.5 3.0 χ 2 ν ν For X i N(µ,σ 2 ), i = 1,...,n, (n 1)s 2 /σ 2 follows χ 2 n 1. For X i N(µ,σ 2 ), i = 1,...,n, P( χ2.025,n 1 n 1 < s2 < χ2.975,n 1 ) = σ 2 n 1 0.95. Solving for σ 2, a 95% CI is (n 1)s 2 χ 2.975,n 1 < σ 2 < (n 1)s2 χ 2.025,n 1 For n = 13 and s =.2375, χ 2.025,12 = 4.4038 and χ 2.975,12 = 23.337, so a 95% CI for σ 2 is 12(.2375) 2 < σ 2 < 12(.2375)2, 23.337 4.4038 or 0.1703 < σ < 0.3920.. Simulations of Coverage, Robustness Slide 22 ## generate data and set parameters ## n <- 10; x <- matrix(rnorm(10000*n),ncol=10000) ## mu <- 0; sig <- 1 ## N(0,1) n <- 30; x <- matrix(runif(10000*n),ncol=10000) mu <-.5; sig <- sqrt(1/12) ## U(0,1) ## calculate CIs and coverage mn <- apply(x,2,mean); v <- apply(x,2,var) hwd <- qnorm(.975)*sig/sqrt(n); lcl<-mn-hwd; ucl<-mn+hwd mean((lcl<mu)&(ucl>mu)) ## z-interval for mu hwd <- qt(.975,n-1)*sqrt(v/n); lcl<-mn-hwd; ucl<-mn+hwd mean((lcl<mu)&(ucl>mu)) ## t-interval for mu lcl <- sqrt(v*(n-1)/qchisq(.975,n-1)) ucl <- sqrt(v*(n-1)/qchisq(.025,n-1)) mean((lcl<sig)&(ucl>sig)) ## chisq-interval for sig