Sampling and sampling distribution

Similar documents
Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 5. Sampling Distributions

STA Module 3B Discrete Random Variables

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Stat 213: Intro to Statistics 9 Central Limit Theorem

Review of the Topics for Midterm I

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Elementary Statistics Lecture 5

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Data Analysis and Statistical Methods Statistics 651

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 9: Sampling Distributions

Statistics 13 Elementary Statistics

Confidence Intervals. σ unknown, small samples The t-statistic /22

Data Analysis and Statistical Methods Statistics 651

Section The Sampling Distribution of a Sample Mean

Chapter 7: Point Estimation and Sampling Distributions

Section 0: Introduction and Review of Basic Concepts

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

1 Sampling Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

BIO5312 Biostatistics Lecture 5: Estimations

Business Statistics 41000: Probability 4

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Introduction to Statistics I

Chapter 4 Probability Distributions

Value (x) probability Example A-2: Construct a histogram for population Ψ.

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Central Limit Theorem

5.3 Statistics and Their Distributions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Confidence Intervals and Sample Size

Sampling Distributions

CD Appendix F Hypergeometric Distribution

8.1 Estimation of the Mean and Proportion

Sampling & populations

Overview. Definitions. Definitions. Graphs. Chapter 4 Probability Distributions. probability distributions

Chapter 5. Statistical inference for Parametric Models

4 Random Variables and Distributions

Statistics and Probability

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Chapter 7 Study Guide: The Central Limit Theorem

6 Central Limit Theorem. (Chs 6.4, 6.5)

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Chapter 7 - Lecture 1 General concepts and criteria

Lecture 9 - Sampling Distributions and the CLT

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

STAT Chapter 7: Central Limit Theorem

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Homework Assignments

1. Variability in estimates and CLT

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6

Overview. Definitions. Definitions. Graphs. Chapter 5 Probability Distributions. probability distributions

Stat 139 Homework 2 Solutions, Fall 2016

Distribution of the Sample Mean

Introduction to Statistical Data Analysis II

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Chapter 5: Statistical Inference (in General)

Discrete probability distributions

Discrete Random Variables

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Probability & Statistics

Module 4: Probability

Review: Population, sample, and sampling distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Confidence Intervals Introduction

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Counting Basics. Venn diagrams

χ 2 distributions and confidence intervals for population variance

4.2 Probability Distributions

Statistics for Business and Economics: Random Variables:Continuous

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

Time Observations Time Period, t

Making Sense of Cents

Chapter 3 Discrete Random Variables and Probability Distributions

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Statistics, Measures of Central Tendency I

Standard Normal, Inverse Normal and Sampling Distributions

Module 4: Point Estimation Statistics (OA3102)

Statistics for Managers Using Microsoft Excel 7 th Edition

Statistics, Their Distributions, and the Central Limit Theorem

ECON 214 Elements of Statistics for Economists 2016/2017

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

2011 Pearson Education, Inc

Central Limit Theorem (cont d) 7/28/2006

Transcription:

Sampling and sampling distribution September 12, 2017 STAT 101 Class 5 Slide 1

Outline of Topics 1 Sampling 2 Sampling distribution of a mean 3 Sampling distribution of a proportion STAT 101 Class 5 Slide 2

Statistical Inference Many economic and social decisions are based on figures from the entire population, e.g., how many homeless people are there? what is the household income? A census every unit in the population is studied is the gold standard but very costly Statisticians use a representative portion of the population a sample (a pseudo-population ) to solve these problems The method of using a sample to study a population is called statistical inference STAT 101 Class 5 Slide 3

Population and sample Population The set of all units of interest Finite Population size N is enumerable Infinite N is not finite (note that infinite continuous as in the definition of random variables) Sample Any subset of a population. Sample size n can be as small as one unit of the population A finite population can be analysed as an infinite population if (1) N is very big (2) n N < 0.05 (3) N is small but sampling is carried out with replacement We assume an infinite population or a finite population with (1), (2) or (3) STAT 101 Class 5 Slide 4

Parameters and statistics Every problem about a population can be characterised by some summaries called parameters, e.g., the proportion of homeless people the mean income A statistic is the equivalence of a parameter calculated from a sample, e.g., the proportion of homeless people in the sample the sample mean income Parameters are usually unknown whereas statistics are known Inferential statistics uses a statistic to infer about a parameter STAT 101 Class 5 Slide 5

Common population quantities and sample counterparts Parameter Statistic Probability distribution Histogram (Population) mean, µ (Sample) mean, X (Population) variance, σ 2 (Sample) variance, s 2 (Population) standard deviation, σ (Sample) standard deviation, s (Population) proportion, p (Sample) proportion, ˆp STAT 101 Class 5 Slide 6

Simple random sample A simple random sample (SRS) is chosen in such a way that every member of the population has the same probability of being selected A SRS is a pseudo-population that mimics the true population and hence, we can use the SRS statistic(s) to answer questions about the unknown population parameter(s) We assume members in our sample are independently drawn from the population each unit in the sample to contribute a separate piece of information about the parameter of interest There are other sampling schemes but we focus on SRS here Hereafter, we refer a SRS of independent observations as a sample STAT 101 Class 5 Slide 7

Sampling from a population Population N i=1 µ = X i N X 1, X 2,, X N N σ 2 i=1 = (X i µ) 2 N Sample X 1, X 2,, X n X = n i=1 X i n Intuition tells us X is similar to µ and s 2 is similar to σ 2 STAT 101 Class 5 Slide 8 n s 2 i=1 = (X i X ) 2 n

Sampling error Example Sampling with replacement from a finite population Population Sample Units X 1,..., X 7 = 1, 2, 3, 4, 5, 6, 7 X 1,..., X 5 =3, 6, 5, 1, 6 Size N = 7 n = 5 Mean µ = X 1+...+X N N = 1+...+7 7 = 4 X = X 1+...+X n n = 3+6+5+1+6 5 = 4.2 X µ = 4.2 4 is called a sampling error Every sample of size n is subject to sampling error because only a subset of the population is used to infer about the whole In practice, µ is unknown and hence is also unknown and it cannot be estimated X 1,..., X 5 are generic symbols for five units randomly selected with replacement from the population; they are not necessarily the first five units in the population STAT 101 Class 5 Slide 9

Sampling distribution Every SRS is randomly drawn from the population, hence X and its sampling error X µ are both random we cannot make definitive statements about anything random (c.f., class 1 slide 10) We look for the probabilities of different values of X, i.e., we want to find its distribution The distribution of X is sometimes called a sampling distribution The distribution of X also tells us the distribution of its sampling error = X µ since µ is just a constant even though it is unknown The distribution of tells us the likely values of STAT 101 Class 5 Slide 10

Sampling distribution (2) Sample k Population 1, 2, 3, 4, 5, 6, 7 4, 5, 6, 1, 7 X = 4.6 4.6 µ = Sample 2 3, 6, 5, 1, 6 X = 4.2 Sample 1 4.2 µ = 1, 4, 6, 2, 2 X = 3 3 µ = STAT 101 Class 5 Slide 11

Finding sampling distribution Method 1 Different SRSs give an empirical sampling distribution of X Few samples have X near 1 or 7 only appear if SRS gives nearly all 1s or all 7s a rare event Highest frequencies of X near population mean µ = 4 very often X is similar to µ and sampling error is small The distributions of X and are identical except the values are translated This method is not feasible because: 1. Drawing many SRSs is time consuming 2. Usually the entire population, such as 1,2,3,4, 5,6,7, and hence µ are unknown 0 to 500 500 to 1000 1000 to 1500 1500 to 2000 2000 to 2500 2500 to 3000 3000 to 3500 3500 to 4000 Distribution of X and x 0 1 2 3 µ 5 6 7 8 X 4 3 2 1 0 1 2 3 4 sampling error x STAT 101 Class 5 Slide 12

Find sampling distribution Method 2 Possible sampling errors = (Possible values of X ) 0 Sampling error µ X By assuming a sufficiently large sample of n observations, the Central Limit Theorem (CLT, P.-S. Laplace, 1810, 1811) shows that using X to estimate µ of any population, the sampling distribution of X (and its sampling error) is approximately normal }{{} X Normal }{{} (µ, var( X )) and = X µ Normal }{{}}{{}}{{} (0, var( ) ) }{{} statistic sampling sampling sampling error sampling sampling distribution variation distribution variation We do not know where exactly is among the red s. However, using the empirical rules, we can be 95% certain that is no more than 0 ± 2 var( ) STAT 101 Class 5 Slide 13

Sampling variation Sample X Sampling error 1 3, 6, 5, 1, 6 4.2 4.2 µ 2 1, 4, 6, 2, 2 3 3 µ.... k 4, 5, 6, 1, 7 4.6 4.6 µ Any X 1, X 2, X 3, X 4, X 5 X 1+...+X 5 5 X 1+...+X 5 5 µ 1 Sampling variation, var( X ), tells us if we wish to estimate µ using X, different samples may give different estimates and different sampling errors 2 X = X1+...+X5 5 so var( X ) is due to var(x 1 ),..., var(x 5 ) 3 X 1, X 2,..., X 5 are randomly drawn from the population, they must have the same behaviour as any X randomly drawn from the population, i.e., var(x 1 ) = var(x 2 ) =... = var(x 5 ) = var(x ) STAT 101 Class 5 Slide 14

Sampling variation (2) ( ) X1 +... + X n var( X ) = var n = 1 n 2 var(x 1 +... + X n ) = 1 n 2 [var(x 1) +... + var(x n )] }{{} X 1,...,X n are independent = 1 n 2 n var(x ) }{{} var(x 1)=...=var(X n) var(x ) var(x ) = }{{ n } depends on var(x ) and n var(sampling error ) = var( X µ) = var( X ) }{{} µ is a constant Sampling variation depends on { (1) var(x ), how different are the values of X in the population (2) n, the sample size STAT 101 Class 5 Slide 15

Why sampling variation matters? Sampling distribution Sampling error 0 0 Large sampling variation Our sampling error is among the s and so may be large Small sampling variation Our sampling error is among the s and so unlikely to be large STAT 101 Class 5 Slide 16

What is a proportion? Example We wish to estimate the proportion, p, of homeless people in a population of N individuals. Let X indicate whether someone is homeless: { 1 homeless X = 0 not homeless Suppose the value of X in the population are X 1 = 1 (homeless), X 2 = 0 (not homeless), X 3 = 0,...,X N = 1, which is a collection of 1 s and 0 s p = #1 s N = 1 + 0 + 0 +... + 1 N = X 1 + X 2 + X 3 +... + X N N = µ Hence a proportion is a special case of µ with only 1 s and 0 s STAT 101 Class 5 Slide 17

Sampling to estimate a proportion Example (cont d) We take a sample X 1,..., X n and estimate p µ using X ˆp = X 1 +... + X n n X 1,..., X n are: { 1 with probability p 0 with probability 1 p We use CLT for X, i.e., X N(µ, var(x ) ) }{{ n } var( X ) var(x ) = E(X 2 ) E(X ) 2 = (1) 2 p + (0) 2 (1 p) = p p 2 = p(1 p) Hence CLT for ˆp is ˆp N(p, p(1 p) n ) STAT 101 Class 5 Slide 18 p 2 {}}{ µ 2