Unit 5: Sampling Distributions of Statistics

Similar documents
Unit 5: Sampling Distributions of Statistics

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Section The Sampling Distribution of a Sample Mean

Chapter 7: Point Estimation and Sampling Distributions

STAT Chapter 6: Sampling Distributions

Chapter 7. Sampling Distributions and the Central Limit Theorem

STAT Chapter 7: Central Limit Theorem

5.3 Statistics and Their Distributions

STAT 241/251 - Chapter 7: Central Limit Theorem

Elementary Statistics Lecture 5

Chapter 7. Sampling Distributions and the Central Limit Theorem

BIO5312 Biostatistics Lecture 5: Estimations

Business Statistics 41000: Probability 4

Lecture 2. Probability Distributions Theophanis Tsandilas

Statistics and Probability

Chapter 5. Sampling Distributions

MATH 3200 Exam 3 Dr. Syring

Binomial and Normal Distributions

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Random Variables Handout. Xavier Vilà

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Sampling and sampling distribution

χ 2 distributions and confidence intervals for population variance

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

The topics in this section are related and necessary topics for both course objectives.

Chapter 7 Study Guide: The Central Limit Theorem

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Simple Random Sampling. Sampling Distribution

Statistics, Their Distributions, and the Central Limit Theorem

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 8: The Binomial and Geometric Distributions

Data Analysis and Statistical Methods Statistics 651

Part V - Chance Variability

Chapter 8: Sampling distributions of estimators Sections

Central Limit Theorem, Joint Distributions Spring 2018

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Commonly Used Distributions

Confidence Intervals Introduction

Confidence Intervals. σ unknown, small samples The t-statistic /22

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

1 Introduction 1. 3 Confidence interval for proportion p 6

Stat 213: Intro to Statistics 9 Central Limit Theorem

Statistical Intervals (One sample) (Chs )

4.3 Normal distribution

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Business Statistics 41000: Probability 3

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

The Bernoulli distribution

Lecture 9 - Sampling Distributions and the CLT

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Probability. An intro for calculus students P= Figure 1: A normal integral

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

7 THE CENTRAL LIMIT THEOREM

Statistics 431 Spring 2007 P. Shaman. Preliminaries

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Chapter 4: Asymptotic Properties of MLE (Part 3)

Random Variable: Definition

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

4.2 Probability Distributions

4 Random Variables and Distributions

Sampling Distribution

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

PROBABILITY DISTRIBUTIONS

Stat 139 Homework 2 Solutions, Fall 2016

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Chapter 8 Estimation

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Chapter 7 - Lecture 1 General concepts and criteria

MATH 264 Problem Homework I

STATS 200: Introduction to Statistical Inference. Lecture 4: Asymptotics and simulation

Chapter 5: Statistical Inference (in General)

Homework Assignments

Midterm Exam III Review

Statistics, Measures of Central Tendency I

Statistical Methods in Practice STAT/MATH 3379

2011 Pearson Education, Inc

Section 0: Introduction and Review of Basic Concepts

Random variables. Contents

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

STAT 830 Convergence in Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

Sampling Distributions and the Central Limit Theorem

MA : Introductory Probability

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Chapter 3 Discrete Random Variables and Probability Distributions

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

6 Central Limit Theorem. (Chs 6.4, 6.5)

IEOR 165 Lecture 1 Probability Review

Much of what appears here comes from ideas presented in the book:

Central Limit Theorem (cont d) 7/28/2006

Chapter 8 Statistical Intervals for a Single Sample

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

8.1 Estimation of the Mean and Proportion

Transcription:

Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate an unknown population parameter is called an estimate The discrepancy between the estimate and the true parameter value is known as sampling error Sampling error is due to sampling variation 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 2

Frequentist Approach to Statistics Assesses the accuracy of a sample estimate by considering how the estimate would vary around the true parameter value if repeated random samples are drawn from the same population A statistic is a random variable with a probability distribution - called the sampling distribution - which is generated by repeated sampling. We use the sampling distribution of a statistic to assess the sampling error of an estimate 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 3 Sample Mean A random sample is a set of independently, identically distributed or i.i.d. observations X 1, X 2,, X n (when sampling from a large population or with replacement) Assume that thepopulation has 2 mean µ = E( Xi ) and variance σ = Var( Xi) n 1 How does thesample mean X = Xi n i= 1 vary on repeated random samples of size n? This is called the sampling distribution of the sample mean. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 4

Mean and Variance of a Die Toss 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 5 Simulating a Die Toss in JMP 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 6

Rolling Two Dice Each one of these 36 outcomes are equally likely, i.e., each one occurs with 1/36 probability. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 7 Rolling Two Dice Sampling Distribution 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 8

Homework To be Done Right Away Use the Sampling Distribution simulation Java applet at the Rice Virtual Lab in Statistics to do the following. Draw 10,000 random samples of size N=5 from the normal distribution provided. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Turn in this output with the rest of the homework for Unit 5. Draw 10,000 random samples of size N=20 from the normal distribution provided. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Draw 10,000 random samples of size N=5 from a uniform distribution on [0,32]. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Draw 10,000 random samples of size N=20 from a uniform distribution on [0,32]. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Draw 10,000 random samples of size N=5 from the skewed distribution provided. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Construct the histogram of the sampling distribution of the sample median Draw 10,000 random samples of size N=20 from the skewed distribution provided. Construct the histogram of the sampling distribution of the sample mean. Construct the histogram of the sampling distribution of the sample variance Construct the histogram of the sampling distribution of the sample median 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 9 Distribution of Sample Means If the i.i.d. r.v. s are Bernoulli, Normal, or Exponential the distribution of the sample mean can be calculated exactly. However, in general the exact distribution of the sample mean is difficult to calculate. What can be said about the distribution of the sample mean when the sample is drawn from an arbitrary population? In many cases we can approximate the distribution of the sample mean when n is large by a normal distribution. This result is called the Central Limit Theorem. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 10

Central Limit Theorem Let X 1, X 2,, X n be a random sample drawn from an arbitrary distribution with a finite mean µ and variance σ 2. Then if n is sufficiently large X µ N(0,1) σ n Sometimes the theorem is given in terms of the sums: n j= 1 X σ i nµ N(0,1) n 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 11 Central Limit Theorem Illustration 0 10 5 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 12

Screen Shots of the Output of the Sampling Distribution Simulation Java Applet σ 6.22 = = 2.78 2.81 n 5 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 13 Central Limit Theorem and Law of Large Numbers Both are asymptotic results about the sample mean Law of Large Numbers says that as n goes to infinity the sample mean converges to the population mean, i.e. X µ converges to 0 as n CLT says that as n goes to infinity X µ σ n converges to N(0,1) as n 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 14

Central Limit Theorem Let X 1, X 2,, X n be a random sample drawn from an arbitrary distribution with a finite mean µ and variance σ 2. Then if n is sufficiently large X µ N(0,1) σ n Sometimes the theorem is given in terms of the sums: n j= 1 X σ i nµ N(0,1) n 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 15 Normal Approximation to the Binomial A binomial r.v. is the sum of i.i.d. Bernoulli r.v. s so the CLT can be used to approximate its distribution Suppose that Z is Bernoulli. Then the mean of Z is p and its variance is p(1 p). By the CLT we have for the Binomial (n, p) r.v X : n Zi np Zi ne( Z) X np i= 1 i= 1 = = N(0,1) np(1 p) np(1 p) Var( Z) n How large of a sample, n, do we need for the approximation to be good? Rule of Thumb: np 10 and n(1 p) 10 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 16 n

CLT Approximation to the Binomial When p is Close to 0.5 For a good approximation np=n(1-p)=n0.5 should be at least 10. So, for a good approximation n should be at least 20 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 17 CLT Approximation to the Binomial When p is Not Close to 0.5 np = n(.1) should be at least 10. So n should be at least 100 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 18

Continuity Correction 8.5 np P( X 8) Φ np(1 p) Similarly: 7.5 np P( X 8) 1 Φ np(1 p) 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 19 Screen Shots of the Output of the Java Applet Normal Approximation to the Binomial Distribution Homework: See the Homework Log. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 20

Why the Normal Approximation to the Binomial Distribution Works in Pictures Green area is approximately the same as the red area 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 21 Java Applet for N=100 and p=.1 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 22

Example: CLT Approximation to the Binomial 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 23 Rolling Two Dice Each one of these 36 outcomes are equally likely, i.e., each one occurs with 1/36 probability. Now we pay attention to the sample variance. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 24

Sampling Distribution of the Sample Variance: Two Dice Example 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 25 Chi-Square Distribution 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 26

Using JMP to Simulate a Chi-Square Random Sample with 5 d.f. The number of rows is the size of the random sample See the JMP tutorial Chi- Square Simulation on the course home page 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 27 Sample of 1000 Random Chi-Square Random Variables Notice the right skewness 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 28

Fitted Chi-Square Based on the Sample 0 10 20 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 29 Chi-Square Density Function Curves Notice how similar is this density function to the histogram in the previous page. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 30

Critical Values for the Chi-Square See the JMP tutorial Tabled Values of Common Distributions 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 31 Distribution of Sample Variance Assuming that the random sample comes from a normal distribution 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 32

Application of the Distribution of Sample Variance Measurement Precision Introduction to the ideas of hypothesis testing 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 33 Application of the Distribution of Sample Variance Measurement Precision 0.05 2 χ 9,0.05 = 16.92 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 34

Student s t-distribution 2 Consider a random sample, X1, X2,..., Xn drawn from a N ( µ, σ ) It is known that ( X µ ) is exactly distributed as N(0,1) for any n. σ n ( X µ ) But T = is not longer distributed as N(0,1). S n The distribution of T is named Student s t-distribution. (A different distribution for each number ν = n -1 = degrees of freedom) Play with the Java applet Student s t Distribution 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 35 t-distribution Table See the JMP tutorial Tabled Values of Common Distributions 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 36

Application of the t-distribution Calculation Process Control 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 37 Example: t-distribution Calculation 0.005 =3.250 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 38

F-Distribution Consider two independent random samples, X, X,..., X from an N( µ, σ ), Y, Y,..., Y from an N( µ, σ ). 2 2 1 2 n 1 1 1 2 n 2 2 Then S S σ 2 2 1 1 2 2 2 σ 2 1 2 has an F distribution with ν 1 = n 1-1d.f. in the numerator and ν 2 = n 2-1 d.f. in the denominator. 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 39 F-Distribution Table See the JMP tutorial Tabled Values of Common Distributions 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 40