Statistics 251: Statistical Methods Sampling Distributions Module

Similar documents
Central Limit Theorem (CLT) RLS

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Chapter 7: Point Estimation and Sampling Distributions

Section The Sampling Distribution of a Sample Mean

Sampling and sampling distribution

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Stat 213: Intro to Statistics 9 Central Limit Theorem

Midterm Exam III Review

Statistics 13 Elementary Statistics

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Introduction to Statistical Data Analysis II

MATH 3200 Exam 3 Dr. Syring

Central Limit Theorem (cont d) 7/28/2006

Statistics 6 th Edition

The Central Limit Theorem

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 5: Statistical Inference (in General)

STAT Chapter 7: Central Limit Theorem

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

CHAPTER 5 Sampling Distributions

Central Limit Theorem

Lean Six Sigma: Training/Certification Books and Resources

Lecture 8 - Sampling Distributions and the CLT

Sampling Distribution Models. Copyright 2009 Pearson Education, Inc.

Central Limit Thm, Normal Approximations

BIOL The Normal Distribution and the Central Limit Theorem

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Elementary Statistics Lecture 5

Chapter 5. Sampling Distributions

6. THE BINOMIAL DISTRIBUTION

5.3 Statistics and Their Distributions

Engineering Statistics ECIV 2305

Statistics for Business and Economics: Random Variables:Continuous

STAT 241/251 - Chapter 7: Central Limit Theorem

Statistics and Probability

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

BIO5312 Biostatistics Lecture 5: Estimations

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Part V - Chance Variability

Sampling Distributions Chapter 18

Counting Basics. Venn diagrams

MAS187/AEF258. University of Newcastle upon Tyne

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

Chapter 15: Sampling distributions

SAMPLING DISTRIBUTIONS. Chapter 7

The Normal Distribution

Chapter 14 - Random Variables

Continuous random variables

Stat 139 Homework 2 Solutions, Fall 2016

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Lecture 9 - Sampling Distributions and the CLT

Sampling & populations

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Random Variable: Definition

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

The normal distribution is a theoretical model derived mathematically and not empirically.

7 THE CENTRAL LIMIT THEOREM

Chapter Four: Introduction To Inference 1/50

8.1 Binomial Distributions

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

9. Statistics I. Mean and variance Expected value Models of probability events

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Standard Normal, Inverse Normal and Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions

4.3 Normal distribution

Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

MA : Introductory Probability

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

1/2 2. Mean & variance. Mean & standard deviation

Section 5 3 The Mean and Standard Deviation of a Binomial Distribution!

Sampling Distributions For Counts and Proportions

Chapter 9: Sampling Distributions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Normal Curves & Sampling Distributions

Making Sense of Cents

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Chapter 7 Study Guide: The Central Limit Theorem

Random Variables Handout. Xavier Vilà

Using the Central Limit

Lecture 3: Review of Probability, MATLAB, Histograms

Estimation Y 3. Confidence intervals I, Feb 11,

Business Statistics. Chapter 5 Discrete Probability Distributions QMIS 120. Dr. Mohammad Zainal

The Binomial Probability Distribution

Chapter 6. The Normal Probability Distributions

Lecture 10: Point Estimation

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Bus 701: Advanced Statistics. Harald Schmidbauer

Corso di Identificazione dei Modelli e Analisi dei Dati

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Transcription:

Statistics 251: Statistical Methods Sampling Distributions Module 7 2018 Three Types of Distributions data distribution the distribution of a variable in a sample population distribution the probability distribution of a single observation of a variable sampling distribution the probability distribution of a statistic Terms I sampling distribution: a probability distribution of a statistic; it is a distribution of all possible samples (random samples) from a population and how often each outcome occurs in repeated sampling (of the same size n). Given simple random samples of size n from a given population with a measured characteristic such as mean X, proportion (ˆp), or standard deviation (s) for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. Use of statistic to estimate the parameter is the main function of inferential statistics as it provides the properties of the statistic. Terms II law of large numbers states that as the number of repetitions of an experiment is increased, the relative frequency obtained in the experiment tends to become ever closer to the theoretical probability. Even though the outcomes do not happen according to any set pattern or order (overall), the long-term observed relative frequency will approach the theoretical probability Central Limit Theorem (CLT) Definition The sampling distribution of the sample statistic is approximately normal with mean µ X and standard deviation (of the sampling distribution of the sample mean) se = σ X n, provided n is sufficiently large. Sampling distribution of the Sample Mean If we take n observations of a quantitative variable and then compute the mean ( x) of those observations in the sample, then x is the sample mean statistic. Assumptions: Each observation x has the same probability distribution with mean µ and standard deviation σ, and the observations are independent. 1

Properties of the Sampling Distribution of x (1) The mean of the sampling distribution is µ (2) The standard deviation of the sampling distribution is se = σ n (3) The shape of the sampling distribution becomes more like a normal distribution as n increases Sampling distribution of the Sample Mean X N (µ, se mean ) Standard error of the mean: σ X = se mean = σ n z = X µ se mean Sample sizes should be n 30 for the sample mean If a distribution is already inherently normal, the sample size stipulation can be ignored. Sampling distribution of the Sample Proportion (ˆp) If we make n observations, and count the number of observations on which an outcome happens (call this x), then ˆp = x n is the sample proportion statistic. Assumptions: x has a binomial distribution where n is the number of trials and the probability of the outcome on each trial is p. Properties of the Sampling Distribution of ˆp (1) The mean of the sampling distribution is p. (2) The standard deviation of the sampling distribution is pq/n. (3) The shape of the sampling distribution becomes more like a normal distribution as n increases. Sampling distribution of ˆp ˆp N (p, seˆp ) Standard error of the proportion: σˆp = seˆp = z = ˆp p seˆp Sample sizes should be n 60 for the sample proportion pq n 2

Sampling distribution of the Sample Sum (Total) If we take n observations of a quantitative variable and then compute the mean total (sum) (ˆτ = n x) of those observations in the sample, then ˆτ is the sample total statistic. Assumptions: Each observation x has the same probability distribution with mean nµ and standard deviation nσ, and the observations are independent. Properties of the Sampling Distribution of τ (1) The mean total of the sampling distribution is τ (2) The standard deviation (of the total) of the sampling distribution is se = nσ (3) The shape of the sampling distribution becomes more like a normal distribution as n increases Sampling distribution of the Sample Sum (Total) ˆτ = nx τ = nµ se sum = nσ ˆτ N(τ, se sum ) with se sum = nσ z = Simulation Examples to Show CLT nx nµ se sum = ˆτ τ se sum To simulate the CLT and how it works, a random sample of 500 observations is taken from three different distributions: normal, exponential, and binomial. The purpose is to demonstrate the distribution of the sample mean; regardless of the original distribution, the distribution of the sample mean will be approximately normal. CLT with normal Simulation of a random sample of 500 observations from a normal distribution with mean of 100 and sd of 10 rnorm(): randomly generates values from the normal distribution in R rnorm(n,mean=,sd= ) - n: number of observations - mean: mean to use for random sample - sd: standard deviation for random sample CLT with normal Sample with n = 500 x=rnorm(500,mean=100,sd=10); cbind(mean(x),sd(x)) [,1] [,2] [1,] 100.653 10.1904 3

hist(x,prob=true,main='random sample 1') Random sample 1 0.00 0.02 0.04 70 90 110 130 x CLT with normal x=rnorm(500,mean=100,sd=10); mean(x) [1] 100.021 hist(x,prob=true,ylim=c(0,0.04),main='random sample 2') curve(dnorm(x,mean=100,sd=10),70,130,add=true,lwd=2,col="red") Random sample 2 0.00 0.02 0.04 70 90 110 130 x CLT simulation with normal Simluation Process Set the mean, standard deviation and sample size 4

Create empty vector to contain sample means from for-loop For-loop calculates the sample means from 500 simulations of sample size 5 (each sample has 5 observations and I am simulating 500 samples of size 5 and will get 500 sample means) CLT simulation with normal mu=100; sigma=10; n=5; xbar=rep(0,500) for (i in 1:500) { xbar[i]=mean(rnorm(n,mean=mu,sd=sigma)) } hist(xbar,prob=true,breaks=12,xlim=c(70,130),ylim=c(0,0.1)) Histogram of xbar 0.00 0.04 0.08 70 80 90 110 130 xbar Exponential Distribution We will look at a random sample of 500 observations from an exponential distribution with rate of 1. The exponential distribution is one that models (describes) the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate, λ. With f(x) = λe λx EX = 1 λ V X = 1 λ 2 CLT with exponential rexp(): randomly generates values from the exponential distribution in R rexp(n,rate= ) - n: number of observations - rate: the rate is rate=1/mean (default=1) 5

CLT with exponential Sample with n = 500 x=rexp(500); mean(x) [1] 1.00412 hist(x,prob=true,main='random sample 1') Random sample 1 0.0 0.2 0.4 0.6 0 2 4 6 8 x CLT with exponential x=rexp(500); mean(x) [1] 0.989368 hist(x,prob=true,main='random sample 2') curve(dexp(x),add=true,lwd=2,col="red") Random sample 2 0.0 0.2 0.4 0.6 0.8 0 1 2 3 4 5 6 x 6

CLT simulation with exponential Simluation Process Set the mean, standard deviation and sample size Create empty vector to contain sample means from for-loop For-loop calculates the sample means from 500 simulations of sample size 30 (each sample has 30 observations and I am simulating 500 samples of size 30 and will get 500 sample means) CLT simulation with exponential mu=1; sigma=1; n=30; xbar=rep(0,500) for (i in 1:500) { xbar[i]=mean(rexp(n)) } hist(xbar,prob=true,breaks=12) Histogram of xbar 0.0 1.0 2.0 0.6 1.0 1.4 xbar Binomial Distribution The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. We will look at a random sample of 500 observations from a binomial distribution with p = 0.8 (q = 1 p = 1.8 = 0.2) and n = 10 ( ) n P (X = x) = p x q n x x With EX = np V X = npq CLT with binomial rbinom(): randomly generates values from the binomial distribution in R 7

rbinom(n,size=,prob= ) - n: number of observations - size: number of trials - prob: probability of success on each trial CLT with binomial Sample with p = 0.8 and size = 10 n = 500 y=rbinom(500,10,.8); mean(y) [1] 7.992 hist(y,prob=t,main='random sample 1') Random sample 1 0.0 0.2 0.4 0.6 4 5 6 7 8 9 10 y CLT with binomial y=rbinom(500,10,.8); mean(y) [1] 8.114 hist(y,prob=t,main='random sample 2') 8

Random sample 2 0.0 0.2 0.4 0.6 4 5 6 7 8 9 10 y CLT simulation with binomial Simluation Process Set the mean, standard deviation and sample size Create empty vector to contain sample means from for-loop For-loop calculates the sample means from 500 simulations of sample size 30 (each sample has 30 observations and I am simulating 500 samples of size 30 and will get 500 sample means) CLT simulation with binomial mu=8; sigma=1.26; n=10; xbar=rep(0,500) for (i in 1:500) { xbar[i]=mean(rbinom(n,10,.8)) } hist(xbar,prob=true,breaks=15) Histogram of xbar 0.0 0.4 0.8 6.5 7.0 7.5 8.0 8.5 9.0 xbar 9

CLT for sample mean (X) and sample sum/total (ˆτ) for sample mean (X) and total (ˆτ) The level of a particular pollutant, nitrogen dioxide (NO 2 ), in the exhaust of a hypothetical model of car, that when driven in city traffic, has a mean level of 2.1 grams per mile (g/m) and a standard deviation of 0.3 g/m. Suppose a company has a fleet of 35 of these cars. (a) What is the mean and standard deviation of the sampling distribution of the sample mean? mean: µ X = µ = 2.1 and se mean = σ n = 0.3 35 = 0.0507 X N(µ, se mean ) = X N(2.1, 0.0507) CLT for X and ˆτ solutions (b) find the probability that the mean NO 2 level is less than 2.03 g/m P (X < 2.03) = P ( Z < ) 2.03 2.1 = P (Z < 1.38) = 0.083793 0.0507 (c) Mandates by the EPA state that the average of the fleet of these cars cannot exceed 2.2 g/m, find the probability that the fleet NO 2 levels from their fleet exceed the EPA mandate P (X > 2.2) = 1 P ( Z < ) 2.2 2.1 0.0507 = 1 P (Z < 1.97) = 1 0.975581 = 0.024419 CLT for X and ˆτ solutions (d) At most, 25% of these cars exceed what mean NO 2 value? Find the z score that represents the top 25%, which is the same as the bottom 75% (is also Q3) and what is needed to find z 0.75 = 0.67449. Next use z = X µ se mean and solve for X: X = z(se mean ) + µ X = (0.67449)(0.0507) + 2.1 = 2.134197 CLT for X and ˆτ solutions (e) what is the mean and standard deviation of the total amount (sum), in g/m, of NO 2 in the exhaust for the fleet? τ = nµ = 35(2.1) = 73.5 se sum = nσ = 35(0.3) = 1.774824 ˆτ N(τ, se sum ) = ˆτ N(73.5, 1.7748) 10

CLT for X and ˆτ solutions (f) find the probability that the total amount of NO 2 for the fleet is between 70 and 75 g/m P (70 < ˆτ < 75) = P ( 70 73.5 1.7448 < Z < ) 75 73.5 1.7748 = P ( 2.01 < Z < 0.86) = P (Z < 0.86) P (Z < 2.01) = 0.805105 0.022216 = 0.78289 CLT for proportion (ˆp) Mars company claims that 10% of the M&M s it produces are green. Suppose that candies are packaged at random in bags that contain 60 candies. (a) Describe the sampling distribution of the sample proportion (what should the distribution look like?); calculate the mean proportion and standard deviation of the sampling distribution of the sample proportion of green M&M s in bags that contain 60 candies (calculate p and se). (b) What is the probability that a bag of 60 candies will have more than 13% green M&M s? CLT for ˆp solutions (a) Describe the sampling distribution of the sample proportion; calculate the mean proportion and standard deviation of the sampling distribution of the sample proportion of green M&M s in bags that contain 60 candies. The distribution of the sample proportion will be approximately normal since n 60. The mean proportion p = 0.1 and the standard error is pq n = (.1)(.9) 60 = 0.0387 (the standard deviation of the sampling distribution of the sample proportion). Thus ˆp N(0.1, 0.0387) CLT for ˆp solutions (b) What is the probability that a bag of 60 candies will have more than 13% green M&M s? P (ˆp > 0.13) = P ( Z > ) 0.13 0.1 0.0387 = P (Z > 0.78) = 1 P (Z < 0.78) = 1 0.782305 = 0.2177 11