Epidemiology Principle of Biostatistics Chapter 7: Sampling Distributions (continued) John Koval

Similar documents
Statistics for Business and Economics: Random Variables:Continuous

The topics in this section are related and necessary topics for both course objectives.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Epidemiology Principle of Biostatistics Chapter 5 Probability Distributions (continued) John Koval

Probability. An intro for calculus students P= Figure 1: A normal integral

Data Analysis and Statistical Methods Statistics 651

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Theoretical Foundations

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Normal populations. Lab 9: Normal approximations for means STT 421: Summer, 2004 Vince Melfi

The normal distribution is a theoretical model derived mathematically and not empirically.

Chapter 7 1. Random Variables

Business Statistics 41000: Probability 4

Lecture 9. Probability Distributions. Outline. Outline

Math 130 Jeff Stratton. The Binomial Model. Goal: To gain experience with the binomial model as well as the sampling distribution of the mean.

Lecture 9. Probability Distributions

Binomial and Normal Distributions

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E

STAT 157 HW1 Solutions

5.4 Normal Approximation of the Binomial Distribution

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E

Data Analysis and Statistical Methods Statistics 651

Binomial Random Variables. Binomial Random Variables

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

23.1 Probability Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Business Statistics 41000: Probability 3

Probability and Statistics

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

The Normal Probability Distribution

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

MATH 10 INTRODUCTORY STATISTICS

The Binomial Probability Distribution

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

5.4 Normal Approximation of the Binomial Distribution Lesson MDM4U Jensen

A useful modeling tricks.

Chapter 5. Sampling Distributions

Central Limit Theorem, Joint Distributions Spring 2018

What do you think "Binomial" involves?

Moments and Measures of Skewness and Kurtosis

Introduction to Statistical Data Analysis II

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

Chapter 6. The Normal Probability Distributions

The Normal Distribution

MATH 264 Problem Homework I

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Lecture 5 - Continuous Distributions

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 6 Part 3 October 21, Bootstrapping

Lean Six Sigma: Training/Certification Books and Resources

Math 227 Elementary Statistics. Bluman 5 th edition

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Lecture 6: Chapter 6

Value (x) probability Example A-2: Construct a histogram for population Ψ.

Introduction to R (2)

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Random Variables and Probability Distributions

4: Probability. What is probability? Random variables (RVs)

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

ECON 214 Elements of Statistics for Economists

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

4.3 Normal distribution

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

Counting Basics. Venn diagrams

1. Distinguish three missing data mechanisms:

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Overview. Definitions. Definitions. Graphs. Chapter 5 Probability Distributions. probability distributions

Chapter 4. The Normal Distribution

The Binomial Distribution

Graphing a Binomial Probability Distribution Histogram

CH 5 Normal Probability Distributions Properties of the Normal Distribution

MATH 10 INTRODUCTORY STATISTICS

Chapter 11. Data Descriptions and Probability Distributions. Section 4 Bernoulli Trials and Binomial Distribution

Lecture 2 Describing Data

x is a random variable which is a numerical description of the outcome of an experiment.

6.1 Graphs of Normal Probability Distributions:

Chapter 7: Point Estimation and Sampling Distributions

Establishing a framework for statistical analysis via the Generalized Linear Model

Statistical Intervals (One sample) (Chs )

Section 6-1 : Numerical Summaries

Chapter 3. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 1. The Normal Distributions

Chapter 7 Study Guide: The Central Limit Theorem

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

VI. Continuous Probability Distributions

MAS187/AEF258. University of Newcastle upon Tyne

CS 237: Probability in Computing

Probability and distributions

Transcription:

Principle of Biostatistics Chapter 7: Sampling Distributions (continued) John Koval Department of Epidemiology and Biostatistics University of Western Ontario

Next want to look at histogram of sample statistics sample mean, median, sample variance, sample standard deviation to see what their distribution looks like

sample mean of Bernoullis Consider the sample of 10 observations from a Bernoulli that is, the sample of 10 responses to the question Do you smoke? where Yes is valued as 1 and No is valued as 0 In what are we interested??

sample mean of Bernoullis Consider the sample of 10 observations from a Bernoulli that is, the sample of 10 responses to the question Do you smoke? where Yes is valued as 1 and No is valued as 0 In what are we interested?? the proportion, p which is the sample mean of a bunch of 0 s and 1 s

Random variables - some math Les us call X 1, a random variable which measures the response (0 or 1) of the first person and X 2 is the response is the response of the second person etc, up to X 10, the response of the 10 th person let Y be the sum of the responses of all ten subjects Then P, the sample proportion, is the average (sample mean) or all ten responses that is P = Y n = 10 1 X i n = 0+1+1...+0 10

Distribution of a sample mean of Bernoullis Remember that Y is the sum of 10 Bernoullis so that what is the distribution of Y? (which can be thought of number of successes in a sample of size 10)

Distribution of a sample mean of Bernoullis Remember that Y is the sum of 10 Bernoullis so that what is the distribution of Y? (which can be thought of number of successes in a sample of size 10) Binomial (10,0.2) where π = 0.2 is the population proportion of smokers or the probability of picking a smoker at random Hence the distribution of the sample proportion is that of a multiple of the binomial distribution that is, it is a curve which has the same boxes as the binomial except the x-axis is marked in proportions rather that integers

Binomial Distribution B(10,0.2) x Pr(X=x) 0 0.10737 1 0.26844 2 0.30199 3 0.20133 4 0.08808 5 0.02642 6 0.00551 7 0.00079 8 0.00007 9 0.00000 10 0.00000

Bin(10,0.2) Probability 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 1 2 3 4 5 6 7 8 9 10

Distribution of proportion x Pr(X=x) 0.0 0.10737 0.1 0.26844 0.2 0.30199 0.3 0.20133 0.4 0.08808 0.5 0.02642 0.6 0.00551 0.7 0.00079 0.8 0.00007 0.9 0.00000 1.0 0.00000

proportion of 10 Bern(0.2) s Probability 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

distribution of proportions If the proportion is the average of a number of Bernoulli distributions its distribution is exactly a multiple of a Binomial Hence we can always plot its distribution and calculate probabilities From a previous lecture, we know that for large sample size, n, and nπ > 5 the binomial distribution can be approximated by a Normal distribution Similarly, the distribution of the proportion for large sample size, n, and nπ > 5 can be approximated by a multiple of a Normal distribution

Sample means from other distributions easy stuff ends here If we have more complicated distributions that produce the data of which we are calculating sample means we cannot get the distributions so easily as for the proportion However, for large samples, the distribution can be approximated

Sampling from a Binomial Consider taking a random sample of 10 people to you have administered the earlier described Stress Scale We assume that the distribution of the Stress Scale is Binomial(10,0.2) From what we have just done we know that, if we simulate the taking of such sample many times we can plot the resulting statistic and see the distribution of the statistic in this case, that of the sample mean

Distribution of sample mean - 1000 simulations Title distribution of sample mean ; options ps=24 ls=64; data samples; seed=25487; nsim = 1000; nsam=10; nquest=10; pi=0.2; do nrun = 1 to nsim; sumx = 0; do i =1 to nsam ; x=ranbin(seed,n,pi); sumx = sumx+x; end; xbar=sumx/nsam; output; end;

Distribution of sample mean (continued) this is a default plot proc means; var xbar; title sampling distribution of sample means ; proc chart; vbar xbar/type=pct space=0; proc gchart; vbar xbar/type=pct space=0;

Statistics Sample statistics nsam Mean Std Dev Minimum Maximum --------------------------------------------------- 10 1.9980000 0.3982510 0.6000000 3.7000000 30 1.9983867 0.2340997 1.1666667 2.9000000 100 1.9984980 0.1279179 1.5600000 2.5600000 --------------------------------------------------- as the sample size increases 1. the standard deviation gets smaller 2. the range gets smaller, and more symmetric

CHART output for sample size 10 Graphical representation of changes with sample size Percentage 10 *** **** 8 ****** ******* 6 ********* ********** 4 *********** ************* 2 *************** ******************* --------------------------- 1.1 1.5 1.9 2.3 2.7 3.1

CHART output for sample size 30 Percentage 12 ** **** 10 ****** ** ****** ** 8 ******** **** ******** **** 6 ** **************** ** **************** 4 ************************ ************************ 2 ****************************** ************************************ ------------------------------------- 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9

CHART output for sample size 100 Percentage 10 * **** 8 ****** ******* 6 ********* ********** 4 ************ ************** 2 ***************** ********************* ------------------------------------ 1.7 1.9 2.1 2.3 2.5

sample size 10- default plot fancier graphs

sample size 30- default plot

sample size 100 - default plot

Distribution of sample mean (continued again) this is a plot with a defined range so that we can compare the output for 10,30,100 proc gchart; vbar xbar/type=pct space=0 midpoints = 0.6 to 3.4 by 0.2;

sample size 10- plot with defined range

sample size 30- plot with defined range

sample 100- plot with defined range can see that plots centre around population mean (2.0)

Conclusions 1. as sample size gets larger variance decreases 2. as sample size gets larger curve looks more symmetric

Distribution of sample mean (more) alternatively use Proc UNIVARIATE s command HISTOGRAM for both the histogram and approximating normal proc univariate; var xbar; histogram /normal(mu = 2.0 sigma = 0.4); where sigma = 0.2309 for n sam = 30 and sigma = 0.1265 for n sam = 100

sample size 10- histogram and theoretical distribution

sample size 30- histogram and theoretical distribution

sample 100- histogranmand theoretical distribution

Conclusions 1. as sample size gets larger curve looks more Normal

Sampling from other distributions 1. Normal - perfect distribution of sample mean is Normal regardless of sample size 2. symmetric, eg, Uniform distribution of sample mean is symmetric (for uniform, tails may be truncated) for smallish samples, distribution is normal approximately 3. asymmetric - continuous counterpart of Binomial like Binomial 3.1 for large sample size, distribution is approximately normal 3.2 for small sample size, approximation to normal is poor

The Central Limit Theorem take sample of size nsam for nsam large enough the distribution of the sample mean will be Normal

The Central Limit Theorem (statistically) sample from (µ,σ 2 ) nsam times for nsam large enough X N(µ,σ 2 /nsam)