Review: Population, sample, and sampling distributions

Similar documents
Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter 9: Sampling Distributions

Value (x) probability Example A-2: Construct a histogram for population Ψ.

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Making Sense of Cents

The Two-Sample Independent Sample t Test

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistics 431 Spring 2007 P. Shaman. Preliminaries

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

ECON 214 Elements of Statistics for Economists 2016/2017

Sampling and sampling distribution

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Continuous Probability Distributions & Normal Distribution

Confidence Intervals and Sample Size

Review of the Topics for Midterm I

Law of Large Numbers, Central Limit Theorem

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Statistical Methods in Practice STAT/MATH 3379

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

One sample z-test and t-test

Chapter 5. Sampling Distributions

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Chapter 7: Point Estimation and Sampling Distributions

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Confidence Intervals Introduction

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

The Binomial Distribution

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Statistics & Statistical Tests: Assumptions & Conclusions

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 15: Sampling distributions

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

1/12/2011. Chapter 5: z-scores: Location of Scores and Standardized Distributions. Introduction to z-scores. Introduction to z-scores cont.

Statistics 13 Elementary Statistics

Stat 213: Intro to Statistics 9 Central Limit Theorem

The Binomial Distribution

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Copyright 2005 Pearson Education, Inc. Slide 6-1

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Central Limit Theorem 11/08/2005

Probability. An intro for calculus students P= Figure 1: A normal integral

The Binomial Probability Distribution

Math 361. Day 8 Binomial Random Variables pages 27 and 28 Inv Do you have ESP? Inv. 1.3 Tim or Bob?

Some Characteristics of Data

Lecture 9. Probability Distributions. Outline. Outline

Data Analysis and Statistical Methods Statistics 651

5.7 Probability Distributions and Variance

Central Limit Theorem (cont d) 7/28/2006

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Lecture 9. Probability Distributions

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

The Central Limit Theorem

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Central Limit Theorem

ECON 214 Elements of Statistics for Economists

Normal Model (Part 1)

Chapter 8 Statistical Intervals for a Single Sample

E509A: Principle of Biostatistics. GY Zou

1. Variability in estimates and CLT

4.2 Probability Distributions

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

ECE 295: Lecture 03 Estimation and Confidence Interval

Week 1 Quantitative Analysis of Financial Markets Distributions B

Chapter 7 - Lecture 1 General concepts and criteria

The Assumption(s) of Normality

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 5: Statistical Inference (in General)

Stat 139 Homework 2 Solutions, Fall 2016

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Chapter Four: Introduction To Inference 1/50

Chapter 5. Statistical inference for Parametric Models

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary.

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

Mean GMM. Standard error

STA Module 3B Discrete Random Variables

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Terms & Characteristics

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

7.1 Comparing Two Population Means: Independent Sampling

Transcription:

Review: Population, sample, and sampling distributions A population with mean µ and standard deviation σ For instance, µ = 0, σ = 1 0 1 Sample 1, N=30 Sample 2, N=30 Sample 100000000000 InterquartileRange = 1.25 InterquartileRange = 1.7 InterquartileRange = 0.65 The sampling distribution of the interquartile range for samples of size N = 30

How s your IQ? Suppose the population IQ is a normal distribution with mean 100 and standard deviation 20. The mean IQ in this class, 23 students, is 130. Should we reject the null hypothesis that this class is no different in IQ from the population?

The Logic Our result: R = 130 Assume Ho: Π = 100, this class is a random sample drawn from the population of people with mean IQ 100 If the result is very unlikely under Ho, if Pr(R=130 Π = 100) α, then we are inclined to reject Ho. Pick a value of α (say,.01) and calculate the conditional probability p = Pr(R=130 Π = 100) Our residual uncertainty that Ho might be right is less than or equal to α

Calculate p = Pr(R=130 Π = 100) Find the sampling distribution of R for N = 23 Since we know the population parameters (normal, mean = 100, standard deviation = 20) we can get the sampling distribution by Monte Carlo sampling: (defun sampling-distribution (n mean std k) "N is the sample size, MEAN and STD are the parameters of a normal distribution, K is the size (number of samples) of the sampling distribution." (loop repeat k collect (mean (sample-normal-to-list mean std n)))) 30 20 10 90 95 100 105 110

Calculate p = Pr(R=130 Π = 100) Find the sampling distribution of R for N = 23 Since we know the population parameters (normal, mean = 100, standard deviation = 20) we can get the sampling distribution by Monte Carlo sampling: 30 20 10 90 95 100 105 110 130 The probability of getting a sample of size 23 with mean 130 by random sampling from a population with mean 100 and standard deviation 20 is virtually zero.

Another way to write the code: (defun sampling-distribution (n mean std r k) (loop repeat k counting (> (mean (sample-normal-to-list mean std n)) r))) (sampling-distribution 23 100 20 130 1000) => 0

Parametric statistical inference Testing hypotheses by simulating the process of sampling is cool but not always necessary The probability of tossing 15 heads in 20 with a fair coin can be worked out exactly The probability that a sample from a population has a particular mean can be estimated However, theory tells us about the sampling distributions of very few statistics; for the rest, simulation works great

Central Limit Theorem The sampling distribution of the mean of samples of size N drawn from a population with mean µ and standard deviation σ approaches a normal distribution with mean µ and standard deviation σ / N as N becomes large Good news! We know the sampling distribution of the mean and can estimate the probability of sample results!

The Logic Our result: R = 130 Assume Ho: Π = 100, this class is a random sample drawn from the population of people with mean IQ 100 If the result is very unlikely under Ho, if Pr(R=130 Π = 100) α, then we are inclined to reject Ho. Pick a value of α (say,.01) and calculate the conditional probability p = Pr(R=130 Π = 100) The sampling distribution of the mean approaches a normal distribution with mean = 100 and std = 20 / 23 = 4.17 So our sample result is 30 / 4.17 = 7.2 standard deviations above the mean of the sampling distribution!

Standard error: The standard deviation of the sampling distribution Standard Error of the Mean under Ho: Π = 100, the sampling distribution is normal, its mean is 100, its standard deviation is 20 / 23 = 4.17 The standard error is 4.17 The sample result is 4.17 standard error units above the mean under Ho 100 104 130 99% of a normal distribution lies within two standard deviations of the mean. How probable is our sample result?

Try it again with a less extreme result Our result: R = 108 Assume Ho: Π = 100, this class is a random sample drawn from the population of people with mean IQ 100 If the result is very unlikely under Ho, if Pr(R=108 Π = 100) α, then we are inclined to reject Ho. Pick a value of α (say,.01) and calculate the conditional probability p = Pr(R=108 Π = 100) The sampling distribution of the mean approaches a normal distribution with mean = 100 and std = 20 / 23 = 4.17 So our sample result is 8 / 4.17 = 1.92 standard errors above the mean of the sampling distribution.

p values s.e. under Ho: Π = 100, the sampling distribution is normal, its mean is 100, its standard deviation is 20 / 23 = 4.17 The sample result, R=108, is 1.92 standard error units above the mean under Ho. 100 104 108 Now it isn t so obvious that we should reject Ho. How can we find p = Pr(R=108 Π = 100)? State the result in standard error units and look up its probability in a table.

p values s.e. The sample result, R=108, is 1.92 standard error units above the mean under Ho. 100 104 108

Standardizing subtract the mean, divide by the standard error s.e. under Ho: Π = 100, the sampling distribution is normal, its mean is 100, its standard deviation is 20 / 23 = 4.17, and the sample result is 108 100 104 108 s.e. under Ho: Π = 0, the sampling distribution is normal, its mean is 0, its standard deviation is 1.0, the sample result is (108-100) / (20 / 23) = 1.92 0 1 1.92

Z scores or standard scores subtract the mean, divide by the standard error s.e. Z = x µ s.e. = x µ σ / N 100 104 108 s.e. 108-100 4.17 = 1.92 0 1 1.92

The Z test Z = x µ s.e. Z is the number of standard error units the sample mean is from the mean of the sampling distribution under the null hypothesis. 1.92 = 8 100 20 / 23 If Z 1.645 then the sample result has p.05 probability given the null hypothesis If Z 1.96 then the sample result has p.01 probability given the null hypothesis

The Z test Our result: R = 108 Assume Ho: Π = 100, this class is a random sample drawn from the population of people with mean IQ 100 If the result is very unlikely under Ho, if Pr(R=108 Π = 100) α, then we are inclined to reject Ho. Pick a value of α (say,.01) and calculate the conditional probability p = Pr(R=108 Π = 100) The sampling distribution of the mean approaches a normal distribution with mean = 100 and std = 20 / 23 = 4.17 So our sample result is 8 / 4.17 = 1.92 standard errors above the mean of the sampling distribution Equivalently, Z = (108-100) / 4.17 = 1.92 p = Pr(R=108 Π = 100) = Pr(Z).0274, α =.01, do not reject Ho.

You do it: A sample of size 25 has mean 8. Test the hypothesis that the sample is drawn from a population with mean 12, standard deviation 10.

You do it: A sample of size 25 has mean 8. Test the hypothesis that the sample is drawn from a population with mean 12, standard deviation 10. Z = 8-12 10 / 25 = 2

Central limit theorem demo (loop repeat 1000 collect (mean (sample-from-population n)))) Std(population) = 1.11 s.e.(20) = 1.11 / 20 =.248 s.e.(30) = 1.11 / 30 =.203 s.e.(50) = 1.11 / 20 =.157 600 500 400 300 200 100 Population 100 50 N=20 Std =.25 90 80 70 60 50 40 30 20 10 N=30 Std =.21 60 50 40 30 20 10 N=50 Std =.16-10 -9-8 -7-6 -5-4 -3-2 -1 0 1 VAR Histogram OF Var[Dataset-3] -1.5-1 -0.5 0-1.4-1.1

Three components of all test statistics Effect size Z = x x = x N background variance sample size You can make any Z score significant with a big enough sample, but you shouldn t. Always try to control variance before increasing N.

Parametric and computer-intensive hypothesis testing std under Ho: Π = 100, the mean of sampling distribution is 100, the standard deviation is 20 / 23 = 4.17 100 104 130 30 20 Empirically (by simulation) this distribution has a mean of 100.05 and a standard deviation of 4.38 10 90 95 100 105 110 130

We do not know the sampling distribution of most statistics but we can estimate them empirically! (defun sampling-distribution (n mean std k) "N is the sample size, MEAN and STD are the parameters of a normal distribution, K is the size (number of samples) of the sampling distribution." (loop repeat k collect (mean (sample-normal-to-list mean std n)))) median interquartile-range trimmed-mean median-divided-by-mom s-age

Some issues for parametric and computer-intensive tests Z is fine if you know σ, (recall, z = (x - µ ) / (σ / n)) but what if you don t? Estimate σ from s and for smaller samples run t tests. Monte Carlo tests are fine if you know the parameters of the population from which samples are drawn, but what if you don t? Estimate these parameters from the sample and run bootstrap or randomization tests.