Chapter 15: Sampling distributions

Similar documents
NOTES: Chapter 4 Describing Data

AMS7: WEEK 4. CLASS 3

Making Sense of Cents

Math 227 Elementary Statistics. Bluman 5 th edition

Describing Data: One Quantitative Variable

Shifting and rescaling data distributions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 6. The Normal Probability Distributions

Chapter 4. The Normal Distribution

Central Limit Theorem

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 7. Sampling Distributions

STAT Chapter 6: Sampling Distributions

BIOL The Normal Distribution and the Central Limit Theorem

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

The normal distribution is a theoretical model derived mathematically and not empirically.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Chapter 6: The Normal Distribution

appstats5.notebook September 07, 2016 Chapter 5

Chapter 6: The Normal Distribution

Sampling Distribution Models. Copyright 2009 Pearson Education, Inc.

ECON 214 Elements of Statistics for Economists 2016/2017

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

Lecture 2 Describing Data

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Value (x) probability Example A-2: Construct a histogram for population Ψ.

The Central Limit Theorem

Lecture 9. Probability Distributions. Outline. Outline

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

1. Variability in estimates and CLT

Statistical Intervals (One sample) (Chs )

Lecture 9. Probability Distributions

Chapter 7 Study Guide: The Central Limit Theorem

The Normal Approximation to the Binomial Distribution

Part V - Chance Variability

3) Marital status of each member of a randomly selected group of adults is an example of what type of variable?

Sampling Distributions

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

STAT Chapter 6 The Standard Deviation (SD) as a Ruler and The Normal Model

ECON 214 Elements of Statistics for Economists

CHAPTER 5 SAMPLING DISTRIBUTIONS

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Chapter Seven. The Normal Distribution

Section Introduction to Normal Distributions

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Statistics for Business and Economics: Random Variables:Continuous

work to get full credit.

Putting Things Together Part 2

Module 4: Probability

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

1 Describing Distributions with numbers

Review: Population, sample, and sampling distributions

Normal Model (Part 1)

Probability Distribution Unit Review

Sampling Distribution of and Simulation Methods. Ontario Public Sector Salaries. Strange Sample? Lecture 11. Reading: Sections

MidTerm 1) Find the following (round off to one decimal place):

LECTURE 6 DISTRIBUTIONS

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Normal Curves & Sampling Distributions

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Introduction to Statistical Data Analysis II

Math 140 Introductory Statistics

Putting Things Together Part 1

Name PID Section # (enrolled)

Math 243 Lecture Notes

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

AP Statistics Unit 1 (Chapters 1-6) Extra Practice: Part 1

Midterm Exam III Review

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Chapter 5. Sampling Distributions

DATA SUMMARIZATION AND VISUALIZATION

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Unit 2 Statistics of One Variable

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Statistics for Managers Using Microsoft Excel 7 th Edition

2011 Pearson Education, Inc

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Chapter 6 Probability

Chapter 3: Displaying and Describing Quantitative Data Quiz A Name

MA131 Lecture 8.2. The normal distribution curve can be considered as a probability distribution curve for normally distributed variables.

5-1 pg ,4,5, EOO,39,47,50,53, pg ,5,9,13,17,19,21,22,25,30,31,32, pg.269 1,29,13,16,17,19,20,25,26,28,31,33,38

MATH FOR LIBERAL ARTS REVIEW 2

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

22.2 Shape, Center, and Spread

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

STA 320 Fall Thursday, Dec 5. Sampling Distribution. STA Fall

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

STAT 113 Variability

Transcription:

=true true Chapter 15: Sampling distributions Objective (1) Get "big picture" view on drawing inferences from statistical studies. (2) Understand the concept of sampling distributions & sampling variability. (3) Learn the Central Limit Theorem & how to use it. Concept briefs: * Statistical inference = Rigorous, statistically valid, conclusion drawn about sampled data via probability based analysis. * Sampling distribution = Hypothetical quantitative variable whose values consist of the same statistic estimated using different samples. * Central limit theorem for proportions = Sample proportions follow normal model N(p, ) where p=true population parameter, p (1-p)/ n. * Central limit theorem for means = Sample means follow the normal model N(, ) where mean, SD/ n. * Assumptions for which the above theorems hold are very important.

Statistical inference: What & why Illustration A survey of smoking among college students is conducted in 2 different states: IN and CA. The % of smokers in each surveyed sample is CA: 35% IN: 30% Can we conclude that the % of college students who smoke is larger in CA than in IN? Why, or why not? [Statistically significant difference?] Assume both surveys used good sampling methods, with the same sample size (e.g, 2000 people) and identical questions. Illustration 2 A study of dropout rates at 4-year colleges is conducted in the years 2000 and 2004. The mean dropout rates in the samples studied were 2000: 15% 2004: 12% Can we conclude that the % of students who dropped out was lower in 2004 than in 2000? Why, or why not? Again, assume sound sampling methods, equivalent samples, etc. Suppose you have more info. In the smoking study, the margin of error for the IN sample is 3.0%, and for the CA sample 2.5%. IN % 27 30 33 32.5 35 37.5 CA %

Two key strategies of statistical inference that we will study: (1) Finding error margins. [Technical term for this: Confidence intervals] (2) Determining when sampled differences are statistically significant. [Technical term for this: Hypothesis test OR Test of significance] Central to everything we study is the concept of: * Sampling distributions; and * Theorems about sampling distributions.

Illustration of sampling distribution concept * Suppose we want to determine the following two statistics for students currently enrolled at Earlham: (1) What % of them live off-campus (2) The mean GPA for all students * Suppose we randomly sample 100 students to estimate these statistics, and we ask the questions: (1) Do you live off-campus? (Y or N) {Categorical data} (2) What is your GPA? {Quantitative data} and we find: 12.4% students respond "Y" to live off-campus 3.06 is the mean GPA of this sample Q: How close are these estimates to the true population parameters we want? Q: If we pick another random sample of 100 students, how much would these same statistics differ? Q: How can we accommodate sampling differences in our interpretation of statistical data?

Sampling distribution models give a sound theoretical basis for answering these questions. Concept 1: Distributions that consist of sampled statistics. * In the above example, imagine calculating the same two population parameters from a 500 different random samples of size 100 each. * This would give what we call a Sampling Distribution. Concept 2: How to describe & analyze sampling distributions. * We can construct histograms, boxplots, etc. * We can calculate mean, SD, median, IQR... * In the above example, imagine calculating the mean of the % of students who live off-campus, and the mean of the mean GPA! Concept 3: Normal models for sampling distributions. * Histograms of sampling distributions are often symmetric, unimodal. * Normal model applies for analyzing them.

Suppose we have collected data in the above example by asking the following 2 questions (with sample size=10): (1) Do you live off-campus? (Y or N) (2) What is your GPA? Student # Off campus? GPA 1 N 2.81 2 N 3.29 3 Y 3.30 4 N 2.72 5 Y 3.75 6 N 1.91 7 N 2.99 8 N 3.80 9 N 3.27 10 N 2.91 Categorical Such data yields "proportion" type of statistic. Quantitative Such data yields "mean" type of statistic. Distinguish between 3 different distributions here: (1) population GPA, (2) sample GPA, and (3) sampling distribution of mean GPA. Thus, there is a separate value of mean GPA for the population, the sample, and the sampling distribution. Think about it: Suppose you know the shape of the true population distribution for a quantitative variable. [E.g., GPA distribution of students is bimodal with strong left skew.] What shape would you expect the following distributions to have: (1) random sample of size, say, 10% of the population, (2) sampling distribution of mean values within such random samples?

(p^ ( ( (p Central limit theorem for "proportion" type statistics * This refers to % type statistics - typically comes from categorical variables. * We always convert proportions to fractions (so 100% becomes 1). * Note that there are only 2 statistics we can have here: The proportion & 1 - The proportion. * Notation: p^ = Proportions estimated from different samples. (i.e., p^ denotes the sampling distribution). p = True proportion (i.e., population parameter) that we want. n = size of the samples. ) = mean of the sampling distribution of p^. (p^ ) = standard deviation of the sampling distribution of p^. q = 1 - p The sampling distribution of any proportion follows the normal model with mean p^ ) = p, and p^ ) = q / n). * Must satisfy the following conditions for this result to hold: (1) The samples must be independent. In practice, this holds if sample is random and n < 10% of population. (2) The samples must be sufficiently large in size. In practice, this holds if: np > 10 and nq > 10.

( ( Central limit theorem for "means" * The above result for proportions can be generalized to sample means. * Notation: y_ = Mean values estimated from different samples. n = size of the samples. * This is summarized in the classic Central Limit Theorem of statistics: The sampling distribution of any statistical mean follows the normal model with mean y_ ) = true mean of the population, & y_ ) = (true SD of population) / n. Must satisfy the following conditions for this result to hold: (1) Independent samples. (In practice: random and n < 10% of the population) (2) Large enough samples -- i.e., adequately large n. There is no simple rule of thumb to verify this in practice.

3.86? = 3.86) above The Normal model as a probability model Recall: Probability = Relative frequency (over the long-term). The normal model, as we've used it previously, tells us what % of a distribution lies how many SD's from the mean. 68% 95% Normal model for probabilities: * Simply convert % to fractions and treat as probability. * Use z-tables as normal probability distribution tables. * Note that total probability = total area under normal curve = 1.0 E.g: Suppose the GPA distribution of students is approximately normal, with = 3.06; 0.4. What is the probability that a randomly selected student has GPA 2 Ans: 3.86 is exactly the mean. So, P(GPA = [1-0.95] / 2 = 0.025.

0 z Exercise 28, pg. 422 Solution: * Q. pertains to sampling distribution of a proportion. * Can apply CLT if we check conditions & verify they're satisfied: (1) Independent sample? Check random & n < 10% of population. Random: True, since 100 students are randomly picked. n < 10%: True, if we assume 100 less than 10% of students on campus. (2) Large enough? Check whether np > 10 and nq > 10. p=0.3, q=0.7. So: np = 30 > 10, nq = 70 > 10. * Thus, normal model would be appropriate for the sampling distribution of this proportion. From the CLT, its mean and SD would be: pq. 3, and n Model is: N(0.3, 0.0458) (b) Find z-score for p^ = 1/3 ( 0. 3)( 0. 7) 100 0. 0458 = [1/3-0.3] / 0.0458 = 0.7278. Lookup z-table & find area for z > 0.7278. We get: 1-0.7673 = 0.2327. Answer: Probability that in this sample more than 1/3 wear contacts is 0.2327. Exercise 36, pg. 423 Strategy: * Check conditions for normal model for proportions. * Identify the "proportion" of interest: % of children with genetic condition. * Identify p; find mean & SD of normal model; and sketch a rough graph. * We want to find 20 subjects out of 732 ---> p^ = 20/732 =.0273 * Interpret Q. as: "What % of samples have p^ larger than.0273"? * Find z-score corresponding to p^ =.0273 * Lookup standard normal table & find area above this z-score. This is the probability of finding enough subjects for the study.

=$20, Exercises 14 & 16, pg. 421 Strategy for 14: [part (C) only] * Assume 50 is large enough to meet the conditions of normal models for means * Find the mean & SD of normal model (using Central Limit Theorem (CLT) with true mean=$32, true SD=$20, and n=50) ). * Find z-score corresponding to y _ =$40 (carefully, using the right * Lookup standard normal table & find area above this z-score. Strategy for 16 (a): * Find mean purchase per customer from given total rev. & number of cust. * Use CLT with=$32, and n=312 to construct normal model. * Find z-score for the mean purchase per customer (watch the you use). * Lookup standard normal table & find area above this z-score. Strategy for 16 (b): * Recognize that 10% of worst days correspond to y _ values in the bottom 10% of the normal distribution curve (of the means). * Find y _ for this z-score (being careful to use the right * Total rev. = 312 x y _. ). * Lookup 0.10 in the standard normal table & find corresponding z-score.