HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Similar documents
Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Homework: (Due Wed) Chapter 10: #5, 22, 42

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Part V - Chance Variability

AMS7: WEEK 4. CLASS 3

Chapter 9 Chapter Friday, June 4 th

CH 5 Normal Probability Distributions Properties of the Normal Distribution

STAT Chapter 7: Confidence Intervals

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 5. Sampling Distributions

Confidence Intervals and Sample Size

Chapter 15: Sampling distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Chapter 10 Estimating Proportions with Confidence

Math 140 Introductory Statistics. Next midterm May 1

Examples of continuous probability distributions: The normal and standard normal

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Chapter 9: Sampling Distributions

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 9 & 10. Multiple Choice.

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

The Normal Approximation to the Binomial

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

Lecture 12: The Bootstrap

Review: Population, sample, and sampling distributions

Random Sampling & Confidence Intervals

1. Variability in estimates and CLT

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Sampling Distributions

Chapter 7. Sampling Distributions

STA Module 3B Discrete Random Variables

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

Statistics 13 Elementary Statistics

Data Analysis and Statistical Methods Statistics 651

Midterm Exam III Review

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Chapter 8 Estimation

Sampling Distributions For Counts and Proportions

Section 7-2 Estimating a Population Proportion

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

FINAL REVIEW W/ANSWERS

ECON 214 Elements of Statistics for Economists 2016/2017

HUDM4122 Probability and Statistical Inference. March 4, 2015

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Module 4: Point Estimation Statistics (OA3102)

Density curves. (James Madison University) February 4, / 20

Sampling and sampling distribution

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Estimation. Focus Points 10/11/2011. Estimating p in the Binomial Distribution. Section 7.3

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

Section Sampling Distributions for Counts and Proportions

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

AP Stats. Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Chapter Four: Introduction To Inference 1/50

MidTerm 1) Find the following (round off to one decimal place):

4.2 Probability Distributions

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

1 Sampling Distributions

CHAPTER 5 SAMPLING DISTRIBUTIONS

Statistical Intervals (One sample) (Chs )

Statistical Methods in Practice STAT/MATH 3379

Standard Normal, Inverse Normal and Sampling Distributions

2011 Pearson Education, Inc

Chapter 8: Binomial and Geometric Distributions

Chapter 9. Sampling Distributions. A sampling distribution is created by, as the name suggests, sampling.

What was in the last lecture?

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

The binomial distribution p314

Lecture 9. Probability Distributions. Outline. Outline

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

The Binomial Probability Distribution

The Binomial Distribution

Lecture 9. Probability Distributions

Elementary Statistics Lecture 5

MATH 10 INTRODUCTORY STATISTICS

The Binomial Distribution

Chapter 6: Random Variables

Math 14 Lecture Notes Ch The Normal Approximation to the Binomial Distribution. P (X ) = nc X p X q n X =

Part 10: The Binomial Distribution

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

ECON 214 Elements of Statistics for Economists

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

A.REPRESENTATION OF DATA

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

Transcription:

This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Understanding Sampling Distributions: Statistics as Random Variables ANNOUNCEMENTS: Shandong Min will give the lecture on Friday. See website for different office hours Fri, Mon, Tues. New use of clickers: to test for understanding. I will give many more clicker questions, and randomly five to count for credit each week. Homework from today and Friday is due Monday, Nov 8. Homework to be assigned Monday is not due. Midterm in one week. You are allowed two sheets of notes. HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Chapters 9 to 13: Statistical Inference See picture drawn on board. Five situations we will cover for the rest of this quarter: For each parameter we will: Learn how to find a confidence interval for its true value Test hypotheses about its true value

EXAMPLES OF EACH OF THE 5 SITUATIONS One proportion: Binomial situation with n and p Question: What proportion of households watched Dancing with the Stars the week of Oct 18? Get a confidence interval. Population parameter: p = proportion of the population of all US households that watched it. Sample statistic: Nielsen ratings measure n = 25,000 households. X = number in sample who watched the show = 3,075. pˆ X n 3, 075 25,000 = = =.123 = proportion of sample who watched This is called p-hat.

Difference in two proportions: Compare two population proportions using independent samples of size n 1 and n 2. Question: What is the difference in the proportion of smokers who would quit if wearing a nicotine patch versus placebo? Get a confidence interval for the population difference. Test to see if it is statistically significantly different from 0. Population parameter: p 1 p 2 = population difference in proportions who would quit if everyone were to use each type of patch (nic.-plac.) Sample statistic: Difference in the proportions in the sample who did quit pˆ pˆ =.46.20 =.26 1 2 This is read as p-one-hat minus p-two-hat Note that the parameter and statistic can range from 1 to +1.

One mean: Population mean for a quantitative variable. Question: An airline would like to know the average weight of checked luggage per passenger, for fuel calculations. Get a confidence interval for the population mean. There is no logical value to test, so we would not do a test. Population parameter: µ = mean weight of the luggage for the population of all passengers who check luggage. Sample statistic: Collect a sample of n observations. For instance, suppose they sample 100 passengers and find the mean is 30 pounds. x = 30 = the mean for the sampleof 100 passengers Remember this is read as x-bar

Mean for paired differences: Population mean for the difference in two quantitative measurements in a matched pairs situation. Question: How much different on average would IQ be after listening to Mozart compared to after sitting in silence? Population parameter: µ d = population mean for the difference in IQ if everyone in the population were to listen to Mozart versus silence. Sample statistic: For the experiment done with n = 36 UCI students, the mean difference was 9 IQ points. d = 9 = the mean difference for the sample of 36 students Read as d-bar.

Difference in two means: Comparing two population means when independent samples of size n 1 and n 2 are available. Question: What is the difference in mean IQ of 4-year-old children for the population of mothers who smoked during pregnancy and the population who did not? Get a confidence interval for the difference. Test to see if the difference is stat signif. different from 0. Population parameter: µ 1 µ 2 = difference in the means for the two populations Sample statistic: Based on a study done at Cornell, the difference in means for two samples was 9 IQ points. x1 x2 = difference in the means for the two samples= 9 Read as x-bar-one minus x-bar-two.

GOAL: Estimate and test parameters based on statistics. Get confidence intervals and do hypothesis tests SOME LOGICAL NOTES: 1. Assuming the sample is representative of the population, the sample statistic should represent the population parameter fairly well. (Better for larger samples.) 2. But the sample statistic will have some error associated with it, i.e. it won t equal the parameter exactly. Recall the margin of error from Chapter 3! 3. If repeated samples are taken from the same population and the sample statistic is computed each time, these sample statistics will vary but in a predictable way, i.e. they will have a distribution. It is a pdf for the statistic. It is called a sampling distribution for the statistic.

Rationale: RATIONALE AND DEFINITION FOR SAMPLING DISTRIBUTIONS When a sample is taken from a population the resulting numbers are the outcome of a random circumstance. Dancing with the Stars example: A random circumstance is taking a random sample of 25,000 households with TVs. The resulting number is the proportion of those households that were watching Dancing with the Stars that week =.123 (or 12.3%)

Remember that a random variable is a number associated with the outcome of a random circumstance, which can change each time the random circumstance occurs. Example: For each different sample of 25,000 households that week, we would have had a different sample proportion (sample statistic) watching the show. Therefore, a sample statistic is a random variable. Therefore, a sample statistic has a pdf associated with it. The pdf of a sample statistic can be used to find the probability that the sample statistic will fall into specified intervals when a new sample is taken.

Definition: The pdf of a sample statistic is called the sampling distribution for that statistic. Example: The random variable is ˆp = sample proportion = sample statistic. The pdf of ˆp will be defined next. It is the distribution of possible sample proportions in this scenario. We already know the pdf for X = number of households out of 25,000 that are watching the show. It is binomial with n = 25,000 and p = true proportion of households in US that watched.

Familiar example: Suppose 48% (p = 0.48) of a population supports a candidate. In a poll of 1000 randomly selected people, what do we expect to get for the sample proportion pˆ who support the candidate in the poll? In the last few lectures, we looked at the pdf for X = the number who support the candidate. X was binomial, and also X was approx. normal with mean = 480 and s.d. = 15.8. Now let s look at the pdf for the proportion who do. pˆ = n X where X is a binomial random variable. We have seen picture of possible values of X. Divide all values by n to get picture for possible pˆ.

PDF for x = number of successes PDF for ˆp = proportion of successes Probability for each possible value of X Plot of possible number who support candidate and probabilities Binomial, n=1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000 420 440 460 480 500 520 540 Values for number of successes X (number who support candidate) Probability for each possible value of p-hat Plot of possible proportion who support candidate, with probabilities Binomial, n=1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50.52 Values for proportion of successes p-hat.54 What s different and what s the same about these two pictures? Everything is the same except the values on the x-axis! On the left, values are numbers 0, 1, 2, to 1000 On the right, values are proportions 0, 1/1000, 2/1000, to 1.

Recall the normal approximation for the binomial: For a binomial random variable X with parameters n and p (with np and n(1 p) at least 5) X is approximately a normal random variable with: mean µ = np standard deviation σ = np(1 p) NOW: Divide everything by n to get similar result for p ˆ ˆp is approximately a normal random variable with: = X n mean µ = p standard deviation σ = s.d.( pˆ ) = p(1 p) n So, we can find probabilities that ˆp will be in specific intervals if we know n and p.

The Sampling Distribution for a Sample Proportion pˆ 1. The physical situation: binomial. Actual population with fixed proportion w/trait or opinion (e.g. polls, TV ratings, etc.) OR A repeatable situation with fixed probability of a certain outcome (e.g. birth is a boy, probability of heart attack if one takes aspirin) 2. The Experiment Random sample of n from the population, pˆ = proportion w/trait OR Repeat situation n times, pˆ = proportion with specified outcome 3. Sample size requirement: In either case, must have np and n(1- p) at least 5, prefer at least 10.

Assuming the above conditions are met, the distribution of possible values of pˆ is approximately normal with: mean µ = p standard deviation σ = The resulting normal distribution is called the sampling distribution of ˆp p ( 1 p) n Notation: p ( 1 p) s.d.( pˆ ) = standard deviation of pˆ = n But suppose p is unknown (which is will be if we are estimating it!) Then instead we approximate the s.d. using p ˆ(1 pˆ ) s.e.( pˆ ) = standard error of pˆ = n = estimate of the standard deviation of pˆ

This result is also called the normal curve approximation rule for sample proportions For the poll example: Poll of n = 1000 people, where the true population proportion p = 0.48. The distribution of possible values of pˆ is approximately normal with mean µ = p =0.48 and s.d. σ = p ( 1 p) n. 48(1.48) = 1000 = 0.0158

Actual (tiny rectangles) Normal approximation (smooth) Plot of possible proportion who support candidate, with probabilities Binomial, n=1000, p=0.48 Approproximate distribution of p-hat Normal, Mean=0.48, StDev=0.0158 Probability for each possible value of p-hat 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50.52 Values for proportion of successes p-hat.54 Density 25 20 15 10 5 0 0.42 0.44 0.46 0.48 0.50 Possible values of p-hat 0.52 0.54 For example, to find the probability that ˆp is at least 0.50: Could add up areas of rectangles from.501,.502,, 1000 but that would be too much work! P( ˆp > 0.50) 0.50.48 Pz ( > ) = Pz ( > 1.267) =.103.0158

Going back to the Big Picture The sampling distribution for ˆp describes the distribution of possibilities for it if we were to take millions of samples of size n and compute ˆp each time. It tells us what ranges we can expect ˆp to fall in, and with what probability. To find the sampling distribution, we would need to know the true value of the parameter p. In practice, we don t know the true value of the parameter p. In fact the whole point of statistical inference is to estimate the parameter, or test for possible values of it.

BUT, the standard deviation (or standard error) of the sampling distribution tells us how far the sample statistic is likely to fall from the parameter p, even if we don t know what that value of p is. For example, in our poll of n = 1000, we know that the standard deviation of ˆp is about.0158 (or.016). So, (from the Empirical Rule) we know that for approximately 68% of all samples ˆp will be within one standard deviation =.016 of the true parameter p. We can use that to estimate p! For instance, if ˆp is 0.45, we can be 68% certain that the true p is somewhere in the range of 0.45 ±.016 or between 0.434 and 0.466.

PREPARING FOR THE REST OF CHAPTER 9 For all 5 situations we are considering, the sampling distribution of the sample statistic: Is approximately normal Has mean = the corresponding population parameter Has standard deviation that involves the population parameter(s) and thus can t be known without it (them) Has standard error that doesn t involve the population parameters and is used to estimate the standard deviation. Has standard deviation (and standard error) that get smaller as the sample size(s) n get larger. Summary table on pages 382-383 will help you with these!

New Example In 2005, according to the Census Bureau, 67% of all children in the United States were living with 2 parents. (Includes step-parents and adoptive parents, but not foster parents.) In our class, there are about 180 of you who participate in clicker questions. Are you a representative sample for this question? If so, what should we expect the class proportion to be? n = 180 p =.67 The sampling distribution of ˆp is approximately normal with mean =.67 and standard deviation = (.67)(.33) 180 =.035

Clicker question (not for credit, answers anonymous) In 2005, were you living with 2 parents? (Step parents and adoptive parents count, but foster parents do not.) A. Yes B. No 12 Sampling distribution of p-hat for n=180, p=.67 Normal, Mean=0.67, StDev=0.035 10 8 Density 6 4 2 0 0.565 0.600 0.635 0.670 0.705 Possible values of p-hat 0.740 0.775