The Idea of a Confidence Interval

Similar documents
CHAPTER 8 Estimating with Confidence

Estimating Proportions with Confidence

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

Statistics for Economics & Business

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

A point estimate is the value of a statistic that estimates the value of a parameter.

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Chapter 8: Estimation of Mean & Proportion. Introduction

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Math 124: Lecture for Week 10 of 17

Topic-7. Large Sample Estimation

. (The calculated sample mean is symbolized by x.)

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

Introduction to Probability and Statistics Chapter 7

B = A x z

ii. Interval estimation:

Confidence Intervals Introduction

Lecture 4: Probability (continued)

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Chapter 8 Interval Estimation. Estimation Concepts. General Form of a Confidence Interval

Sampling Distributions and Estimation

BASIC STATISTICS ECOE 1323

CHAPTER 8 CONFIDENCE INTERVALS

ST 305: Exam 2 Fall 2014

1. Find the area under the standard normal curve between z = 0 and z = 3. (a) (b) (c) (d)

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

Sampling Distributions & Estimators


Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

Sampling Distributions and Estimation

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

Lecture 5 Point Es/mator and Sampling Distribu/on

Lecture 5: Sampling Distribution

Parametric Density Estimation: Maximum Likelihood Estimation

SCHOOL OF ACCOUNTING AND BUSINESS BSc. (APPLIED ACCOUNTING) GENERAL / SPECIAL DEGREE PROGRAMME

14.30 Introduction to Statistical Methods in Economics Spring 2009

1 Random Variables and Key Statistics

Using Math to Understand Our World Project 5 Building Up Savings And Debt

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

Section 3.3 Exercises Part A Simplify the following. 1. (3m 2 ) 5 2. x 7 x 11

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

5. Best Unbiased Estimators

5 Statistical Inference

Statistics for Business and Economics

Monetary Economics: Problem Set #5 Solutions

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

Maximum Empirical Likelihood Estimation (MELE)

AY Term 2 Mock Examination

Introduction to Statistical Inference

Unbiased estimators Estimators

Non-Inferiority Logrank Tests

The material in this chapter is motivated by Experiment 9.

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

1 Estimating the uncertainty attached to a sample mean: s 2 vs.

MATH : EXAM 2 REVIEW. A = P 1 + AP R ) ny

Systematic and Complex Sampling!

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Chapter 17 Sampling Distribution Models

ISBN Copyright 2015 The Continental Press, Inc.

Elementary Statistics and Inference. Elementary Statistics and Inference. Chapter 20 Chance Errors in Sampling (cont.) 22S:025 or 7P:025.

Notes on Expected Revenue from Auctions

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

APPLICATION OF GEOMETRIC SEQUENCES AND SERIES: COMPOUND INTEREST AND ANNUITIES

When you click on Unit V in your course, you will see a TO DO LIST to assist you in starting your course.

4.5 Generalized likelihood ratio test


Chapter 10 Statistical Inference About Means and Proportions With Two Populations. Learning objectives

0.07. i PV Qa Q Q i n. Chapter 3, Section 2

Calculation of the Annual Equivalent Rate (AER)

CAPITAL PROJECT SCREENING AND SELECTION

1 The Power of Compounding

Control Charts for Mean under Shrinkage Technique

CHAPTER 2 PRICING OF BONDS

Chapter 5: Sequences and Series

Data Analysis and Statistical Methods Statistics 651

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

1 Estimating sensitivities

FOUNDATION ACTED COURSE (FAC)

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

Models of Asset Pricing

Models of Asset Pricing

Confidence Intervals based on Absolute Deviation for Population Mean of a Positively Skewed Distribution

NPTEL DEPARTMENT OF INDUSTRIAL AND MANAGEMENT ENGINEERING IIT KANPUR QUANTITATIVE FINANCE END-TERM EXAMINATION (2015 JULY-AUG ONLINE COURSE)

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Appendix 1 to Chapter 5

Class Sessions 2, 3, and 4: The Time Value of Money

Just Lucky? A Statistical Test for Option Backdating

Overlapping Generations

INTERVAL GAMES. and player 2 selects 1, then player 2 would give player 1 a payoff of, 1) = 0.

of Asset Pricing R e = expected return

Transcription:

AP Statistics Ch. 8 Notes Estimatig with Cofidece I the last chapter, we aswered questios about what samples should look like assumig that we kew the true values of populatio parameters (like μ, σ, ad p ). I practice, we usually do t kow what the populatio actually looks like ad we do t have the resources to take a to of differet samples. Our goal is to try to estimate a parameter based o a sigle sample. A poit estimator is a statistic that provides a estimate of a populatio parameter. The value of that statistic from a sample is called a poit estimate. Ideally, a poit estimate is our best guess at the value of a ukow parameter. We like our poit estimators to be ubiased ad have low variability. For example, we usually use x as a poit estimator of μ. The value of x from a give sample is our best guess for the true value of μ. Similarly, we use ˆp as a poit estimator of p. Example: I each of the followig settigs, determie the poit estimator you would use ad calculate the value of the poit estimate. a) The makers of a ew golf ball wat to estimate the media distace the ew balls will travel whe hit by a mechaical driver. They select a radom sample of 10 balls ad measure the distace each ball travels after beig hit by the mechaical driver. Here are the distaces (i yards): 285 286 284 285 282 284 287 290 288 285 b) The math departmet wats to kow what proportio of its studets ow a graphig calculator, so they take a radom sample of 100 studets ad fid that 28 ow a graphig calculator. The Idea of a Cofidece Iterval While it s great to have a sigle umber to use as a estimate, we expect the poit estimate to be differet from sample to sample, ad we do t expect the parameter to be exactly equal to the poit estimate. We ca use what we kow about the samplig distributio of the statistic to give a iterval of plausible values for the parameter based o the value of our poit estimate. A C% cofidece iterval gives a iterval of plausible values for a parameter. The iterval is calculated from the sample data ad has the form poit estimate ± margi of error The differece betwee the poit estimate ad the true parameter value will be less tha the margi of error i about C% of all possible samples. The cofidece level C gives the overall success rate of the method used to calculate the cofidece iterval. That is, i C % of all possible samples, the iterval computed from the sample data with capture the true parameter value.

If we take 25 samples of the same size from the same populatio, each will give us a differet poit estimate, x (the dot i the middle). Based o the x ad what we kow about the samplig distributio, we ca calculate a cofidece iterval (the lie segmet) that we believe captures the true value of μ. This is a example of a 95% cofidece iterval. I most of the samples, we eded up with a iterval that does cotai μ, but i oe ulucky sample, we missed. I the log ru, we expect that 95% of all our samples will give us itervals that capture μ. Costructig a Cofidece Iterval The cofidece iterval for estimatig a populatio parameter has the form statistic ± critical value stadard deviatio of statistic ( ) ( ) where the statistic we use is the poit estimator for the parameter. A cofidece iterval for μ might look similar to x ± 2 σ x A cofidece iterval for p might look similar to pˆ ± 3 σ pˆ The secod half of the formula for cofidece itervals is called the margi of error. Margi of error = critical value stadard deviatio of statistic. ( ) ( ) The margi of error of a estimate describes how far, at most, we expect the estimate to vary from the true value of the parameter. That is, i a C% cofidece iterval, the distace betwee the poit estimate ad the true parameter value will be less tha the margi of error i C% of all samples. The critical value depeds o the cofidece level C ad the shape of the samplig distributio. It tells us withi how may stadard deviatios of the true parameter value we expect the statistic to be i C% of samples. For example, if the statistic is Normally distributed, it should follow the 68-95-99.7 rule. We expect the value of the statistic to be withi about 2 stadard deviatios of the parameter value i 95% of samples, which also meas that we expect the ukow parameter value to be withi 2 stadard deviatios of the statistic i 95% of samples. For this reaso, we use a critical value of about 2 for a 95% cofidece iterval. (It s actually 1.960 for proportios ad depeds o the sample size for meas, but we ll talk about that later!) Iterpretig Cofidece Levels ad Cofidece Itervals WARNING: Misiterpretig cofidece itervals is oe of the most commo mistakes made o the AP test! Treat this as a fill-i-the-blak. Memorize the wordig. Do t try to use your ow wordig, ad do t add additioal iterpretatio. Also, pay attetio to whether you were asked to iterpret the cofidece level or the cofidece iterval. Every time you costruct a cofidece iterval, you should iterpret it, but you should oly iterpret the cofidece level if the problem asks you to.

Cofidece Level: If we to repeat this samplig procedure/experimet may times ad costruct a C% cofidece iterval each time, the about C% of the resultig itervals would capture the true [parameter i cotext]. (Specify what the parameter is: the mea weight of domestic cats, the proportio of all U.S. adults with a certai opiio, etc.) Cofidece Iterval: We are C% cofidet that the iterval from to captures the true [parameter i cotext]. 95% cofidet is shorthad for We got these umbers with a method that gives correct results 95% of the time. I practice, we wo t kow whether our sample is oe of the lucky 95% that captures the true value or oe of the ulucky 5%. DO NOT DO THIS! THESE ARE WRONG! DON T EVEN THINK ABOUT USING THIS WORDING! YOU WILL INCUR MS. LINFORD S WRATH (AND HER WRATH IS NOT PLEASANT) IF YOU EVEN CONSIDER PHRASING SOMETHING LIKE THIS! CONSIDER YOURSELF WARNED! There is a 95% probability that the iterval from to captures the actual value of. (Either it does or it does t the probability is either 1 or 0.) 95% of samples have a [statistic value] betwee ad. (The statistic values will be cetered o the parameter value (which is ukow), ot the value of the statistic from your sample.) 95% of [populatio values] fall betwee ad. (We are estimatig a parameter, ot describig idividuals i the populatio or sample. The iterval is based o the samplig distributio of the statistic, ot the populatio distributio. The stadard deviatio of populatio values is very differet from the stadard deviatio of the statistic.) We are 95% cofidet that the iterval from to cotais the sample proportio/mea, etc. (You already kow the value of the sample statistic you just used it to costruct the iterval. It is i the iterval right i the middle! You wat to kow the value of the populatio parameter.) Examples: Accordig to a Pew Research Ceter report published i September 2018, a 95% cofidece iterval for the true 0.54, 0.64. proportio of U.S. tees who have bee the target of cyberbullyig is ( ) a) Iterpret the cofidece iterval. b) Iterpret the cofidece level. c) What is the poit estimate that was used to create the iterval? What is the margi of error? d) The report claims that A majority of tees have bee the target of cyberbullyig. Evaluate this claim based o the cofidece iterval.

How much does the fat cotet of Brad X hot dogs vary? To fid out, researchers measured the fat cotet (i grams) of a radom sample of 10 Brad X hot dogs. A 90% cofidece iterval for the populatio stadard deviatio σ is 2.84 to 7.55 grams. a) Iterpret the cofidece iterval. b) Iterpret the cofidece level. c) Explai what is wrog with each of the followig statemets: I 90% of samples, the stadard deviatio of the fat cotet will be betwee 2.84 ad 7.55 grams. 90% of Brad X hot dogs have a fat cotet betwee 2.84 ad 7.55 grams. There is a 90% probability that the true stadard deviatio of fat cotet for all Brad X hot dogs is betwee 2.84 ad 7.55 grams. We are 90% cofidet that the iterval from 2.84 to 7.55 grams captures the stadard deviatio of fat cotet for the sample of hot dogs. Margi of Error vs. Cofidece Level Why do we settle for 95%? Why ot 99.9% or 100%? If we wat to be more cofidet, we eed a wider iterval so we ca guaratee that it cotais the correct value. If we wat a smaller iterval, we have to settle for beig a little less cofidet. If our iterval gets too wide, it s essetially useless. It does t help much to say, We are 100% cofidet that the iterval from 0 feet to 12 feet captures the true average height of adult males or We are 100% cofidet that the iterval from 0 to 1 captures the true proportio of people who like pizza.

Ways to lower the margi of error: Decrease the cofidece level C. Values of Values of 80% cofidece itervals for µ. The itervals are arrow (low margi of error), but they oly captured the true value i 20 out of 25 samples. 95% cofidece itervals for µ. The itervals are wider (high margi of error), but they succeeded i capturig the true value i 24 out of 25 samples. Icrease the sample size. o The statistics from larger samples will be closer, o average, to the parameter tha the statistics from smaller samples. Turig this aroud, it meas that the parameter should be closer, o average, to the statistic from a give sample. The stadard deviatio of the statistic is lower for larger samples, so the margi of error is also lower. Beware! The margi of error i a cofidece iterval covers oly chace variatio due to radom samplig or radom assigmet. It does ot accout for practical difficulties or bad study desig! Problems like udercoverage ad orespose ca produce estimates that are much farther from the parameter tha the margi of error suggests, so always be cautious ad a bit skeptical whe you read poll results that iclude a margi of error!

Estimatig a Populatio Proportio Ms. Liford asked a SRS (it was defiitely ot a coveiece sample) of 41 BHS studets how ofte they wear a seatbelt. I the sample, 30 of the 41 studets respoded that they wear a seatbelt 100% of the time. Ms. Liford wats to costruct a cofidece iterval for the proportio of all BHS studets who always wear a seatbelt. Cofidece itervals always have the same form: poit estimate ± margi of error ( ) ( ) = statistic ± critical value stadard deviatio of statistic Let s figure out what each of these values are if we are dealig with proportios. Statistic: If we are iterested i a proportio, we should use the sample proportio ˆp as the statistic. Example: What is ˆp for the seatbelt example? Critical value: The critical value, z for a C% cofidece level is the value of z such that C% of the area uder the Normal curve is betwee z ad z. Fidig a Critical Value: 1. Draw a Normal curve with the cetral C% marked. The boudaries of this regio are z ad z. 2. Figure out the area to the left of z (the cetral C% plus the left tail). Look up the z-score correspodig to this value usig the bottom row of the t table, or usig the ivorm commad o your calculator with μ = 0 ad σ = 1. For a 95% iterval, the tails cotai 5% of the area, so 2.5% i each tail. The area below the upper boud is 97.5%. Usig ivorm (area = 0.9750) gives z = 1.96. 95% of all samples will result i a value of ˆp that is withi 1.96σ p ˆ of the true value of p. This also meas that p will be withi 1.96σ p ˆ of ˆp i 95% of samples. Examples: Fid the value of z for the followig cofidece levels: a) 85% b) 98% c) 99.5%

Stadard deviatio of the statistic: We kow that ˆp from our sample is part of the samplig distributio of p ˆ. If we took p ˆ 's from may differet samples, those values would be cetered o the true value of p ad pq would have a stadard deviatio of σ pˆ =. Sice we do t kow p ad q, we have to use our best estimates for them i this formula istead: ˆp ad q ˆ. Whe we do this, we call the result a stadard error istead of a stadard deviatio. Stadard Error (SE): Whe the stadard deviatio of a statistic ( σ pˆ or σ x ) is estimated from data (usig ˆp or s i the formula to estimate p or σ ), the result is called the stadard error of the statistic. That meas that the stadard error of ˆp is SE pˆ pq ˆ ˆ =, as log as the Idepedet/10% coditio is met. The stadard error is just a estimate of σ ˆp (the stadard deviatio of the samplig distributio of ˆp ) that uses ˆp ad ˆq i the formula istead of p ad q. It describes the typical distace of the sample proportio ˆp from the populatio proportio p i repeated SRSs of size. Example: Calculate ad iterpret SE ˆp for the seatbelt example. Example: Now that you have all the pieces, put them together to form a 95% cofidece iterval for the true proportio of all BHS studets who always wear a seatbelt. The iterpret the iterval. Oe-Sample z Iterval for a Populatio Proportio Choose a SRS of size from a large populatio that cotais a ukow proportio p of successes. A approximate C% cofidece iterval for p is pq ˆ ˆ pˆ ± z where z is the critical value for the stadard Normal curve with C% of the area betwee ad z. This formula is valid oly whe the Radom, Normal, ad Idepedet/10% coditio are met. z

Coditios for Estimatig p: There are three coditios that must be met i order for the formula for a oesample cofidece iterval for a proportio to be valid oe for each of the three compoets i the formula. 1. Radom Coditio ad p ˆ : If we wat to make ay sort of iferece about the populatio of iferece, the data must come from a radom sample from the populatio of iterest. If the data come from a coveiece sample or a volutary respose sample, we should t have ay cofidece that the value of ˆp from our sample is a good estimate of p, which meas we should t have ay cofidece i a iterval based o it. If we ca t make ay coclusios beyod our crappy data, there s o poit costructig a cofidece iterval! Throw the whole study out the widow ad start over! 2. Normal/Large Couts Coditio ad * * z : The critical value z depeds o the assumptio that the samplig distributio of ˆp is approximately Normal. (For example, ˆp will be withi 1.96 stadard deviatios of p approximately 95% of the time oly if the samplig distributio is approximately Normal.) If the Normal coditio is violated, the actual capture rate of our itervals will usually be less tha the stated cofidece level. For example, we may use a critical value for 95% cofidece, but less tha 95% of the itervals will actually capture the true parameter value. The samplig distributio of ˆp is approximately Normal if p ad q are both at least 10. However, we have a problem: We ca t check this because we do t kow p ad q, so we have to use our best guess for what they are istead: ˆp ad q ˆ. Normal/Large Couts Coditio for Proportios (whe p is ukow): pˆ 10 ad qˆ 10. # of successes i sample # of successes i sample Sice ˆp = pˆ = = # of successes i the sample. Similarly, q ˆ = # of failures i the sample. Both umbers should be itegers! pq ˆ ˆ 3. Idepedet/10% Coditio ad : This coditio is the least importat! Remember that the stadard deviatio formulas for samplig distributios oly work if the idepedet/10% coditio is met. If the sample cosists of more tha 10% of the populatio, we should use a more complicated formula for the stadard deviatio of the samplig distributio. If we sample more tha 10% of the populatio, we actually ed up with itervals that are wider tha ecessary ad capture the true proportio more ofte tha the stated cofidece level. This is t a huge problem. It s just that we could get a iterval with a smaller margi of error if we used a better formula. Example: Check to make sure that all coditios are met for the seatbelt example.

Thigs you must write o cofidece iterval problems for a proportio: 1. Name the procedure ( Oe-sample z iterval for p ). 2. Defie the parameter of iterest (what is p i this sceario?). 3. Check coditios: Radom: The data come from a radom sample from the populatio of iterest. Normal/Large Couts: The samplig distributio of ˆp is approximately Normal. This is true if pˆ 10 ad qˆ 10 (the sample has at least 10 successes ad at least 10 failures). Idepedet/10% Coditios: Observatios are idepedet. If they are t (they probably wo t be), check the good old 10% coditio fudge factor istead: If samplig without replacemet, the sample size is o more tha 10% of the populatio size/the populatio size is at least te times the sample size. 4. Report the cofidece iterval, calculated either from the formula or with a calculator. If you show work, it s best to show umbers plugged ito the formula but do t write the formula with symbols. (Errors with symbols are a very commo way to lose credit!) 5. Iterpret your iterval i cotext. Cofidece Itervals for Proportios o TI-83/TI-84 Calculators. 1. Choose 1-PropZIt o the STAT TESTS meu. 2. Eter the requested iformatio: x: umber of successes (must be a iteger). : sample size C-Level: cofidece level, as a decimal. 3. Choose Calculate. Example: I a SRS of 50 peies from a large collectio, 35 were more tha 10 years old. a) Calculate ad iterpret a 96% cofidece iterval for the proportio of all the peies i the collectio that are more tha 10 years old. b) Is it plausible that less tha 55% of the peies are more tha 10 years old? Explai.

Example: Accordig to a article i the Sa Gabriel Valley Tribue (February 13, 2003), Most people are kissig the right way. That is, accordig to the study, the majority of couples tilt their heads to the right whe kissig. I the study, a researcher observed a radom sample of 124 couples kissig i various public places ad foud that 66.9% of the couples tilted their heads to the right. a) Costruct ad iterpret a 95% cofidece iterval for the proportio of all couples who tilt their heads to the right whe kissig. b) Does the article s claim that more tha half of all couples tilt their heads to the right while kissig seem justified? Explai. Choosig a Sample Size I plaig a study, it s importat to make sure that we choose a large eough sample to achieve a desired margi of error so that we do t fid out after the fact that our sample was too small. pq ˆ ˆ The margi of error (ME) i the cofidece iterval for p is ME = z. We wat to solve this formula for. The problem is that we wo t kow ˆp ad ˆq util after we ve doe the experimet. This leaves us two choices: 1. Use a guess for ˆp based o past experiece with similar studies. 2. Use p ˆ = 0.5 as the guess. The margi of error will always be largest whe p ˆ = 0.5, so this guess will give us a sample size that guaratees that the margi of error will be o higher tha a certai value. It helps us pla for a worst case sceario if we get ay other ˆp whe we do our study, we will get a margi of error smaller tha plaed. Oce you re picked a guessed value to use for p ˆ, solve pq ˆ ˆ z = ME for.

Example: Suppose that you wated to estimate p = the true proportio of studets at your school who have a tattoo with 95% cofidece ad a margi of error of o more tha 0.10. a) A article i the school ewspaper reported that 20% of the studets at the school have a tattoo. Use this value as a estimate of ˆp to determie how may studets must be sampled to achieve the desired cofidece level ad margi of error. b) Repeat the calculatios usig the more coservative guess of p ˆ = 0.5.

Estimatig a Populatio Mea σ Whe we try to estimate a mea, we would ideally like to use a iterval of the form x ± z, but if we do t kow the populatio mea μ, we probably do t kow the populatio stadard deviatio σ either. This meas that we have to the sample stadard deviatio s i the formula istead. s Stadard error of the sample mea ( SE x or SEM): SEx =, where s is the sample stadard deviatio, as log as the Idepedet/10% coditio is met. The stadard error is just a estimate of σ x (the stadard deviatio of the samplig distributio of x ) that uses s i the formula istead of σ. It describes the typical distace betwee the sample mea x ad the populatio mea μ i repeated SRSs of size. Ufortuately, we ru ito a problem if we try to * s use itervals of the form x ± z. Because the value of s varies from sample to sample, the itervals ed up beig differet widths, ad we ed up capturig the true value of μ much less ofte tha we would expect based o our cofidece level. The figure at the right shows 50 cofidece itervals calculated with the formula above. They are based o samples of size = 10 take from a populatio i which μ = 50 ad σ = 10. Notice that eve though the cofidece level is 95%, oly 88% of the itervals captured the true value of μ. A chemistry graduate by the ame of William Sealy Gosset worked i a brewery i Dubli, Irelad. He ofte had to estimate the amout of yeast i large jars of beer based o extremely small samples. He described a problem he ra ito i a paper etitled The Probable Error of a Mea, published i 1908 i the joural Biometrika. The usual method of determiig the probability that the mea of the populatio lies withi a give distace of the mea of the sample is to assume a ormal distributio about the mea of the sample with a stadard deviatio equal to s where s is the stadard deviatio of the sample, ad to use the tables of probability [based o a ormal curve]. But as we decrease the [sample size], the value of the stadard deviatio foud from the sample becomes itself subject to a icreasig error, util judgmets used i this way become altogether misleadig Gosset set out to fix the problem. He described a ew type of distributio called a t distributio that worked much better tha a Normal distributio for estimatig μ whe σ was ukow. The brewery would ot allow him to publish his results uder his ow ame, so he published them uder the pseudoym Studet. For this reaso, t distributios are ofte referred to as Studet s t distributios i his hoor.

Moral of the story: Use z for proportios ad t for meas! (uless you are i the extremely rare situatio where you do t kow μ but you somehow kow σ - or are pretedig you do to figure out how large of a sample you eed). t Distributios ad * t Critical Values There is a differet t distributio for each sample size. We specify what t distributio we are usig by givig its degrees of freedom (df). For iferece about a populatio mea, we fid the degrees of freedom by subtractig oe from the sample size: df = 1. The desity curves of the t distributios are similar i shape to the stadard Normal curve. They are symmetric about 0, sigle-peaked, ad bell-shaped. The spread of the t distributios is a bit greater tha that of the stadard Normal distributio. There is more probability i the tails ad less i the ceter tha i the stadard Normal curve. This is because substitutig the estimate s for the fixed parameter σ itroduces more variatio ito the statistic. As the degrees of freedom icrease, the t desity curve approaches the stadard Normal curve ever more closely. This happes because s i place of σ causes less extra variatio whe the sample is large. The critical value for cofidece itervals about meas is called t * * istead of z because it comes from a t distributio istead of a Normal distributio. The iterpretatio of t * is the same as the iterpretatio of z *. It measures withi how may stadard deviatios of the estimate (i this case, the sample mea) we expect the true value of the parameter (i this case, the populatio mea) to be i C% of samples. Usig the t Table The t table has tail areas (which are the areas to the right of a certai value of t ) alog the top ad cofidece levels alog the bottom. (It s easier to look at the cofidece level tha the tail area)! The left side of the table lists degrees of freedom. The values of t * are o the iside of the table. If there is t a lie for the degrees of freedom that you eed, use a lie with fewer degrees of freedom, or better yet, use a calculator! Fidig t Critical Values o a Calculator TI-84: Use ivt(area, df) o the DISTR meu (2 d VARS). DO NOT eter the cofidece level! Eter the area to the left of the critical value: Ceter area (cofidece level) plus left tail!

TI-83: Use the IverseT app. Whe it asks for a probability, eter the area to the left of the critical value: Ceter area (cofidece level) plus left tail! Example: Fid the critical value t that you would use for a cofidece iterval for a populatio mea μ i each of the followig situatios. a) A 90% cofidece iterval from a SRS of size 10. b) A 98% cofidece iterval based o = 14 observatios. c) A 95% cofidece iterval from a sample of size 45. The Oe-Sample t Iterval for a Populatio Mea (σ ukow) Choose a SRS of size from a populatio havig ukow mea μ. A approximate C% cofidece iterval for μ is s x ± t where t is the critical value for the t distributio with 1 degrees of freedom (df) ad C% of * the area betwee t ad t *. This formula is oly valid whe the Radom, Normal, ad Idepedet/10% coditios are met. The Normal coditio for t procedures: The mathematics behid t procedures are based o the assumptio that the populatio distributio is exactly Normal. However, they still give fairly accurate results whe this coditio is violated, especially whe the sample size is large. We say that t procedures are robust agaist skewess i the populatio distributio. A iferece procedure is called robust if the probability calculatios ivolved i that procedure remai fairly accurate eve whe a coditio for usig the procedure is violated. The t procedures are robust agaist skewess i the populatio distributio (but ot agaist outliers). Larger samples improve the accuracy of the t procedures whe the populatio distributio is ot Normal. If the sample size is large eough ( 30 ), t procedures give surprisigly accurate results eve whe the populatio distributio is clearly skewed. The t procedures are ot robust agaist violatios of the Radom coditio! Thigs you must write o cofidece iterval problems for a mea: 1. Name the procedure ( Oe-sample t iterval for μ ). 2. Defie the parameter of iterest (what isμi this sceario?). 3. Check coditios: Radom: The data come from a radom sample from the populatio of iterest.

Normal/Large Sample Size: The populatio distributio is Normal or the sample size is large ( 30). If < 30 ad the populatio distributio has a ukow shape, DRAW A GRAPH of the sample data. (Do t just look at it o your calculator you have to actually draw it!) As log as the graph does t show strog skewess or outliers, it s okay to use t procedures. (No strog skewess or outliers meas it s plausible that the populatio distributio is Normal, but do t go ito that much detail whe writig dow your check!) Idepedet: Observatios are idepedet. If they are t (they probably wo t be), check the good old 10% coditio fudge factor istead: If samplig without replacemet, the sample size is o more tha 10% of the populatio size/the populatio size is at least te times the sample size. 4. Report the cofidece iterval, calculated either from the formula or with a calculator. Also write dow the degrees of freedom. INCLUDE UNITS!!! 5. Iterpret your iterval i cotext. Examples: Determie whether we ca safely use a oe-sample t iterval to estimate the populatio mea i each of the followig settigs. (a) The dotplot below shows the amout of time it took (i miutes) to order ad receive a regular coffee i five radomly selected visits to a local coffee shop. 1 2 3 4 5 6 time (b) The boxplot below shows the SAT Math scores for a radom sample of 20 studets at your high school. 400 500 600 700 800 SAT_Math (c) To estimate the average GPA of studets at your school, you radomly select 50 studets from classes you take. Here is a histogram of their GPAs: 10 8 6 4 2 2.0 2.5 3.0 3.5 4.0 GPA Cofidece Itervals o TI-83/TI-84 Calculators. 1. Choose TIterval o the STAT TESTS meu. 2. Choose Data if you have a list of sample data. Choose Stats if you have values for x ad s x. 3. Eter the requested iformatio: For Data optio, iput the sample values ito a list ad idicate which list they are i. For Stats optio, x : sample mea Sx: sample stadard deviatio : sample size C-Level: cofidece level, as a decimal. 4. Choose Calculate.

Example: As part of their fial project i AP Statistics, two studets radomly selected 18 rolls of a geeric brad of toilet paper to measure how well this brad could absorb water. To do this, they poured ¼ cup of water oto a hard surface ad couted how may squares of toilet paper it took to completely absorb the water. Here are the results from their 18 rolls: 29 20 25 29 21 24 27 25 24 29 24 27 28 21 25 26 22 23 Costruct ad iterpret a 99% cofidece iterval for μ = the mea umber of squares of geeric toilet paper eeded to absorb ¼ cup water. Example: The pricipal at a large high school claims that studets sped at least 10 hours per week doig homework, o average. To ivestigate this claim, a AP Statistics class selected a radom sample of 250 studets from their school ad asked them how log they spet doig homework durig the last week. The sample mea was 10.2 hours ad the sample stadard deviatio was 4.2 hours. a) Costruct ad iterpret a 95% cofidece iterval for the mea time that studets at this school spet doig homework i the last week. b) Based o your iterval i part (a), what ca you coclude about the pricipal s claim?

Choosig Sample Size for a Desired Margi of Error Whe Estimatig μ: * s The margi of error for a oe-sample t iterval for μ is ME = t. It would be lovely if we could simply plug i the desired margi of error, t *, ad s i order to fid out how large our sample size should be. Problems: 1. We have t take a sample yet, so we do t have a value to use for s. 2. We do t kow how may degrees of freedom to use to figure out t * because we do t kow the sample size! * Solutio: Preted we kow σ. That way, we ca use z istead of t * (which we have to do ayway we do t have aother choice sice we ca t figure out the degrees of freedom!) To determie the sample size that will yield a level C cofidece iterval for a populatio mea with a specified margi of error ME: 1. Get a reasoable estimate of the populatio stadard deviatio σ from a earlier or pilot study. 2. Fid the critical value z from a stadard Normal curve for cofidece level C. σ 3. Solve ME = z for. Example: A lab supply compay sells pieces of Douglas fir for force experimets i sciece classes. From experiece, the stregth of these pieces of wood follows a Normal distributio with stadard deviatio 3000 pouds. You wat to estimate the mea load eeded to pull apart these pieces of wood to withi 1000 pouds with 95% cofidece. How large a sample is eeded?