Confidence Intervals. σ unknown, small samples The t-statistic /22

Confidence Intervals σ unknown, small samples The t-statistic 1 /22

Homework Read Sec 7-3. Discussion Question pg 365 Do Ex 7-3 1-4, 6, 9, 12, 14, 15, 17 2/22

Objective find the confidence interval for a mean when σ is unknown. 3/22

Confidence Interval When the population standard deviation (σ) is unknown (as is usually the case) the sample standard deviation (s) can be used. If the sample size is at least 30 we can still use the standardized normal (z) distribution. Your book states that if σ is unknown and the sample size is < 30 we change our distribution (z) to a new distribution (t). I have a more general rule. If σ is unknown, use t regardless of sample size. If σ is unknown, use t 4 /22

Z distribution We learned from the central limit theorem that the sampling distribution of a statistic (like a sample mean) will follow a normal distribution, as long as the sample size is sufficiently large. The CLT tells us that the standard deviation of the sampling distribution is equal to the standard deviation of the population divided by the square root of the sample size. When we know the standard deviation of the population, we can compute a z-score, and use the normal distribution to evaluate probabilities using the sample mean. However, we rarely know the standard deviation of the population, and a large sample size may be problematic. 5 /22

t distribution Sample sizes are sometimes small, and rarely do we know the standard deviation of the population. When either of these situations occur, statisticians rely on the distribution of the Student s t statistic (named by a beer maker) (also known as the t score), whose values are given by: t = X µ s Look familiar? n where x is the sample mean (µ is the population mean), s is the standard deviation of the sample, and n is the sample size. The distribution of the t statistic is called the t distribution or the Student s t distribution. 6 /22

Degrees of Freedom Unlike the Z-distribution, there are actually many different t distributions. The particular form of the t distribution is determined by its degrees of freedom. The degrees of freedom refers to the number of independent observations in a set of data. When estimating a mean score or a proportion from a single sample, the number of independent observations is equal to the sample size minus one. Hence, the distribution of the t statistic from samples of size 8 would be described by a t distribution having 8-1 or 7 degrees of freedom. Similarly, a t distribution having 15 degrees of freedom would be used with a sample of size 16. 7 /22

t distribution The t distribution has the following properties: The mean of the distribution is equal to 0. A simplified version of the variance is equal to d.f. / (d.f. - 2 ), the symbol ν (nu) is sometimes used to denote the degrees of freedom and ν > 2. The variance (and standard deviation) of the t curves is not necessary when using the t statistic. The variance is always greater than 1, although it is close to 1 when there are many degrees of freedom. With infinite degrees of freedom, the t distribution is the same as the standard normal (z) distribution. 8 /22

t Distribution The t distribution approximates a normal distribution with mean and median values 0 in a symmetric curve just like the standard normal curve The primary difference with a t distribution is that the standard deviation is greater than 1. The standard deviation is determined by the degrees of freedom (d.f.) (n-1). As sample size increases the t distribution approaches the standard normal distribution. 9 /22

t distribution As the degrees of freedom increase, the t-models get closer to the z-distribution. Normal Distribu.on (z) t- Distribu.on The t-model with infinite degrees of freedom is exactly Normal. 10/22

How do we find t? To find the value for t, we use a table very similar to the z table. Instead of probabilities, the table will give the t value. Of course we will actually use the calculator. Table F on page 771 is the t table. To find the appropriate t value, locate the desired confidence level and the degrees of freedom (d.f. = n - 1) where those (column and row) join is the correct t score. You might note that at the bottom of the t table where d. f. (sample size) is infinite, the t score is also the z score. 11 /22

t table For the present, ignore the one-tail and two-tail rows 12/22

Confidence Interval Mean and Standard Deviation Calculating the confidence interval is exactly the same as with the z statistic, simply replace the z with t. X ± tα s 2 n Some researchers (and I) prefer to use the t statistic almost exclusively when the population standard deviation is unknown. Using the t statistic will result in a slightly more conservative (wider) interval. The book suggests using the z statistic when the sample size is greater than or equal to 30. You are free to make your own choice. 13/22

Example Example A sample of 28 football players has a mean weight of 212 lbs and a standard deviation of 21 lbs, find a 95% confidence t interval for the mean weight of the population of football players. Find a 99% confidence t interval. s 21 = 2.052 8.1 95% tα 28 2 n The interval would be 212 ± 8.1. or 203.9 < µ < 220.1 s 21 = 2.771 11 99% tα 28 2 n The interval would be 212 ± 11 or 201 < µ < 223 14/22

Sentence Frame When writing the conclusion for a confidence interval, use the following sentence structure. Sample size n Confidence Level Based on the data from our sample of size, we are % confident the true value of the population is between and. Parameter, Variable Lower Boundary Upper Boundary 15/22

To t or not to t When do we use the t statistic instead of the z statistic? If the population σ is known, use z with σ in the equation. If sample size 30, you may use z or t with s in the equation If σ is unknown and sample size < 30, use t. 16/22

Example Find a 90% confidence interval for the population mean if a sample of size 20 has a mean of 1462 with a standard deviation 42. Since we do not know σ and the sample size is < 30 we use the t distribution. 90% s 42 tα = 1.729 16.2 20 2 n CI = 1462 ± 16.2 = (1445.8,1478.2) 1445.8 < µ < 1478.2 Based on data from a sample of 20 we are 90% confident the true population mean value would be between 1445.8 and 1478.2 17/22

Example Let us say that in previous years the average temperature for this time of year is 67 F. Students are complaining that this year it is much warmer. To find out if it is actually warmer students record the temps at noon for a two week period. For this example we will assume σ = 5 F. 74 72 69 75 62 64 70 72 78 68 71 62 71 70 Test the student s conjecture at a significance level of.05 Remember to answer completely while using the calculator. Find the mean and standard deviation of our sample. X = 69.8571 s = 4.6716 18/22

Calculator & Z Z with data, enter data into a list. TESTS STAT 7:ZInterval X = 69.8571 s = 4.6716 Inpt: Data Stats σ: 5 List: L1 Freq: 1 C-Level:.95 Calculate (67.238, 72.476) x: 69.8571 sx: 4.6716 n: 14 Note the values for x and s Based on data from a random sample of 14 days we are 95% confident the true mean temperature for February would be between 67.2ᵒ and 72.5ᵒ 19/22

Calculator w/z Calculator &Z Z with statistics (µ, σ), no data. TESTS STAT 7:ZInterval Inpt: Data Stats σ: 5 x: 69.8571 n: 14 C-Level:.95 Calculate X = 69.8571 s = 4.6716 (67.238, 72.476) x: 69.8571 n: 14 Based on data from a random sample of 14 days we are 95% confident the true mean temperature for February would be between 67.2ᵒ and 72.5ᵒ 20/22

Calculator w/t Calculator &t t with data, enter data into a list. TESTS STAT 8:TInterval Inpt: Data Stats List: L1 Freq: 1 C-Level:.95 Calculate X = 69.8571 s = 4.6716 (67.16, 72.554) x: 69.8571 sx: 4.6716 n: 14 Note the interval width. Based on data from a random sample of 14 days we are 95% confident the true mean temperature for February would be between 67.2ᵒ and 72.5ᵒ 21/22

Calculator w/t X = 69.8571 t with statistics, no data. TESTS STAT 8:TInterval Inpt: Data Stats x: 69.8571 sx: 4.6716 n: 14 C-Level:.95 Calculate s = 4.6716 (67.16, 72.554) x: 69.8571 sx: 4.6716 n: 14 Based on data from a random sample of 14 days we are 95% confident the true mean temperature for February would be between 67.2ᵒ and 72.5ᵒ 22/22