Chapter 8 Estimation - PDF Free Download

Chapter 8 Estimation There are two important forms of statistical inference: estimation (Confidence Intervals) Hypothesis Testing Statistical Inference drawing conclusions about populations based on samples of the population parameter unknown : μ, σ A parameter is a number that describes the population of interest. Since we usually cannot examine the entire population of interest, parameters are generally unknown. statistic known:, s A statistic is a number that is computed from sample data. We often use a statistic to estimate an unknown population parameter. sample statistic and population parameter Notation μ = population mean (unknown) = sample mean (computed from the data we have on hand from a sample of the population) σ = population standard deviation (unknown) s = sample standard deviation (computed from the data we have on hand from a sample of the population) estimates μ s estimates Point Estimate An estimate of a population parameter given by a single number. A point estimation of a population parameter is an estimate of the parameter using a single number. is a point estimate of μ S is a point estimate of 1 P a g e

Sampling Variability Example: What is the average weight of women 5 1 tall between the ages of 21-45? The American Medical Association takes a sample of 1000 women between the ages of 21-45 years and with height 5 1 They find that that the mean weight is = 13 2 lbs Question: If our goal is to estimate the mean weight of the population, how should we deal with the fact that different samples yield different estimates of the mean weight?? The basic fact that the value of a sample statistic varies in (hypothetical) repeated random sampling is called sampling variability. E ample: If another sample of 1000 women was chosen from the same population of 5 1 women between 21-45 years old, the value of would almost certainly be different something other than 136.2 lbs. Answer: Allow a margin of error that takes sampling variability into account. Confidence Intervals Confidence intervals are generally of the form point estimate ± margin of error ± margin of error Question: Why should we estimate μ, true population mean, with an interval of numbers? Why not just use the point estimate as our estimate of μ? Answer: (1) Using an interval estimate (i.e. confidence interval) takes sampling variability into consideration, and (2) we can attach a level of confidence to an interval estimate which we cannot do with a point estimate. A confidence interval for μ has two parts: 1) A margin of error says how close lies to μ 2) A level of confidence says what percent of all possible samples satisfy the margin of error. 2 P a g e

A confidence level, c, is any value between 0 and 1 that corresponds to the area under the standard normal curve between zc and +zc. Margin of Error Even if we take a very large sample size, may differ from µ. Critical Values For an interval of numbers there is a left endpoint and a right endpoint. (lower bound, upper bound) For a confidence level c, the critical value is the number such that the area under the standard normal curve between and equals c (your confidence level) Example - Which of the following correctly expresses the confidence interval shown below? a)p 0.99 < z < 1 = 2.58 b) P 2.58 < z < 2.58 = 0.99 c) P 0 < z < 0.99 = 5.16 d) P 2.58 < z < 2.58 = 0.01 3 P a g e

Common Confidence Levels Area = 0.9 or 90% Area = 0.95 or 95% -1.645 1.645-1.96 1.96 Area = 0.98 or 98% Area = 0.99 or 99% -2.33 2.33-2.58 2.58 Notice as the confidence level increase the interval gets wider When constructing a confidence interval, you must decide on the risk you are willing to take of being wrong. A confidence interval is wrong if it doesn t contain the true value of the population parameter. 99% confidence ==> 1% chance of being wrong 95% confidence ==> 5% chance of being wrong 90% confidence ==> 10% chance of being wrong How confidence intervals behave High confidence says that our method almost always gives correct answers. A small margin of error says that we have pinned down the parameter quite precisely. 4 P a g e

The margin of error determines the width of the confidence interval. 1) The margin of error is larger for higher confidence levels. To obtain a smaller margin of error from the same data, you must be willing to accept lower confidence. 2) The margin of error is larger for smaller sample sizes. 3) The margin of error is larger for populations that have lots of variability. Interpreting confidence levels Take 95% confidence, for example. Practical Interpretation: We are confident that the is between and,on average. Statistical Interpretation: If we repeatedly take random samples of size n from the population and construct 95% confidence intervals for each sample, then in the long run 95% of these confidence intervals will capture the true value of μ Our sample is either one of the 95% for which the calculated interval captures μ, or one of the unlucky 5% that do not The idea of sampling distribution Take many samples from the same population. Collect the s from all the samples Display the distribution of the histogram, for example). s (in a The histogram will be bell-shaped and symmetric, centered at the population mean. The sampling distribution of distribution! is a normal 5 P a g e

Sampling Distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. ampling Distribution of : eans that if we collect all the possible samples of si e n from the population of interest and collect all of the s and plot them in a histogram, that histogram will be bell shaped, symmetric and centered at μ with standard deviation σ/ n Facts about the sampling distribution of These facts describe how varies from one sample to the next: 1) In repeated sampling, will sometimes fall above the true value of μ and sometimes below it, but there is no systematic tendency for to overestimate or underestimate μ The sampling distribution of is centered at μ, and so is called an unbiased estimator of μ 2) The values of from larger samples are less variable than those from smaller samples. The standard deviation of the sampling distribution of is Mean of x : x x Standard Deviation of x: x x n ( ) Confidence interval for μ for 95% Confidence σ is known σ is unknown. ( ). ( ) If σ is known then we use Z c If σ is unknown then we have s, then we use t c 6 P a g e

Maximal Margin of Error Since µ is unknown, the margin of error µ is unknown. Using confidence level c, we can say that differs from µ by at most:.. = = The Probability Statement ( ( < < ) = < < ) = In words, c is the probability that the sample mean,, will differ from the population mean, µ, by at most, margin of error. 7 P a g e

Confidence Intervals A c confidence interval for µ is an interval computer from sample data in such a way that c is the probability of generating an interval containing the actual value of µ Example - For a population of domesticated geese, the standard deviation of the mass is 1.3 kg. A sample of 45 geese has a mean mass of 5.7 kg. Find the confidence interval for the population mean at the 95% confidence level. Notice that we have (population standard deviation) so we can use Z c Calculator: STAT, TEST, 7: Z-Interval, Choose STAT Critical Thinking Since is a random variable, so are the endpoints. After the confidence interval is numerically fixed for a specific sample, it either does or does not contain µ. If we repeated the confidence interval process by taking multiple random samples of equal size, some intervals would capture µ and some would not! The equation ( all intervals containing µ will be c. < < ) = states that the proportion of 8 P a g e

Estimating µ When σ is Unknown In most cases, researchers will have to estimate σ with s (the standard deviation of the sample). The sampling distribution for will follow a non-normal distribution called the Student s t distribution. The t Distribution Assume that x has a normal distribution with mean μ. For samples of size n with sample mean and sample standard deviation s, the t variable is = degrees of freedom = n-1 has a tudent s t distribution with Properties of the t-distribution bell shaped and symmetric and centered at zero there is more area in the tails in the t-distribution than there is in the N(0,1) distribution the t-distribution is really a family of density curves such that each one is significantly different depending on the degrees of freedom as degrees of freedom gets larger and larger the t-density curve looks more and more identical to the N(0,1) For different levels of Confidence: For 95% Confidence Interval For 90% Confidence Interval For 99% Confidence Interval 9 P a g e

Example -Find the t-value for the following data: a). 27.62 b). 0.11 x 55.2, 58.1, s 4.2, n 40 c). 8.95 d). 4.37 To find values of t c you use Table 6 of Appendix II to find the critical values t c for a confidence level c. Degrees of freedom, df, are the row headings. Confidence levels, c, are the column headings Maximal Margin of Error If we are using the t distribution: 10 P a g e

What Distribution Should We Use? 11 P a g e

Notes on Calculator: For Normal Distribution For Proportion σ is unknown σ is known Test Statistic t obs z obs Calculator Stat Test T-test Stat Test Z-test Stat Test 1-Prop Z-test Confidence ( ) ( σ Interval ) Calculator Stat Test T-Interval Stat Test Z-Interval Stat Test 1-Prop Z-Interval Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. For the 60 homeowners surveyed, the sample average was 4.2 and the sample standard deviation was 2.1. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime. Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. Suppose that this time sigma is known to be 2.8. Assume that we collect a sample of 60 homeowners and compute the sample average to be 4.2. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime. 12 P a g e

Example: The numbers of advertisements seen or heard in one week for 30 randomly selected people in the United States are listed below. Construct a 95% confidence interval for the true mean number of advertisements. 598 494 441 595 728 690 684 486 735 808 481 298 135 846 764 317 649 732 582 677 734 588 590 540 673 727 545 486 702 703 13 P a g e

Estimating p in the Binomial Distribution We will use large-sample methods in which the sample size, n, is fixed. We assume the normal curve is a good approximation to the binomial distribution if both np > 5 and nq = n(1 p) > 5. Point Estimates in the Binomial Case Margin of Error The magnitude of the difference between the actual value of p and its estimate is the margin of error. The Distribution of For large samples, the distribution is well approximated by a normal distribution. A Probability Statement With confidence level c, as before. 14 P a g e

Example - Suppose that 800 students were randomly selected from the student body of 20,000 and are given shots to prevent a certain type of flu. All 800 students were exposed to the flu, and 600 of them did not get the flu. Let p represent the probability that the shot will be successful for any single student selected at random from the entire population of 20,000. a) What are the point estimates for p and q? What is the value of n and r? b) Is the number of trials large enough to justify a normal approximation to the binomial? 15 P a g e

c) Find a 99% confidence interval for p. Calculator: STAT, TESTS, A: 1- Prop Z Int. The value of x = r Example: A survey of 300 fatal accidents showed that 123 were alcohol related. Construct a 98% confidence interval for the proportion of fatal accidents that were alcohol related. 16 P a g e

Choosing Sample Sizes When designing statistical studies, it is good practice to decide in advance: The confidence level The maximal margin of error Then, we can calculate the required minimum sample size to meet these goals. Sample Size for Estimating μ If σ is unknown, use σ from a previous study or conduct a pilot study to obtain s. Always round n up to the next integer!! Sample Size for Estimating If we have a preliminary estimate for p, use the following. If we have no preliminary estimate for p, use the following modification: Example A wildlife study is designed to find the mean weight of salmon caught be an Alaskan fishing company. A preliminary study of a random sample of 50 salmon showed = 2.15 pounds. How large a sample should be taken to be 90% confident that the sample mean, is within 0.20 pounds of the true mean weight μ? 17 P a g e

Example: A researcher wishes to estimate the number of households with two cars. How large a sample is needed in order to be 98% confident that the sample proportion will not differ from the true proportion by more than 5%? A previous study indicates that the proportion of households with two cars is 19%. 18 P a g e