Statistics 13 Elementary Statistics

Similar documents
Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Confidence Intervals and Sample Size

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Chapter Four: Introduction To Inference 1/50

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

χ 2 distributions and confidence intervals for population variance

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

8.1 Estimation of the Mean and Proportion

Section 7-2 Estimating a Population Proportion

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Chapter 7. Sampling Distributions

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Chapter 8 Estimation

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

ECON 214 Elements of Statistics for Economists 2016/2017

7.1 Comparing Two Population Means: Independent Sampling

Statistics Class 15 3/21/2012

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Confidence Intervals Introduction

Stat 213: Intro to Statistics 9 Central Limit Theorem

ECON 214 Elements of Statistics for Economists

1 Inferential Statistic

Statistics for Business and Economics

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Sampling and sampling distribution

STAT Chapter 7: Confidence Intervals

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Chapter 8 Statistical Intervals for a Single Sample

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Lecture 6: Chapter 6

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Normal Probability Distributions

Statistical Intervals (One sample) (Chs )

Chapter 9: Sampling Distributions

1 Sampling Distributions

Elementary Statistics

Math 227 Elementary Statistics. Bluman 5 th edition

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

SAMPLING DISTRIBUTIONS. Chapter 7

Chapter 5. Sampling Distributions

Data Analysis and Statistical Methods Statistics 651

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Estimation Y 3. Confidence intervals I, Feb 11,

Chapter Seven: Confidence Intervals and Sample Size

STAT Chapter 6: Sampling Distributions

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Name PID Section # (enrolled)

STA215 Confidence Intervals for Proportions

Central Limit Theorem (cont d) 7/28/2006

Sampling & Confidence Intervals

1 Introduction 1. 3 Confidence interval for proportion p 6

Confidence Intervals for the Mean. When σ is known

Midterm Exam III Review

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Estimation. Focus Points 10/11/2011. Estimating p in the Binomial Distribution. Section 7.3

Data Analysis and Statistical Methods Statistics 651

5.3 Interval Estimation

Time Observations Time Period, t

Confidence Intervals. σ unknown, small samples The t-statistic /22

For more information about how to cite these materials visit

Expected Value of a Random Variable

5/5/2014 یادگیري ماشین. (Machine Learning) ارزیابی فرضیه ها دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی. Evaluating Hypothesis (بخش دوم)

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Lecture 9. Probability Distributions. Outline. Outline

Homework: (Due Wed) Chapter 10: #5, 22, 42

ECO220Y Sampling Distributions of Sample Statistics: Sample Proportion Readings: Chapter 10, section

Lecture 9. Probability Distributions

Confidence Intervals for Large Sample Proportions

Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, σ Unknown

Learning Objectives for Ch. 7

Chapter 7. Inferences about Population Variances

The Central Limit Theorem

MATH 3200 Exam 3 Dr. Syring

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

BIO5312 Biostatistics Lecture 5: Estimations

Statistics for Business and Economics: Random Variables:Continuous

CH 5 Normal Probability Distributions Properties of the Normal Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

6.1, 7.1 Estimating with confidence (CIS: Chapter 10)

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Central Limit Theorem

Section 8.1 Estimating μ When σ is Known

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

Simple Random Sampling. Sampling Distribution

GPCO 453: Quantitative Methods I Review: Hypothesis Testing

A point estimate is a single value (statistic) used to estimate a population value (parameter).

1. Variability in estimates and CLT

Probability & Statistics

Estimation and Confidence Intervals

Transcription:

Statistics 13 Elementary Statistics Summer Session I 2012 Lecture Notes 5: Estimation with Confidence intervals 1 Our goal is to estimate the value of an unknown population parameter, such as a population mean or a proportion from a binomial population. We want to use the sample information to estimate the population parameter of interest (called the target parameter) and to assess the reliability of the estimate. Different techniques will be used for estimating a mean or proportion, depending on whether a sample contains a large or small number of measurements. Definition 5.1 The unknown population parameter (e.g., mean or proportion) that we are interested in estimating is called the target parameter. Determining the Target Parameter Parameter Key Words or Phrases Type of Data µ Mean; Average Quantitative p Proportion; percentage; fraction; rate Qualitative (success or failure) For examples, the words mean in mean gas mileage and average in average life expectancy imply that the target parameter is the population mean µ. The word proportion in proportion of Iraq War veterans with post-traumatic stress syndrome indicates that the target parameter is the binomial proportion p. 5.2 Large-Sample Confidence Interval for a Population Mean Motivative example: Suppose a large hospital wants to estimate the average length of time patients remain in the hospital. Hence, the hospital s target parameter is the population mean µ. To accomplish this objective, the hospital administrators plan to randomly sample 100 of all previous patients records and to use the sample mean X of the lengths of stay to estimate µ, the mean of all patients visits. The sample mean X represents a point estimator of the population mean µ. How can we assess the accuracy of this large-sample point estimator? By the central limit theorem, we know that the sampling distribution of the sample mean is approximately normal for large samples. What are the chances that the interval X ± 2σ X = X ± 2σ n will enclose µ, the population mean? 1 Last update: July 9, 2012 1

Definition 5.2 An interval estimator (or confidence interval) is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter. Definition 5.3 The confidence coefficient is the probability that an interval estimator encloses the population parameter that is, the relative frequency with which the interval estimator encloses the population parameter when the estimator is used repeatedly a very large number of times. The confidence level is the confidence coefficient expressed as a percentage. For example, if our confidence level is 95%, then in the long run, 95% of our sample confidence intervals will contain µ. Definition 5.4 The value z α is defined as the value of the standard normal random variable Z such that the area α will lie to its right. In other words, P (Z > z α ) = α. We can construct a confidence interval with any desired confidence coefficient by increasing or decreasing the area (call it α) assigned to the tails of the sampling distribution. For example, if we place the area α/2 in each tail and if z α/2 is the z value such that α/2 will lie to its right, then the confidence interval with confidence coefficient is (1 α) is x ± z α/2 σ X Definition 5.5 The value z α is defined as the value of the standard normal random variable Z such that the area α will lie to its right. In other words, P (z > z α ) = α. Confidence Level 100(1-α) α α/2 z α/2 90% 0.10 0.05 1.645 95% 0.05 0.025 1.96 99% 0.01 0.005 2.575 Large-Sample 100(1-α)% Confidence Interval for µ The large-sample 100(1-α)% confidence interval for µ is x ± z α/2 σ X = x ± z α/2 σ n where z α/2 is the z value with an area α/2 to its right and σ x = σ/ n. The parameter σ is the standard deviation of the sampled population and n is the sample size. 2

Note: When σ is unknown (as is almost always the case) and n is large (say, n 30), the confidence interval is approximately equal to ( ) s x ± z α/2 n where s is the sample standard deviation. Conditions Required for a Valid Large-Sample Confidence Interval for µ 1. A random sample is selected from the target population. 2. The sample size n is large (i.e., n 30). (Due to the central limit theorem, this condition guarantees that the sampling distribution of x is approximately normal.) Interpretation for a Confidence Interval for a Population Mean When we form a 100(1-α)% confidence interval for µ, we usually express our confidence in the interval with a statement such as We can be 100(1-α)% confident that µ lies between the lower and upper bounds of the confidence interval, where, for a particular application we substitute the appropriate numerical values for the level of confidence and for the lower and upper bounds. The statement reflects our confidence in the estimation process, rather than in the particular interval that is calculated from the sample data. We know that repeated application of the same procedure will result in different lower and upper bounds on the interval. Furthermore, we know that 100(1-α)% of the resulting intervals will contain µ. There is (usually) no way to determine whether any particular interval is one of those which contain µ or one of those which do not. However, unlike point estimators, confidence intervals have some measure of reliability the confidence coefficient associated with them. For that reason, they are generally preferred to point estimators. 3

5.3 Small-Sample Confidence Interval for a Population Mean The use of a small sample in making an inference about µ presents two immediate problems. Problem 1 The shape of the sampling distribution of the sample mean X now depends on the shape of the population that is sampled. We can no longer assume that the sampling distribution of X is approximately normal, because the central limit theorem ensures normality only for samples that are sufficiently large. Solution The sampling distribution of X is exactly normal even for relatively small samples if the population is normal. It is approximately normal if the sampled population is approximately normal. Problem 2 The population standard deviation σ is almost always unknown. Although it is still true that σ X = σ/ n, the sample standard deviation s may provide a poor approximation for σ when the sample size is small. Solution Instead of using the standard normal statistic Z = X µ σ X = X µ σ/ n which requires knowledge of, or a good approximation to, σ, we define and use the statistic t = X µ s/ n in which the sample standard deviation s replaces the population standard deviation σ. If we are sampling from a normal distribution, the t-statistic has a sampling distribution very much like that of the z-statistic: mound shaped, symmetric, and with mean 0. The primary difference between the sampling distribution of t and Z is that the t-statistic is more variable than the Z, a property that follows intuitively when you realize that t contains two random quantities ( X and s), whereas z contains only one ( x). The actual amount of variability in the sampling distribution of t depends on the sample size n. A convenient way of expressing this dependence is to say that the t statistic has (n 1) degrees of freedom (df). Recall that the quantity (n 1) is the divisor that appears in the formula for s 2. This number plays a key role in the sampling distribution of s 2 and appears in discussions of other statistics in later lectures. In particular, the smaller the number of degrees of freedom associated with the t-statistic, the more variable will be its sampling distribution. 4

Small-Sample Confidence Interval for µ The small-sample confidence interval for µ is s x ± t α/2 n where t α/2 is based on (n 1) degrees of freedom. Conditions Required for a Valid Small-Sample Confidence Interval for µ 1. A random sample is selected from the target population. 2. The population has a relative frequency distribution that is approximately normal. 5.4 Large-Sample Confidence Interval for a Population Proportion Problem: Public-opinion polls are conducted regularly to estimate the fraction of U.S. citizens who trust the president. Suppose 1,000 people are randomly chosen and 637 answer that they trust the president. How would you estimate the true fraction of all U.S. citizens who trust the president? Solution: What we have really asked is how you would estimate the probability p of success in a binomial experiment in which p is the probability that a person chosen trusts the president. One logical method of estimating p for the population is to use the proportion of successes in the sample. That is, we can estimate p by calculating ˆp = Number of people sampled who trust the president Number of people sampled where ˆp is read p hat. Thus, in this case, Sampling Distribution of ˆp ˆp = 637 1, 000 = 0.637 1. The mean of the sampling distribution of ˆp is p; that is, ˆp is an unbiased estimator of p. 2. The standard deviation of the sampling distribution of ˆp is pq/n; that is, σ p = pq/n, where q = 1 p. 3. For large samples, the sampling distribution of ˆp is approximately normal. A sample size is considered large if the interval ˆp ± 3σˆp does not include 0 or 1. 5

Large-Sample Confidence Interval for p The large-sample confidence interval for p is pq ˆpˆq ˆp ± z α/2 σˆp = ˆp ± z α/2 n ˆp ± z α/2 n where ˆp = x and ˆq = 1 ˆp. n Note: When n is large, ˆp can approximate the value of p in the formula for σˆp. Conditions Required for a Valid Large-Sample Confidence Interval for p 1. A random sample is selected from the target population. 2. The sample size n is large. (this condition will be satisfied if both nˆp 15 and nˆq 15. Note that nˆp and nˆq are simply the number of successes and number of failures, respectively, in the sample. Unless n is extremely large, the large-sample procedure presented in this section performs poorly when p is near 0 or 1. To overcome this potential problem, an extremely large sample size is required. Since the value of n required to satisfy extremely large is difficult to determine, statisticians have proposed an alternative method, based on the Willson (1927) point estimator of p. Researchers have shown that the confidence interval with Wilson s adjustment for estimating p works well for any p, even when the sample size n is very small. Adjusted (1 α)100% Confidence Interval for a Population Proportion p An adjusted confidence interval for p is p ± z α/2 p(1 p) n + 4 where p = x+2 is the adjusted sample proportion of observations with the characteristic n+4 of interest, x is the number of successes in the sample, and n is the sample size. 6

5.5 Determining the Sample Size In this section, we show the appropriate sample size for making an inference about a population mean or proportion depends on the desired reliability. Determination of Sample Size for 100(1 α)% Confidence Intervals for µ In order to estimate µ with a sampling error SE, half-width of the confidence interval, and with 100(1 α)% confidence, the required sample size is found as follows: The solution for n is given by the equation z α/2 ( σ n ) = SE n = (z α/2) 2 σ 2 SE 2 The value of σ is usually unknown. It can be estimated by the standard deviation s from a previous sample. Alternatively, we may approximate the range R of observations in the population and (conservatively) estimate σ R/4. In any case, you should round the value of n obtained upward to ensure that the sample size will be sufficient to achieve the specified reliability. Determination of Sample Size for 100(1 α)% Confidence Interval for p In order to estimate a binomial probability p with sampling error SE and with 100(1 α)% confidence, the required sample size is found by solving the following equation for n: z α/2 pq n = SE The solution for n can be written as follows: n = (z α/2) 2 (pq) (SE) 2 Since the value of the product pq is unknown, it can be estimated by the sample fraction of successes, ˆp, from a previous sample. We can show that the value of pq is at its maximum when p equals 0.5, so you can obtain conservatively large values of n by approximating p by 0.5 or values close to 0.5. In any case, you should round the value of n obtained upward to ensure that the sample size will be sufficient to achieve the specified reliability. 7