point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Estimatio We have oted that the pollig problem which attempts to estimate the proportio p of Successes i some populatio ad the measuremet problem which attempts to estimate the mea value µ of some quatity withi a populatio are situatios we hadle by samplig data. To estimate a populatio proportio p, it makes sese to compute the sample proportio p; but sice it is wisest to choose radom samples, the computatio of this statistic is best regarded as a radom variable P. Similarly, to estimate the populatio mea µ, it makes sese to compute the sample proportio x; but sice it is wisest to choose radom samples, the computatio of this statistic is best regarded as a radom variable X. 1

poit estimator a radom variable (like P or X) whose values are used to estimate a populatio parameter poit estimate ay value of a poit estimator (like p or x) derived from a particular sample A poit estimator is said to be ubiased if its expected value is equal to the parameter beig estimated; efficiet if its stadard error is less tha that of other poit estimators; ad cosistet if, as the sample size gets larger, it more accurately estimates the parameter. 2

iterval estimator a rage of values which is used to estimate a populatio parameter cofidece iterval a iterval estimator which cotais the parameter beig estimated with a certai level of cofidece (i.e., with a high probability) margi of error half the width of a cofidece iterval; cofidece itervals are geerally costructed as cetered o some poit estimate, that is, they have the form Poit estimate ± Margi of error 3

Cofidece itervals for the populatio mea Recall that the samplig distributio of the sample mea X is ormally distributed provided that either we are samplig from a ormal populatio of X values, or the sample is large (o the order of 30). Uder this assumptio, the Empirical Rule tells us that (roughly) 95% of all sampled values of X will lie withi two stadard deviatios (SD( X) = σ/ ) of its expected value (E( X) = µ). That is, P (µ 2 σ X µ + 2 σ ) 0.95. We ca do a bit better tha this by replacig this approximate probability with a more accurate value: by ivertig the ormal probability distributio fuctio, we fid that 95% of values of a stadard ormal radom variable Z fall withi 1.96 stadard deviatios of the mea: P ( 1.96 Z 1.96) = 0.95. Thus, P (µ 1.96 σ X µ + 1.96 σ ) = 0.95. 4

Next, we observe that sice 95% of all sampled values of X will lie withi two stadard deviatios of the mea µ, the it is certaily true that the mea µ will lie withi two stadard deviatios of the sampled mea X. That is, P ( X 1.96 σ µ X + 1.96 σ ) = 0.95. This seemigly iocuous coclusio has far-reachig cosequeces: it states that 95% of all sampled values of X have the property that the ukow value of µ lies i the cofidece iterval X ± 1.96 σ. The choice of 95% i this statemet is merely coveiet; it is ot a ecessary feature of the cofidece iterval. We may replace it with ay suitably large level of cofidece. cofidece coefficiet, or level of sigificace (α) the probability that a cofidece iterval will ot cotai the parameter it iteds to estimate cofidece level (100(1 α)%) the percetage of sampled values of a poit estimate that produce cofidece itervals cotaiig the parameter they ited to estimate 5

Thus, the cofidece iterval above estimates µ at the α = 0.05 sigificace level ad has a 95% cofidece level. These choices both lead to the ormal tail probability z α/2 = 1.96 that appears i the cofidece iterval X ± 1.96 σ. More geerally, at ay sigificace level α, we obtai the followig: cofidece iterval for the mea assumig that we are samplig from a ormally distributed populatio of values of the radom variable X, or if the sample is large eough ( 30), the sampled mea value X = x leads to a cofidece iterval for µ at level of sigificace α give by x ± z α/2 σ. 6

Notice the three quatities that determie the margi of error: margi of error = z α/2 σ. The margi of error i a cofidece iterval estimate - is smaller for populatios with smaller stadard deviatio σ; - is smaller for samples of greater size ; - is smaller whe larger sigificace levels α are employed. However, sice σ is costat, the ivestigator has o meas to lower this parameter. Oly sample size ad sigificace level α ca be maipulated. A more serious practical cocer is maifest here: the cofidece iterval formula give above presumes that the ivestigator kows the value of σ. If the parameter µ is ukow, it is highly ulikely that σ will also be kow! That is, the ivestigator may be forced to estimate σ first i order to develop a estimate for µ. 7

Cofidece itervals ad the t df distributio There is a obvious solutio to this problem: estimate the stadard deviatio SD( X) = σ of the samplig distributio with the stadard error SE( X) = s. This replaces the ukow parameter σ with the kow sample statistic s. However, this estimatio carries with it its ow samplig variability, so whe we replace SD( X) with SE( X) i the cofidece iterval formula, the uderlyig samplig distributio o loger behaves eough like a ormal distributio. The additioal variability itroduced by estimatig σ with s causes more variability i the samplig distributio of X, producig a symmetric bell-shaped distributio, but with thicker tails tha the ormal distributio. The precise ature of this ew samplig distributio was discovered by W. S. Gossett i the early 1900s, who published his fidigs uder the pseudoym Studet. Thus, it is kow as the Studet t df distributio, or the t distributio with df degrees of freedom. Degrees of freedom is a additioal parameter that satisfies df = 1. 8

Properties of the t df distributio like the stadard ormal distributio, the t distributio is symmetric, asymptotic, ad bell-shaped, with mea equal to 0 ad stadard deviatio equal to 1; ulike the stadard ormal distributio, the t distributio has thicker tails ad a less promiet cetral peak; the smaller the df, the thicker the tails; as the umber of df icreases, the tails of the t distributio thi out ad its cetral promiece icreases i desity, approximatig more closely the stadard ormal desity fuctio. Whereas the samplig distributio of X with σ kow is ormal, so that the stadardized variable Z = X µ σ/ follows the stadard ormal distributio, the samplig distributio of X with σ ukow is stadardized so that the variable T = X µ s/ follows the Studet t df distributio with df degrees of freedom. 9

cofidece iterval for the mea (σ ukow) assumig that we are samplig from a ormally distributed populatio of values of the radom variable X, or if the sample is large eough ( 30), the sampled mea value X = x leads to a cofidece iterval for µ at level of sigificace α give by s x ± t α/2,df 10

Cofidece itervals for the populatio proportio Recall that the samplig distributio of the sample proportio P is, by the Cetral Limit Theorem, approximated well by a ormal distributio. This approximatio is valid provided that we are samplig radomly ad that the sample is large large eough so that it cotais at least 5 expected successes (that is, p 5) ad at least 5 expected failures (that is, (1 p) 5). Uder these assumptios, we coclude that P is ormally distributed, with expected value E( P ) = p (the populatio proportio) ad stadard deviatio give by the formula SD( P ) = p(1 p)/. It follows that if a radom sample from the populatio produces the value P = p, the ( ) p(1 p) p(1 p) P p 1.96 p p + 1.96 ad more geerally, that = 0.95, 11

P ( p z α/2 p(1 p) ) p(1 p) p p + z α/2 = 1 α, which ( ca be recast i the form ) p(1 p) p(1 p) P p z α/2 p p + z α/2 = 1 α. Cosequetly, i a etirely aalogous maer to how we developed a cofidece iterval for the mea i the case that σ is kow, we ca produce a cofidece iterval for the proportio p with the formula p(1 p) p ± z α/2. The difficulty with this is that the margi of error i this last formula depeds o kowig p; but as p is what we are tryig to estimate i the first place, we do ot kow its value! Istead, we replace the stadard deviatio SD( P ) = p(1 p)/ i this formula with the stadard error SE( P ) = p(1 p)/ 12

to obtai the cofidece iterval for the proportio p assumig that we compute a value of P from a sample of size radomly selected from a populatio with true proportio of Success equal to p, where the sample is large eough so that it cotais at least 5 expected successes (p 5) ad at least 5 expected failures ((1 p) 5), the a cofidece iterval for p at level of sigificace α is give by p(1 p) p ± z α/2. 13

Choosig the sample size We have observed that the size of the margi of error i a cofidece iterval estimate decreases as the size of the sample icreases; this is because the margi of error equals z α/2 σ i the case of estimatig a mea (with σ kow), ad z α/2 p(1 p) i the case of estimatig a proportio. We ca use these facts to help choose the sample size so as to achieve a desired margi of error D: simply set D equal to the appropriate quatity above ad solve the equatio for : choosig whe estimatig µ for a desired margi of error D, the miimum sample size required to estimate µ with a cofidece iterval at sigificace level α is = ( ) zα/2ˆσ 2 where ˆσ represets some reasoable estimate of σ (perhaps the value of s, or eve the crude estimate [rage/4] take from a pilot sample) 14 D

choosig whe estimatig p for a desired margi of error D, the miimum sample size required to estimate p with a cofidece iterval at sigificace level α is ( zα/2 ) 2 = ˆp(1 ˆp) D where ˆp represets some reasoable estimate of p (perhaps the value of p take from a pilot sample, or eve the coservative estimate ˆp = 50%) 15