Sampling Distributions & Estimators

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 Samplig Distributios & Estimators I. Estimators The Importace of Samplig Radomly Three Properties of Estimators 1. Ubiased 2. Cosistet 3. Efficiet I am grateful to previous TFs for may of these materials.

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 From the Estimators Module Quiz: Suppose you are iterested i estimatig the mea household icome of a populatio ad collect data o a radom sample of households. Cosider the followig estimator: Estimator = X 1, the value of the first icome listed i the sample (i.e. the icome for the household we happe to have surveyed first) More formally, if sample icomes are {x l, x 2, x 3,, x }, the estimate is x 1 If you are usig this estimator to estimate the populatio mea, is this estimator: Ubiased? Cosistet? Practice: For all of the followig questios, assume the populatio parameter is µ. 1. Show that X is ubiased. 2. Derive the variace of X. 2

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 3. Fill out the followig table: Estimator Ubiased? Cosistet? Most Efficiet? Y 1 +Y 25 +Y 99 3 (Y i ) + 5 i=1 i=1 (Y i ) 3

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 Target Parameter (θ) Sample Size (s) Poit Estimator (ˆθ) E(ˆθ) Stadard Deviatio of Samplig Distributio (σˆθ) µ Y µ σ p ˆp p p(1 p) µ 1 µ 2 1, 2 Y 1 Y 2 µ 1 µ 2 σ 1 2 2 1 + σ 2 2 p 1 p 2 1, 2 ˆp 1 ˆp 2 p 1 p 2 p 1 (1 p 1 ) 1 + p 2 (1 p 2 ) 2 Notes: 1. The expected values ad stadard errors show i the table are valid regardless of the form of the populatio probability desity fuctio. 2. All four estimators possess probability distributios that are approximately ormal for large samples. 3. For the last two rows, the two samples are assumed to be idepedet. 4. For all rows, the samples are assumed to be simple radom samples. 4

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 II. The Cetral Limit Theorem If X 1, X 2,, X costitute a simple radom sample from a populatio with mea µ ad variace σ 2, the the sample mea X has a approximately ormal distributio with mea µ ad stadard error σ /, assumig the sample size is large eough (usually > 30 is sufficiet). The approximatio improves as icreases. This result holds regardless of the form of the populatio distributio. See this applet to visualize the CLT: http://oliestatbook.com/stat_sim/samplig_dist/ Experimet with differet sample sizes ad populatio distributios to see how the CLT is ivoked. Keep i mid this distictio from lecture: Distributio i the Populatio Distributio i the Sample "Samplig" Distributio Desity 0.02.04.06.08.1.12.14.16 Relative Frequecy Histogram of Age (Populatio N=1049) 20 30 40 50 60 70 Age Desity 0.02.04.06.08.1.12.14.16 Relative Frequecy Histogram of Age (Sample =50) 20 30 40 50 60 70 Age y 0.1.2.3.4.5 20 25 30 35 40 Age 5

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 Example: Survey of Registered Voters A telephoe survey of 2374 adult Americas fids that 1912 of the respodets are registered voters. Costruct a 90% Cofidece Iterval for the proportio of adults i the Uited States who are registered voters. 6

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 III. Brigig it All Together Meas Estimator: X Parameter: µ Why is this ormally distributed? Why is the mea of the distributio µ? Sice X is a ubiased estimator for µ (E( X) =µ), it will be cetered aroud µ. The Cetral Limit Theorem shows that the samplig distributio of some estimators (icludig X) is ormal (assumig a large eough sample size). µ ( ) X µ 1.96 X µ +1.96 X Why this width? The size of our iterval depeds o the: Stadard error of our estimator ( X = / p for X) Stadard deviatio of the populatio distributio ( ) Sample size Our multiplier (1.96 i this case) Distributio of our samplig distributio (Normal i this case) Size of our chose cofidece level (95% i this case) Why this particular estimate of X? No reaso! It s just the estimate that we happeed to get from a sigle sample of the populatio. I real life, this value (ad its stadard deviatio) is all we have to work with. 7

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 Proportios Estimator: ˆp Parameter: p Why is this ormally distributed? Why is the mea of the distributio p? Sice ˆp is a ubiased estimator for p (E(ˆp) =p), it will be cetered aroud p. The Cetral Limit Theorem shows that the samplig distributio of some estimators (icludig ˆp) is ormal (assumig a large eough sample size). p 1.96 ˆp ˆp p ( ) p 1.96 ˆp Why this width? The size of our iterval depeds o the: Stadard error of our estimator ( ˆp = p p(1 p)/ for ˆp) Stadard deviatio of the populatio distributio ( ) Sample size Our multiplier (1.96 i this case) Distributio of our samplig distributio (Normal i this case) Size of our chose cofidece level (95% i this case) Why this particular estimate of ˆp? No reaso! It s just the estimate that we happeed to get from a sigle sample of the populatio. I real life, this value (ad its stadard deviatio) is all we have to work with. 8

API-209 TF Sessio 2 Teddy Svoroos September 18, 2015 IV. Cofidece Itervals Thik about the various factors that cotribute to a cofidece iterval of the mea age of a populatio. What would happe to the samplig distributio (e.g. would it chage shape? ceter?) ad cofidece iterval (e.g. would it get larger? smaller?) as the followig icreased? 1. Sample size 2. Stadard deviatio of age i the populatio 3. Cofidece level 9