Lecture 5 Point Es/mator and Sampling Distribu/on

Lecture 5 Poit Es/mator ad Samplig Distribu/o Fall 03 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech

Road map Poit Es/ma/o Cofidece Iterval Es/ma/o Hypothesis Tes/g

Populatio, sample, statistics P S Statistics Populatio: all possible observatios Sample: observed measuremets take from populatio {x, x, x 3,, x } Statistics: data summaries S(x, x, x 3,, x ) 3

Why eed poit estimator Assume the data are actually radom variables geerated from a distributio (e.g., Normal, Expoetial, Biomial), but the distributio s PARAMETERS are ukow (e.g., mea, variace, rate, proportio) 4

Poit estimator Goal of estimatio : Create a best approximatio of the ukow parameter usig a statistic (summary or fuctio of data) called poit estimator Examples: ormal, biomial S(x, x,..., x ) 5

Sample proportio A automobile maufacturer has developed a ew type of bumper, which is supposed to absorb impacts with less damage tha previous bumpers. The maufacturer has used this bumper i a sequece of 5 cotrolled crashes agaist wall, each at 0 mph, usig oe of its compact models. Let X be the umber of crashes that result i o visible damage to the automobile. He observes that oly 5 out of 5 cars have o damage aser crash. The maufacturer is iterested i the propor/o of all such crashes that result i o damage. We defie this feature through the parameter p, which is the probability of o damage i a sigle crash. 6

Commo poit estimators Estimatig populatio mea sample mea ˆµ X i Estimatig populatio variace X i sample variace σˆ S - i (X i - X) Estimatig populatio proportio sample proportio p ˆ X if X ~ BIN(,p) 7

Samplig distributio If we treat data x,,x as realiza/os from radom variables X,,X, the poit es/mators which are fuc/os of RVs are also radom variables, ad so they also have distribu/os called SAMPLING DISTRIBUTIONS. Samplig Distributio for X/ Example: Samplig distributio of pˆ whe X~Bi(5,p/) 0.35 0.3 0.5 0. 0.5 0. 0.05 0 0 0. 0.4 0.6 0.8 8

Sample mea: samplig distributio For X,,X ~ mea µ ad variace σ, estimate mea µ usig the sample mea X +... + X ˆ µ. For X,,X ~ N( µ, σ ) ad σ kow, the the samplig distributio is ˆ µ ~ N( µ, σ / ).. For large, a approximate samplig distributio is ˆµ~N(µ,σ / ). Distribu(o of sample differece? Example: tes/g drug effec/veess. 9

Example: airport wait time The amout of time that a customer speds waitig i the airport security is a radom variable with mea 8. miutes ad stadard deviatio.5 miutes. Suppose that a radom sample of 49 customers is observed. Fid the probability that the average waitig time for these customers is: (a) Less tha 0 miutes (b) Betwee 5 ad 0 miutes 0

Sample Mea: Samplig Distributio Example: Stadard error computatio for the sample mea. For σ kow, the stadard error is V( ˆ) µ σ /.. For σ ukow, the stadard error estimate is V( ˆ) µ σ S ( x i x) /( ).

What is a good es/mator?

What is a good estimator? Suppose we are estimatig the ukow parameter θ with a poit estimator θ(x ˆ, X,..., X ) There are two properties for good estimator ubiased E[ ˆ θ(x, X,..., X )] θ E( ˆ) θ θ iscalled BIAS Bias small variace VAR( ˆ θ) Cosistecy: ubiased ad variace goes to 0 whe goes to ifiity

Evaluatio of estimators: Example Take two istrumets measurig blood pressure. Oe of them has bee calibrated, where the secod oe gives values larger tha the true oes. For oe idividual, we measure the blood pressure for 0 /mes: x,,x 0 with the first istrumet ad y,,y 0 with the secod istrumet. Say θ is the blood pressure of the same idividual. If we es/mate θ by x x +... + x0 0 ad y y +... + y0 0 based o the first ad the secod set of measuremets. Which of the two es/mates will be more biased? 4

Sample proportio Our data is x is sampled from BIN(,p) Poit Estimatio of p: Samplig Distributio of pˆ E( ) p, VAR( ) p(-p)/ pˆ p pˆ. As X ~ B(,p), the exact samplig distributio is: X pˆ x ~ Bi(,p). For large (CLT), a approximate distributio is: pˆ ~ N(p,p(-p)/)

Car crash example Example: Observe x 5 cars that have o damage after crash, whe 5 cars are tested. What is a approximate stadard error for the estimated sample proportio? If we wat the stadard error to be less tha., how may tests are eeded?

Sample mea: ubiasedess ad cosistecy For X,,X ~ mea µ ad variace σ, estimate mea µ usig the sample mea ad E X + + E X µ E µ ( )... ( ) ( ˆ) µ So µˆ is ubiased As the bias is zero, we eed to check whether the variace goes to zero: VAR( µˆ ) σ / So µˆ is cosistet 7

Chi-square distributio Let Z,, Z ~N(0, ) ad idepedet Y Z + Z +... + Z ~ χ 0.5 0.5 0. 0.075 0.05 0.05 X ~ χ 5 f(x) > 0 oly for x >0, Chi-Square has oe parameter, called it s degrees of freedom.5 5 7.5 0.5 5 X ~ χ k : f (x k) E(X) k, k/ Γ( k / ) xk/ e x/, Γ(k) (k )! VAR(X) k

Y ad W differ oly by oe degree-of-freedom ( vs. -) ~ ) (, i i i i X X W the X X Let χ σ If X,,X ~ N(µ,σ ) are idepedet, what is the distributio of? i X i Y σ µ Chi-square distributio

Sample Variace: Samplig Distributio If X,,X ~ N( µ, σ ), the S ( σ ) ~ χ 0

Sample Variace: Ubiasedess ad Cosistecy E(S ) σ - E " S ( ) % # & $ σ ' ( )σ σ VAR(S ) σ 4 (-) V " S ( ) % # & $ σ ' σ 4 4 σ (-) ( ) Therefore, S is ubiased ad cosistet.

Summary of Samplig Distributios Oe popula/o Two popula/os Sample Mea Kow Variace Ukow Variace X µ ~ N(0,) σ X µ ~ t( ) S ( X X ) ( µ µ ) ~ NID(0,) σ σ + ( X X ) ( µ µ ) ~ t( + ) S p + Sample Variace ( ) S ~ χ ( ) σ S S σ ~ F(, σ )

Mea Square Error (MSE) Quality of estimator usually measured by aother quatity mea square error MSE( ˆ θ) E ˆ θ θ MSE( θ) ˆ E( θ ˆ θ)! " E( θ ˆ θ) # $ + var( θ ˆ θ) MSE( Θˆ ) [ ] Bias( Θˆ ) + var( Θˆ )

Geeral Cocepts of Poit Estimatio: Miimum Variace Ubiased Estimator (MVUE) Θˆ is a MVUE if: a) It is ubiased estimator of θ, b) It satisfies the followig equality, var( Θˆ ) E l f ( x) θ Cramér-Rao Boud