Topic 14: Maximum Likelihood Estimation

Toic 4: November, 009 As before, we begi with a samle X = (X,, X of radom variables chose accordig to oe of a family of robabilities P θ I additio, f(x θ, x = (x,, x will be used to deote the desity fuctio for the data whe θ is the true state of ature Defiitio The likelihood fuctio is the desity fuctio regarded as a fuctio of θ The maximum likelihood estimator (MLE, L(θ x = f(x θ, θ Θ ( ˆθ(x = arg max L(θ x ( θ Note that if ˆθ(x is a maximum likelihood estimator for θ, the g(ˆθ(x is a maximum likelihood estimator for g(θ For examle, if θ is a arameter for the variace ad ˆθ is the maximum likelihood estimator, the ˆθ is the maximum likelihood estimator for the stadard deviatio This flexibility i estimatio criterio see here is ot available i the case of ubiased estimators Tyically, maximizig the score fuctio l L(θ x will be easier Examles Examle (Beroulli trials If the exerimet cosists of Beroulli trial with success robability θ, the L(θ x = θ x ( θ ( x θ x ( θ ( x = θ (x+ +x ( θ (x+ +x l L(θ x = l θ( x i + l( θ( x i = x l θ + ( x l( θ ( x l L(θ x = θ θ x θ This equals zero whe θ = x Check that this is a maximum Thus, ˆθ(x = x Examle 3 (Normal data Maximum likelihood estimatio ca be alied to a vector valued arameter For a simle radom samle of ormal radom variables, ( L(µ, σ x = πσ ex (x µ ( πσ ex (x µ 89 = (πσ ex (x i µ

l l Itroductio to Statistical Methodology 00e+00 50e-07 0e-06 5e-06 0 03 04 05 06 07 00e+00 0e- 0e- 0 03 04 05 06 07 log(l -0-8 -6-4 log(l -33-3 -9-7 0 03 04 05 06 07 0 03 04 05 06 07 Figure : Likelihood fuctio (to row ad its logarithm, the score fuctio, (bottom row for Berouli trials The left colum is based o 0 trials havig 8 ad successes The right colum is based o 40 trials havig 6 ad successes Notice that the maximum likelihood is aroximately 0 6 for 0 trials ad 0 for 40 Note that the eaks are more arrow for 40 trials rather tha 0 90

Itroductio to Statistical Methodology l L(µ, σ x = l πσ µ l L(µ, σ x = σ Because the secod artial derivative with resect to µ is egative, is the maximum likelihood estimator σ l L(µ, σ x = σ + (σ Recallig that ˆµ(x = x, we obtai (x i µ (x i µ = ( x µ σ ˆµ(x = x ( (x i µ = (σ σ ˆσ (x = (x i ˆx Note that the maximum likelihood estimator is a biased estimator (x i µ Examle 4 (Liear regressio Our data is observatios with oe exlaatory variable ad oe resose variable The model is that y i = α + βx i + ɛ i where the ɛ i are ideedet mea 0 ormal radom variable The (ukow variace is σ The likelihood fuctio L(α, β, σ y, x = (πσ ex l L(α, β, σ y, x = l πσ (y i (α + βx i (y i (α + βx i This, the maximum likelihood estimators ˆα ad ˆβ also the least square estimator The redicted value for the resose variable ŷ i = ˆα + ˆβx i The maximum likelihood estimator for σ is The ubiased estimator is ˆσ MLE = ˆσ U = (y i ŷ i k= (y i ŷ i k= For the measuremets o the legths i cetimeters of the femur ad humerus for the five secimes of Archeoteryx, we have the followig R outut for liear regressio > femur<-c(38,56,59,64,74 > humerus<-c(4,63,70,7,84 > summary(lm(humerus femur Call: 9

Itroductio to Statistical Methodology lm(formula = humerus femur Residuals: 3 4 5-086 -03668 3045-0940 -090 Coefficiets: Estimate Std Error t value Pr(> t (Itercet -365959 445896-08 047944 femur 9690 007509 594 0000537 *** --- Sigif codes: 0 *** 000 ** 00 * 005 0 Residual stadard error: 98 o 3 degrees of freedom Multile R-squared: 09883,Adjusted R-squared: 09844 F-statistic: 54 o ad 3 DF, -value: 00005368 The residual stadard error of 98 cetimeters is obtaied by squarig the 5 residuals, dividig by 3 = 5 ad takig a square root Examle 5 (Uiform radom variables If our data X = (X,, X are a simle radom samle draw from uiformly distributed radom variable whose maximum value θ is ukow, the each radom variable has desity { /θ if 0 x θ, f(x θ = 0 otherwise Therefore, the likelihood { /θ if, for all i, 0 x L(θ x = i θ, 0 otherwise Cosequetly, to maximize L(θ x, we should miimize the value of θ i the first alterative for the likelihood This is achieved by takig ˆθ(x = max i x i However, ˆθ(X = max i X i < θ ad the maximum likelihood estimator is biased For 0 x θ, the distributio of X ( = max i X i is F ( (x = P { max i X i x} = P {X x} = (x/θ Thus, the desity The mea ad thus is a ubiased estimator of θ f ( (x = x θ E θ X ( = + θ d(x = + X ( 9

Itroductio to Statistical Methodology Asymtotic Proerties Much of the attractio of maximum likelihood estimators is based o their roerties for large samle sizes Cosistecy If θ 0 is the state of ature, the if ad oly if L(θ 0 X > L(θ X l f(x i θ 0 f(x i θ > 0 By the strog law of large umbers, this sum coverges to [ E θ0 l f(x ] θ 0 f(x θ which is greater tha 0 From this, we obtai We call this roerty of the estimator cosistecy ˆθ(X θ 0 as Asymtotic ormality ad efficiecy Uder some assumtios that is meat to isure some regularity, a cetral limit theorem holds Here we have (ˆθ(X θ0 coverges i distributio as to a ormal radom variable with mea 0 ad variace /I(θ 0, the Fisher iformatio for oe observatio Thus Var θ0 (ˆθ(X I(θ 0, the lowest ossible uder the Crámer-Rao lower boud This roerty is called asymtotic efficiecy 3 Proerties of the log likelihood surface For large samle sizes, the variace of a MLE of a sigle ukow arameter is aroximately the egative of the recirocal of the the Fisher iformatio Thus, the estimate of the variace give data x [ ] I(θ = E l L(θ X θ / ˆσ = l L(ˆθ x θ the egative recirocal of the secod derivative, also kow as the curvature, of the log-likelihood fuctio evaluated at the MLE If the curvature is small, the the likelihood surface is flat aroud its maximum value (the MLE If the curvature is large ad thus the variace is small, the likelihood is strogly curved at the maximum For a multidimesioal arameter sace θ = (θ, θ,, θ, Fisher iformatio I(θ is a matrix, the ij-th etry is [ I(θ i, θ j = E θ l L(θ X ] [ ] l L(θ X = E θ l L(θ X θ i θ j θ i θ j 93

Itroductio to Statistical Methodology Examle 6 To obtai the maximum likelihood estimate for the gamma family of radom variables, write ( ( β α β α L(α, β x = Γ(α xα e βx Γ(α xα e βx l L(α, β x = (α l β l Γ(α + (α To determie the arameters that maximize the likelihood, solve the equatios l x i β x i α l L(ˆα, ˆβ x = (l ˆβ d dα l Γ(ˆα + l x i = 0, l x = d l Γ(ˆα l ˆβ dα ad β l L(ˆα, ˆβ x = ˆαˆβ To comute the Fisher iformatio matrix ote that x i = 0, x = ˆαˆβ I(α, β = d l L(α, β x = l Γ(α, α dα I(α, β = β l L(α, β x = α β, This give a Fisher iformatio matrix The iverse I(α, β = I(α, β = αβ l L(α, β x = β I(α, β = ( d dα l Γ(α β β α( d dα l Γ(α α β ( α β β β d dα l Γ(α For the examle for the distributio of fitess effects α = 03 ad β = 535 ad = 00, ad ( ( I(03, 535 03 535 00000 006 = 00(03(9804 535 535 = (0804 006 3095 Var (03,535 (ˆα 00000, Var (03,535 ( ˆβ 3095 Comare this to the emirical values of 0066 ad 046 for the method of momets 94