ECON 5350 Class Notes Maximum Likelihood Estimatio 1 Maximum Likelihood Estimatio Example #1. Cosider the radom sample {X 1 = 0.5, X 2 = 2.0, X 3 = 10.0, X 4 = 1.5, X 5 = 7.0} geerated from a expoetial distributio. What is the maximum likelihood (ML) estimator of β? Aswer. Begi by formig the likelihood fuctio, L(θ): L = f(x 1, x 2, x 3, x 4, x 5 ; β) = 5 f(x i) = 5 1 β exp( x i/β) = 1 β 5 exp( 5 x i/β) where θ = 1/β. It is ofte more coveiet to work with the mootoic trasformatio: l L(θ) = l(θ 5 ) θ(x 1 + x 2 + x 3 + x 4 + x 5 ) = 5 l(θ) 21θ. The ML estimator of θ, ˆθ, is the value of θ that maximizes L(θ) or l L(θ). Now we calculate ˆθ. d l L(θ) dθ = 5 θ 21 = 0 = ˆθ = 5/21 = ˆβ = 4.2. Next, we check the secod-order coditio to esure that ˆθ = 5/21 is ideed a maximum. d 2 l L(θ) dθ 2 = 5θ 2 < 0. Therefore, ˆβ = 4.2 is the maximum likelihood estimator of E(X) = β. Notes: [ ] [ ] 1. The iformatio umber is I(θ) = E 2 l L(θ) = E ( θ 2 θ ) 2. [ ] 2. The iformatio matrix is I(θ) = E 2 l L(θ) θ θ = E (k 1) colum vector. [ ] θ θ where θ = {θ 1,..., θ k } is a 3. The Cramer-Rao lower boud, I(θ) 1, is the lowest value the variace of a ubiased estimator ˆθ ca attai, give certai regularity coditios are satisfied. 1
Example #2. Fid the ML estimators for µ ad σ 2 from a ormal distributio. Let X 1,..., X be a radom sample from N(µ, σ 2 ). L(µ, σ 2 ) = [ { (2πσ 2 ) 0.5 exp ( 1 }] 2σ 2 )(x i µ) 2. Takig atural logs: l L(µ, σ 2 ) = 0.5 l(2πσ 2 ) 1 2σ 2 (x i µ) 2. First take partial derivatives with respect to µ ad σ 2 : µ = 1 σ 2 (x i µ); σ 2 2 l L(θ) µ 2 = σ 2 ; 2 l L(θ) µ σ 2 = 1 σ 4 (x i µ); = 2σ 2 + 1 2σ 4 (x i µ) 2 2 l L(θ) (σ 2 ) 2 = 2σ 4 1 σ 6 (x i µ) 2. Now set first derivatives equal to zero ad solve for the ML estimators: = 1 µ σ 2 (x i µ) = 0 = ˆµ = X σ 2 = 2σ 2 + 1 2σ 4 (x i µ) 2 = 0 = ˆσ 2 = 1 (x i X) 2. Cramer-Rao Lower Boud θ = (µ, σ 2 ). The iformatio matrix is I(θ) = E 1 σ 2 1 σ 4 (x i µ) σ 4 (x i µ) 2σ 1 4 σ 6 (x i µ) 2 = σ 0 2 0 2σ 4 ad the CRLB is I(θ) 1 = σ 2 0 0 2σ 4. Questio. Aswer. Are X, s 2 ad ˆσ 2 effi ciet estimators? Recall, E( X) = µ, E(s 2 ) = σ 2 ad E(ˆσ 2 ) = 1 σ2. var( X) = σ 2 / = X is a miimum variace liear ubiased estimator. var(s 2 ) = 2σ 4 /( 1) = s 2 may or may ot be ubiased effi ciet. ˆσ 2 p σ 2 ad asy.var.(ˆσ 2 ) = 2σ 4 / = ˆσ 2 is asymptotically effi ciet. 2
Properties of ML Estimators (uder regularity, Greee p. 515 ). 1. ˆθ ML p θ. 2. ˆθ ML asy N(θ, I 1 (θ)). 3. ˆθ ML achieves the CRLB ad is therefore asymptotically effi ciet. 4. Ivariace (i.e., γ = g(θ) = ˆγ ML = g(ˆθ ML )). Notes: The asymptotic covariace matrix of ˆθ ML is ofte hard or impossible to estimate. Three possible (asymptotically equivalet) estimators are: 1. I 1 (ˆθ ML ), which is ofte ot feasible. 2. ( ) 2 1 l L(ˆθ), which is sometimes quite complicated. ˆθ ˆθ 3. BHHH estimator: ( l f(x i, ˆθ) θ l f(x i, ˆθ) ) 1. ˆθ 2 Likelihood Ratio, Wald ad Lagrage Multiplier Tests The likelihood ratio (LR), Wald (W) ad Lagrage multiplier (LM) tests are asymptotically equivalet tests that may produce differet results i small samples. Whe o other iformatio exists, you ca choose the test that is the easiest to compute. See the attached figure for a graphical represetatio of each test. 2.1 Likelihood Ratio Test Let ˆθ R (ˆθ U ) ad ˆL R (ˆL U ) be the restricted (urestricted) estimate ad likelihood value, respectively. Let the ull ad alterative hypotheses be H 0 : c(θ) = q H 1 : c(θ) q. The likelihood ratio is defied as λ = ˆL R /ˆL U where 0 λ 1. The LR statistic is the LR = 2 l λ asy χ 2 (r) 3
where r is the umber of restrictios imposed. 2.2 Wald Test I the LR test, oe eeds to calculate ˆL U ad ˆL R. A advatage of the Wald test is that ˆθ R does ot eed to be calculated. The Wald statistic is W = (c(ˆθ U ) q) var(c(ˆθ U ) q) 1 (c(ˆθ U ) q) asy χ 2 (r). If c(ˆθ) is ormally distributed, the W is a quadratic form i a ormal vector ad is distributed chi-square for all sample sizes. 2.3 Lagrage Multiplier Test This test is based o the restricted model. Derivatio. Begi by formig the Lagragia: l L (θ) = l L(θ) + λ (c(θ) q). The first-order coditios are l L θ l L λ = θ = c(θ) q = 0. + c(θ) θ λ = 0 At ˆθ R, l L(ˆθ R ) ˆθ R = c(ˆθ R ) ˆθ R ˆλ = ĝr. If H 0 : c(θ) = q is correct, ĝ R should be close to zero i large samples. This fact is used as motivatio for LM = ĝ RI 1 (ˆθ R )ĝ R asy χ 2 (r). 2.3.1 A Example Usig the LR, W ad LM Tests Cosider a artificial radom sample ( = 100) from a expoetial(β = 0.1) distributio. The log likelihood fuctio is l L(θ) = l(θ) θ x i 4
where θ = 1/β. The first-order coditio ad urestricted ML estimator is θ = θ x i = 0 = ˆθ U = X 1. The secod-order coditio is 2 l L(θ) θ 2 = θ 2 < 0 so ˆθ U is ideed a maximum. Now cosider testig the followig hypothesis H 0 : θ = 7.5 H 1 : θ 7.5 so that ˆθ R = 7.5 1. Likelihood Ratio Test The likelihood values are ˆL U = ˆL R = ˆθ 100 U exp( ˆθ U x i) ˆθ 100 R exp( ˆθ R x i) ad the LR statistic is LR = 2 l(ˆl R /ˆL U ). 2. Wald Test The Wald statistic is ( where var(ˆθ U ) = Î 1 (ˆθ U ) = 3. Lagrage Multiplier Test The LM statistic is W = (ˆθ U 7.5) 2 var(ˆθ U 7.5) = (ˆθ U 7.5) 2 var(ˆθ U ) 2 l L(ˆθ U ) ˆθ 2 U ) 1 = ˆθ 2 U /. LM = ĝ2 R I(ˆθ R ) where ĝ R = ˆθR x i ad I(ˆθ R ) = /ˆθ 2 R. 5
Fially, the critical regio is defied by the chi-square critical value with r = 1 degrees of freedom ad a 95% cofidece level. Usig the chi-square table (iside cover i Greee s text), the critical value is 3.84. Therefore, If LR, W or LM is greater tha 3.84, we reject the ull H 0 : θ = 7.5 i favor of the alterative. If LR, W or LM is less tha or equal to 3.84, we fail to reject the ull H 0 : θ = 7.5. 3 Maximum Likelihood Estimatio: Regressio Model with Ω Kow Now cosider effi ciet estimatio via maximum likelihood whe the errors are ormally distributed. The log likelihood fuctio is l L(β, σ 2 Y, X) = 0.5 l(2π) 0.5 l σ 2 Ω 0.5(Y Xβ) (σ 2 Ω) 1 (Y Xβ). (1) Takig derivates of (1) with respect to β ad σ 2 ad settig equal to zero gives l(l) β = 1 σ 2 (X Ω 1 Y X Ω 1 Xβ) = 0 (2) l(l) σ 2 = 0.5 σ 2 + 0.5 1 σ 4 (Y X β) (Y X β) = 0. (3) Solvig this set of equatios gives ˆβ ML = (X X ) 1 (X Y ) ˆσ 2 ML = e e / so that whe ɛ N(0, σ 2 Ω), the ML estimator is the GLS estimator. 4 Maximum Likelihood Estimatio: Regressio Model with Ω Ukow Cosider maximizatio of (1) by choosig Ω, as well as β ad σ 2. The problem with treatig Ω as a free parameter is that it icludes ( + 1)/2 ukow elemets, while there are oly data poits. This ca 6
also be see by takig first-order coditio with respect to Ω, settig equal to zero ad solvig: l(l) Ω 1 = 0.5(Ω 1 σ 2 e e ) = 0, which implies that ˆΩ ML = e e /ˆσ 2 ML, which is a sigular matrix ad caot be used i the GLS formula. The obvious solutio is to parameterize Ω with a smaller umber of parameters θ, i.e., Ω(θ). The we would istead take derivative of l(l) with respect to θ, set equal to zero, ad solve joitly with (2) ad (3). This will be a oliear optimizatio problem, for which the search methods outlied earlier could be applied. Alteratively, a (iterative) two-step procedure, credited to Oberhofer ad Kmeta, is possible. 1. Fid a cosistet estimate of θ ad use it to calculate ˆβ F GLS ad ˆσ 2 F GLS. 2. Reestimate θ usig ˆβ F GLS ad ˆσ 2 F GLS ad the equatio l(l)/ θ = 0. This procedure will be asymptotically effi ciet at step #2 (ad all subsequet iteratios), ad uder fairly iocuous coditios, ca be show to coverge to the ML estimator. Further iteratios of steps #1 ad #2, while providig o asymptotic beefits, may produce better results i smaller samples. 7