ECON 5350 Class Notes Maximum Likelihood Estimation

Similar documents
5. Best Unbiased Estimators

Topic 14: Maximum Likelihood Estimation

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

STAT 135 Solutions to Homework 3: 30 points

5 Statistical Inference

14.30 Introduction to Statistical Methods in Economics Spring 2009

Parametric Density Estimation: Maximum Likelihood Estimation

Lecture 5 Point Es/mator and Sampling Distribu/on

18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

4.5 Generalized likelihood ratio test

x satisfying all regularity conditions. Then

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Summary. Recap. Last Lecture. .1 If you know MLE of θ, can you also know MLE of τ(θ) for any function τ?

Maximum Empirical Likelihood Estimation (MELE)

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Overlapping Generations

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

1 Random Variables and Key Statistics

Sampling Distributions and Estimation

Chapter 10 - Lecture 2 The independent two sample t-test and. confidence interval

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

AY Term 2 Mock Examination

Generative Models, Maximum Likelihood, Soft Clustering, and Expectation Maximization

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Unbiased estimators Estimators

Estimating Proportions with Confidence

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

Introduction to Probability and Statistics Chapter 7

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Statistics for Economics & Business

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Topic-7. Large Sample Estimation

ST 305: Exam 2 Fall 2014

ii. Interval estimation:

Math 124: Lecture for Week 10 of 17

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

Rafa l Kulik and Marc Raimondo. University of Ottawa and University of Sydney. Supplementary material

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Chapter 4: Asymptotic Properties of MLE (Part 3)

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Math 312, Intro. to Real Analysis: Homework #4 Solutions

1 Estimating sensitivities

FOUNDATION ACTED COURSE (FAC)

BASIC STATISTICS ECOE 1323

. (The calculated sample mean is symbolized by x.)

SELECTING THE NUMBER OF CHANGE-POINTS IN SEGMENTED LINE REGRESSION

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

EVEN NUMBERED EXERCISES IN CHAPTER 4

CHAPTER 8 Estimating with Confidence

Chapter 8: Estimation of Mean & Proportion. Introduction

Hopscotch and Explicit difference method for solving Black-Scholes PDE

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

Solutions to Problem Sheet 1

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Outline. Populations. Defs: A (finite) population is a (finite) set P of elements e. A variable is a function v : P IR. Population and Characteristics

Point Estimation by MLE Lesson 5

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Monetary Economics: Problem Set #5 Solutions

MA Lesson 11 Section 1.3. Solving Applied Problems with Linear Equations of one Variable

Point Estimation by MLE Lesson 5

Lecture 10: Point Estimation

Basic formula for confidence intervals. Formulas for estimating population variance Normal Uniform Proportion

Sampling Distributions and Estimation

B = A x z

The method of Maximum Likelihood.

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

Simulation Efficiency and an Introduction to Variance Reduction Methods

Lecture 4: Probability (continued)

Asymptotics: Consistency and Delta Method

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

Discriminating Between The Log-normal and Gamma Distributions

A point estimate is the value of a statistic that estimates the value of a parameter.

Chapter 8: Sampling distributions of estimators Sections

CAPITAL PROJECT SCREENING AND SELECTION

Sampling Distributions & Estimators

Calculation of the Annual Equivalent Rate (AER)

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

INTERVAL GAMES. and player 2 selects 1, then player 2 would give player 1 a payoff of, 1) = 0.

1 Basic Growth Models

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Lecture 5: Sampling Distribution

Notes on Expected Revenue from Auctions

ASYMPTOTIC MEAN SQUARE ERRORS OF VARIANCE ESTIMATORS FOR U-STATISTICS AND THEIR EDGEWORTH EXPANSIONS

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

SCHOOL OF ACCOUNTING AND BUSINESS BSc. (APPLIED ACCOUNTING) GENERAL / SPECIAL DEGREE PROGRAMME

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 2

Chapter 7 - Lecture 1 General concepts and criteria

Kernel Density Estimation. Let X be a random variable with continuous distribution F (x) and density f(x) = d

The material in this chapter is motivated by Experiment 9.

A DOUBLE INCREMENTAL AGGREGATED GRADIENT METHOD WITH LINEAR CONVERGENCE RATE FOR LARGE-SCALE OPTIMIZATION

6. Genetics examples: Hardy-Weinberg Equilibrium

Transcription:

ECON 5350 Class Notes Maximum Likelihood Estimatio 1 Maximum Likelihood Estimatio Example #1. Cosider the radom sample {X 1 = 0.5, X 2 = 2.0, X 3 = 10.0, X 4 = 1.5, X 5 = 7.0} geerated from a expoetial distributio. What is the maximum likelihood (ML) estimator of β? Aswer. Begi by formig the likelihood fuctio, L(θ): L = f(x 1, x 2, x 3, x 4, x 5 ; β) = 5 f(x i) = 5 1 β exp( x i/β) = 1 β 5 exp( 5 x i/β) where θ = 1/β. It is ofte more coveiet to work with the mootoic trasformatio: l L(θ) = l(θ 5 ) θ(x 1 + x 2 + x 3 + x 4 + x 5 ) = 5 l(θ) 21θ. The ML estimator of θ, ˆθ, is the value of θ that maximizes L(θ) or l L(θ). Now we calculate ˆθ. d l L(θ) dθ = 5 θ 21 = 0 = ˆθ = 5/21 = ˆβ = 4.2. Next, we check the secod-order coditio to esure that ˆθ = 5/21 is ideed a maximum. d 2 l L(θ) dθ 2 = 5θ 2 < 0. Therefore, ˆβ = 4.2 is the maximum likelihood estimator of E(X) = β. Notes: [ ] [ ] 1. The iformatio umber is I(θ) = E 2 l L(θ) = E ( θ 2 θ ) 2. [ ] 2. The iformatio matrix is I(θ) = E 2 l L(θ) θ θ = E (k 1) colum vector. [ ] θ θ where θ = {θ 1,..., θ k } is a 3. The Cramer-Rao lower boud, I(θ) 1, is the lowest value the variace of a ubiased estimator ˆθ ca attai, give certai regularity coditios are satisfied. 1

Example #2. Fid the ML estimators for µ ad σ 2 from a ormal distributio. Let X 1,..., X be a radom sample from N(µ, σ 2 ). L(µ, σ 2 ) = [ { (2πσ 2 ) 0.5 exp ( 1 }] 2σ 2 )(x i µ) 2. Takig atural logs: l L(µ, σ 2 ) = 0.5 l(2πσ 2 ) 1 2σ 2 (x i µ) 2. First take partial derivatives with respect to µ ad σ 2 : µ = 1 σ 2 (x i µ); σ 2 2 l L(θ) µ 2 = σ 2 ; 2 l L(θ) µ σ 2 = 1 σ 4 (x i µ); = 2σ 2 + 1 2σ 4 (x i µ) 2 2 l L(θ) (σ 2 ) 2 = 2σ 4 1 σ 6 (x i µ) 2. Now set first derivatives equal to zero ad solve for the ML estimators: = 1 µ σ 2 (x i µ) = 0 = ˆµ = X σ 2 = 2σ 2 + 1 2σ 4 (x i µ) 2 = 0 = ˆσ 2 = 1 (x i X) 2. Cramer-Rao Lower Boud θ = (µ, σ 2 ). The iformatio matrix is I(θ) = E 1 σ 2 1 σ 4 (x i µ) σ 4 (x i µ) 2σ 1 4 σ 6 (x i µ) 2 = σ 0 2 0 2σ 4 ad the CRLB is I(θ) 1 = σ 2 0 0 2σ 4. Questio. Aswer. Are X, s 2 ad ˆσ 2 effi ciet estimators? Recall, E( X) = µ, E(s 2 ) = σ 2 ad E(ˆσ 2 ) = 1 σ2. var( X) = σ 2 / = X is a miimum variace liear ubiased estimator. var(s 2 ) = 2σ 4 /( 1) = s 2 may or may ot be ubiased effi ciet. ˆσ 2 p σ 2 ad asy.var.(ˆσ 2 ) = 2σ 4 / = ˆσ 2 is asymptotically effi ciet. 2

Properties of ML Estimators (uder regularity, Greee p. 515 ). 1. ˆθ ML p θ. 2. ˆθ ML asy N(θ, I 1 (θ)). 3. ˆθ ML achieves the CRLB ad is therefore asymptotically effi ciet. 4. Ivariace (i.e., γ = g(θ) = ˆγ ML = g(ˆθ ML )). Notes: The asymptotic covariace matrix of ˆθ ML is ofte hard or impossible to estimate. Three possible (asymptotically equivalet) estimators are: 1. I 1 (ˆθ ML ), which is ofte ot feasible. 2. ( ) 2 1 l L(ˆθ), which is sometimes quite complicated. ˆθ ˆθ 3. BHHH estimator: ( l f(x i, ˆθ) θ l f(x i, ˆθ) ) 1. ˆθ 2 Likelihood Ratio, Wald ad Lagrage Multiplier Tests The likelihood ratio (LR), Wald (W) ad Lagrage multiplier (LM) tests are asymptotically equivalet tests that may produce differet results i small samples. Whe o other iformatio exists, you ca choose the test that is the easiest to compute. See the attached figure for a graphical represetatio of each test. 2.1 Likelihood Ratio Test Let ˆθ R (ˆθ U ) ad ˆL R (ˆL U ) be the restricted (urestricted) estimate ad likelihood value, respectively. Let the ull ad alterative hypotheses be H 0 : c(θ) = q H 1 : c(θ) q. The likelihood ratio is defied as λ = ˆL R /ˆL U where 0 λ 1. The LR statistic is the LR = 2 l λ asy χ 2 (r) 3

where r is the umber of restrictios imposed. 2.2 Wald Test I the LR test, oe eeds to calculate ˆL U ad ˆL R. A advatage of the Wald test is that ˆθ R does ot eed to be calculated. The Wald statistic is W = (c(ˆθ U ) q) var(c(ˆθ U ) q) 1 (c(ˆθ U ) q) asy χ 2 (r). If c(ˆθ) is ormally distributed, the W is a quadratic form i a ormal vector ad is distributed chi-square for all sample sizes. 2.3 Lagrage Multiplier Test This test is based o the restricted model. Derivatio. Begi by formig the Lagragia: l L (θ) = l L(θ) + λ (c(θ) q). The first-order coditios are l L θ l L λ = θ = c(θ) q = 0. + c(θ) θ λ = 0 At ˆθ R, l L(ˆθ R ) ˆθ R = c(ˆθ R ) ˆθ R ˆλ = ĝr. If H 0 : c(θ) = q is correct, ĝ R should be close to zero i large samples. This fact is used as motivatio for LM = ĝ RI 1 (ˆθ R )ĝ R asy χ 2 (r). 2.3.1 A Example Usig the LR, W ad LM Tests Cosider a artificial radom sample ( = 100) from a expoetial(β = 0.1) distributio. The log likelihood fuctio is l L(θ) = l(θ) θ x i 4

where θ = 1/β. The first-order coditio ad urestricted ML estimator is θ = θ x i = 0 = ˆθ U = X 1. The secod-order coditio is 2 l L(θ) θ 2 = θ 2 < 0 so ˆθ U is ideed a maximum. Now cosider testig the followig hypothesis H 0 : θ = 7.5 H 1 : θ 7.5 so that ˆθ R = 7.5 1. Likelihood Ratio Test The likelihood values are ˆL U = ˆL R = ˆθ 100 U exp( ˆθ U x i) ˆθ 100 R exp( ˆθ R x i) ad the LR statistic is LR = 2 l(ˆl R /ˆL U ). 2. Wald Test The Wald statistic is ( where var(ˆθ U ) = Î 1 (ˆθ U ) = 3. Lagrage Multiplier Test The LM statistic is W = (ˆθ U 7.5) 2 var(ˆθ U 7.5) = (ˆθ U 7.5) 2 var(ˆθ U ) 2 l L(ˆθ U ) ˆθ 2 U ) 1 = ˆθ 2 U /. LM = ĝ2 R I(ˆθ R ) where ĝ R = ˆθR x i ad I(ˆθ R ) = /ˆθ 2 R. 5

Fially, the critical regio is defied by the chi-square critical value with r = 1 degrees of freedom ad a 95% cofidece level. Usig the chi-square table (iside cover i Greee s text), the critical value is 3.84. Therefore, If LR, W or LM is greater tha 3.84, we reject the ull H 0 : θ = 7.5 i favor of the alterative. If LR, W or LM is less tha or equal to 3.84, we fail to reject the ull H 0 : θ = 7.5. 3 Maximum Likelihood Estimatio: Regressio Model with Ω Kow Now cosider effi ciet estimatio via maximum likelihood whe the errors are ormally distributed. The log likelihood fuctio is l L(β, σ 2 Y, X) = 0.5 l(2π) 0.5 l σ 2 Ω 0.5(Y Xβ) (σ 2 Ω) 1 (Y Xβ). (1) Takig derivates of (1) with respect to β ad σ 2 ad settig equal to zero gives l(l) β = 1 σ 2 (X Ω 1 Y X Ω 1 Xβ) = 0 (2) l(l) σ 2 = 0.5 σ 2 + 0.5 1 σ 4 (Y X β) (Y X β) = 0. (3) Solvig this set of equatios gives ˆβ ML = (X X ) 1 (X Y ) ˆσ 2 ML = e e / so that whe ɛ N(0, σ 2 Ω), the ML estimator is the GLS estimator. 4 Maximum Likelihood Estimatio: Regressio Model with Ω Ukow Cosider maximizatio of (1) by choosig Ω, as well as β ad σ 2. The problem with treatig Ω as a free parameter is that it icludes ( + 1)/2 ukow elemets, while there are oly data poits. This ca 6

also be see by takig first-order coditio with respect to Ω, settig equal to zero ad solvig: l(l) Ω 1 = 0.5(Ω 1 σ 2 e e ) = 0, which implies that ˆΩ ML = e e /ˆσ 2 ML, which is a sigular matrix ad caot be used i the GLS formula. The obvious solutio is to parameterize Ω with a smaller umber of parameters θ, i.e., Ω(θ). The we would istead take derivative of l(l) with respect to θ, set equal to zero, ad solve joitly with (2) ad (3). This will be a oliear optimizatio problem, for which the search methods outlied earlier could be applied. Alteratively, a (iterative) two-step procedure, credited to Oberhofer ad Kmeta, is possible. 1. Fid a cosistet estimate of θ ad use it to calculate ˆβ F GLS ad ˆσ 2 F GLS. 2. Reestimate θ usig ˆβ F GLS ad ˆσ 2 F GLS ad the equatio l(l)/ θ = 0. This procedure will be asymptotically effi ciet at step #2 (ad all subsequet iteratios), ad uder fairly iocuous coditios, ca be show to coverge to the ML estimator. Further iteratios of steps #1 ad #2, while providig o asymptotic beefits, may produce better results i smaller samples. 7