The method of Maximum Likelihood.

Similar documents
Measurement of cost of equity trading - the LOT measure

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Econometric Methods for Valuation Analysis

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Chapter 4: Asymptotic Properties of MLE (Part 3)

CS 361: Probability & Statistics

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

Parameter estimation in SDE:s

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Chapter 7: Estimation Sections

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

IEOR E4703: Monte-Carlo Simulation

Maximum Likelihood Estimation

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Multi-armed bandits in dynamic pricing

Back to estimators...

Chapter 8: Sampling distributions of estimators Sections

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Maximum Likelihood Estimation

6. Genetics examples: Hardy-Weinberg Equilibrium

Agricultural and Applied Economics 637 Applied Econometrics II

Chapter 7: Estimation Sections

MVE051/MSG Lecture 7

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Notes on the EM Algorithm Michael Collins, September 24th 2005

Unobserved Heterogeneity Revisited

PASS Sample Size Software

PROBABILITY AND STATISTICS

Amath 546/Econ 589 Univariate GARCH Models

Learning From Data: MLE. Maximum Likelihood Estimators

Point Estimation. Copyright Cengage Learning. All rights reserved.

IEOR E4703: Monte-Carlo Simulation

Chapter 7: Estimation Sections

MODEL SELECTION CRITERIA IN R:

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

Intro to GLM Day 2: GLM and Maximum Likelihood

Modelling, Estimation and Hedging of Longevity Risk

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

1 The Solow Growth Model

Universal Portfolios

ECON 5350 Class Notes Maximum Likelihood Estimation

Lecture Quantitative Finance Spring Term 2015

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Simulation Wrap-up, Statistics COS 323

Evaluation of a New Variance Components Estimation Method Modi ed Henderson s Method 3 With the Application of Two Way Mixed Model

CPSC 540: Machine Learning

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Statistical estimation

Alastair Hall ECG 790F: Microeconometrics Spring Computer Handout # 2. Estimation of binary response models : part II

CPSC 540: Machine Learning

Generalized Linear Models

Comparing the Means of. Two Log-Normal Distributions: A Likelihood Approach

Much of what appears here comes from ideas presented in the book:

Economics 742 Brief Answers, Homework #2

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Chapter 8. Introduction to Statistical Inference

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

2 f. f t S 2. Delta measures the sensitivityof the portfolio value to changes in the price of the underlying

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 7: Point Estimation and Sampling Distributions

Practice Exam 1. Loss Amount Number of Losses

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

IEOR E4703: Monte-Carlo Simulation

STRESS-STRENGTH RELIABILITY ESTIMATION

Vladimir Spokoiny (joint with J.Polzehl) Varying coefficient GARCH versus local constant volatility modeling.

A UNIFIED APPROACH FOR PROBABILITY DISTRIBUTION FITTING WITH FITDISTRPLUS

Sensitivity Analysis with Data Tables. 10% annual interest now =$110 one year later. 10% annual interest now =$121 one year later

Lecture 10: Point Estimation

To apply SP models we need to generate scenarios which represent the uncertainty IN A SENSIBLE WAY, taking into account

Financial Econometrics

Confidence Intervals for an Exponential Lifetime Percentile

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

Penalty Functions. The Premise Quadratic Loss Problems and Solutions

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

book 2014/5/6 15:21 page 261 #285

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Mark-recapture models for closed populations

Phd Program in Transportation. Transport Demand Modeling. Session 11

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Course information FN3142 Quantitative finance

Intro to Decision Theory

Confidence Intervals Introduction

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

ECS171: Machine Learning

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Variance clustering. Two motivations, volatility clustering, and implied volatility

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

Lecture 7: Bayesian approach to MAB - Gittins index

CS340 Machine learning Bayesian model selection

An EM-Algorithm for Maximum-Likelihood Estimation of Mixed Frequency VARs

Confidence Intervals for Pearson s Correlation

Transcription:

Maximum Likelihood

The method of Maximum Likelihood. In developing the least squares estimator - no mention of probabilities. Minimize the distance between the predicted linear regression and the observed data. Need assumed normality or appeal to large sample results to have results about distributions of the OLS estimator. Maximum Likelihood: start in the opposite end. Make probability assumptions: Assume we know the probability distribution. Then find parameters that make the observed data most likely to have been observed.

Maximum Likelihood - evaluation relative to OLS Benefit: Can think about models that are not the simple linear models used in regression settings. Cost: need to make more assumptions about the distribution of the error term. Given that choice, can estimate a much wider range of estimation problems.

Intuition about construction Setup y : data θ : parameters Likelihood function: L(y, θ): How likely we are to have observed y as a function of the parameters. In the applications we are going to look at, the observations will be independent, and we can write the likelihood function as L(y, θ) = T L t (y t, θ) where y t is observation number t. L t (y t, θ) is the probability distribution of y t.

As a rule we can work with the log of the likelihood function, instead of the likelihood function directly Let A max of one will be a max of the other The log is typically much easier to find a max of. Since l(y) = log L(y, θ) L(y, θ) = T L t (y t, θ) ( T ) l(y) = log L(y, θ) = log L t (y t, θ) = T log L t (y t, θ) = T l t (y

Definition: The maximum likelihood estimate is the set of parameters θ that maximizes the value of the likelihood function, or alternatively the log likelihood function. or ˆθ ml = arg max l(y, θ) θ l(y, ˆθ ml ) l(y, θ) θ Θ

An alternative formulation can be found by looking at the first order conditions for a maximum of the likelihood function. θ l(y, θ) = θ T l t (y t, θ) = T θ l t(y t, θ) = 0 These give two definitions of how to find a ML estimate The max of the loglikelihood function: Type I. The First Order Condition for a max of the log likelihood function: Type II.

General about Maximum Likelihood It can be shown that under the assumed probability assumption being correct, maximum likelihood estimators have a number of desirable properties. 1. Any ML estimator is consistent (In large samples it converges to the true parameter.) 2. ML estimators are asymptotically normal (as the number of observations increase, they move towards the normal distribution.) 3. ML estimators are asymptotically efficient. (As the number of observations increase, the ML estimators achieve the so called Cramér-Rao lower bound, which is the minimum possible covariance matrix for an unbiased estimator. 4. Once the probability distribution is specified and the problem is set up, ML estimators are straightforward to implement as nonlinear optimization problems, and will be easy to solve on a computer.

The ML estimators thus have a number of desirable properties, as well as being easy to work with. For example, the usual test statistics, based on the Wald, LM and LR principles, are easily accessible. Let us look at the LR statistic: Letting θ be the parameters, and X the data, L(θ, X) is the likelihood function. We want to compare the fit of an unrestricted estimate, let us call that ˆθ, to a restricted estimate θ. The restricted estimate θ is found by minimizing the likelihood function imposing the restrictions. The LR statistic is calculated as ( ) L(ˆθ, X) LR = 2 ln L( θ, X) (This is where the name likelihood ratio is coming from, it is the ratio of two likelihoods.

Computational device Even if one has problems with the swallowing the assumed distributional assumption, the ML method is still a useful computational device, it allows calculation of estimates in situations where it would be very hard to get an estimator any other way.

ML estimation of binomial variable We are observing outcomes y t from a binomial distribution { a with probability p y t = b with probability 1 p 1. Determine the Maximum Likelihood estimator of p.

ML estimation of binomial variable - Solution The inference problem is to estimate the probability p from a sample of T observation of y, {y t } T. Suppose we observe n outcomes of y t = a, and (T n) outcomes of y t = b. The probability of observing this outcome for a given p is p n (1 p) T n To find the maximum likelihood estimator we will maximize this with respect to p, the parameter of interest. Formally, ML proceeds by creating a likelihood function L, a function of the data (y) and parameters (p).

In this case the likelihood function is L(y, p) = p n (1 p) T n This likelihood function is to be maximized with respect to p, the parameter. In practice we often work with an equivalent formulation, and take logs to get the log-likelihood function l(y, p) = log L(y, p) = n log(p) + (T n) log(t n) A maximum for this log-likelihood function is also a maximum for the likelihood function, but it is more easy to work with.

The first order condition for a maximum of the log-likelihood function is p l(y, p) = n 1 p (T n) 1 1 p set this equal to zero and solve for p n 1 p (T n) 1 1 p = 0 n(1 p) = (T n)p n np = Tp np n = Tp p = n T Thus, the Maximum Likelihood estimator of p, p ml, is p ml = n T

ML estimation of binomial variable - using R y t follows a binomial distribution { a with probability p y t = b with probability 1 p 1. Set p = 0.5, simulate a number of outcomes, and estimate the model using ML.

ML estimation of binomial variable - Solution Suppose we observe n outcomes of y t = a, and (T n) outcomes of y t = b. The probability of observing this outcome for a given p is p n (1 p) T n To find the maximum likelihood estimator we will maximize this with respect to p, the parameter of interest. Formally, ML proceeds by creating a likelihood function L, a function of the data (y) and parameters (p).

In this case the likelihood function is L(y, p) = p n (1 p) T n This likelihood function is to be maximized with respect to p, the parameter. In practice we often work with an equivalent formulation, and take logs to get the log-likelihood function l(y, p) = log L(y, p) = n log(p) + (T n) log(t n)

loglik <- function (p) { T <- length(y) n <- sum(y) ll <- n*log (p) + (T-n)* log(1-p) return(ll) } y <- c(1,0,1,0,1,0,1,0,1,0,1,0) library(maxlik) ml <- maxlik(loglik, start=c(0.25)) summary(ml)

Result in > summary(ml) -------------------------------------------- Maximum Likelihood estimation Newton-Raphson maximisation, 4 iterations Return code 1: gradient close to zero Log-Likelihood: -8.317766 1 free parameters Estimates: Estimate Std. error t value Pr(> t) [1,] 0.50000 0.14434 3.4641 0.000532 ***

ML estimation of uniform distrubution

ML estimation of uniform distribution. A variable y t is drawn from an uniform distribution on the interval [0, b] if the probability distribution of y t is { 1 p(y t ) = b if y t [0, b] 0 otherwise 1. Determine the maximum likelihood estimator of b.

ML estimation of uniform distribution. The only unknown parameter to estimate is the value b. Given a sample y t, by the definition of the distribution we know that b max y t t The likelihood of observing a set of y t is L(y, b) = ( ) 1 T b

Note that this problem can not be solved the usual way, since if we take logs and try to solve the first order conditions: or log L = T (log(1) log(b)) = T log(b) b = T 1 b = 0 1 b = 0 which can not be set equal to zero, but will go towards zero as b.

Thus, the first order conditions can not be used to find an estimate of b, but from the likelihood function itself L(y, b) = ( ) 1 T b it should be obvious that it will have a maximum at the lowest possible b, which in this case is b = max y t t

ML estimation of linear regression

Max Likelihood estimation of OLS regression. Suppose we are given data x t and outcomes y t, where the model postulates that y is related to x by y t = x t b + u t, where u t is some error term. To do Maximum Likelihood, we need to make distributional assumptions about the error term u t. The simplest assumption is to make all errors to be independently, independently normally distributed, with mean zero and variance σ 2 < : u t N ( 0, σ 2) 1. Determine the Maximum Likelihood estimator of b. 2. Determine the Maximum Likelihood estimator of σ 2.

Max Likelihood estimation of OLS regression. Recall the distribution function for the normal distribution. f (u t ) = 1 σ 2π e 1 2σ 2 u2 t Replace u t with y t x t b: f (y t x tb) = 1 σ 2π e 1 2σ 2 (yt x t b)2 We are interested in estimating the parameters b and σ. Form the likelihood function L: L T (b, σ, X T, Y T ) = T f (y t x tb) we include the data X T = {x 1,, x T } and Y T = {y 1,, y T } in the arguments to make explicit the fact that the likelihood function is also a function of the observed data.

We find the ML estimates from b ml T = arg max L T (b, σ, X T, Y T ) b σ ml T = arg max L T (b, σ, X T, Y T ) σ Intuitively, by this maximisation we find the parameters b and σ that make the observations x 1,, x T most likely to have happened.

Let us calculate the explicit estimates. It is easier to find the maximum of the log-likelihood function. l T = l T (b, σ, X T, Y T ) = ln L T (b, σ, X T, Y T ) ( T ) = ln f (y t x tb) = T ln ( f (y t x tb) ) = T ( ) 1 ln σ T ( ) 1 ln 2π T 1 1 ( yt 2 σ 2 x tb ) 2

We use the first order conditions: l T b = 1 1 T 2 σ 2 x t (y t x tb) = 0 l T T σ 2 = Solve for b: T y t x t 1 T σ T x t x tb = 0 [ T ] [ T ] x t y t = x t x t b ˆb ml T [ T ] 1 [ = T ] x t x t x t y t 1 ( yt x 2 tb ) 2 ( 2σ ) 3 = 0

Solve for σ 2 : 1 σ T ( 1) + 1 σ 3 T ( yt x tb ) 2 = 0 T σ 2 + ˆσ 2 ml = 1 T T (y t x tb) 2 = 0 T (y t x t ˆb ml 2 )2 Note that ˆb T ml in this case is the same as the OLS estimate. This will in general not be the case. The two are derived under different assumptions.

Max Likelihood estimation of OLS regression. Consider the model y t = a + bx t + u t, where u t is some error term. Suppose the constant a = 2 and b = 2, and the error term is normally distributed with mean 0 and variance 1. Simulate 100 observations of this model, and show the estimation of the model using Maximum Likelihood.

Max Likelihood estimation of OLS regression. Recall the distribution function for the normal distribution. f (u t ) = 1 σ 2π e 1 2σ 2 u2 t Replace u t with y t a + bx t : f (y t x tb) = 1 σ 2π e 1 2σ 2 (yt a bxt)2 We are interested in estimating the parameters b and σ. Form the likelihood function L: L T (b, σ, X T, Y T ) = T f (y t a bx t )

As a rule, it is easier to find the maximum of the log-likelihood function. l T = l T (b, σ, X T, Y T ) = ln L T (b, σ, X T, Y T ) ( T ) = ln f (y t a bx t ) = T ln (f (y t a bx t )) = T ( ) 1 ln σ T ( ) 1 ln 2π T 1 1 2 σ 2 (y t a bx t ) 2

We apply this log likelihood function directly to the R maximum likelihood routine. First, the simulation of the model. The form of the X variable was not specified, so let us use the integers from 1 to 100. a <- 2 b <- 2 sigma <- 1 N <- 100 x <- 1:N sigma <-1 y <- a + b*x + rnorm(n,0,sigma)

Then, ml estimation. We first need to write the likelihood function as a R function. loglik <- function(param) { N=length(x) alpha <- param[1] beta <- param[2] sigma <- param[3] e <- y - ( alpha + beta*x ) ll <- -0.5 * N * log(2*pi) - N*log(sigma) - sum(0.5*(e)^ return(ll) }

This is then feed to the ML implementation in the library maxlik library(maxlik) ml <- maxlik(loglik, start=c(1,1,1)) summary(ml)

With output > summary(ml) -------------------------------------------- Maximum Likelihood estimation Newton-Raphson maximisation, 15 iterations Return code 1: gradient close to zero Log-Likelihood: -141.5555 3 free parameters Estimates: Estimate Std. error t value Pr(> t) [1,] 1.9069817 0.2009801 9.4884 < 2.2e-16 *** [2,] 2.0013569 0.0034545 579.3429 < 2.2e-16 *** [3,] 0.9966221 0.0704751 14.1415 < 2.2e-16 *** --------------------------------------------

Summarizing Maximum Likelihood estimation Starting point: The underlying probability distribution that generated the data. Powerful: the whole distribution has potentially more information than minimizing distance Potential problem: ML is always dependent on the specified probability distribution being close to correct Some important examples of estimation problems where estimation is done using maximum likelihood. Limited dependent variable models (Probit/Logit) ARCH VARs Factor analysis