Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Similar documents
Chapter 4: Asymptotic Properties of MLE (Part 3)

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Lecture 10: Point Estimation

Chapter 7: Estimation Sections

6. Genetics examples: Hardy-Weinberg Equilibrium

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Chapter 7: Estimation Sections

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Practice Exam 1. Loss Amount Number of Losses

The Bernoulli distribution

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Back to estimators...

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Applied Statistics I

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

CS340 Machine learning Bayesian model selection

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Statistical estimation

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Chapter 8. Introduction to Statistical Inference

CS 361: Probability & Statistics

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Bivariate Birnbaum-Saunders Distribution

Comparing the Means of. Two Log-Normal Distributions: A Likelihood Approach

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 8: Sampling distributions of estimators Sections

Qualifying Exam Solutions: Theoretical Statistics

Confidence Intervals Introduction

AN EXTREME VALUE APPROACH TO PRICING CREDIT RISK

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 5. Sampling Distributions

Chapter 7: Estimation Sections

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

Improved Inference for Signal Discovery Under Exceptionally Low False Positive Error Rates

Topic 14: Maximum Likelihood Estimation

Notes on the EM Algorithm Michael Collins, September 24th 2005

Commonly Used Distributions

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Modeling of Price. Ximing Wu Texas A&M University

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Point Estimation. Edwin Leuven

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 6: Point Estimation

Point Estimation. Copyright Cengage Learning. All rights reserved.

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Modelling Environmental Extremes

Modelling Environmental Extremes

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Estimation after Model Selection

Universität Regensburg Mathematik

Monotone, Convex and Extrema

ECON 5350 Class Notes Maximum Likelihood Estimation

Chapter 5. Statistical inference for Parametric Models

Non-informative Priors Multiparameter Models

Measuring Financial Risk using Extreme Value Theory: evidence from Pakistan

Logit Models for Binary Data

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

An Introduction to Statistical Extreme Value Theory

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Learning From Data: MLE. Maximum Likelihood Estimators

Amath 546/Econ 589 Univariate GARCH Models

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Chapter 8. Sampling and Estimation. 8.1 Random samples

Parameter estimation in SDE:s

GPD-POT and GEV block maxima

Chapter 8: Sampling distributions of estimators Sections

A Derivation of the Normal Distribution. Robert S. Wilson PhD.

Web-based Supplementary Materials for. A space-time conditional intensity model. for invasive meningococcal disease occurence

SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) Syllabus for PEA (Mathematics), 2013

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

The method of Maximum Likelihood.

1 Residual life for gamma and Weibull distributions

Inference for the Sharpe Ratio using a Likelihood-Based Approach

MATH 3200 Exam 3 Dr. Syring

Financial Risk Forecasting Chapter 9 Extreme Value Theory

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

STAT 830 Convergence in Distribution

PhD Qualifier Examination

Financial Time Series and Their Characterictics

2.1 Probability, stochastic variables and distribution functions

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 19 11/20/2013. Applications of Ito calculus to finance

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Strategies for High Frequency FX Trading

Hydrology 4410 Class 29. In Class Notes & Exercises Mar 27, 2013

Objective Bayesian Analysis for Heteroscedastic Regression

STAT 425: Introduction to Bayesian Analysis

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

Maximum Likelihood Estimation

Transcription:

Methods of Inference Toss coin 6 times and get Heads twice. p is probability of getting H. Probability of getting exactly 2 heads is 15p 2 (1 p) 4 This function of p, is likelihood function. Definition: The likelihood function is map L: domain Θ, values given by L(θ) = f θ (X) Key Point: think about how the density depends on θ not about how it depends on X. Notice: X, observed value of the data, has been plugged into the formula for density. Notice: coin tossing example uses the discrete density for f. We use likelihood for most inference problems: 132

1. Point estimation: we must compute an estimate ˆθ = ˆθ(X) which lies in Θ. The maximum likelihood estimate (MLE) of θ is the value ˆθ which maximizes L(θ) over θ Θ if such a ˆθ exists. 2. Point estimation of a function of θ: we must compute an estimate ˆφ = ˆφ(X) of φ = g(θ). We use ˆφ = g(ˆθ) where ˆθ is the MLE of θ. 3. Interval (or set) estimation. We must compute a set C = C(X) in Θ which we think will contain θ 0. We will use for a suitable c. {θ Θ : L(θ) > c} 4. Hypothesis testing: decide whether or not θ 0 Θ 0 where Θ 0 Θ. We base our decision on the likelihood ratio sup{l(θ); θ Θ \ Θ 0 } sup{l(θ); θ Θ 0 }. 133

Maximum Estimation To find MLE maximize L. Typical function maximization problem: Set gradient of L equal to 0 Check root is maximum, not minimum or saddle point. Examine some likelihood plots in examples: Cauchy Data IID sample X 1,..., X n from Cauchy(θ) density f(x; θ) = The likelihood function is L(θ) = n i=1 [Examine likelihood plots.] 1 π(1 + (x θ) 2 ) 1 π(1 + (X i θ) 2 ) 134

Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 135

Function: Cauchy, n=5 Function: Cauchy, n=5 0.2 0.4 0.6 0.8 1.0 Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 Function: Cauchy, n=5 136

Function: Cauchy, n=25 Function: Cauchy, n=25 Function: Cauchy, n=25 Function: Cauchy, n=25 Function: Cauchy, n=25 Function: Cauchy, n=25 137

Function: Cauchy, n=25 Function: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Function: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Function: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Function: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Function: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0-1.0-0.5 0.0 0.5 1.0 138

I want you to notice the following points: The likelihood functions have peaks near the true value of θ (which is 0 for the data sets I generated). The peaks are narrower for the larger sample size. The peaks have a more regular shape for the larger value of n. I actually plotted L(θ)/L(ˆθ) which has exactly the same shape as L but runs from 0 to 1 on the vertical scale. 139

To maximize this likelihood: differentiate L, set result equal to 0. Notice L is product of n terms; derivative is n i=1 j i 1 2(X i θ) π(1 + (X j θ) 2 ) π(1 + (X i θ) 2 ) 2 which is quite unpleasant. Much easier to work with logarithm of L: log of product is sum and logarithm is monotone increasing. Definition: The function is l(θ) = log{l(θ)}. For the Cauchy problem we have l(θ) = log(1 + (X i θ) 2 ) n log(π) [Examine log likelihood plots.] 140

Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-22 -20-18 -16-14 -12-25 -20-15 -10 Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-20 -15-10 -20-15 -10-5 Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-24 -22-20 -18-16 -14-12 -10-25 -20-15 141

Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-13.5-13.0-12.5-12.0-11.5-11.0-14 -12-10 -8 Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-12 -11-10 -9-8 -7-6 -8-6 -4-2 Ratio Intervals: Cauchy, n=5 Ratio Intervals: Cauchy, n=5-14 -13-12 -11-10 -17-16 -15-14 -13-12 142

Ratio Intervals: Cauchy, n=25 Ratio Intervals: Cauchy, n=25-100 -80-60 -40-100 -80-60 -40-20 Ratio Intervals: Cauchy, n=25 Ratio Intervals: Cauchy, n=25-100 -80-60 -40-20 -100-80 -60-40 Ratio Intervals: Cauchy, n=25 Ratio Intervals: Cauchy, n=25-120 -100-80 -60-100 -80-60 -40 143

Ratio Intervals: Cauchy, n=25 Ratio Intervals: Cauchy, n=25-30 -28-26 -24-30 -28-26 -24-22 -1.0-0.5 0.0 0.5 1.0 Ratio Intervals: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Ratio Intervals: Cauchy, n=25-28 -26-24 -22-44 -42-40 -38-36 -1.0-0.5 0.0 0.5 1.0 Ratio Intervals: Cauchy, n=25-1.0-0.5 0.0 0.5 1.0 Ratio Intervals: Cauchy, n=25-56 -54-52 -50-48 -46-49 -48-47 -46-45 -44-43 -1.0-0.5 0.0 0.5 1.0-1.0-0.5 0.0 0.5 1.0 144

Notice the following points: Plots of l for n = 25 quite smooth, rather parabolic. For n = 5 many local maxima and minima of l. tends to 0 as θ so max of l occurs at a root of l, derivative of l wrt θ. Def n: Score Function is gradient of l U(θ) = l θ MLE ˆθ usually root of Equations U(θ) = 0 In our Cauchy example we find U(θ) = 2(X i θ) 1 + (X i θ) 2 [Examine plots of score functions.] Notice: often multiple roots of likelihood equations. 145

Score Score -4-2 0 2-25 -20-15 -10-22 -18-14 Score Score -4-2 0 2 4-20 -15-10 -5-3 3-20 -15-10 Score Score 3-25 -20-15 -24-20 -16-12 146

Score Score -15-5 0 5 10 15-100 -80-60 -40-20 -15-100 -80-60 -40 Score Score -100-80 -60-40 -15-5 0 5 10 15-100 -80-60 -40-20 Score Score -100-80 -60-40 -15-120 -100-80 -60 147

Example : X Binomial(n, θ) L(θ) = ( ) n X l(θ) = log θ X (1 θ) n X ( ) n X + X log(θ) + (n X) log(1 θ) U(θ) = X θ n X 1 θ The function L is 0 at θ = 0 and at θ = 1 unless X = 0 or X = n so for 1 X n the MLE must be found by setting U = 0 and getting ˆθ = X n For X = n the log-likelihood has derivative U(θ) = n θ > 0 for all θ so that the likelihood is an increasing function of θ which is maximized at ˆθ = 1 = X/n. Similarly when X = 0 the maximum is at ˆθ = 0 = X/n. 148

The Normal Distribution Now we have X 1,..., X n iid N(µ, σ 2 ). are two parameters θ = (µ, σ). We find There L(µ, σ) = e (X i µ) 2 /(2σ 2 ) (2π) n/2 σ n l(µ, σ) = n 2 log(2π) (Xi µ) 2 and that U is (Xi µ) σ 2 (Xi µ) 2 n σ σ 3 2σ 2 n log(σ) Notice that U is a function with two components because θ has two components. Setting the likelihood equal to 0 and solving gives ˆµ = X and ˆσ = (Xi X) 2 n 149

Check this is maximum by computing one more derivative. Matrix H of second derivatives of l is n 2 (X i µ) σ 2 σ 3 2 (X i µ) 3 (X i µ) 2 σ 3 σ 4 + n σ 2 Plugging in the mle gives H(ˆθ) = n ˆσ 2 0 0 2n ˆσ 2 which is negative definite. Both its eigenvalues are negative. So ˆθ must be a local maximum. [Examine contour and perspective plots of l.] 150

Z 0 0.2 0.4 0.6 0.8 1 n=10 40 30 20 Y 10 10 X 20 30 40 n=100 0 0.2 0.4 0.6 0.8 1 Z 40 30 20 Y 10 10 X 20 30 40 151

n=10 Sigma 1.0 1.5 2.0-1.0-0.5 0.0 0.5 1.0 Mu n=100 Sigma 0.9 1.0 1.1 1.2-0.4-0.3-0.2-0.1 0.0 0.1 0.2 Mu 152

Notice that the contours are quite ellipsoidal for the larger sample size. For X 1,..., X n iid log likelihood is The score function is l(θ) = log(f(x i, θ)). U(θ) = log f θ (X i, θ). MLE ˆθ maximizes l. If maximum occurs in interior of parameter space and the log likelihood continuously differentiable then ˆθ solves the likelihood equations U(θ) = 0. Some examples concerning existence of roots: 153

Solving U(θ) = 0: Examples N(µ, σ 2 ) Unique root of likelihood equations is a global maximum. [Remark: Suppose we called τ = σ 2 the parameter. Score function still has two components: first component same as before but second component is (Xi τ l = µ) 2 2τ 2 n 2τ Setting the new likelihood equations equal to 0 still gives ˆτ = ˆσ 2 General invariance (or equivariance) principal: If φ = g(θ) is some reparametrization of a model (a one to one relabelling of the parameter values) then ˆφ = g(ˆθ). Does not apply to other estimators.] 154

Cauchy: location θ At least 1 root of likelihood equations but often several more. One root is a global maximum; others, if they exist may be local minima or maxima. Binomial(n, θ) If X = 0 or X = n: no root of likelihood equations; likelihood is monotone. Other values of X: unique root, a global maximum. Global maximum at ˆθ = X/n even if X = 0 or n. 155

The 2 parameter exponential The density is f(x; α, β) = 1 β e (x α)/β 1(x > α) Log-likelihood is for α > min{x 1,..., X n } and otherwise is l(α, β) = n log(β) (X i α)/β Increasing function of α till α reaches ˆα = X (1) = min{x 1,..., X n } which gives mle of α. Now plug in ˆα for α; get so-called profile likelihood for β: l profile (β) = n log(β) (X i X (1) )/β Set β derivative equal to 0 to get ˆβ = (X i X (1) )/n Notice mle ˆθ = (ˆα, ˆβ) does not solve likelihood equations; we had to look at the edge of the possible parameter space. α is called a support or truncation parameter. ML methods behave oddly in problems with such parameters. 156

Three parameter Weibull The density in question is f(x; α, β, γ) = 1 ( ) γ 1 x α β β exp[ {(x α)/β} γ ]1(x > α) Three likelihood equations: Set β derivative equal to 0; get ˆβ(α, γ) = [ (Xi α) γ /n ] 1/γ where ˆβ(α, γ) indicates mle of β could be found by finding the mles of the other two parameters and then plugging in to the formula above. 157

It is not possible to find explicitly the remaining two parameters; numerical methods are needed. However putting γ < 1 and letting α X (1) will make the log likelihood go to. MLE is not uniquely defined: any β will do. any γ < 1 and If the true value of γ is more than 1 then the probability that there is a root of the likelihood equations is high; in this case there must be two more roots: a local maximum and a saddle point! For a true value of γ > 1 the theory we detail below applies to the local maximum and not to the global maximum of the likelihood equations. 158