Chapter 4: Asymptotic Properties of MLE (Part 3)

Similar documents
Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Lecture 10: Point Estimation

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

Non-informative Priors Multiparameter Models

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Applied Statistics I

The Bernoulli distribution

Chapter 8. Introduction to Statistical Inference

Chapter 5. Statistical inference for Parametric Models

Chapter 7 - Lecture 1 General concepts and criteria

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Chapter 8: Sampling distributions of estimators Sections

Chapter 7: Estimation Sections

CS340 Machine learning Bayesian model selection

Chapter 8: Sampling distributions of estimators Sections

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Back to estimators...

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Qualifying Exam Solutions: Theoretical Statistics

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

Chapter 6: Point Estimation

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

6. Genetics examples: Hardy-Weinberg Equilibrium

Chapter 7: Point Estimation and Sampling Distributions

Random Samples. Mathematics 47: Lecture 6. Dan Sloughter. Furman University. March 13, 2006

Hardy Weinberg Model- 6 Genotypes

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

Learning From Data: MLE. Maximum Likelihood Estimators

STAT 830 Convergence in Distribution

Chapter 7: Estimation Sections

MVE051/MSG Lecture 7

Simulation Wrap-up, Statistics COS 323

PROBABILITY AND STATISTICS

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

STAT 111 Recitation 4

Chapter 7: Estimation Sections

IEOR 165 Lecture 1 Probability Review

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Confidence Intervals Introduction

Comparing the Means of. Two Log-Normal Distributions: A Likelihood Approach

Conjugate Models. Patrick Lam

BIO5312 Biostatistics Lecture 5: Estimations

Huber smooth M-estimator. Mâra Vçliòa, Jânis Valeinis. University of Latvia. Sigulda,

CSC 411: Lecture 08: Generative Models for Classification

The method of Maximum Likelihood.

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

MATH 3200 Exam 3 Dr. Syring

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Much of what appears here comes from ideas presented in the book:

STAT 111 Recitation 3

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

STAT/MATH 395 PROBABILITY II

Computer Statistics with R

Bayesian Linear Model: Gory Details

Chapter 7. Sampling Distributions and the Central Limit Theorem

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Statistics for Managers Using Microsoft Excel 7 th Edition

Practice Exam 1. Loss Amount Number of Losses

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Lecture 17: More on Markov Decision Processes. Reinforcement learning

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

Chapter 7. Sampling Distributions and the Central Limit Theorem

Point Estimation. Copyright Cengage Learning. All rights reserved.

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Generating Random Numbers

ECON 5350 Class Notes Maximum Likelihood Estimation

IEOR E4703: Monte-Carlo Simulation

Chapter 6. Importance sampling. 6.1 The basics

Probability. An intro for calculus students P= Figure 1: A normal integral

Objective Bayesian Analysis for Heteroscedastic Regression

Statistical estimation

Parameter Estimation for the Lognormal Distribution

Point Estimation. Edwin Leuven

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Statistics for Business and Economics

The Normal Distribution

Probability & Statistics

Conditional Heteroscedasticity

Chapter 5: Statistical Inference (in General)

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Modeling and Estimation of

Drunken Birds, Brownian Motion, and Other Random Fun

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Chapter 5. Sampling Distributions

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

The Vasicek Distribution

A Regime Switching model

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

2.1 Probability, stochastic variables and distribution functions

Transcription:

Chapter 4: Asymptotic Properties of MLE (Part 3) Daniel O. Scharfstein 09/30/13 1 / 1

Breakdown of Assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to Score Equations Number of Parameters Increase with the Sample Size Support of p(x; θ) depends on θ Non-I.I.D. Data 2 / 1

Non-Existence of the MLE The non-existence of the MLE may occur for all values of x n or for only some of them. In general, this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in θ. Example 4.1: Suppose that X Bernoulli(1/(1 + exp(θ)), where Θ = R. If we observe x = 1, then L(θ; 1) = 1/(1 + exp(θ)). The likelihood function is a decreasing function of θ and the maximum is not attained on Θ. If Θ were closed, i.e., Θ = R, the MLE would be. Example 4.2: Suppose that X Normal(µ, σ 2 ). So, θ = (µ, σ 2 ) and Θ = R R +. Now, l(θ; x) log σ 1 (x µ) 2. Take 2σ 2 µ = x. Then as σ 0, l(θ; x) +. So, the MLE does not exist. 3 / 1

Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter θ. Example 4.3: Suppose that Y Normal(X θ, I ), where X is an n k matrix with rank smaller than k and θ Θ R k. The density function is p(y; θ) = (2π) n/2 exp( 1 2 (y X θ) (y X θ)) Since X is not full rank, there exists an infinite number of solutions to X θ = 0. That means that there exists an infinite number of θ s that generate the same density function. So, θ is not identified. Furthermore, note that the likelihood is maximized at all values of θ satisfying X X θ = X y. 4 / 1

Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n, we can still use our theorems to show consistency and asymptotic normality. This will work provided that as n gets large there is a unique maximum with large probability. Example 4.4: Suppose that X n = (X 1,..., X n ), where the X i s are i.i.d. Cauchy(θ, 1). We assume that θ 0 lies in the interior of a compact set Θ R. So, p(x; θ) = 1 π(1 + (x θ) 2 ) So, the log-likelihood for the full sample is l(θ; x) = n log π log(1 + (x i θ) 2 ) Note that as θ ±, l(θ; x). 5 / 1

Multiple Roots to the Score Equations The score for θ is given by dl(θ; x) dθ = 2(x i θ) 1 + (x i θ) 2 There can be multiple roots to the score equations. Regardless, the MLE is consistent (see Homework 2). 6 / 1

Number of Parameters Increase with the Sample Size Up to now, we have implicitly assumed that the number of parameters is equal to a fixed constant k. In some cases the number of parameters increases naturally with the number of observations. In such cases, the MLE may i. no longer converge ii. may converge to a parameter value different than θ 0 iii. may still converge to θ 0. In general, the outcome depends on the importance of the number of parameters relative to the number of observations. 7 / 1

Example 8.5: (Neyman-Scott, Econometrika, 1948) Suppose that X n = (X 1,..., X n ), where the X i s are independent with X i = (X i1, X i2 ), X i1 independent of X i2 and X ip N(µ i, σ 2 ) for p = 1, 2. We are interested in estimating the µ i s and σ 2. In this problem, we have n + 1 parameters. The likelihood function is L(µ 1,..., µ n, σ 2 ; x n ) = n It is easy to show that the MLE s are 1 2πσ 2 exp( 1 2σ 2 2 (X ip µ i ) 2 ) p=1 ˆµ i = 1 2 (X i1 + X i2 ) for i = 1,..., n ˆσ 2 = 1 2n p=1 2 (X ip ˆµ i ) 2 8 / 1

Example 4.5: (Neyman-Scott, Econometrika, 1948) Note that ˆµ i doesn t converge to µ i and we can show that ˆσ 2 converges in probability to σ 2 /2. To show this latter fact, note that we can express ˆσ 2 as 1 n 4n (X i1 X i2 ) 2. Let Z i = X i1 X 2σ i2. Then Z i N(0, 1) and Zi 2 is χ 2 1. Since we have an i.i.d. sample of Z 2 i s, we can employ the WLLN to show that 1 n This implies that ˆσ 2 = σ2 2 1 n Z 2 i P σ2 2 n Z 2 i P 1. 9 / 1

Example 4.6 Suppose that X n = (X 1,..., X n ), where the X i s are independent with X i = (X i1, X i2,..., X in ), X ip s are independent N(µ i, σ 2 ) random variables for p = 1, 2,..., n. We are interested in estimating the µ i s and σ 2. Again, we have n + 1 parameters. The likelihood function is L(µ 1,..., µ n, σ 2 ; x n ) = n 1 2πσ 2 exp( 1 2σ 2 (X ip µ i ) 2 ) p=1 It is easy to show that the MLE s are ˆµ i = 1 X ip for i = 1,..., n n p=1 ˆσ 2 = 1 n 2 p=1 (X ip ˆµ i ) 2 By the WLLN, we know that ˆµ i converges in probability to µ i and we can also show that ˆσ 2 converges in probability to σ 2. 10 / 1

Support of p(x; θ) depends on θ In this case, the MLE is frequently consistent, but not asymptotically normal. Example 4.7: Suppose X n = (X 1,..., X n ), where the X i s are i.i.d. from a shifted exponential. That is, p(x; θ) = exp( (x θ))i (x θ) Then, the likelihood for the full sample is L(θ; x n ) = exp( (x i θ))i (min x i θ) 11 / 1

Support of p(x; θ) depends on θ The MLE for θ is min X i or the first order statistic X (1). Note that the likelihood is not differentiable at the MLE. This violates condition (iv) of Theorem 4.6. We can show that the MLE is consistent. P θ0 [ X (1) θ 0 > ɛ] = P θ0 [X (1) θ 0 > ɛ] + P θ0 [X (1) θ 0 < ɛ] = P θ0 [X (1) > θ 0 + ɛ] + P θ0 [X (1) < θ 0 ɛ] n = P θ0 [X i > θ 0 + ɛ] = exp( nɛ) 0 12 / 1

Support of p(x; θ) depends on θ It is obvious that n(x (1) θ 0 ) cannot be centered at zero since X (1) is always greater than θ 0. W can show that n(x (1) θ 0 ) D Exponential(1). To see this, not that P θ0 [n(x (1) θ 0 ) a] = P θ0 [X (1) a/n + θ 0 ] = P θ0 [X i a/n + θ 0 ] n = exp( a) Here the rate of convergence is n instead of n. 13 / 1

Non-I.I.D. Data Example 4.8: Consider independent random variables Y i Normal(θx i, 1), where the x i s are given constants. The MLE of θ is ˆθ = x i Y i / xi 2 Normal(θ, 1/ xi 2 ) This estimator may not be consistent. Suppose that n x i 2 1. Then, we know that ˆθ D(θ 0) N(θ 0, 1), which is not θ 0. If n x 2 i, then ˆθ is consistent. To see this, note that ˆθ is unbiased and its variance goes to zero. 14 / 1

Non-I.I.D. Data What about the limiting distribution of n(ˆθ θ 0 )? We know that n xi 2(ˆθ θ 0 ) D N(0, 1) If n/ n x 2 i it converges at n x 2 i 1, then ˆθ converges at n rates. In general, rates. 15 / 1