Chapter 4: Asymptotic Properties of MLE (Part 3)

Chapter 4: Asymptotic Properties of MLE (Part 3) Daniel O. Scharfstein 09/30/13 1 / 1

Breakdown of Assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to Score Equations Number of Parameters Increase with the Sample Size Support of p(x; θ) depends on θ Non-I.I.D. Data 2 / 1

Non-Existence of the MLE The non-existence of the MLE may occur for all values of x n or for only some of them. In general, this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in θ. Example 4.1: Suppose that X Bernoulli(1/(1 + exp(θ)), where Θ = R. If we observe x = 1, then L(θ; 1) = 1/(1 + exp(θ)). The likelihood function is a decreasing function of θ and the maximum is not attained on Θ. If Θ were closed, i.e., Θ = R, the MLE would be. Example 4.2: Suppose that X Normal(µ, σ 2 ). So, θ = (µ, σ 2 ) and Θ = R R +. Now, l(θ; x) log σ 1 (x µ) 2. Take 2σ 2 µ = x. Then as σ 0, l(θ; x) +. So, the MLE does not exist. 3 / 1

Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter θ. Example 4.3: Suppose that Y Normal(X θ, I ), where X is an n k matrix with rank smaller than k and θ Θ R k. The density function is p(y; θ) = (2π) n/2 exp( 1 2 (y X θ) (y X θ)) Since X is not full rank, there exists an infinite number of solutions to X θ = 0. That means that there exists an infinite number of θ s that generate the same density function. So, θ is not identified. Furthermore, note that the likelihood is maximized at all values of θ satisfying X X θ = X y. 4 / 1

Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n, we can still use our theorems to show consistency and asymptotic normality. This will work provided that as n gets large there is a unique maximum with large probability. Example 4.4: Suppose that X n = (X 1,..., X n ), where the X i s are i.i.d. Cauchy(θ, 1). We assume that θ 0 lies in the interior of a compact set Θ R. So, p(x; θ) = 1 π(1 + (x θ) 2 ) So, the log-likelihood for the full sample is l(θ; x) = n log π log(1 + (x i θ) 2 ) Note that as θ ±, l(θ; x). 5 / 1

Multiple Roots to the Score Equations The score for θ is given by dl(θ; x) dθ = 2(x i θ) 1 + (x i θ) 2 There can be multiple roots to the score equations. Regardless, the MLE is consistent (see Homework 2). 6 / 1

Number of Parameters Increase with the Sample Size Up to now, we have implicitly assumed that the number of parameters is equal to a fixed constant k. In some cases the number of parameters increases naturally with the number of observations. In such cases, the MLE may i. no longer converge ii. may converge to a parameter value different than θ 0 iii. may still converge to θ 0. In general, the outcome depends on the importance of the number of parameters relative to the number of observations. 7 / 1

Example 8.5: (Neyman-Scott, Econometrika, 1948) Suppose that X n = (X 1,..., X n ), where the X i s are independent with X i = (X i1, X i2 ), X i1 independent of X i2 and X ip N(µ i, σ 2 ) for p = 1, 2. We are interested in estimating the µ i s and σ 2. In this problem, we have n + 1 parameters. The likelihood function is L(µ 1,..., µ n, σ 2 ; x n ) = n It is easy to show that the MLE s are 1 2πσ 2 exp( 1 2σ 2 2 (X ip µ i ) 2 ) p=1 ˆµ i = 1 2 (X i1 + X i2 ) for i = 1,..., n ˆσ 2 = 1 2n p=1 2 (X ip ˆµ i ) 2 8 / 1

Example 4.5: (Neyman-Scott, Econometrika, 1948) Note that ˆµ i doesn t converge to µ i and we can show that ˆσ 2 converges in probability to σ 2 /2. To show this latter fact, note that we can express ˆσ 2 as 1 n 4n (X i1 X i2 ) 2. Let Z i = X i1 X 2σ i2. Then Z i N(0, 1) and Zi 2 is χ 2 1. Since we have an i.i.d. sample of Z 2 i s, we can employ the WLLN to show that 1 n This implies that ˆσ 2 = σ2 2 1 n Z 2 i P σ2 2 n Z 2 i P 1. 9 / 1

Example 4.6 Suppose that X n = (X 1,..., X n ), where the X i s are independent with X i = (X i1, X i2,..., X in ), X ip s are independent N(µ i, σ 2 ) random variables for p = 1, 2,..., n. We are interested in estimating the µ i s and σ 2. Again, we have n + 1 parameters. The likelihood function is L(µ 1,..., µ n, σ 2 ; x n ) = n 1 2πσ 2 exp( 1 2σ 2 (X ip µ i ) 2 ) p=1 It is easy to show that the MLE s are ˆµ i = 1 X ip for i = 1,..., n n p=1 ˆσ 2 = 1 n 2 p=1 (X ip ˆµ i ) 2 By the WLLN, we know that ˆµ i converges in probability to µ i and we can also show that ˆσ 2 converges in probability to σ 2. 10 / 1

Support of p(x; θ) depends on θ In this case, the MLE is frequently consistent, but not asymptotically normal. Example 4.7: Suppose X n = (X 1,..., X n ), where the X i s are i.i.d. from a shifted exponential. That is, p(x; θ) = exp( (x θ))i (x θ) Then, the likelihood for the full sample is L(θ; x n ) = exp( (x i θ))i (min x i θ) 11 / 1

Support of p(x; θ) depends on θ The MLE for θ is min X i or the first order statistic X (1). Note that the likelihood is not differentiable at the MLE. This violates condition (iv) of Theorem 4.6. We can show that the MLE is consistent. P θ0 [ X (1) θ 0 > ɛ] = P θ0 [X (1) θ 0 > ɛ] + P θ0 [X (1) θ 0 < ɛ] = P θ0 [X (1) > θ 0 + ɛ] + P θ0 [X (1) < θ 0 ɛ] n = P θ0 [X i > θ 0 + ɛ] = exp( nɛ) 0 12 / 1

Support of p(x; θ) depends on θ It is obvious that n(x (1) θ 0 ) cannot be centered at zero since X (1) is always greater than θ 0. W can show that n(x (1) θ 0 ) D Exponential(1). To see this, not that P θ0 [n(x (1) θ 0 ) a] = P θ0 [X (1) a/n + θ 0 ] = P θ0 [X i a/n + θ 0 ] n = exp( a) Here the rate of convergence is n instead of n. 13 / 1

Non-I.I.D. Data Example 4.8: Consider independent random variables Y i Normal(θx i, 1), where the x i s are given constants. The MLE of θ is ˆθ = x i Y i / xi 2 Normal(θ, 1/ xi 2 ) This estimator may not be consistent. Suppose that n x i 2 1. Then, we know that ˆθ D(θ 0) N(θ 0, 1), which is not θ 0. If n x 2 i, then ˆθ is consistent. To see this, note that ˆθ is unbiased and its variance goes to zero. 14 / 1

Non-I.I.D. Data What about the limiting distribution of n(ˆθ θ 0 )? We know that n xi 2(ˆθ θ 0 ) D N(0, 1) If n/ n x 2 i it converges at n x 2 i 1, then ˆθ converges at n rates. In general, rates. 15 / 1