Asymptotics: Consistency and Delta Method

ad Delta Method MIT 18.655 Dr. Kempthore Sprig 2016 1 MIT 18.655 ad Delta Method

Outlie Asymptotics 1 Asymptotics 2 MIT 18.655 ad Delta Method

Cosistecy Asymptotics Statistical Estimatio Problem X 1,..., X iid P θ, θ Θ. q(θ): target of estimatio. qˆ(x 1,..., X ): estimator of q(θ). Defiitio: qˆ is a cosistet estimator of q(θ), i.e., P θ qˆ(x 1,..., X ) q(θ) if for every E > 0, lim P θ ( q (X 1,..., X ) q(θ) > E) = 0. Example: Cosider P θ such that: E [X 1 θ] = θ q(θ) = θ i=1 X i qˆ = = X Whe is ˆq cosistet for θ? 3 MIT 18.655 ad Delta Method

Cosistecy: Example Asymptotics Example: Cosistecy of sample mea ˆq (X 1,..., X ) = X = X. If Var(X 1 θ) = σ 2 (θ) <, apply Chebychev s Iequality. For ay E > 0: Var(X θ) σ 2 (θ)/e 2 P θ ( X E) = 0 E 2 If Var(X 1 θ) =, X is cosistet if E [ X 1 θ] <. Proof: Levy Cotiuity Theorem. 4 MIT 18.655 ad Delta Method

Cosistecy: A Stroger Defiitio Defiitio: qˆ is a uiformly cosistet estimator of q(θ), if for every E > 0, lim sup[p θ ( q (X 1,..., X ) q(θ) > E)] = 0. θ Θ Example: Cosider the sample mea ˆq = X for which E [X θ] = θ Var[X θ] = σ 2 (θ). Proof of cosistecy of ˆq = X exteds to uiform cosistecy if sup θ Θ σ 2 (θ) M < ( ). Examples Satisfyig ( ) X i i.i.d. Beroulli(θ). X i i.i.d. Normal(µ, σ 2 ), where θ = (µ, σ 2 ) Θ = {θ} = (, + ) [0, M], for fiite M <. 5 MIT 18.655 ad Delta Method

Cosistecy: The Strogest Defiitio Defiitio: i qˆ is a strogly cosistet estimator t of q(θ), if P θ lim q (X 1,..., X ) q(θ) E)] = 1, for every E > 0. Compare to: a.s. qˆ q(θ). (a.s. almost surely ) Defiitio: qˆ is a (weakly) cosistet estimator of q(θ), if for every E > 0, lim [P θ ( q (X 1,..., X ) q(θ) > E)] = 0. 6 MIT 18.655 ad Delta Method

Cosistecy of Plug-I Estimators Plug-I Estimators: Discrete Case Discrete outcome space of size K : X = {x 1,..., x K } X 1,..., X iid P θ, θ Θ, where θ = (p 1,..., p K ) P(X 1 = x k θ) = p k, k = 1,..., K K p k 0 for k = 1,..., K ad p k = 1. Θ = S K (K -dimesioal simplex). Defie the empirical distributio: θˆ = (ˆp 1,..., pˆk ) where i=1 1(X i = x k ) N k pˆk = ˆθ S K. 1 7 MIT 18.655 ad Delta Method

Cosistecy of Plug-I Estimators Propositio/Theorem (5.2.1) Suppose X = (X 1,..., X ) is a radom sample of size from a discrete distributio θ S. The: ˆθ is uiformly cosistet for θ S. For ay cotiuous fuctio q : S R d, qˆ = q(θˆ) is uiformly cosistet for q(θ). Proof: For ay E > 0, P θ ( θˆ θ E) 0. This follows upo otig that: {x : θˆ(x ) θ 2 < E 2 } K k=1{x : (θˆ(x ) θ) k 2 < E 2 /K} So K P({x : θˆ(x ) θ 2 E 2 } k=1 P({x : (θˆ(x ) θ) k 2 E 2 /K} K 1 K 2 k=1 4 /(E2 /K) = 4: 2 8 MIT 18.655 ad Delta Method

Cosistecy of Plug-I Estimators Proof (cotiued): q( ): cotiuous o compact S = q( ) uiformly cotiuous o S. For every E > 0, there exists δ(e) > 0 such that: θ 1 θ 0 < δ(e) = q(θ 1 ) q(θ 0 ) < E, uiformly for all θ 0, θ 1 Θ. It follows that {x : q(θˆ(x)) q(θ) < E} c {x : θˆ θ < δ(e)} c = P θ [ qˆ q(θ) E] P θ [ θˆ θ δ(e)] Note: uiform cosistecy ca be show; see B&D. 9 MIT 18.655 ad Delta Method

Cosistecy of Plug-I Estimators Propositio 5.2.1 Suppose: Defie: where The: g = (g 1,..., g d ) : X Y R d. E [ g j (X 1 ) θ] <, for j = 1,..., d, for all θ Θ. m j (θ) E [g j (X 1 ) θ], for j = 1,..., d. q(θ) = h(m(θ)), m(θ) = (m 1 (θ),..., m d (θ)) h : Y R p, is cotiuous. 1 I qˆ = h(ḡ) = h g(x i ) i=1 is a cosistet estimator of q(θ). 10 MIT 18.655 ad Delta Method

Cosistecy of Plug-I Estimators Propositio 5.2.1 Applied to No-Parametric Models P = {P : E P ( g(x 1 ) ) < } ad ν(p) = h(e P g(x 1 )) P ν(pˆ) ν(p) where Pˆ is empirical distributio. 11 MIT 18.655 ad Delta Method

Cosistecy of MLEs i Expoetial Family Theorem 5.2.2 Suppose: The: P is a caoical expoetial family of rak d geerated by T = (T 1 (X ),..., T d (X )) T. p(x η) = h(x)exp{t(x)η A(η)} E = {η} is ope. X 1,..., X are i.i.d P η P P η [ the MLE ˆη exists] 1. ηˆ is cosistet. 12 MIT 18.655 ad Delta Method

Cosistecy of MLEs i Expoetial Family Proof: ηˆ(x 1,..., X ) exists iff T = 1 T(X i )/ C T o. If η 0 is true, the t 0 = E [T(X 1 ) η 0 ] C T o ad A(η 0 ) = t 0. By defiitio of the iterior of the covex support, there exists δ > 0: S δ = {t : t E η0 [T(X 1 ) < δ} C o. T By the WLLN: I 1 Pη0 T = T(X i ) E η0 [T(X 1 )] i=1 = P η0 T C T o 1. ηˆ exists if it solves A(η) = T, i.e., if T C T o, The map η A(η) is 1-to-1 o C 0 ad cotiuous o E, so T the iverse fuctio A 1 is cotiuous, ad Prop. 5.2.1 applies. 13 MIT 18.655 ad Delta Method

Cosistecy of Miimum-Cotrast Estimates Miimum-Cotrast Estimates X 1,..., X iid P θ, θ Θ R d. ρ(x, θ) : X Θ R, a cotrast fuctio D(θ 0, θ) = E [ρ(x, θ) θ 0 ]: the discrepacy fuctio θ = θ 0 uiquely miimizes D(θ 0, θ). The Miimum-Cotrast Estimate θˆ miimizes: 1 I ρ (X, θ) = ρ(x i, θ). i=1 Theorem 5.2.3 Suppose 1 I P θ0 sup{ ρ(xi, θ) D(θ 0, θ)} 0 θ Θ i=1 if {D(θ 0, θ)} > D(θ 0, θ 0 ), for all E > 0. θ θ 0 : The θˆ is cosistet. 14 MIT 18.655 ad Delta Method

Outlie Asymptotics 1 Asymptotics 15 MIT 18.655 ad Delta Method

Delta Method Asymptotics Theorem 5.3.1 (Applyig Taylor Expasios) Suppose that: The where X 1,..., X iid with outcome space X = R. E [X 1 ] = µ, ad Var[X 1 ] = σ 2 <. E [ X 1 m ] <. h : R R, m-times differetiable o R, with m 2. d j h(x) h (j) =, j = 1, 2,..., m. dx j h (m) sup h (m) (x) M <. x X m 1 I h (j) (µ) E (h(x )) = h(µ) + E[(X µ) j ] + R m j! j=1 R m M E [ X 1 m ] m/2 m! 16 MIT 18.655 ad Delta Method

Proof: Apply Taylor Expasio to h(x ) : h(x ) = m 1 I h (j) (µ) h(µ) + (X µ) j + h (m) (X )(X µ) m, j! j=1 where X µ X µ Take expectatios ad apply Lemma 5.3.1: If E X 1 j <, j 2, the there exist costats C j > 0 ad D j > 0 such that E X µ j C j E X 1 j j/2 E [(X µ) j ] D j E X 1 j (j+1)/2 for j odd 17 MIT 18.655 ad Delta Method

Applyig Taylor Expasios Corollary 5.3.1 (a). If E X 1 3 < ad h (3) <, the h (2) (µ) E [h(x )] = h(µ) + 0 + [ ] σ2 2 + O( 3/2 ) Corollary 5.3.1 (b). If E X 1 4 < ad h (4) <, the h (2) (µ) E [h(x )] = h(µ) + 0 + [ ] σ2 2 + O( 2 ) For (b), use Lemma 5.3.2 with j = 3 (odd) gives 1 E (x µ) 3 D 3 E [ X 1 3 ] = O( 2 ) Note: Asymptotic bias of h(x ) for h(µ) : If h (2) (µ) 1 ) = 0, the O( If h (2) (µ) = 0, the O( 3/2 ) if third-momet fiite ad O( 2 ) if fourth-momet fiite. 18 MIT 18.655 ad Delta Method 2

Applyig Taylor Expasios: Asymptotic Variace Corollary 5.3.2 (a). If h (j) <, for j=1,2,3, ad E Z 1 3 <, the σ2 [h(1)(µ)] 2 Var[h(X )] = + O( 3/2 ) Proof: Evaluate Var[h(X )] = E [(h(x ) 2 ] (E[h(X ]) 2 From Corollary 5.3.1 (a): h (2) (µ) E [h(x )] = h(µ) + 0 + [ ] σ2 2 + O( 3/2 ) h (2) (µ) ] σ2 2 = (E[h(X )]) 2 = (h(µ) + [ ) 2 + O( 3/2 ) = (h(µ)) 2 + [h(µ)h (2) (µ)] σ2 + O( 3/2 ) Takig Expectatio of the Taylor Expasio: E ([h(x )] 2 ) = [h(µ)] 2 + E [X [ µ] [ 2[h(µ)]h (1) (µ) 1 2 [ + E[(X µ) 2 ] 2[h (1) (µ)] 2 + 2[h(µ)]h (2) (µ) 1 + 6 E [(X µ) 3 ] [h 2 (µ)] (3) (X ) Differece gives result. 19 MIT 18.655 ad Delta Method

Note: Asymptotic bias of h(x ) for h(µ) is O( 1 ). Asymptotic stadard deviatio of h(x ) is O( 1 ) uless (!) h (1) (µ) = 0. More terms i ataylor Series with fiite expectatios of E [ X θ j ] yields fier approximatio to order O( j/2 ) Taylor Series Expasios apply to fuctios of vector-valued statistics (See Theorem 5.3.2). 20 MIT 18.655 ad Delta Method

Outlie Asymptotics 1 Asymptotics 21 MIT 18.655 ad Delta Method

Theorem 5.3.3 Suppose The X 1,..., X iid with X = R. E [X 1 2 ] <. µ = E [X 1 ] ad σ 2 = Var(X 1 ). h : R R is differetiable at µ. [ L (h(x ) h(µ))) N(0, σ 2 (h)) where σ 2 (h) = [h (1) (µ)] 2 σ 2. Proof: Apply Taylor expasio of h(x ) about µ : h(x ) = h(µ) + (X µ)[h (1) (µ) + R ] = (h(x ) h(µ)) = [ (X µ)][h (1) (µ) + R ] L [N(0, σ 2 )] h (1) (µ) 22 MIT 18.655 ad Delta Method

Limitig Distributios of t Statistics Example 5.3.3 Oe-sample t-statistic X 1,..., X iid P P E P [X 1 ] = µ Var P (X 1 ) = σ 2 < For a give µ 0, defie t-statistic for testig H 0 : µ = µ 0 versus H 1 : µ > µ 0. (X µ 0 ) T = where 2 s s = 1(X i X ) 2 /( 1). L If H is true the T N(0, 1). Proof: Apply Slutsky s theorem for limit of {U /v } where (X µ 0 ) L U = N(0, 1). σ P v = s /σ 1. 23 MIT 18.655 ad Delta Method

Limitig Distributios of t Statistics Example 5.3.3 Two-Sample t-statistic X 1,..., X 1 iid with E [X 1 ] = µ 1 ad Var[X 1 ] = σ2 1 <. Y 1,..., X 2 iid with E [Y 1 ] = µ 2 ad Var[Y 1 ] = σ2 2 <. Defie t-statistic for testig H 0 : µ 1 2 = µ 1 ( versus H ) 1 : µ 2 > µ 1. Y X 1 2 Y X T = = s2 s2 s + 1 2 2 1 1 where s = (X 2 i X ) 2 + (Y i Y ) 2 2 i=1 j=1 If H is true, σ 2 = σ2 1 2, ad all distributios are Gaussia, the T t 2, (a t-distributio) I geeral, if H is true, ad 1 ad 2, with 1 / λ, (0 < λ < 1), the L (1 λ)σ1 2 +λσ2 T 2 N(0, τ 2 ), where τ 2 = λσ1 ( 1, whe?) 2 +(1 λ)σ2 2 24 MIT 18.655 ad Delta Method

Additioal Topics Mote Carlo simulatios/studies: evaluatig asymptotic distributio approximatios. Variace-stabilizig trasformatios: Whe E [X ] = µ, but Var[X ] = σ 2 (µ), cosider h(x ) such that σ 2 (µ)[h (1) (µ)] 2 = costat Asymptotic distributio approximatio for h(x ) will have a costat variace. Edgeworth Approximatios: Refiig the Cetral Limit Theorem to match ozero skewess ad o-gaussia kurtosis. 25 MIT 18.655 ad Delta Method

Taylor Series Review Asymptotics Power Series Represetatio of fuctio f (x) f (x) = j=0 c j (x a) j, for x : x a < d a = ceter; ad d = radius of covergece Theorem: If f (x) has a power series represetatio, the d j f (j) (a) [f (x)] dx c j j = = x=a. j! j! m Defie T m (x) = j=1 c j (x a) j, ad R m (x) = f (x) T m (x). lim T m (x) = f (x) ad ( ) lim R m (x) = 0. m m 26 MIT 18.655 ad Delta Method

27 Power Series Approximatio of f (x) where f (j) (x): fiite for 1 j m sup x f (m) (x) M. For m = 2: f (2) (x) M x f (2) x = a (t)dt a Mdt f (1) (x) f (1) (a) M(x a) f (1) (x) f (1) (a) + M(x a) Itegrate agai: x f (1) (t)dt x [f (1) (a) + M(t a)]dt a a (x a) 2 f (x) f (a) f (1) (a)(x a) + M 2 (x a) f (x) f (a) + f (1) (a)(x a) + M 2 2 Reverse iequality ad use M: (x a) = f (x) f (a) + f (1) (a)(x a) M 2 2 = f (x) = f (a) + f (1) (a)(x a) + R 2 (x) where R 2 (x) M MIT 18.655 (t a) 2 2 ad Delta Method

Delta Method for Fuctio of a Radom Variable X a r.v. with µ = E [x] h( ) fuctio with m derivatives h(x ) = h(µ) + (X µ)h (1) (µ) + R 2 (X ) where R 2 (X ) M (X µ) 2. 2 h(x ) = h(µ) + (X µ)h (1) (µ) + (X µ) 2 h (2) (µ) + R 3 (X ) 2 where R 3 (X ) M X µ 3. 3! h(x ) = h(µ) + (X µ)h (1) (µ) + (X µ) 2 h (2) (µ) 2 +(X µ) 3 h (3) (µ) 3! + R 4 (X ) where R 4 (X ) M X µ 4. 4! 28 MIT 18.655 ad Delta Method

1 Key Example: X = i=1 X i, for i.i.d. X i E [X 1 ] = θ, E [(X 1 θ) 2 ] = σ 2 E [(X 1 µ) 3 ] = µ 3 E [ X 1 µ 3 ] = κ 3 With X for a give sample size E [X ] = θ, E [(X θ) 2 σ ] = 2 µ E [(X θ) 3 ] = 3 E [ X µ 3 ] = 0 p [( 1 ) 3 ] 2 Takig Expectatios of the Delta Formulas (cases m = 2, 3) E [h(x )] = E[h(θ) + ( X θ)h (1) (θ) + R 2 (X )] = h(θ) + E [(X θ)]h (1) (θ) + E[R 2 (X )] = h(θ) + 0 + E [R 2 (X )] σ where E [R 2 2 (X )] E [ R 2 (X ) ] M E [(X θ) 2 ] = M 2 2 X θ) 2 h (2) (θ) E [h(x )] = E[h(θ) + ( X θ)h (1) (θ) + ( 2 + R 3 (X )] h (2) (θ) = h(θ) + σ2 2 + E[R 3 (X )] where E [R 3 (X )] [E R 3 (X ) ] M E [ X θ 3 ] = M O p ( 1 3! 3! 3 ). 29 MIT 18.655 ad Delta Method

Takig Expectatios of the Delta Formula (case m = 4) X θ) 2 h (2) (θ) E [h(x )] = E[h(θ) + ( X θ)h (1) (θ) + ( 2 ]+ +E [( X θ) 3 h(3) + R 4 (X )] 3! h (2) (θ) 2 3! h (2) (θ) µ h (3) 3 2 2 3! h (2) (θ) 2 2 = h(θ) + σ2 + E[( X θ) 3 ] h(3) + E [R 4 (X )] = h(θ) + σ2 + + E [R 4 (X )] = h(θ) + σ2 + O p ( 1 ) because E [R 4 (X )] [E R 4 (X ) ] M E [ X θ 4 ] = M O p [ 1 ]. 4! 4! 2 30 MIT 18.655 ad Delta Method

1 Key Example: X = i=1 X i, for i.i.d. X i E [X 1 ] = θ, E [(X 1 θ) 2 ] = σ 2 E [(X 1 µ) 3 ] = µ 3 E [ X 1 µ 3 ] = κ 3 With X for a give sample size E [X ] = θ, E [(X θ) 2 σ ] = 2 µ E [(X θ) 3 ] = 3 E [ X µ 3 ] = 0 p [( 1 ) 3 ] 2 Limit Laws from the Delta Formula (case m = 2) h(x ) = h(θ) + ( X θ)h (1) (θ) + R 2 (X ) = [h(x ) h(θ)] = (X θ)h (1) (θ) + R 2 (X ) = (X θ)h (1) (θ) + O p ( 1 ) σ 2 L N(0, [h (1) (θ)] 2 ) M σ 2 sice E [R 2 (X )] 2 Note: if h (1) (θ) = 0, the [h(x ) h(θ)] Pr 0. Cosider icreasig scalig to [h(x ) h(θ)] 32 MIT 18.655 ad Delta Method

MIT OpeCourseWare http://ocw.mit.edu 18.655 Mathematical Statistics Sprig 2016 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.