High Dimensional Edgeworth Expansion. Applications to Bootstrap and Its Variants

With Applications to Bootstrap and Its Variants Department of Statistics, UC Berkeley Stanford-Berkeley Colloquium, 2016

Francis Ysidro Edgeworth (1845-1926) Peter Gavin Hall (1951-2016)

Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results

Setup Given the data X = (X 1,..., X n ) i.i.d. F ; X i R p, EX i = 0, EX i X T i = V ; Sample mean X = 1 n n i=1 X i; Goal: approximate the distribution of X, i.e. approx. ( nv ) P 1 2 X A for A C where C denote the collection of all covex sets in R p.

CLT with Fixed Dimensions Let Φ be the measure of N(0, I p p ). When the dimension p is fixed: Central Limit Theorem (CLT): ( nv ) sup P 1 2 X A A C Φ(A) = o(1);

CLT with Fixed Dimensions Let Φ be the measure of N(0, I p p ). When the dimension p is fixed: Central Limit Theorem (CLT): ( nv ) sup P 1 2 X A A C Φ(A) = o(1); Berry-Esseen Bound (with third-order moments): ( nv ) sup P 1 2 X A Φ(A) = O A C ( n 1 2 ) ;

Edgeworth Expansion with Fixed Dimensions Edgeworth Expansion (with (ν + 3)-order moments): ( nv ) sup P 1 2 X A Φ(A) A C ν n ( j 2 Pj (A) = O j=1 n ν+1 2 where P j ( ) are sign measures determined by the cumulants of F. When ν = 1 and p = 1, ( nv ) sup P 1 2 X A Φ(A) n 1 2 P1 (A) = O ( n 1) A C where P 1 (A) has a density p 1 (x) = 1 6 (x 3 x) 1 2π e x2 2. ) ;

Edgeworth Expansion and Bootstrap Draw a bootstrap sample X1,..., X n of X 1,..., X n ; i.i.d. ˆF n where ˆF n is the ecdf

Edgeworth Expansion and Bootstrap Draw a bootstrap sample X1,..., X n of X 1,..., X n ; i.i.d. ˆF n where ˆF n is the ecdf Heuristically, a first-order edgeworth expansion implies ( n(v sup P ) 1 2 ( X X ) A ) Φ(A) n X 1 2 P 1 (A) = O ( n 1) ; A C where V = Var(X1 ) and P 1 ( ) is determined by the cumulants of X1.

Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2.

Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2. As a consequence, ( n(v sup P ) 1 2 ( X ( nv X ) A ) P X 1 2 X A) = O ( n 1). A C

Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2. As a consequence, ( n(v sup P ) 1 2 ( X ( nv X ) A ) P X 1 2 X A) = O ( n 1). A C This is called Higher-Order Accuracy (Hall, 1992).

Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results

CLT in High Dimensions The CLT in high dimensions has been investigated since 60 s, e.g. Sazonov, 1968; Portnoy, 1986; Gotze, 1991.

CLT in High Dimensions The CLT in high dimensions has been investigated since 60 s, e.g. Sazonov, 1968; Portnoy, 1986; Gotze, 1991. Sharp result is obtained by Bentkus (2003), ( nv ) sup P 1 2 X A Φ(A) = O A C ( ) p 7 4 ; n

Edgeworth Expansion in High Dimensions In contrast to CLT, very few works on edgeworth expansion in high dimensions; Some results on Banach space but focus on Ef ( X ) for smooth f instead of the law of X (Gotze, 1981; Bentkus, 1984). Using existing techniques (Bhattacharya & Rao, 1986): p PolyLog(n); Fundamental limit: n 1 2 P 1 (A) is of order n 1 2 p 3, without further constraints, p n 1 6.

Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results

Edgeworth Expansion in High Dimension Theorem 1. Let X 1,..., X n be i.i.d. samples with zero mean and covariance matrix V. Assume that 1 λ max (V ) = O(1), λ min (V ) = Ω(1); 2 p = O(n γ ) for some γ < 1 14 ; 3 X ij B = O(1); Then for any positive integer S, ( ν (p ) ν+1 ) sup P( nv 1 2 X A) Φ(A) n j 9 2 Pj (A) = O 2. n A C n j=1

Here C n includes all convex sets plus all sets with form {F 1 (A) : A is convex} for arbitrary smooth non-linear functions F : R p R.

Here C n includes all convex sets plus all sets with form {F 1 (A) : A is convex} for arbitrary smooth non-linear functions F : R p R. As a corollary, for a smooth F, P(F ( ν n X ) A) Φ 0,V (F 1 (A)) + n j 2 Pj (V 1 2 F 1 (A)). This gives an edgeworth expansion for smooth transform of mean. j=1

Bootstrap in High Dimension Theorem 2. Let X 1,..., X n be i.i.d. samples with zero mean and covariance matrix V and X1,..., X n i.i.d. ˆF n. Assume that 1 λ max (V ) = O(1), λ min (V ) = Ω(1); 2 p = Θ(n γ ) for some 0 < γ < 1 17 ; 3 X ij B = O(1). Then with probability 1 exp { Ω (n γ )}, ( sup P V ( 1 2 X X ) ( ) A) P V 1 2 X A Cp9 n. A C n This is strictly better than the bound given by CLT since n << p 7 4, when p = O(n 1 n p 9 17 ).

Thanks!

References Bentkus, V. (1984). Asymptotic expansions in the central limit theorem in hilbert space. Lithuanian Mathematical Journal, 24(3), 210 225. Bentkus, V. (2003). On the dependence of the berry esseen bound on dimension. Journal of Statistical Planning and Inference, 113(2), 385 402. Bhattacharya, R. N., & Rao, R. R. (1986). Normal approximation and asymptotic expansions (Vol. 64). SIAM. Gotze, F. (1981). On edgeworth expansions in banach spaces. The Annals of Probability, 852 859. Gotze, F. (1991). On the rate of convergence in the multivariate clt. The Annals of Probability, 724 739. Hall, P. (1992). The bootstrap and edgeworth expansion. Portnoy, S. (1986). On the central limit theorem in r p when p. Probability theory and related fields, 73(4), 571 583.

References Sazonov, V. (1968). On the multi-dimensional central limit theorem. Sankhyā: The Indian Journal of Statistics, Series A, 181 204.