With Applications to Bootstrap and Its Variants Department of Statistics, UC Berkeley Stanford-Berkeley Colloquium, 2016
Francis Ysidro Edgeworth (1845-1926) Peter Gavin Hall (1951-2016)
Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results
Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results
Setup Given the data X = (X 1,..., X n ) i.i.d. F ; X i R p, EX i = 0, EX i X T i = V ; Sample mean X = 1 n n i=1 X i; Goal: approximate the distribution of X, i.e. approx. ( nv ) P 1 2 X A for A C where C denote the collection of all covex sets in R p.
CLT with Fixed Dimensions Let Φ be the measure of N(0, I p p ). When the dimension p is fixed: Central Limit Theorem (CLT): ( nv ) sup P 1 2 X A A C Φ(A) = o(1);
CLT with Fixed Dimensions Let Φ be the measure of N(0, I p p ). When the dimension p is fixed: Central Limit Theorem (CLT): ( nv ) sup P 1 2 X A A C Φ(A) = o(1); Berry-Esseen Bound (with third-order moments): ( nv ) sup P 1 2 X A Φ(A) = O A C ( n 1 2 ) ;
Edgeworth Expansion with Fixed Dimensions Edgeworth Expansion (with (ν + 3)-order moments): ( nv ) sup P 1 2 X A Φ(A) A C ν n ( j 2 Pj (A) = O j=1 n ν+1 2 where P j ( ) are sign measures determined by the cumulants of F. When ν = 1 and p = 1, ( nv ) sup P 1 2 X A Φ(A) n 1 2 P1 (A) = O ( n 1) A C where P 1 (A) has a density p 1 (x) = 1 6 (x 3 x) 1 2π e x2 2. ) ;
Edgeworth Expansion and Bootstrap Draw a bootstrap sample X1,..., X n of X 1,..., X n ; i.i.d. ˆF n where ˆF n is the ecdf
Edgeworth Expansion and Bootstrap Draw a bootstrap sample X1,..., X n of X 1,..., X n ; i.i.d. ˆF n where ˆF n is the ecdf Heuristically, a first-order edgeworth expansion implies ( n(v sup P ) 1 2 ( X X ) A ) Φ(A) n X 1 2 P 1 (A) = O ( n 1) ; A C where V = Var(X1 ) and P 1 ( ) is determined by the cumulants of X1.
Edgeworth Expansion and Bootstrap Draw a bootstrap sample X1,..., X n of X 1,..., X n ; i.i.d. ˆF n where ˆF n is the ecdf Heuristically, a first-order edgeworth expansion implies ( n(v sup P ) 1 2 ( X X ) A ) Φ(A) n X 1 2 P 1 (A) = O ( n 1) ; A C where V = Var(X1 ) and P 1 ( ) is determined by the cumulants of X1. Recall that ( nv ) sup P 1 2 X A Φ(A) n 1 2 P1 (A) = O(n 1 ); A C
Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2.
Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2. As a consequence, ( n(v sup P ) 1 2 ( X ( nv X ) A ) P X 1 2 X A) = O ( n 1). A C
Edgeworth Expansion and Bootstrap The cumulants of F and those of ˆF n are closed and thus ( ) sup P 1 (A) P1 (A) = O A C n 1 2. As a consequence, ( n(v sup P ) 1 2 ( X ( nv X ) A ) P X 1 2 X A) = O ( n 1). A C This is called Higher-Order Accuracy (Hall, 1992).
Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results
CLT in High Dimensions The CLT in high dimensions has been investigated since 60 s, e.g. Sazonov, 1968; Portnoy, 1986; Gotze, 1991.
CLT in High Dimensions The CLT in high dimensions has been investigated since 60 s, e.g. Sazonov, 1968; Portnoy, 1986; Gotze, 1991. Sharp result is obtained by Bentkus (2003), ( nv ) sup P 1 2 X A Φ(A) = O A C ( ) p 7 4 ; n
CLT in High Dimensions The CLT in high dimensions has been investigated since 60 s, e.g. Sazonov, 1968; Portnoy, 1986; Gotze, 1991. Sharp result is obtained by Bentkus (2003), ( nv ) sup P 1 2 X A Φ(A) = O A C ( ) p 7 4 ; n Fundamental limit: p = o(n 2 7 ) for CLT to hold;
Edgeworth Expansion in High Dimensions In contrast to CLT, very few works on edgeworth expansion in high dimensions; Some results on Banach space but focus on Ef ( X ) for smooth f instead of the law of X (Gotze, 1981; Bentkus, 1984).
Edgeworth Expansion in High Dimensions In contrast to CLT, very few works on edgeworth expansion in high dimensions; Some results on Banach space but focus on Ef ( X ) for smooth f instead of the law of X (Gotze, 1981; Bentkus, 1984). Using existing techniques (Bhattacharya & Rao, 1986): p PolyLog(n);
Edgeworth Expansion in High Dimensions In contrast to CLT, very few works on edgeworth expansion in high dimensions; Some results on Banach space but focus on Ef ( X ) for smooth f instead of the law of X (Gotze, 1981; Bentkus, 1984). Using existing techniques (Bhattacharya & Rao, 1986): p PolyLog(n); Fundamental limit: n 1 2 P 1 (A) is of order n 1 2 p 3, without further constraints, p n 1 6.
Edgeworth Expansion in High Dimensions In contrast to CLT, very few works on edgeworth expansion in high dimensions; Some results on Banach space but focus on Ef ( X ) for smooth f instead of the law of X (Gotze, 1981; Bentkus, 1984). Using existing techniques (Bhattacharya & Rao, 1986): p PolyLog(n); Fundamental limit: n 1 2 P 1 (A) is of order n 1 2 p 3, without further constraints, p n 1 6. Question: How fast can the dimension grow with n?
Table of Contents 1 Background 2 High Dimensional Settings 3 Main Results
Edgeworth Expansion in High Dimension Theorem 1. Let X 1,..., X n be i.i.d. samples with zero mean and covariance matrix V. Assume that 1 λ max (V ) = O(1), λ min (V ) = Ω(1); 2 p = O(n γ ) for some γ < 1 14 ; 3 X ij B = O(1); Then for any positive integer S, ( ν (p ) ν+1 ) sup P( nv 1 2 X A) Φ(A) n j 9 2 Pj (A) = O 2. n A C n j=1
Here C n includes all convex sets plus all sets with form {F 1 (A) : A is convex} for arbitrary smooth non-linear functions F : R p R.
Here C n includes all convex sets plus all sets with form {F 1 (A) : A is convex} for arbitrary smooth non-linear functions F : R p R. As a corollary, for a smooth F, P(F ( ν n X ) A) Φ 0,V (F 1 (A)) + n j 2 Pj (V 1 2 F 1 (A)). This gives an edgeworth expansion for smooth transform of mean. j=1
Bootstrap in High Dimension Theorem 2. Let X 1,..., X n be i.i.d. samples with zero mean and covariance matrix V and X1,..., X n i.i.d. ˆF n. Assume that 1 λ max (V ) = O(1), λ min (V ) = Ω(1); 2 p = Θ(n γ ) for some 0 < γ < 1 17 ; 3 X ij B = O(1). Then with probability 1 exp { Ω (n γ )}, ( sup P V ( 1 2 X X ) ( ) A) P V 1 2 X A Cp9 n. A C n This is strictly better than the bound given by CLT since n << p 7 4, when p = O(n 1 n p 9 17 ).
Thanks!
References Bentkus, V. (1984). Asymptotic expansions in the central limit theorem in hilbert space. Lithuanian Mathematical Journal, 24(3), 210 225. Bentkus, V. (2003). On the dependence of the berry esseen bound on dimension. Journal of Statistical Planning and Inference, 113(2), 385 402. Bhattacharya, R. N., & Rao, R. R. (1986). Normal approximation and asymptotic expansions (Vol. 64). SIAM. Gotze, F. (1981). On edgeworth expansions in banach spaces. The Annals of Probability, 852 859. Gotze, F. (1991). On the rate of convergence in the multivariate clt. The Annals of Probability, 724 739. Hall, P. (1992). The bootstrap and edgeworth expansion. Portnoy, S. (1986). On the central limit theorem in r p when p. Probability theory and related fields, 73(4), 571 583.
References Sazonov, V. (1968). On the multi-dimensional central limit theorem. Sankhyā: The Indian Journal of Statistics, Series A, 181 204.