Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University High Dimensional Probability VII Institut d Études Scientifiques de Cargèse May 29th, 2014 (Joint work with Jeff Nisen)

Outline 1 The Statistical Problems and the Main Estimators 2 Optimally Thresholded Estimators for Finite-Jump Activity Models 3 Main Results 4 Extensions Additive Processes Stochastic Volatility Processes 5 Conclusions

Set-up 1 Itô Semimartingales: Continuous-time stochastic process t X t with dynamics where dx t = γ t dt + σ t dw t + dj t, t W t is a standard Brownian motion; t J t := N t j=1 ζ j is a piece-wise constant process of finite jump activity; t γ t and t σ t are adapted processes; 2 Finite-Jump Activity (FJA) Lévy Model: where N t X t = γt + σw t + ζ j, {N t} t 0 is a homogeneous Poisson process with jump intensity λ; {ζ j } j 0 are i.i.d. with density f ζ : R R +; j=1 the triplet ({W t}, {N t}, {ζ j }) are mutually independent.

Statistical Problems Given a discrete record of observations, X t0, X t1,..., X tn, 0 = t 0 < t 1 < < t n = T, the following problems are of interest in a high-frequency sampling setting (i.e., mesh(π) := max i {t i t i 1 } 0): 1 Estimating the integrated variance (or continuous quadratic variation): σ 2 T := T 0 σ 2 t dt. 2 Estimating the jump features of the process: Jump times τ 1 < τ 2 < < τ NT Jump sizes ζ 1 < ζ 2 < < ζ NT

Two main classes of estimators Precursor. Realized Quadratic Variation: n 1 ( QV (X) π := Xti+1 X ) 2 t i, (π : 0 = t0 < < t n = T ). i=0 Under very general conditions: RV (X) π mesh(π) 0 σ 2 T + N T j=1 ζ2 j. 1 Multipower Realized Variations (Barndorff-Nielsen and Shephard (2004)): n 1 BPV (X) T := Xti+1 X ti Xti+2 X ti+1, MPV (X) (r 1,...,r k ) T := i=0 n k X ti+1 X ti r 1... X ti+k X ti+k 1 r k. i=0 2 Threshold Realized Variations (Mancini (2003), Jacod(2007)): n 1 ( TRV (X)[B] π T := Xti+1 X ) 2 t i 1{ Xti+1 Xti B}, (B (0, )). i=0

Advantages and Drawbacks 1 Multipower Realized Variations (MPV) Pros: Easy to implement; Cons: Can exhibit high levels of bias and variability in the presence of jumps; e.g., for a FJA Lévy model and regular sampling (t i = ih n with h n = T /n), ] E [MPV (X) (r 1,...,r k ) T C r σ T 2 T C r h 1 2 1 max i r i n, with Cr = k i=1 E N (0, 1) r i r 1 + + r k = 2. 2 Threshold Realized Variations (TRV): Pros: Can be adapted for estimating other jump features: n 1 n 1 N[B] π T := 1 { >B} Xti+1, Ĵ[B] π ( ) 2 X T := Xti+1 X ti 1 { ti Xti+1 Xti i=0 i=0 } >B Cons: Performance strongly depends on a good" choice of the threshold B; e.g., for a FJA Lévy model and regular sampling (t i = ih n with h n = T /n), [ ( E TRV (X)[B n] π T ] σ T 2 Th n γ 2 λσ 2) ( ) Bn Bn 2T σφ +2 T λb3 nc(f ζ ). hn hn 3

Selection of threshold parameter Literature consists of mostly somewhat ad hoc" selection methods for B, which satisfy sufficient conditions for the consistency of the associated estimators. Power Threshold (Mancini (2003)). Pow(α, ω) := α mesh(π) ω, for α > 0 and ω (0, 1/2). Bonferroni Threshold (Gegler & Stadtmüller (2010) and Bollerslev et al. (2007)) ( BF n ( σ, C) := σmesh(π) 1/2 Φ 1 1 C mesh(π) ), for C > 0 and σ > 0. 2

Numerical illustration I 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 Diffusion Volatility Parameter (DVP) Estimates RMPV(1, 1) RMPV(2 3, 2 3, 2 3) RMPV(1 2,, 1 2) RMPV(2 5,, 2 5) RMPV(1 3,, 1 3) Min RV(2) Med RV(2) TBPV(Pow(0.05)) TBPV(Pow(0.15)) TBPV(Pow(0.25)) TBPV(Pow(0.35)) TBPV(Pow(0.45)) TBPV(Pow(0.495)) TBPV(B opt) TBPV(BF(0.05)) TBPV(BH(0.05)) Merton Model: Diffusion Volatility Parameter Estimates Actual DVP Multi Power Variation Style Estimators Thresholded Multi Power Style Estimators Multiple Testing Style Estimators Figure: Volatility Estimation Boxplots: Based on 2,500 sample paths (T = 1 year, 5 min sampling frequency). Parameters: σ = 0.3, λ = 20, ζ = D N ( 0.1, 0.1 2 ).

Numerical illustration II 0.34 0.36 0.38 0.40 0.42 0.44 Diffusion Volatility Parameter Estimates TRV(GE) TRV(Hist) TRV(BF) TRV(Pow(0.35)) TRV(Pow(0.4)) TRV(Pow(0.45)) TRV(Pow(0.495)) MPV(1, 1) MPV(2 3,, 2 3) MPV(1 2,, 1 2) MPV(2 5,, 2 5) MPV(1 3,, 1 3) MinRV MedRV Merton Model: Diffusion Volatility Parameter Estimates Calibrated TRV Estimators Uncalibrated TRV Estimators Power Variation Estimators T = 6 months Freq. 5 min. σ = 0.35 λ = 197.55 δ = 0.037 Figure: Volatility Estimation Boxplots: Based on 1,000 sample paths. Parameters: σ = 0.35, λ = 197.55, ζ = D N (0, 0.037 2 ), T = 6-months, sampling frequency = 5-min.

Optimal Threshold Realized Estimators 1 Question: Can the threshold parameter be chosen in a meaningfully objective manner? 2 Aims Develop a well-posed optimal selection criterion for the threshold B, that minimizes a suitable statistical loss function of estimation. Develop a feasible implementation method for B. 3 Assumptions Finite-Jump Activity Lévy Model: N t i.i.d. X t = γt + σw t + ζ j, {N t} t 0 Poisson(λ), ζ j f ζ, ; j=1 Regular sampling scheme with mesh h n := T n ; i.e., π : t i = it n. The jump density function f ζ takes the mixture form: f ζ (x) = pf +(x)1 {x 0} + qf ( x)1 {x<0} with p + q = 1, p, q 0, f ± : [0, ) R + C 1 b(0, )

Loss Functions 1 Natural Loss Function Loss (1) n (B) := E [ TRV (X)[B]T T σ 2 2] + E [ N[B]T N T 2 ]. 2 Alternative Loss Function Loss (2) n (B) := E n i=1 ( ) 1 [ n i X >B, n N=0] + 1 i [ n i X B, n N 0], i where, as usual, n i X := X t i X ti 1 and n i N := N t i N ti 1. 3 Interpretation Loss (1) n (B) favors sequences that minimizes the estimation errors of both the continuous and the jump component. Loss (2) n (B) favors sequences that minimizes the total number of miss-classifications: flag a jump within [t i 1, t i ], when there is no jump, and fail to identify a jump within [t i 1, t i ] when there is a jump. Loss (2) n (B) is much more tractable than Loss (1) n (B).

Asymptotic Comparison of Loss Functions Theorem (FL & Nisen (2013)) Given a threshold sequence (B n ) n satisfying B n 0 and B n n, there exists a positive sequence (ε n ) n 1, with lim n ε n = 0, such that Loss (2) n (B) + R n (B) Loss (1) n (B) (1 + ε n )Loss (2) n (B) + R n (B) + R n (B), where, as n, R n (B) T ( ( ) 2 λ2 2σ 2 h n + T 2 Bn φ B n hn σ 2λB n C(f ζ )), h n R n (B) 6 [ T σ 4 h n + 2B 6 nt 2 λ 2 C(f ζ ) 2], C(f ζ ) := pf + (0) + qf (0). In particular, lim inf n B>0 Loss(1) n (B)/ inf B>0 Loss(2) n (B) = 1. Note: Loss (2) n is more analytically tractable: B n := arg min B>0 Loss (2) n (B).

Optimal Threshold Parameter Results Theorem: (FL & Nisen (2013)) (1) There exists an N N such that for all n N, the loss function Loss (2) n (B) is quasi-convex and, hence, possesses a unique global minimum Bn. (2) Furthermore, as n, the optimal threshold sequence (Bn) n is such that ( ) ( ) 1 ln 2πσλC(fζ ) σh 1/2 ( ) Bn = 3σ 2 n h h n log n + o, h n 3 log(1/hn ) log(1/h n ) where C(f ζ ) = p + f + (0) + qf (0). Corollary: For the optimally thresholded TRV estimator, as n, Bias(TRV (X)[B n]) h n (γ 2 λσ 2 ), MSE(TRV (X)[B n]) 2σ 4 h n.

Remarks 1 The leading term of the optimal sequence is proportional to the Lévy modulus of the Brownian motion: lim sup h 0 1 2h ln(1/h) 2 The leading order sequence B,1 n := sup W t W s = 1, a.s. t s <h,s,t [0,1] ( 1 3σ 2 h n ln also attains the same rates as B n and provides a blueprint" for devising a good threshold sequence. 3 The second-order term accounts for the volatility σ, the intensity of jumps λ, and the mass concentration of f ζ near 0, as measured by C(f ζ ). h n ),

A Feasible Iterative Algorithm to Find B n 1 Key Issue: The optimal threshold B would allow us to find an optimal estimate ˆσ for σ 2 of the form but B depends on precisely σ 2. ˆσ 2 := 1 T TRV (X)[B (σ 2 )] n, 2 The previous issue suggests a fixed-point" type of implementation: Set σ n,0 2 := 1 n T i=1 X t i X ti 1 2 and B ( ) 1/2 n,0 := 3 σ n,0 2 h n ln(1/h n ) while σ 2 n,k 1 > σ2 n,k do σ n,k+1 2 1 T TRV (X)[ B n,k ] and B n,k+1 (3 σ n,k+1 2 h n ln(1/h n ) end while { } Let kn := inf k 1 : σ n,k+1 2 = σ2 n,k and take σ n,k 2 as the final n estimate for σ and the corresponding B n,k as an estimate for n B n. 3 The previous algorithm generates a non-increasing sequence of estimators { σ 2 n,k } k and finish in finite time. ) 1/2

A numerical illustration Merton Model: 4-year / 1-day σ = 0.3 λ = 5 µ = 0, δ = 0.6 Method TRV S TRV Loss S Loss B n,k n 0.2985 0.0070 2.0588 1.4267 Pow ω=0.495;α=1 0.2967 0.0066 2.2992 1.4972 BF 0.2983 0.0071 2.1756 1.4749 Table: Finite-sample performance of the threshold realized variation (TRV) estimators i.i.d. based on K = 5, 000 sample paths for the Merton model ζ i N (µ, δ 2 ). Loss represents the total number of Jump Misclassification Errors, while TRV, Loss, S TRV, and S Loss denote the corresponding sample means and standard deviations, respectively.

A numerical illustration (S2) Kou Model: 1-week / 5-minute σ = 0.5 λ = 50 p = 0.45, α + = 0.05, α = 0.1 Method TRV S TRV Loss S Loss B n,k n 0.5004 0.0186 0.2232 0.4706 Pow ω=0.495;α=1 0.4407 0.0142 13.5302 3.6392 BF 0.4917 0.0193 1.180 1.0775 Table: Finite-sample performance of the threshold realized variation (TRV) estimators based on K = 5, 000 sample paths for the Kou model: f Kou (x) = p α + e x/α+ 1 [x 0] + (1 p) α e x /α 1 [x<0]. Loss represents the total number of Jump Misclassification Errors, while TRV, Loss, S TRV, and S Loss denote the corresponding sample means and standard deviations, respectively.

A numerical illustration (S3) Kou Model: 1-year / 5-minute σ = 0.4 λ = 1000 p = 0.5, α + = α = 0.1 Method TRV S TRV Loss S Loss B n,k n 0.4039 0.0028 139.6776 12.2193 Pow ω=0.495;α=1 0.3767 0.0019 230.0170 15.0308 BF 0.6495 0.0315 375.5850 24.3999 Table: Finite-sample performance of the threshold realized variation (TRV) estimators based on K = 5, 000 sample paths for the Kou model: f ζ (x) = p α + e x/α+ 1 [x 0] + q α e x /α 1 [x<0]. Loss represents the total number of Jump Misclassification Errors, while TRV, Loss, S TRV, and S Loss denote the corresponding sample means and standard deviations, respectively.

Additive Processes 1 The model X s := s 0 γ(u)du + s 0 N s σ(u)dw u + ζ j =: Xs c + J s, where (N s ) s 0 Poiss ({λ(s)} s 0 ), independent of W, and deterministic smooth functions σ, λ : [0, ) R + and γ : [0, ) R with σ and λ bounded away from 0. 2 Optimal Threshold Problem Given a sampling scheme π : t 0 < < t n = T, determine the vector B π, = (B π, t 1 inf E B=(Bt1,...,B tn ) R m + n = i=1 j=1,..., B π, t n ) that minimizes the problem n i=1 ( 1 [ Xti X ti 1 >B ti,n ti N ti 1 =0] + 1 [ Xti X ti 1 B ti,n ti N ti 1 0] inf {P ( i X > B ti, i N = 0) + P ( i X B ti, i N 0)}, B ti ( i X := X ti X ti 1, i N := N ti N ti 1 ) )

Well-posedness and Asymptotic Characterization Theorem (FL & Nisen (2013)) For any fixed T > 0, there exists h 0 := h 0 (T ) > 0 such that, for all t [0, T ] and h (0, h 0 ], the functions L t,h (B) := P( X t+h X t > B, N t+h N t = 0)+P( X t+h X t B, N t+h N t 0), are quasi-convex and possess a unique global minimum, Bt,h, such that, as h 0 +, B t,h = 3σ 2 (t)h ln(1/h) log( 2πσ(t)λC(f ζ ))σ(t)h 1/2 3 log(1/h) + o( h/ ln(1/h)).

Spot Volatility Estimation via Kernel Methods Notation: h i = t i t i 1 (Mesh), K θ (t) = 1 θ K ( t θ ) (Kernel), θ = Bandwidth Algorithm: [Based on kernel type estimators by Kristensen(2010)] For each i {1, 2,..., n}, set σ 2 0(t i ) := l j= l 1 h i+j i+j X 2 K θ (t i t i+j ) and B,0 t i := [ 3 σ 2 0(t i )h i ln(1/h i ) ] 1/2 while there exists i {1, 2,..., m} such that σ k 1 2 (t i) > σ k 2(t i) do σ k+1 2 (t i) l 1 j= l h i+j i+j X 2 1 [ ] K i+j X B,k θ (t i t i+j ) and t i+j B,k+1 t i [ 3 σ k+1 2 (t i)h i ln(1/h i ) ] 1/2 end while Let k := inf { k 1 : σ k+1 2 (t i) = σ k 2(t i); for all i = 1, 2,..., n } and take σ k 2 (t,k i) as the final estimate for σ(t i ) and the corresponding B t i as an estimate for Bt i. The previous algorithm generates a non-increasing sequence of estimators { σ 2 k (t i)} k, for each i, and finish in finite time.

Illustration of Opt. Thresh. Spot Vol. Estimation Alg. (A) Initial Estimates (B) Intermediate Estimates 0.0 0.2 0.4 0.6 0.8 1.0 Actual Spot Volatility Est. Spot Vol. (Uniform) Est. Spot Vol. (Quad) Sample Increments 0.0 0.2 0.4 0.6 Actual Spot Volatility Est. Spot Vol. (Uniform) Est. Spot Vol. (Quad) Sample Increments 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Horizon Time Horizon Figure: Estimation of Spot Volatility using Adaptive Kernel Weighted Realized Volatility. (A) The initial estimates. (B) Intermediate estimates. Parameters: γ(t) = 0.1t, σ(t) = 4.5t sin(2πe t2 ) 2 + 0.2, λ(t) = 25(e 3t 1), ζ i i.i.d. = D N (µ = 0.025, δ = 0.025).

Illustration of Opt. Thresh. Spot Vol. Estimation Alg. (C) Terminal Estimates 0.0 0.2 0.4 0.6 Actual Spot Volatility Est. Spot Vol. (Uniform) Est. Spot Vol. (Quad) Sample Increments 0.0 0.2 0.4 0.6 0.8 1.0 Time Horizon Figure: Estimation of Spot Volatility using Adaptive Kernel Weighted Realized Volatility. (C) The terminal estimates. (D) Estimation variability, based on 100 generated sample paths, for the Quadratic Kernel based estimator. Parameters: γ(t) = 0.1t, σ(t) = 4.5t sin(2πe t2 ) 2 + 0.2, λ(t) = 25(e 3t 1), ζ i i.i.d. = D N (µ = 0.025, δ = 0.025).

Stochastic Volatility Models with FJA 1 Motivation In practice (e.g., finance), one usually encounters models like X s := s 0 γ u du + s 0 N s σ u dw u + ζ j =: Xs c + J s, where {σ t } t 0 is itself stochastic, independent of (W, N, ζ). 2 Prototypical Example: Mean-reverting square-root process (CIR Model): dσ 2 t = κ ( α σ 2 t 3 Pitfalls of the Estimation Methods for σ: j=1 ) dt + γσt dw (σ) t, (2κα γ 2 > 0, W (σ) W ) Spot volatility estimation has received limited" attention in the literature: e.g., Kristensen(2010), Reno & Mancini(2013) consider Kernel Estimators; Fan & Wang(2008), Alvarez et al.(2010), and others consider estimators for [X, X] t = t 0 σ2 s ds and a finite-difference approx. of its derivative; Available kernel estimation methods for {σt 2 } t 0 (say, Kristensen s method) are quite sensitive to the bandwidth. Few Bandwidth Selection Methods (cf. Kristensen(2010))

A Proposed Model Selection Method 1 Technical Assumption (Kristensen, 2010): σ 2 t+δ σ 2 t = L t (δ)δ υ + o P (δ υ ), a.s. (δ 0) ( ) where υ (0, 1] and δ L t (δ) is slowly varying (random) function at 0 and t L t (0) := lim δ 0 + L t (δ) is continuous. 2 Under ( ), Kristensen(2010) argues that ( ) bw loc opt,t = n 1 σ 4 2υ+1 t K 1 2 2υ+1 2 υl 2 t (0) ( ) 1 bw glb opt = n 1 T 2υ+1 0 σ4 t dt K 2 2υ+1 2 υ T 0 L2 t (0)dt ( ) 3 Pitfall: In general, it is hard to check ( ) with explicit (nonzero) L t (0) and υ. 4 A (heuristic) alternative method: Suppose that [ (σ 2 E t+δ σt 2 ) 2 ] Ft = L 2 t (0)δ 2υ + o P (δ υ ), a.s. (δ 0), ( ) for some positive adapted {L t (0)} t 0. Then, use ( ).

Computation for the CIR model 1 Recall σ 2 t+δ σ 2 t = t+δ t κ ( α σu 2 ) t+δ du + γ t σ u dw (σ) u, 2 The leading term of ( σ 2 t+δ σt 2 2 ) is γ 2 t+δ 2, σ t u dw (σ) u which is such that ( ) 2 t+δ ( ) t+δ E γ 2 σ u dw (σ) u t F t = γ 2 E σudu 2 t F t = γ 2 σt 2 δ + o(δ). 3 The previous heuristics suggest that ( σ 2 E t+δ σt 2 2 ) Ft = γ 2 σt 2 δ + o(δ). 4 Hence, ( ) holds with υ = 1 2 and L2 t (0) = γ2 σ 2 t. 5 This implies the following local optimal estimator: ( 2σ bw loc 2 opt,t = t K 2 2 ) 1/2 n 1/2. γ 2

Outlined of Estimation Method 1 Get a rough" estimate t σt 2 ; e.g., using Alvarez et al. (2010), [X, ˆσ 0(t 2 X] ti + δ i ) = n [X, X] ti = 1 δn δ n 2 Refine ˆσ 2 0 (t i) using thresholding, with j:t j (t i,t i + δ n] ˆσ 1(t 2 i ) = 1 ( n δ j X ) 2 1 [ ] n j:t i (t i,t i + j X ˆB (0) t i δ n] B (0) t i := [ 3 σ 2 0 (t i)h i ln(1/h i ) ] 1/2. ( n j X ) 2. 3 Estimate γ using the realized variation of {ˆσ 1 (t i )} i=1,...,n, since σ, σ t = γ2 4. 4 Apply iterated kernel estimation with thresholding and (estimated) optimal local bandwidth selection.

Numerical Results: Uniform Kernel Model: Normal jump sizes and CIR stochastic volatility. Parameters: γ drift = 0.05, λ = 60, ζ i N (0, 5(0.3)), κ = 5, α = 0.04; γ volatiity = 0.5. Sampling scheme: T=1/12 (one month) and h n = 5 min Monte Carlo results of MSE = n i=1 (ˆσ 2 (t i ) σ 2 (t i ) ) 2 based on 500 runs Method MSE Alvarez et al. Method 0.6531 Alvarez et al. Method with thresholding 0.0919 Kernel Est. with loc. bw selection (not thresholding) 0.2909 Kernel Est. with thresholding and loc. bw selection 0.0526 Oracle Kernel Est. with thresholding and loc. bw selection 0.0478

Conclusions 1 Introduce an objective threshold selection procedure based on statistical optimality reasoning via a well-posed optimization problem. 2 Characterize precisely the infill asymptotic behavior of the optimal threshold sequence. 3 Proposed an iterative algorithm to find the optimal threshold sequence. 4 Extend the approach to more general stochastic models, which allows time-varying volatility and jump intensity.

For Further Reading I Figueroa-López & Nisen. Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Stochastic Processes and their Applications 123(7), 2648-2677, 2013. Available at www.stat.purdue.edu/ figueroa. Figueroa-López & Nisen. Optimality properties of thresholded multi power variation estimators. In preparation, 2014.