Ultra High Frequency Volatility Estimation with Market Microstructure Noise. Yacine Aït-Sahalia. Per A. Mykland. Lan Zhang

Similar documents
Central Limit Theorem for the Realized Volatility based on Tick Time Sampling. Masaaki Fukasawa. University of Tokyo

arxiv: v1 [stat.ap] 8 Jun 2009

A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data

Research Statement, Lan Zhang, October Research Statement

Return Volatility, Market Microstructure Noise, and Institutional Investors: Evidence from High Frequency Market

Edgeworth expansions in small noise asymptotics

Estimation of High-Frequency Volatility: An Autoregressive Conditional Duration Approach

Lecture Note 6 of Bus 41202, Spring 2017: Alternative Approaches to Estimating Volatility.

Short-Time Asymptotic Methods in Financial Mathematics

Real-time Volatility Estimation Under Zero Intelligence

Volatility. Roberto Renò. 2 March 2010 / Scuola Normale Superiore. Dipartimento di Economia Politica Università di Siena

Optimal Kernel Estimation of Spot Volatility of SDE

Limit Theorems for the Empirical Distribution Function of Scaled Increments of Itô Semimartingales at high frequencies

VOLATILITY AND JUMPS IN HIGH FREQUENCY FINANCIAL DATA: ESTIMATION AND TESTING

Fourteen. AÏT-SAHALIA and DACHENG XIU

Economics 201FS: Variance Measures and Jump Testing

NBER WORKING PAPER SERIES HIGH FREQUENCY MARKET MICROSTRUCTURE NOISE ESTIMATES AND LIQUIDITY MEASURES. Yacine Ait-Sahalia Jialin Yu

Asymptotic Methods in Financial Mathematics

Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error

Volatility estimation with Microstructure noise

Testing for non-correlation between price and volatility jumps and ramifications

Economics 883: The Basic Diffusive Model, Jumps, Variance Measures. George Tauchen. Economics 883FS Spring 2015

An Econometric Analysis of the Volatility Risk Premium. Jianqing Fan Michael B. Imerman

Index Arbitrage and Refresh Time Bias in Covariance Estimation

Financial Econometrics

Technical Report as Supplemental Material: Multi-scale Jump and Volatility Analysis for High-Frequency Financial Data

Data Sources. Olsen FX Data

Jumps in Equilibrium Prices. and Market Microstructure Noise

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

UNIVERSITÀ DEGLI STUDI DI PADOVA. Dipartimento di Scienze Economiche Marco Fanno

Optimal Kernel Estimation of Spot Volatility

A Non-Random Walk Down Wall Street

Asymptotic Theory for Renewal Based High-Frequency Volatility Estimation

Exact Sampling of Jump-Diffusion Processes

arxiv: v2 [q-fin.st] 7 Feb 2013

Does Volatility Proxy Matter in Evaluating Volatility Forecasting Models? An Empirical Study

Realized Volatility When Sampling Times can be Endogenous

Economics 883: The Basic Diffusive Model, Jumps, Variance Measures, and Noise Corrections. George Tauchen. Economics 883FS Spring 2014

Volatility Trading Strategies: Dynamic Hedging via A Simulation

Absolute Return Volatility. JOHN COTTER* University College Dublin

M.I.T Fall Practice Problems

Does Anything Beat 5-Minute RV? A Comparison of Realized Measures Across Multiple Asset Classes

Separating microstructure noise from volatility

Efficient multipowers

Amath 546/Econ 589 Univariate GARCH Models

A Closer Look at High-Frequency Data and Volatility Forecasting in a HAR Framework 1

Lecture 17: More on Markov Decision Processes. Reinforcement learning

A Stochastic Price Duration Model for Estimating. High-Frequency Volatility

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

12 The Bootstrap and why it works

Separating microstructure noise from volatility

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

Université de Montréal. Rapport de recherche. Empirical Analysis of Jumps Contribution to Volatility Forecasting Using High Frequency Data

Comments on Hansen and Lunde

Estimation methods for Levy based models of asset prices

U n i ve rs i t y of He idelberg

econstor Make Your Publications Visible.

Calculating Implied Volatility

2.1 Mathematical Basis: Risk-Neutral Pricing

NEWCASTLE UNIVERSITY SCHOOL OF MATHEMATICS, STATISTICS & PHYSICS SEMESTER 1 SPECIMEN 2 MAS3904. Stochastic Financial Modelling. Time allowed: 2 hours

Beyond the Black-Scholes-Merton model

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Return dynamics of index-linked bond portfolios

Conditional Heteroscedasticity

Measuring volatility with the realized range

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam

Financial Risk Forecasting Chapter 9 Extreme Value Theory

STOCHASTIC CALCULUS AND BLACK-SCHOLES MODEL

Are stylized facts irrelevant in option-pricing?

Measuring volatility with the realized range

LONG MEMORY IN VOLATILITY

Internet Appendix: High Frequency Trading and Extreme Price Movements

Online Appendix to Grouped Coefficients to Reduce Bias in Heterogeneous Dynamic Panel Models with Small T

Does Anything Beat 5-Minute RV?

Monte Carlo Methods for Uncertainty Quantification

Neil Shephard Oxford-Man Institute of Quantitative Finance, University of Oxford

Rough volatility models: When population processes become a new tool for trading and risk management

Financial Econometrics and Volatility Models Estimating Realized Variance

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Correcting Finite Sample Biases in Conventional Estimates of Power Variation and Jumps

City, University of London Institutional Repository

Parametric Inference and Dynamic State Recovery from Option Panels. Nicola Fusari

City, University of London Institutional Repository. This version of the publication may differ from the final published version.

Chapter 8: Sampling distributions of estimators Sections

Estimation of dynamic term structure models

Amath 546/Econ 589 Univariate GARCH Models: Advanced Topics

The Economic and Social BOOTSTRAPPING Review, Vol. 31, No. THE 4, R/S October, STATISTIC 2000, pp

Market Efficiency and Microstructure Evolution in U.S. Equity Markets: A High-Frequency Perspective

Chapter 8: Sampling distributions of estimators Sections

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

NBER WORKING PAPER SERIES A REHABILITATION OF STOCHASTIC DISCOUNT FACTOR METHODOLOGY. John H. Cochrane

Modeling Capital Market with Financial Signal Processing

Lecture 6: Non Normal Distributions

Parametric Inference and Dynamic State Recovery from Option Panels. Torben G. Andersen

Predicting Inflation without Predictive Regressions

A comment on Christoffersen, Jacobs and Ornthanalai (2012), Dynamic jump intensities and risk premiums: Evidence from S&P500 returns and options

Modelling financial data with stochastic processes

A Multifrequency Theory of the Interest Rate Term Structure

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Transcription:

Ultra High Frequency Volatility Estimation with Market Microstructure Noise Yacine Aït-Sahalia Princeton University Per A. Mykland The University of Chicago Lan Zhang Carnegie-Mellon University

1. Introduction

1. Introduction Observed transaction price = unobservable efficient price + some noise component due to the imperfections of the trading process Y τ = X τ + ε τ

1. Introduction Observed transaction price = unobservable efficient price + some noise component due to the imperfections of the trading process Y τ = X τ + ε τ ε summarizes a diverse array of market microstructure effects, either informational or not: bid-ask bounces, discreteness of price changes, differences in trade sizes or informational content of price changes, gradual response of prices to a block trade, the strategic component of the order flow, inventory control effects, etc.

We study the implications of such a data generating process for the estimation of the volatility of the efficient log-price process dx t = µ t dt + σ t dw t using discretely sampled data on the transaction price process at times 0,,..., n = T.

We study the implications of such a data generating process for the estimation of the volatility of the efficient log-price process dx t = µ t dt + σ t dw t using discretely sampled data on the transaction price process at times 0,,..., n = T. Without noise, the realized volatility n i=1 (X ti+1 X ti ) 2 estimates the quadratic variation T 0 σ 2 t dt.

We study the implications of such a data generating process for the estimation of the volatility of the efficient log-price process dx t = µ t dt + σ t dw t using discretely sampled data on the transaction price process at times 0,,..., n = T. Without noise, the realized volatility n i=1 (X ti+1 X ti ) 2 estimates the quadratic variation T 0 σ 2 t dt. In theory, sampling as often as possible will produce in the limit a perfect estimate of that quantity.

We show that the situation changes radically in the presence of market microstructure noise:

We show that the situation changes radically in the presence of market microstructure noise: Computing RV using all the data (say every second) leads to an estimate of the variance of the noise, not the quadratic variation that one seeks to estimate.

We show that the situation changes radically in the presence of market microstructure noise: Computing RV using all the data (say every second) leads to an estimate of the variance of the noise, not the quadratic variation that one seeks to estimate. In practice, people sample sparsely at some lower frequency (5 mn to 30 mn usually)

We show that the situation changes radically in the presence of market microstructure noise: Computing RV using all the data (say every second) leads to an estimate of the variance of the noise, not the quadratic variation that one seeks to estimate. In practice, people sample sparsely at some lower frequency (5 mn to 30 mn usually) If one insists upon sampling sparsely, what is the right frequency? We show how to determine the optimal sparse frequency.

We show that the situation changes radically in the presence of market microstructure noise: Computing RV using all the data (say every second) leads to an estimate of the variance of the noise, not the quadratic variation that one seeks to estimate. In practice, people sample sparsely at some lower frequency (5 mn to 30 mn usually) If one insists upon sampling sparsely, what is the right frequency? We show how to determine the optimal sparse frequency.

But even if sampling optimally, one is throwing away a large amount of data.

But even if sampling optimally, one is throwing away a large amount of data. For example, if T = 1 NYSE day and transactions occur every δ = 1 second, the original sample size is n = T/δ = 23, 400.

But even if sampling optimally, one is throwing away a large amount of data. For example, if T = 1 NYSE day and transactions occur every δ = 1 second, the original sample size is n = T/δ = 23, 400. But sampling sparsely even at the highest frequency used by empirical researchers (once every 5 mn) means throwing away 299 out of every 300 observations: the sample size used is only n sparse = 78.

But even if sampling optimally, one is throwing away a large amount of data. For example, if T = 1 NYSE day and transactions occur every δ = 1 second, the original sample size is n = T/δ = 23, 400. But sampling sparsely even at the highest frequency used by empirical researchers (once every 5 mn) means throwing away 299 out of every 300 observations: the sample size used is only n sparse = 78. This violates one of the most basic principles of statistics.

But even if sampling optimally, one is throwing away a large amount of data. For example, if T = 1 NYSE day and transactions occur every δ = 1 second, the original sample size is n = T/δ = 23, 400. But sampling sparsely even at the highest frequency used by empirical researchers (once every 5 mn) means throwing away 299 out of every 300 observations: the sample size used is only n sparse = 78. This violates one of the most basic principles of statistics.

We propose a solution which makes use of the full data sample:

We propose a solution which makes use of the full data sample: Our estimator (TSRV) is based on subsampling, averaging and bias-correction.

We propose a solution which makes use of the full data sample: Our estimator (TSRV) is based on subsampling, averaging and bias-correction. This is the only consistent estimator (we know of) for quadratic variation in the presence of market microstructure noise.

We propose a solution which makes use of the full data sample: Our estimator (TSRV) is based on subsampling, averaging and bias-correction. This is the only consistent estimator (we know of) for quadratic variation in the presence of market microstructure noise.

2. Outline

2. Outline Without market microstructure noise

2. Outline Without market microstructure noise What happens when noise is present

2. Outline Without market microstructure noise What happens when noise is present The fifth best approach: use all the data to compute RV

2. Outline Without market microstructure noise What happens when noise is present The fifth best approach: use all the data to compute RV The fourth best approach: sample sparsely (5, 10 or 15 mn) to compute RV

2. Outline Without market microstructure noise What happens when noise is present The fifth best approach: use all the data to compute RV The fourth best approach: sample sparsely (5, 10 or 15 mn) to compute RV The third best approach: sample sparsely at an optimally determined frequency to compute RV

2. Outline Without market microstructure noise What happens when noise is present The fifth best approach: use all the data to compute RV The fourth best approach: sample sparsely (5, 10 or 15 mn) to compute RV The third best approach: sample sparsely at an optimally determined frequency to compute RV

The second best approach: subsampling and averaging

The second best approach: subsampling and averaging The first best approach, TSRV: use the fifth best to bias-correct the second best

The second best approach: subsampling and averaging The first best approach, TSRV: use the fifth best to bias-correct the second best Monte Carlos

The second best approach: subsampling and averaging The first best approach, TSRV: use the fifth best to bias-correct the second best Monte Carlos Two extensions: time series dependence in the noise, and MSRV

The second best approach: subsampling and averaging The first best approach, TSRV: use the fifth best to bias-correct the second best Monte Carlos Two extensions: time series dependence in the noise, and MSRV Data analysis: INTC, MSFT

The second best approach: subsampling and averaging The first best approach, TSRV: use the fifth best to bias-correct the second best Monte Carlos Two extensions: time series dependence in the noise, and MSRV Data analysis: INTC, MSFT

3. Without Market Microstructure Noise

3. Without Market Microstructure Noise Suppose first that σ is constant. Without noise, the log-returns are iid N(0, σ 2 ). The MLE for σ 2 coincides with the realized volatility of the process, ˆσ 2 = 1 T n i=1 (X ti+1 X ti ) 2,

3. Without Market Microstructure Noise Suppose first that σ is constant. Without noise, the log-returns are iid N(0, σ 2 ). The MLE for σ 2 coincides with the realized volatility of the process, ˆσ 2 = 1 T n i=1 (X ti+1 X ti ) 2, T 1/2 (ˆσ 2 σ 2) N(0, T 2σ4 )

3. Without Market Microstructure Noise Suppose first that σ is constant. Without noise, the log-returns are iid N(0, σ 2 ). The MLE for σ 2 coincides with the realized volatility of the process, ˆσ 2 = 1 T n i=1 (X ti+1 X ti ) 2, T 1/2 (ˆσ 2 σ 2) N(0, T 2σ4 ) Thus selecting as small as possible is optimal for the purpose of estimating σ 2.

When volatility is stochastic, dx t = σ t dw t :

When volatility is stochastic, dx t = σ t dw t : Realized volatility n i=1 (X ti+1 X ti ) 2 estimates the quadratic variation T 0 σ 2 t dt.

When volatility is stochastic, dx t = σ t dw t : Realized volatility n i=1 (X ti+1 X ti ) 2 estimates the quadratic variation T 0 σ 2 t dt. The sum converges to the integral, with a known distribution: Jacod (1994), Jacod and Protter (1998), etc.

When volatility is stochastic, dx t = σ t dw t : Realized volatility n i=1 (X ti+1 X ti ) 2 estimates the quadratic variation T 0 σ 2 t dt. The sum converges to the integral, with a known distribution: Jacod (1994), Jacod and Protter (1998), etc. As in the constant σ case, selecting as small as possible (= n as large as possible) is optimal.

4. With Noise

4. With Noise When dx t = σ t dw t, but we observe X with noise, the object of interest remains the quadratic variation of X : X, X T = 0 σ2 t dt over a fixed time period [0, T ], or possibly several such time periods. T

4. With Noise When dx t = σ t dw t, but we observe X with noise, the object of interest remains the quadratic variation of X : X, X T = 0 σ2 t dt over a fixed time period [0, T ], or possibly several such time periods. T Asymptotics are in 0, with T fixed.

4. With Noise When dx t = σ t dw t, but we observe X with noise, the object of interest remains the quadratic variation of X : X, X T = 0 σ2 t dt over a fixed time period [0, T ], or possibly several such time periods. T Asymptotics are in 0, with T fixed. The usual estimator of X, X T is the realized volatility [Y, Y ] T = n i=1 (Y ti+1 Y ti ) 2.

4.1. Completely Ignoring the Noise: Fifth Best

4.1. Completely Ignoring the Noise: Fifth Best We show that, if one uses all the data (say sampled every second),

4.1. Completely Ignoring the Noise: Fifth Best We show that, if one uses all the data (say sampled every second), [Y, Y ] (all) T L X, X T }{{} object of interest + 2nE[ε 2 ] }{{} bias due to noise + 2T T n 0 σ4 t dt ] 1/2 due to noise }{{}} due {{ to discretization} total variance + [ 4nE[ε 4 ] }{{} conditionally on the X process. Z total.

4.1. Completely Ignoring the Noise: Fifth Best We show that, if one uses all the data (say sampled every second), [Y, Y ] (all) T L X, X T }{{} object of interest + 2nE[ε 2 ] }{{} bias due to noise + 2T T n 0 σ4 t dt ] 1/2 due to noise }{{}} due {{ to discretization} total variance + [ 4nE[ε 4 ] }{{} conditionally on the X process. Z total.

4.2. Sampling Sparsely at an Arbitrary Frequency: Fourth Best

4.2. Sampling Sparsely at an Arbitrary Frequency: Fourth Best Of course, sampling as prescribed by [Y, Y ] (all) in practice T is not what people do

4.2. Sampling Sparsely at an Arbitrary Frequency: Fourth Best Of course, sampling as prescribed by [Y, Y ] (all) in practice T is not what people do Instead, they use the estimator [Y, Y ] (sparse) T constructed by summing squared log-returns at some lower frequency: 5 mn, or 10, 15, 30 mn, typically.

4.2. Sampling Sparsely at an Arbitrary Frequency: Fourth Best Of course, sampling as prescribed by [Y, Y ] (all) in practice T is not what people do Instead, they use the estimator [Y, Y ] (sparse) T constructed by summing squared log-returns at some lower frequency: 5 mn, or 10, 15, 30 mn, typically.

Here is the fourth best estimator for different values of, averaged for the 30 DJIA stocks and the last 10 trading days in April 2004: 0.0009 0.0008 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 5sec 30sec 1mn 2mn 3mn 4mn 5mn As = T/n 0, the graph shows that the estimator diverges as predicted by our result (2nE[ε 2 ]) instead of converging to the object of interest X, X T as predicted by standard asymptotic theory.

4.3. Sampling Sparsely at an Optimal Frequency: Third Best

4.3. Sampling Sparsely at an Optimal Frequency: Third Best If one insists upon sampling sparsely, what is the right answer? Is it 5 mn, 10 mn, 15 mn?

4.3. Sampling Sparsely at an Optimal Frequency: Third Best If one insists upon sampling sparsely, what is the right answer? Is it 5 mn, 10 mn, 15 mn? To determine optimally the sparse sampling frequency, we show that: n sparse = ( T 4 E[ε 2 ] 2 ) T 1/3 0 σ4 t dt.

4.3. Sampling Sparsely at an Optimal Frequency: Third Best If one insists upon sampling sparsely, what is the right answer? Is it 5 mn, 10 mn, 15 mn? To determine optimally the sparse sampling frequency, we show that: n sparse = ( T 4 E[ε 2 ] 2 T 0 σ4 t dt ) 1/3. This gives rise to the third best estimator we define as [Y, Y ] (sparse,opt) T.

4.4. Correcting for Microstructure Noise: Second Best

4.4. Correcting for Microstructure Noise: Second Best We have just argued that one could benefit from using infrequently sampled data.

4.4. Correcting for Microstructure Noise: Second Best We have just argued that one could benefit from using infrequently sampled data. Indeed, the fourth and third best estimators do better than the fifth best.

4.4. Correcting for Microstructure Noise: Second Best We have just argued that one could benefit from using infrequently sampled data. Indeed, the fourth and third best estimators do better than the fifth best. But this entails discarding a very substantial fraction of the data sample.

4.4. Correcting for Microstructure Noise: Second Best We have just argued that one could benefit from using infrequently sampled data. Indeed, the fourth and third best estimators do better than the fifth best. But this entails discarding a very substantial fraction of the data sample. And yet, one of the most basic lessons of statistics is that one should not do this.

4.4. Correcting for Microstructure Noise: Second Best We have just argued that one could benefit from using infrequently sampled data. Indeed, the fourth and third best estimators do better than the fifth best. But this entails discarding a very substantial fraction of the data sample. And yet, one of the most basic lessons of statistics is that one should not do this.

We present a method to tackle the problem:

We present a method to tackle the problem: We partition the original grid of observation times, G = {t 0,..., t n } into subsamples, G (k), k = 1,..., K where n/k as n.

We present a method to tackle the problem: We partition the original grid of observation times, G = {t 0,..., t n } into subsamples, G (k), k = 1,..., K where n/k as n. For example, for G (1) start at the first observation and take an observation every 5 minutes; for G (2), start at the second observation and take an observation every 5 minutes, etc.

We present a method to tackle the problem: We partition the original grid of observation times, G = {t 0,..., t n } into subsamples, G (k), k = 1,..., K where n/k as n. For example, for G (1) start at the first observation and take an observation every 5 minutes; for G (2), start at the second observation and take an observation every 5 minutes, etc. Then we average the estimators obtained on the subsamples.

We present a method to tackle the problem: We partition the original grid of observation times, G = {t 0,..., t n } into subsamples, G (k), k = 1,..., K where n/k as n. For example, for G (1) start at the first observation and take an observation every 5 minutes; for G (2), start at the second observation and take an observation every 5 minutes, etc. Then we average the estimators obtained on the subsamples. To the extent that there is a benefit to subsampling, this benefit can now be retained, while the variation of the estimator can be lessened by the averaging.

We present a method to tackle the problem: We partition the original grid of observation times, G = {t 0,..., t n } into subsamples, G (k), k = 1,..., K where n/k as n. For example, for G (1) start at the first observation and take an observation every 5 minutes; for G (2), start at the second observation and take an observation every 5 minutes, etc. Then we average the estimators obtained on the subsamples. To the extent that there is a benefit to subsampling, this benefit can now be retained, while the variation of the estimator can be lessened by the averaging.

This gives rise to the estimator [Y, Y ] (avg) T = 1 K K k=1 [Y, Y ] (k) constructed by averaging the estimators [Y, Y ] (k) T of average size n. T obtained on K grids

This gives rise to the estimator [Y, Y ] (avg) T = 1 K K k=1 [Y, Y ] (k) constructed by averaging the estimators [Y, Y ] (k) T of average size n. T obtained on K grids We show that: [Y, Y ] (avg) T L X, X T }{{} object of interest + 2 ne[ε 2 ] }{{} bias due to noise + 4T T 3 n 0 σ4 t dt ] 1/2 Z total }{{} due } to noise due {{ to discretization} total variance + [ 4 n K E[ε4 ] }{{}

This gives rise to the estimator [Y, Y ] (avg) T = 1 K K k=1 [Y, Y ] (k) constructed by averaging the estimators [Y, Y ] (k) T of average size n. T obtained on K grids We show that: [Y, Y ] (avg) T L X, X T }{{} object of interest + 2 ne[ε 2 ] }{{} bias due to noise + 4T T 3 n 0 σ4 t dt ] 1/2 Z total }{{} due } to noise due {{ to discretization} total variance + [ 4 n K E[ε4 ] }{{}

4.5. Two Scales Realized Volatility: First Best

4.5. Two Scales Realized Volatility: First Best While a better estimator than [Y, Y ] (all) T, [Y, Y ] (avg) T remains biased.

4.5. Two Scales Realized Volatility: First Best While a better estimator than [Y, Y ] (all) T, [Y, Y ] (avg) T remains biased. The bias of [Y, Y ] (avg) T is 2 ne[ε 2 ].

4.5. Two Scales Realized Volatility: First Best While a better estimator than [Y, Y ] (all) T, [Y, Y ] (avg) T remains biased. The bias of [Y, Y ] (avg) T is 2 ne[ε 2 ]. But recall that E[ε 2 ] can be consistently approximated by the fifth best estimator: Ê[ε 2 ] = 1 [Y, Y ](all) 2n T

4.5. Two Scales Realized Volatility: First Best While a better estimator than [Y, Y ] (all) T, [Y, Y ] (avg) T remains biased. The bias of [Y, Y ] (avg) T is 2 ne[ε 2 ]. But recall that E[ε 2 ] can be consistently approximated by the fifth best estimator: Ê[ε 2 ] = 1 [Y, Y ](all) 2n T Hence the bias of [Y, Y ] (avg) can be consistently estimated by n n [Y, Y ](all) T.

A bias-adjusted estimator for X, X can thus be constructed as

A bias-adjusted estimator for X, X can thus be constructed as X, X T = [Y, Y ] (avg) }{{ T } slow time scale n n [Y, Y ](all) }{{ T } fast time scale

A bias-adjusted estimator for X, X can thus be constructed as X, X T = [Y, Y ] (avg) }{{ T } slow time scale n n [Y, Y ](all) }{{ T } fast time scale We call this estimator Two Scales Realized Volatility.

We show that if the number of subsamples is optimally selected as K = cn 2/3, then TSRV has the following distribution: X, X T L X, X T }{{} object of interest + 1 n 1/6 [ 8 c 2E[ε2 ] 2 }{{} + c 4T T 3 0 σ4 t dt ] 1/2 Z total }{{}} due to noise due {{ to discretization} total variance

We show that if the number of subsamples is optimally selected as K = cn 2/3, then TSRV has the following distribution: X, X T L X, X T }{{} object of interest + 1 n 1/6 [ 8 c 2E[ε2 ] 2 }{{} + c 4T T 3 0 σ4 t dt ] 1/2 Z total }{{}} due to noise due {{ to discretization} total variance Unlike all the previously considered ones, this estimator is now correctly centered

We show that if the number of subsamples is optimally selected as K = cn 2/3, then TSRV has the following distribution: X, X T L X, X T }{{} object of interest + 1 n 1/6 [ 8 c 2E[ε2 ] 2 }{{} + c 4T T 3 0 σ4 t dt ] 1/2 Z total }{{}} due to noise due {{ to discretization} total variance Unlike all the previously considered ones, this estimator is now correctly centered To the best of our knowledge, this is the only consistent estimator for X, X T in the presence of market microstructure noise.

5. Monte Carlo Simulations RV TSRV Fifth Best Fourth Best Third Best Second Best First Best [Y, Y ] (all) T [Y, Y ] (sparse) T [Y, Y ] (sparse,opt) T [Y, Y ] (avg) T X, X (adj) T Small Sample Bias 1.1699 10 2 3.89 10 5 2.18 10 5 1.926 10 5 2 10 8 Asymptotic Bias 1.1700 10 2 3.90 10 5 2.20 10 5 1.927 10 5 0 Small Sample Variance 1.791 10 8 1.4414 10 9 1.59 10 9 9.41 10 10 9 10 11 Asymptotic Variance 1.788 10 8 1.4409 10 9 1.58 10 9 9.37 10 10 8 10 11 Small Sample RMSE 1.1699 10 2 5.437 10 5 4.543 10 5 3.622 10 5 9.4 10 6 Asymptotic RMSE 1.1700 10 2 5.442 10 5 4.546 10 5 3.618 10 5 8.9 10 6 Small Sample Relative Bias 182 0.61 0.18 0.15 0.00045 Small Sample Relative Variance 82502 1.15 0.11 0.053 0.0043 Small Sample Relative RMSE 340 1.24 0.37 0.28 0.065

6. Data Analysis Here is a comparison of RV to TSRV for INTC, last 10 trading days in April 2004: TSRV and RV for INTC on April 19, 2004 0.0003 0.000275 0.00025 0.000225 0.0002 0.000175 0.00015 5s 1mn 2mn 3mn 4mn 5mn 6mn 7mn 8mn 9mn 10mn

Zooming around the 5 minutes sampling frequency: 0.00029 RV TSRV 0.00028 0.00027 0.00026 4mn 5mn 6mn

7. Dependent Market Microstructure Noise

7. Dependent Market Microstructure Noise So far, we have assumed that the noise ε was iid. In that case, log-returns are MA(1): Y τi Y τi 1 = τi τ i 1 σ t dw t + ε τi ε τi 1

For example, here is the autocorrelogram for AIG transactions, last 10 trading days in April 2004: 1 AIG 0.8 Sample autocorrelation 0.6 0.4 0.2-0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 transaction lag -0.4

But here is the autocorrelogram for INTC transactions, same last 10 trading days in April 2004: 1 INTC Sample autocorrelation 0.75 0.5 0.25 1 3 5 7 9 11 0 1 2 3 4 5 6 8 10 12 13 14 15 16 17 18 19 20-0.25 transaction lag -0.5

A simple model to capture this higher order dependence is ε ti = U ti + V ti where U is iid, V is AR(1) and U V.

A simple model to capture this higher order dependence is ε ti = U ti + V ti where U is iid, V is AR(1) and U V. Fitted autocorrelogram for INTC: 1 INTC 0.75 Sample autocorrelation Fitted autocorrelation 0.5 0.25 1 3 5 7 9 11-0.25 0 1 2 3 4 6 8 10 12 13 14 15 16 17 18 19 20 transaction lag -0.5

The TSRV Estimator with (J, K) Time Scales X, X T = [Y, Y ] (K) }{{ T } slow time scale n K n J [Y, Y ] (J) }{{ T } fast time scale

The TSRV Estimator with (J, K) Time Scales X, X T = [Y, Y ] (K) }{{ T } slow time scale n K n J [Y, Y ] (J) }{{ T } fast time scale We show that if we select J/K 0 when n, then this estimator is robust to (essentially) arbitrary time series dependence in microstructure noise.

The TSRV Estimator with (J, K) Time Scales X, X T = [Y, Y ] (K) }{{ T } slow time scale n K n J [Y, Y ] (J) }{{ T } fast time scale We show that if we select J/K 0 when n, then this estimator is robust to (essentially) arbitrary time series dependence in microstructure noise. Specifically, we let the noise process ε ti be stationary and strong mixing with exponential decay. We also suppose that E [ ε 4+κ] < for some κ > 0.

Robustness to the selection of the slow (K) and fast (J) time scales, INTC again: Robustness of TSRV with Time Dependent Noise for INTC 5mn 0.00025 6mn 7mn 1mn30s K 8mn 9mn 30s 1mn J 10mn 5s

8. Further Refinement: Multiple Scales Realized Volatility

8. Further Refinement: Multiple Scales Realized Volatility We have seen that TSRV provides:

8. Further Refinement: Multiple Scales Realized Volatility We have seen that TSRV provides: the first consistent and asymptotic (mixed) normal estimator of the quadratic variation X, X T ;

8. Further Refinement: Multiple Scales Realized Volatility We have seen that TSRV provides: the first consistent and asymptotic (mixed) normal estimator of the quadratic variation X, X T ; that it can be made robust to arbitrary serial dependence in market microstructure noise;

8. Further Refinement: Multiple Scales Realized Volatility We have seen that TSRV provides: the first consistent and asymptotic (mixed) normal estimator of the quadratic variation X, X T ; that it can be made robust to arbitrary serial dependence in market microstructure noise; and that it has the rate of convergence n 1/6.

8. Further Refinement: Multiple Scales Realized Volatility We have seen that TSRV provides: the first consistent and asymptotic (mixed) normal estimator of the quadratic variation X, X T ; that it can be made robust to arbitrary serial dependence in market microstructure noise; and that it has the rate of convergence n 1/6.

At the cost of higher complexity, it is possible to generalize TSRV to multiple time scales, by averaging not on two time scales but on multiple time scales (Zhang 2004).

At the cost of higher complexity, it is possible to generalize TSRV to multiple time scales, by averaging not on two time scales but on multiple time scales (Zhang 2004). The resulting estimator, MSRV has the form of X, X (msrv) T = M a i [Y, Y ] (K i) T } i=1 {{} weighted sum of M slow time scales + 1 n [Y, Y ](all) }{{ T } fast time scale

At the cost of higher complexity, it is possible to generalize TSRV to multiple time scales, by averaging not on two time scales but on multiple time scales (Zhang 2004). The resulting estimator, MSRV has the form of X, X (msrv) T = M a i [Y, Y ] (K i) T } i=1 {{} weighted sum of M slow time scales + 1 n [Y, Y ](all) }{{ T } fast time scale TSRV corresponds to the special case where M = 1, i.e., where one uses a single slow time scale in conjunction with the fast time scale to bias-correct it.

con- For suitably selected weights a i and M = O(n 1/2 (msrv) ), X, X T verges to the X, X T at rate n 1/4.

con- For suitably selected weights a i and M = O(n 1/2 (msrv) ), X, X T verges to the X, X T at rate n 1/4. Optimal weights are given in closed-form.

con- For suitably selected weights a i and M = O(n 1/2 (msrv) ), X, X T verges to the X, X T at rate n 1/4. Optimal weights are given in closed-form. We also provide an analysis of this estimator under dependence of the noise.

9. Conclusions

9. Conclusions Two Scales Realized Volatility

9. Conclusions Two Scales Realized Volatility In the limit where all the data is used, realized volatility converges to the variance of the noise, not the quadratic variation of the log-returns

9. Conclusions Two Scales Realized Volatility In the limit where all the data is used, realized volatility converges to the variance of the noise, not the quadratic variation of the log-returns The practical response so far has been to use sparse sampling: once every 5 or 10 minutes.

9. Conclusions Two Scales Realized Volatility In the limit where all the data is used, realized volatility converges to the variance of the noise, not the quadratic variation of the log-returns The practical response so far has been to use sparse sampling: once every 5 or 10 minutes. But it is possible instead to correct for the noise by subsampling, averaging and bias-correcting and obtain a well behaved estimator that makes use of all the data: TSRV

9. Conclusions Two Scales Realized Volatility In the limit where all the data is used, realized volatility converges to the variance of the noise, not the quadratic variation of the log-returns The practical response so far has been to use sparse sampling: once every 5 or 10 minutes. But it is possible instead to correct for the noise by subsampling, averaging and bias-correcting and obtain a well behaved estimator that makes use of all the data: TSRV

The difference matters:

The difference matters: In Monte Carlo simulations, the RMSE of TSRV is orders of magnitude smaller than that of RV

The difference matters: In Monte Carlo simulations, the RMSE of TSRV is orders of magnitude smaller than that of RV In empirical examples, the difference is also meaningful: for INTC, TSRV = 0.0025 vs. RV in the range (0.0029, 0.0035).

The difference matters: In Monte Carlo simulations, the RMSE of TSRV is orders of magnitude smaller than that of RV In empirical examples, the difference is also meaningful: for INTC, TSRV = 0.0025 vs. RV in the range (0.0029, 0.0035).

And one final important message:

And one final important message: Any time one has an impulse to discard data, one can usually do better: using likelihood corrections in the parametric volatility case or subsampling and averaging in the stochastic volatility case.

And one final important message: Any time one has an impulse to discard data, one can usually do better: using likelihood corrections in the parametric volatility case or subsampling and averaging in the stochastic volatility case. No matter what the model is, no matter what quantity is being estimated.