Estimating Order Imbalance Using Low Frequency. Data

Similar documents
Estimating Order Imbalance Using Low Frequency. Data

Liquidity skewness premium

Internet Appendix. Table A1: Determinants of VOIB

Liquidity Variation and the Cross-Section of Stock Returns *

A Comparison of the Results in Barber, Odean, and Zhu (2006) and Hvidkjaer (2006)

Do Retail Trades Move Markets? Brad Barber Terrance Odean Ning Zhu

THE EFFECT OF LIQUIDITY COSTS ON SECURITIES PRICES AND RETURNS

Revisiting Idiosyncratic Volatility and Stock Returns. Fatma Sonmez 1

Online Appendix to. The Value of Crowdsourced Earnings Forecasts

Tracking Retail Investor Activity. Ekkehart Boehmer Charles M. Jones Xiaoyan Zhang

Does market liquidity explain the idiosyncratic volatility puzzle in the Chinese stock market?

Core CFO and Future Performance. Abstract

Further Test on Stock Liquidity Risk With a Relative Measure

Turnover: Liquidity or Uncertainty?

Hedge Funds as International Liquidity Providers: Evidence from Convertible Bond Arbitrage in Canada

Order flow and prices

Internet Appendix: High Frequency Trading and Extreme Price Movements

Liquidity and IPO performance in the last decade

Industries and Stock Return Reversals

Change in systematic trading behavior and the cross-section of stock returns during the global financial crisis: Fear or Greed?

April 13, Abstract

Asubstantial portion of the academic

Liquidity, Liquidity Risk, and the Cross Section of Mutual Fund Returns. Andrew A. Lynch and Xuemin (Sterling) Yan * Abstract

Caught on Tape: Institutional Trading, Stock Returns, and Earnings Announcements

Liquidity, Price Behavior and Market-Related Events. A dissertation submitted to the. Graduate School. of the University of Cincinnati

Stock price synchronicity and the role of analyst: Do analysts generate firm-specific vs. market-wide information?

Order flow and prices

An Online Appendix of Technical Trading: A Trend Factor

Is Information Risk Priced for NASDAQ-listed Stocks?

Short Sales and Put Options: Where is the Bad News First Traded?

Illiquidity and Stock Returns:

PRE-CLOSE TRANSPARENCY AND PRICE EFFICIENCY AT MARKET CLOSING: EVIDENCE FROM THE TAIWAN STOCK EXCHANGE Cheng-Yi Chien, Feng Chia University

Internet Appendix to Leverage Constraints and Asset Prices: Insights from Mutual Fund Risk Taking

Quotes, Trades and the Cost of Capital *

The Reporting of Island Trades on the Cincinnati Stock Exchange

Decimalization and Illiquidity Premiums: An Extended Analysis

U.S. Quantitative Easing Policy Effect on TAIEX Futures Market Efficiency

A Lottery Demand-Based Explanation of the Beta Anomaly. Online Appendix

Volatility Appendix. B.1 Firm-Specific Uncertainty and Aggregate Volatility

Large price movements and short-lived changes in spreads, volume, and selling pressure

Market Frictions, Price Delay, and the Cross-Section of Expected Returns

CHAPTER 6 DETERMINANTS OF LIQUIDITY COMMONALITY ON NATIONAL STOCK EXCHANGE OF INDIA

Making Derivative Warrants Market in Hong Kong

Dynamic Causality between Intraday Return and Order Imbalance in NASDAQ Speculative New Lows

The cross section of expected stock returns

The Impact of Institutional Investors on the Monday Seasonal*

TRACKING RETAIL INVESTOR ACTIVITY. EKKEHART BOEHMER, CHARLES M. JONES, and XIAOYAN ZHANG* October 30, 2017 ABSTRACT

Return Reversals, Idiosyncratic Risk and Expected Returns

Economic Valuation of Liquidity Timing

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective

Liquidity Creation as Volatility Risk

Liquidity and the Post-Earnings-Announcement Drift

Intraday return patterns and the extension of trading hours

Can Hedge Funds Time the Market?

Industries and Stock Return Reversals

University of California Berkeley

Asset Pricing in the Dark: The Cross Section of OTC Stocks

Liquidity as risk factor

Discussion Paper No. DP 07/02

Underreaction, Trading Volume, and Momentum Profits in Taiwan Stock Market

Internet Appendix to Is Information Risk Priced? Evidence from Abnormal Idiosyncratic Volatility

IMPACT OF RESTATEMENT OF EARNINGS ON TRADING METRICS. Duong Nguyen*, Shahid S. Hamid**, Suchi Mishra**, Arun Prakash**

The Effect of Financial Constraints, Investment Policy and Product Market Competition on the Value of Cash Holdings

Price Impact or Trading Volume: Why is the Amihud (2002) Illiquidity Measure Priced? XIAOXIA LOU TAO SHU * August 2016

Fama-French in China: Size and Value Factors in Chinese Stock Returns

Do the LCAPM Predictions Hold? Replication and Extension Evidence

Online Appendix for Overpriced Winners

Asset-Specific and Systematic Liquidity on the Swedish Stock Market

The Volatility of Liquidity and Expected Stock Returns

Robustness Checks for Idiosyncratic Volatility, Growth Options, and the Cross-Section of Returns

Volatility Information Trading in the Option Market

Information-Based Trading and Autocorrelation in Individual Stock Returns

Cross-sectional identification of informed trading

Reconcilable Differences: Momentum Trading by Institutions

Are Firms in Boring Industries Worth Less?

High Idiosyncratic Volatility and Low Returns. Andrew Ang Columbia University and NBER. Q Group October 2007, Scottsdale AZ

Variation in Liquidity, Costly Arbitrage, and the Cross-Section of Stock Returns

Internet Appendix to. Glued to the TV: Distracted Noise Traders and Stock Market Liquidity

Realized Skewness for Information Uncertainty

Premium Timing with Valuation Ratios

Cycles of Declines and Reversals. following Overnight Market Declines

Foreign Fund Flows and Asset Prices: Evidence from the Indian Stock Market

Measuring and explaining liquidity on an electronic limit order book: evidence from Reuters D

Persistence in Mutual Fund Performance: Analysis of Holdings Returns

On the Profitability of Volume-Augmented Momentum Trading Strategies: Evidence from the UK

Robert Engle and Robert Ferstenberg Microstructure in Paris December 8, 2014

International Journal of Management Sciences and Business Research, 2013 ISSN ( ) Vol-2, Issue 12

An Investigation of Spot and Futures Market Spread in Indian Stock Market

Algorithmic Trading in Volatile Markets

One Brief Shining Moment(um): Past Momentum Performance and Momentum Reversals

Analysis Determinants of Order Flow Toxicity, HFTs Order Flow Toxicity and HFTs Impact on Stock Price Variance

Earnings Announcement Idiosyncratic Volatility and the Crosssection

Research Proposal. Order Imbalance around Corporate Information Events. Shiang Liu Michael Impson University of North Texas.

Risk Taking and Performance of Bond Mutual Funds

Day-of-the-Week Trading Patterns of Individual and Institutional Investors

Pervasive Liquidity Risk

Real Estate Ownership by Non-Real Estate Firms: The Impact on Firm Returns

Realization Utility: Explaining Volatility and Skewness Preferences

Three essays on corporate acquisitions, bidders' liquidity, and monitoring

Transcription:

Estimating Order Imbalance Using Low Frequency Data JinGi Ha and Jianfeng Hu November 19, 2016 ABSTRACT We estimate net order flow based on the Kyle (1985) model, in which price impact is the product of order imbalance and market illiquidity. Taking well-known illiquidity measures as given, we calculate net order imbalance of individual stocks using contemporaneous returns and illiquidity at the daily frequency. The estimated low frequency order imbalance (LFOI) has comparable performance to the imbalance estimate using high frequency data (HFOI) in the Nasdaq market. Both LFOI and HFOI positively predict next day s returns in the cross section. While HFOI s price impact reverses completely later, LFOI exhibits a permanent price impact. Subsample analysis shows that LFOI has stronger price impact for small and illiquid stocks and for Nasdaq stocks. An investment strategy based on LFOI is profitable in all G10 countries except for the UK. We also find that LFOI increases significantly around corporate events, and becomes more informative. Additional to computing efficiency, the evidence suggests that the proposed LFOI is a good proxy of net order imbalance for empirical studies at daily or longer horizons. JinGi Ha and Jianfeng Hu are at Singapore Management University. We would like to thank Fangjian Fu, Luis Goncalves-Pinto, Allaudeen Hameed, Dashan Huang, Sheng Huang, Jin-Woo Kim, Ji-Chai Lin, Roger Loh, Claudia Moise, Dmitriy Muravyev, Gilbert Nartea, David Ng, Antoine Noel, Lin Peng, Wenlan Qian, David Reeb, Dominik Rosch, Eduardo Schwartz, Johan Sulaeman, Yuehua Tang, Qing Tong, Kam- Ming Wan, Christine Wang, John Wei, Joe Zhang, Weina Zhang, and the seminar participants at Hong Kong Polytechnic University, National University of Singapore, Singapore Management University, Zhejiang University, 2016 Asian Finance Association Annual Conference, the LKCSB Summer Research Camp 2016, the Tenth Annual Risk Management Conference 2016, the 12th Annual Conference of the Asia-Pacific Association of Derivatives, the 2016 FMA Annual Meeting, and 2016 Vietnam Symposium in Banking and Finance for comments. All remaining errors are ours. This research was supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and IREC, The Institute of Finance and Banking, and Seoul National University. Jianfeng Hu also acknowledges financial support from the Lee Kong Chian Fund for Excellence. Please address correspondence to JinGi Ha (jingiha.2014@pbs.smu.edu.sg) and Jianfeng Hu (jianfenghu@smu.edu.sg) at Lee Kong Chian School of Business, Singapore Management University, 50 Stamford Road, Singapore, 178899.

I. Introduction A large body of the market microstructure literature examines the relation between investors order flow and asset prices. In the seminal study of Kyle (1985), the relation can be summarized as P = λx, where the price change P is a result of net order flow (x) and price sensitivity (λ) termed as market depth by Kyle. The λ is determined by the relative amount of expected informed trading and essentially measures illiquidity of an asset. 1 The pricing relation can be rewritten as x = P/λ. Since the price change is directly observable, in this article, we propose to use established low-frequency measures of illiquidity (λ) to estimate the net order flow (x) for individual stocks at the daily level. To the best of our knowledge, this method of order flow estimation has not been systematically analyzed and documented in the literature. Specifically, we consider three illiquidity measures that can be calculated every day without using intraday data, the inverse share volume turnover ratio, closing percentage bid-ask spread as in Amihud and Mendelson (1986), and the high-low spread as in Corwin and Shultz (2012). 2 By dividing stock returns and illiquidity proxies on the same day, we arrive at three proxies of net daily order imbalance in the cross section of stocks. The Amihud (2002) illiquidity measure can also be calculated every day using only the dollar trading volumes and stock returns. Indeed, Pastor and Stambaugh (2003, PS hereafter) estimate individual stock s illiquidity as the price sensitivity to return-signed dollar volume on the 1 The asserted price impact from order flow is a generic result in market microstructure theories although the exact form of price impact can vary across studies. Indeed, in sequential trade models such as Glosten and Milgrom (1985) and Easley and O Hara (1992), the asset price set by the market maker is updated after order arrival to reflect the probability of new information in the order flow. Motivated by inventory cost rather than asymmetric information, the dynamic inventory models such as Ho and Stoll (1983) and Spiegel and Subrahmanyam (1995) also investigate the price changes following transactions as a result of the price pressure faced by market makers. 2 There are several other well-known low-frequency measures of illiquidity in the literature. We do not use the serial correlation of returns as in Roll (1984) and the effective spread based on zero return days as in Lesmond, Ogden, and Trzcinka (1999) because we want to update the illiquidity measure every day to calculate daily order imbalance for price discovery analysis. We do not use the effective bid-ask spreads developed by Holden (2009) and Goyenko, Holden, and Trzcinka (2009) because we want to avoid using intraday data for computing easiness and efficiency. These illiquidity measures can be potentially used in our framework under additional assumptions. 2

previous day. Although the underlying rationale is not highlighted by PS, the signed volume is consistent with our methodology using the Amihud measure of illiquidity. Therefore, we use the return-signed dollar volume in our analysis as a benchmark of low frequency order imbalance in prior research. The proposed low-frequency order imbalance (LFOI) has two significant advantages over order flow estimates using intraday data such as those developed by Lee and Ready (1991), Ellis, O Hara, and Michaely (2000), Odders-White (2000), Chakrabarty, et al. (2007), and Easley, Lopes de Prado, and O Hara (2013). First, because the estimation requires only daily after-market data, it is easy to use and suitable for large-scale empirical analysis. This feature is particularly desirable in today s markets with elevating size of intraday data due to high-frequency trading. Second, it is possible to use this method to estimate order flow when high frequency tick data are not available. The underlying assumption of a positive contemporaneous price impact from order flow is generic and intuitive with support from almost all market microstructure theories. The method can therefore be applied to various markets under different market structures. Although we only focus on daily intervals in this study, this method can potentially be applied to shorter intervals by interacting highfrequency liquidity proxies and returns. We test the performance of LFOIs by comparing them with order imbalance of Nasdaq stocks calculated using the exchange s detailed order history (Historical TotalView-ITCH) data between June 1999 and December 2010. The time-series average of cross-sectional correlations between LFOIs and ITCH order imbalance ranges between 0.135 and 0.286 depending on the illiquidity measure used. All LFOIs outperform the PS order imbalance, which has a correlation of 0.121 with the ITCH order imbalance. Out of the three LFOI measures examined, the share turnover ratio based low frequency order imbalance (TLFOI) consistently generates the highest correlations with ITCH order imbalance in the full sample as well as in individual years. To our surprise, the performance of TLFOI in the cross section is even comparable to that of a high frequency order imbalance (HFOI) measure following the 3

Lee and Ready (1991) algorithm, which has a correlation of 0.316 with ITCH order imbalance in the full sample. The results suggest that both HFOI and LFOI contain measurement errors and using tick level data to calculate net order flow does not warrant superior performance despite much greater computing power needed. We suspect that the noise in HFOI mainly comes from imperfect matching of trades and quotes and the increasing trading speed can be detrimental to the performance. Indeed, the correlation between HFOI and ITCH order imbalance reduces from 0.457 in 1999 to 0.288 in 2010. The trading speed also affects the accuracy of LFOI as the correlation between TLFOI and ITCH order imbalance decreases from 0.541 in 1999 to 0.255 in 2010. In the cross section, however, the performance of LFOI is more stable than HFOI. We find that HFOI performs better on small stocks (0.353) and illiquid stocks (0.34) than on large stocks (0.291) and liquid stocks (0.304) while the TLFOI performance does not seem to depend on firm size (0.279 for small stocks and 0.287 for large stocks) or liquidity (0.28 for illiquid stocks and 0.274 for liquid stocks). The correlation tests with ITCH order imbalance suggest that LFOI, especially TLFOI, can be a reasonable proxy for daily net order flow in the cross section. We turn to two applications of low frequency order imbalance next. The first application is cross-sectional return prediction for the period from 1993 to 2013. Market microstructure theories based on both inventory management and asymmetric information suggest that net order flow affect stock returns. The contemporaneous price impact is positive in both strands of models. However, the inventory management theory predicts an ultimate price reversal following initial price impact because the stock s fundamental value has not changed, while the asymmetric information theory predicts only partial price adjustment upon arrivals of informed orders followed by a subsequent and permanent price impact after the private information becomes public. We find that our proposed LFOIs have positive and significant predictive power for future stock returns at the daily frequency controlling for past returns, HFOI, and stock liquidity, consistent with the findings by Chordia and Subrahmanyam (2004) using the Lee and Ready (1991) order imbalance. Although a reversal occurs beyond one 4

day, the stock price does not fully revert to the previous level even after 20 days, suggesting that LFOIs capture both transitory price pressure and information content. We also find that LFOIs are complementary to HFOI as the coefficient estimates of the previous day s HFOI are also positive and significant in the same return regressions. However, the price impact from HFOI is short living and weaker than LFOIs in terms of both statistical and economic significance. The prediction power of LFOIs is robust in the subsamples based on firm size, liquidity, exchange markets, and time periods. Consistent with the effect of opaqueness and transactions cost on informed trading, LFOIs have stronger return predictability for small and illiquid stocks, Nasdaq stocks, and the predictability weakens in the recent period while staying statistically and economically significant. We also confirm that the return predictability is not a spurious result due to the long bull market during our sample period because both net buy and sell order imbalances are able to predict returns in the right direction. Investment strategies that are long stocks in the highest LFOI decile and short stocks in the lowest LFOI decile with daily rebalancing generate statistically significant returns between 0.306% and 0.855% and similar alphas with respect to the Fama-French (1993) three factors on the next day. To take advantage of the flexible computing method, we also extend the sample in the investment analysis. The turnover and high-low spread based LFOIs can be calculated for the period between 1927 and 2013 using CRSP data and we find abnormal returns from the LFOI strategy are around 0.36% a day in this period. The bid-ask spread based LFOI can be calculated for the period between 1983 and 2013 when the spread data are available in CRSP. The resulting daily abnormal return from the LFOI strategy is around 0.7%. Concerning that both components of LFOIs (past return and illiquidity) are significant return predictors, we perform double sorting investment analysis. We find that the LFOI strategy is highly profitable in all past return and illiquidity quintile portfolios except for the middle return quintile, which has past returns close to zero, hence the smallest cross-sectional dispersion in LFOI. Finally, we also examine profitability of the turnover based LFOI strategy 5

in G10 countries using Datastream data. The results show that the LFOI strategy generates positive and significant abnormal returns in all markets except for the UK, suggesting that the predictive ability of LFOI is internationally pervasive. The second application concerns fundamental information flow around corporate events. If informed traders exploit their advanced information, we expect the net order flow contains valuable information about upcoming corporate announcements. Specifically, we investigate the estimated LFOI around earnings announcements, extreme price movements, analyst recommendation changes, value related 8-K filings, and schedule 13-D filings. We find that LFOIs increase significantly in the right direction approaching all these corporate events, consistent with presence of informed trading ahead of the announcement. LFOIs also significantly predict abnormal announcement returns. Moreover, we find that the price sensitivity to LFOIs strengthens before large price jumps and analyst recommendation changes, possibly due to the unscheduled nature of such events. The results from this event study suggest that the simple measure of LFOI can be sufficient to detect the flow of private information in financial markets. Our method is related to the tick test on time bulks by Easley, Lopez de Prado, and O Hara (2012). However, their method still relies on intraday tick data as the tick test is performed on volume-weighted transaction prices. Compared to their method, our low frequency order signing algorithm uses the end-of-day prices only. As a result, our method computes daily stock order imbalance at a much faster speed. The cross-sectional pricing tests show that the low frequency order imbalances we propose have even stronger return predictability than the high frequency constructs. Campbell, Grossman, and Wang (1993) also interact returns and turnover to predict subsequent returns at the market level. But they do not interpret the interaction as order imbalance. Rather, the turnover ratio is used as a conditional variable in the same way as volatility to study the market return reversal. Unlike Campbell, Grossman, and Wang (1993), our focus is to propose an order flow measure at the individual stock level. 6

The rest of the paper is organized as follows. Section II describes how to construct our empirical measures of order flow and sample selection. Section III tests the performance of order flow measures. Section IV shows two applications of the low frequency order imbalance in cross-sectional return prediction and informed trading around corporate events. Section V concludes. II. Methodology and data A. Variable definition We first define the low frequency order imbalance (LFOI) measures. Based on the Kyle (1985) model, net order flow is the contemporaneous price change divided by stock illiquidity. We employ three low-frequency liquidity measures that are easy to calculate, namely the share volume turnover ratio (TURN), percentage bid-ask spread (BASPRD), and daily high and low prices implied spread (HLSPRD) of Corwin and Shultz (2012). TURN is calculated as the ratio of daily number of shares traded and number of shares outstanding. When the turnover ratio is low, it becomes hard to encounter potential trading partners and the transactions cost is high (Karpoff (1986)). Hence, TURN is inversely related to illiquidity. BASPRD is the closing bid-ask spread scaled by the midpoint of the bid and ask prices. This variable directly measures the transactions cost from immediate execution and should increase when the market is more illiquid (Amihud and Mendelson (1986)). Corwin and Shultz (2012) develop an estimator for effective spread based on the daily high and low prices. Like BASPRD, HLSPRD also increases when the stock becomes illiquid. We use two stock return measures to calculate LFOI. Theoretically, the contemporaneous price impact of net order flow pertains to the price movement during trading hours only and should not include price adjustment due to overnight information flow when the market is closed. Therefore, stock returns in LFOI calculations should be based on the open-to-close return. We use open-to-close midpoint returns of National Best Bid and Offer (NBBO) 7

for this reason and also to address the bid-ask bounce in transaction prices. However, the opening NBBO quotes are not available in standard aftermarket databases such as Daily Center of Research in Security Prices (CRSP) and we have to obtain such information from the NYSE TAQ database. To avoid using tick data from TAQ completely, we also use the close-to-close transaction return recorded in CRSP as an alternative return measure for calculation convenience. Because price impact from order imbalance is idiosyncratic, we use the residual returns from individual time-series regressions on Fama-French (1993) three factors in the full sample to calculate LFOI. 3 The interaction of contemporaneous stock returns and illiquidity generates the following six LFOI measures. TLFOI1: Daily risk-adjusted close-to-close transaction return TURN. TLFOI2: Daily risk-adjusted open-to-close mid quote return TURN. BALFOI1: Daily risk-adjusted close-to-close transaction return BASPRD. BALFOI2: Daily risk-adjusted open-to-close mid quote return BASPRD. HLLFOI1: Daily risk-adjusted close-to-close transaction return HLSPRD. HLLFOI2: Daily risk-adjusted open-to-close mid quote return HLSPRD. For comparison, we also calculate a high frequency order imbalance measure (HFOI) and another daily order imbalance measure used by Pastor and Stambaugh (2003, PS hereafter). We follow Lee and Ready (1991) to compute HFOI. Specifically, we match transactions in TAQ to the prevailing NBBO prices. A transaction is classified as buyer-initiated if the transaction price is above the midpoint of the matched NBBO prices, and seller-initiated if below. In case the trade price is the same as the midpoint NBBO price, the transaction is classified as buyer-initiated (seller-initiated) if price change prior to the transaction is positive (negative). We then calculate net order imbalance of a stock as the total buyerinitiated number of shares minus the total seller-initiated number of shares on the same day. 3 We also experiment with rolling-window regressions to avoid the looking-ahead bias and using raw returns instead of risk-adjusted returns. The results are largely the same and are reported in the internet appendix to the paper. 8

For cross-sectional normalization, we scale the net order imbalance by the number of shares outstanding to arrive at our HFOI measure. The PS measure of daily net order imbalance is the sign of daily close-to-close transaction return multiplied by daily dollar trading volume. This measure is equivalent to our LFOI measure if we use Amihud s (2002) illiquidity proxy. Although the PS measure is closely related to our turnover based imbalance (TLFOI), our prior is that TLFOI can be more desirable in the cross section for two reasons. First, the turnover ratio adjusts for the market capitalization and normalizes trading intensity across stocks. Therefore, the turnover ratio can better describe the cross-sectional variation in liquidity than the dollar volume used by PS. Second, the sign of return in PS calculation takes binary values and ignores the magnitude of the price impact caused. This treatment can overstate order imbalance when the resulting stock return is marginal and understate order imbalance when the stock return is large. It is hard to assess the performance of different order imbalance measures without observing the true order imbalance. To establish a benchmark, we acquire Nasdaq s Historical Total-View ITCH data and calculate the imbalance for all Nasdaq stocks. The ITCH data record detailed history of order messages submitted to Nasdaq markets including limit order addition, execution, and replacement or cancelation messages. The addition messages include the buy/sell indicator of all limit orders. When a marketable order is executed, an execution message is generated, which also includes the reference number of the limit order executed against. Following Odders-White (2000), we determine the direction of a transaction by reverting the buy/sell indicator in the limit order s addition message after linking the trade execution message to the addition message. The rationale is that the late comer in the transaction is likely to be the liquidity taker, hence the initiator of the transaction. Similar to HFOI, we then aggregate the signed trading volumes in ITCH data on the same stock each day, and scale the aggregate order imbalance by the total number of shares outstanding to obtain the benchmark imbalance measure, which we term ITCH OI. 9

B. Data and sample selection We obtain Nasdaq Historical TotalView-ITCH data between June 1999 and December 2010 and match with TAQ and CRSP databases in the same period. We focus on common stocks (CRSP code 10 and 11) listed on Nasdaq markets with prices above five dollars to form our main sample in the analysis. There are 2,495 trading days in the sample period and 1,458 stocks per day on average. When calculating order imbalance using tick data from either ITCH or TAQ, we restrict our analysis to trades executed during trading hours (9:30 AM to 4 PM EST) only. Following Lee and Ready (1991), we match trades to the NBBO prices five seconds ago before 1998 in TAQ data. The quote lag is inactivated after 1998 because Madhavan, Porter, and Weaver (2005) and Chordia, Roll, and Subrahmanyam (2005) show that the market becomes more efficient in the recent period and the reporting delay is absent. To circumvent the concern on fast moving quotes in the recent sample period, we follow Holden and Jacobsen s (2014) quote adjustment for the monthly TAQ data to construct NBBO prices after 2001. Since the Nasdaq trading volume could be overstated due to interdealer trades, we adjust the trading volume in turnover calculation following Gao and Ritter (2010). When calculating BASPRD, we exclude stock-day observations with negative or zero bid-ask spreads, and those with percentage bid-ask spread above 50%. We winsorize all order imbalance variables at 1 and 99 percent levels in the cross section for each day. III. Performance of low frequency order imbalance In this section, we test the performance of proposed low frequency order imbalance by examining the correlations with the order imbalance calculated using Nasdaq s ITCH data for all stocks listed on Nasdaq. Table I Panel A presents the time-serial averages of crosssectional statistics of all order imbalance measures. There are 2,495 days in our sample from June 1999 to December 2010 and the average number of stocks per day is 1,458. Due to 10

missing quotes in TAQ, the order imbalance calculated using mid quote returns has slightly fewer observations per day than the imbalance calculated using CRSP returns. Because Corwin and Shultz s (2012) estimation can lead to negative estimates of the high-low spread and we exclude such observations, HLLF OI1 and HF LF OI2 have smaller number of stocks per day than the other imbalance measures. The ITCH order imbalance has a mean of 0.001 with a standard deviation of 0.055 and a median of 0, indicating that the order flow is well balanced on average. In the extreme cases, however, the ITCH order imbalance can reach -0.19 or 0.201. The Lee and Ready (1991) order imbalance (HFOI) has a mean of 0.012, a standard deviation of 0.199, and a median of 0.001, indicating that HFOI is more positive than ITCH imbalance on average with larger dispersions despite both imbalance measures are normalized by total shares outstanding in the cross section. Surprisingly, the two turnover based low frequency order imbalance measures, TLFOI1 and TLFOI2 have similar statistics to ITCH order imbalance. Given that TLFOI can also be interpreted as a relative volume measure, the closely matched statistics suggest on average, the turnover based low frequency order imbalance well represents the distribution of the order imbalance in ITCH data. BASPRD, HLSPRD, and PS have different units from the other imbalance measures. Interestingly, BASPRD1 and BASPRD2, as well as HLLFOI1 and HLLFOI2, have different signs for the mean but similar standard deviation. [Place Table I about here] We examine the average cross-sectional correlations between ITCH order imbalance and the other imbalance estimates in Panel B. We first compare the correlations in the full sample period from June 1999 to end of 2013. The Lee and Ready measure, HFOI, has an average correlation coefficient of 0.316. This correlation, viewed as a performance indicator, is lower than the accuracy ratio of the same algorithm reported by Lee and Ready (1991), Ellis, Michaely, and O Hara (2000), Lee and Radhakrishna (2000), Odders-White (2000), and Chakrabarty, Moulton, and Shkilko (2012). We suspect the underperformance is due 11

to several reasons. First, we examine all Nasdaq stocks during our sample period while the other studies typically choose a relatively small sample of stocks. It is possible that the trade signing algorithm performs better for stocks with certain characteristics and the performance is clouded in the full Nasdaq sample. Second, our test period spans almost eleven years and prevailing high frequency trading toward more recent sample period can potentially impair the algorithm s performance. Third, the ITCH data include odd-lot orders while such information is not available in TAQ. O Hara, Yao, and Ye (2014) show that extensive use of odd lots causes understatement in market trading activity in popular databases such as TAQ. And finally, we focus on the cross-sectional relation of daily net order imbalance rather than trade level accuracy and we need to normalize imbalance by the number of shares outstanding for this purpose. The difference in testing methods may not make the results directly comparable. Turning to low frequency order imbalance measures next, we find that the turnover ratio based imbalance using CRSP returns, TLFOI1, has a correlation of 0.279. Although this correlation is slightly lower than the correlation of HFOI, it is surprising that a simple order imbalance measure that can be calculated using only CRSP data reaches such high correlation with ITCH order imbalance. Using open-to-close mid quote returns in the imbalance calculation improves the correlation of TLFOI2 to 0.286. The bid-ask spread based imbalance measures have average correlations of 0.207 and 0.201 for calculations using CRSP returns and mid quote returns, respectively. The high-low spread based imbalance measures have average correlations of 0.14 and 0.135, respectively. It seems that using open-to-close mid quote return does not improve the performance for the two spread based order imbalance measures. All of the proposed low frequency order imbalance measures have higher correlations than the PS measure used in prior research, which has a correlation of 0.121 with ITCH order imbalance. Concerning potential time trends in the performance, we also calculate the correlations in individual years and report them in Panel B too. We find that over time, all the order imbalance measures become noisier as the correlations reduce. For 12

example, HFOI has a correlation of 0.457 in 1999 and it reduces to 0.288 in 2010. The correlation of TLFOI1 also decline from 0.541 in 1999 to 0.255 in 2010. However, among the imbalance measures, the rank of performance is largely consistent during the sample period. HFOI, TLFOI1 and TLFOI2 outperform the other imbalance measures most of the time. The PS measure tends to underperform the other measures. In Panel C, we investigate cross-sectional heterogeneity in the imbalance measure performance. Because the trading speed adversely affects the performance, we consider two firm characteristics that relate to trading speed, namely firm size and liquidity as measured by BASPRD. Our conjecture is that firms with larger market capitalization and lower bid-ask spreads are likely to have higher speed of trading. Therefore, we divide the sample into tertile portfolios based on size and illiquidity, respectively, and calculate the correlations with the ITCH imbalance within each tertile group. We find that both firm size and spread affect the performance of HFOI. The correlation of HFOI is 0.353 for small stocks, 0.291 for large stocks, 0.304 for liquid stocks, and 0.34 for illiquid stocks. Indeed, as firm size grows or liquidity improves, the accuracy of HFOI decreases. On the other hand, the cross-sectional variation in the performance of low frequency order imbalance seems smaller. For example, TLFOI1 has a correlation of 0.279 for small stocks, 0.287 for large stocks, 0.274 for liquid stocks, and 0.28 for illiquid stocks, suggesting that this measure could be more stable than HFOI in the cross section in terms of accuracy. Interestingly, we find the PS imbalance is also heavily affected by stock characteristics, and the correlations in the tertile groups are generally higher than the full-sample result in Panel B. This is true because the PS measure is calculated based on dollar volume and the sign of return, hence lacking cross-sectional normalization in itself. It is possible that the characteristic groups reduce variation in the cross section within each group, and makes the PS measure of imbalance more comparable. In summary of our analysis, HFOI calculated using the Lee and Ready (1991) method is the best proxy of order imbalance calculated using Nasdaq s order level data. The turnover ratio based low frequency order imbalance (TLFOI), however, generates largely comparable 13

performance with HFOI. Given the convenience and computing efficiency of this alternative imbalance measure, it seems the cost of adoption is not too high in terms of effectiveness. The bid-ask spread based measure (BALFOI) and high-low spread based measure (HLLFOI) also outperform the prior low frequency measure employed by Pastor and Stambaugh (2003). Using open-to-close mid quote returns does not significantly improve the performance of low frequency imbalance measures. Therefore, in empirical applications later, we only use the low frequency measures calculated using the CRSP close-to-close returns. IV. Applications Based on the premise that the proposed low frequency order imbalance is a good proxy for net order imbalance in the cross section, we apply this method in two empirical tests. The first applications is cross-sectional return prediction. The second application is an event study to detect fundamental information flow around corporate events. A. Cross-sectional return prediction A.1. Sample description For asset pricing tests in the cross section, a large sample is desirable and we expand the sample to include all common stocks on NYSE, AMEX, and Nasdaq between 1993 and 2013 which have information in CRSP and TAQ with prices above five dollars. [Place Table II about here] Table II Panel A documents the time-serial average of cross-sectional statistics for order imbalance and liquidity measures. There are 5,289 trading days in the sample. The average number of stocks per year is about 3,800. The average statistics are quite similar to those in Panel A of Table I. Panel B reports the cross-sectional average of autocorrelations up to five lags for all imbalance measures. The autocorrelation of HFOI monotonically declines from 14

0.187 in the first-order autocorrelation to 0.100 in the fifth-order autocorrelation, consistent with findings in Chordia and Subrahmanyam (2004). LFOIs have much lower autocorrelations and the autocorrelations are less persistent. For example, although the first-order autocorrelations of TLFOI (0.039) is statistically different from zero, autocorrelations at the other lags are statistically insignificant. Panel C presents time-serial averages of crosssectional correlations between order imbalance, liquidity, and stock returns. Although HFOI and LFOIs can have similar correlations to the ITCH order imbalance, the correlations between HFOI and LFOIs are not high. TLFOI has the highest correlation with HFOI at 0.232 and HLLFOI has the lowest correlation of 0.087. The PS imbalance measure has a correlation of 0.075 with HFOI. It seems HFOI and LFOIs can capture different information about the order flow dynamics. Panel C also shows that LFOIs are not highly correlated to liquidity proxies. The correlation coefficients of all the LFOIs with BASPRD, HLSPRD, and AMIHUD are close to zero. Finally, the contemporaneous correlation between HFOI and stock returns averages at 0.207. Not surprisingly, all the low frequency imbalance measures have much higher correlations with returns due to the use of contemporaneous return in construction. Since stock returns are serially correlated, this raises the importance of teasing out order imbalance impact from return reversal in asset pricing tests. A.2. Main result We test the return predictive ability of order imbalance. Information-based market microstructure theories such as Glosten and Milgram (1985), Kyle (1985), and Easley and O Hara (1987) predict a positive relation between order flow and future returns. Informed traders can generate imbalance in the order flow submitted to a market maker depending on the type of information they receive. The market maker can update the prior based on the order imbalance. However, the market maker is unable to fully learn the information if the informed traders have capital constraints or litigation concerns. Therefore, there is only partial adjustment in the stock price upon arrival of informed trades. When the fundamental 15

information is released to the public, there will be additional price movement consistent with the previous order imbalance. If order flow does not contain private information, on the other hand, the price impact will eventually reverse because the fundamental value of the stock has not changed. Therefore, there can be a negative relation between order imbalance and future returns. We empirically examine the question using Fama-MacBeth (1973) two-stage regressions. To address autocorrelations of coefficient estimates, we report t-statistics based on Newey-West (1987) standard errors with eight lags. [Place Table III about here] Table III Panel A presents the estimated coefficients of the following model, R i,t = α t + 5 βk,toi L i,t k + k=1 5 βk,thfoi H i,t k k=1 + β T t TURN i,t 1 + β S t BASPRD i,t 1 + 5 βk,tr R i,t k + k=1 5 k=1 β R2 k,t R 2 i,t k + ϵ i,t, where for stock i on day t, R is the risk-adjusted mid-quote stock return with respect to the Fama-French three factors; OI is an alternative imbalance measure including ITCH OI, T LF OI, BALF OI, HLLF OI, and P S; and BASP RD, lagged returns, and lagged squared returns are used as control variables for liquidity, return reversal, and volatility. 4 The sign of independent variables in the regression model is expected. Our proposed LFOIs show positive and significant coefficient in the first lagged term, consist with hypothesis. The first lagged ITCH OI is not significant but at least positive. However, the finding that all the lagged PS are negative and significant is inconsistent with our hypothesis as well as Chordia and Subrahmanyam (2004). In the case of TLFOI and BALFOI, negative estimated coefficients are following in the second and third lagged terms, but the predictive power is 4 We choose risk-adjusted mid-quote return as our dependent variable in the regression model. Mid-quote return allows us to be free from the concern about bid-ask bounce, and risk-adjusted return mitigates the potential bias come from the market condition. We report the estimated coefficient of the regression model of transaction return, mid-quote return, or risk-adjusted return in the internet appendix. The implication we can get those regression models is same regardless of the type of returns as a dependent variable. 16

not fully reversed. That may imply that OIs can capture not only temporary price impact but also permanent price impact. We will look at this aspect of LFOIs more deeply in the following analyses. Table III Panel A supports our hypothesis. Positive and significant LFOIs in their first lagged term indicate that the stock price is more likely to rise (drop) on day t if LFOIs of a given stock is cross-sectionally higher than other stocks on day t 1. Only PS is not consistent with our hypothesis because PS may not represent true OI as good as our proposed LFOIs do. The predictive power of LFOIs for future stock return is also economically meaningful. The coefficient of TLFOI is 2.428. If TLFOI increases by one standard deviation, 0.030 in Table II, this coefficient implies that the stock return on day t rises 0.073%, which amounts to roughly 73% of the mean stock return, 0.1% in Table II and roughly 2.5% of the standard deviation of stock return, 2.9% in Table II. Likewise, one standard deviation increase in BALFOI and HLLFOI implies 0.224% and 0.089% of the stock return rise on the next day, respectively. Furthermore, LFOIs outperform HFOI in terms of economic significance. The coefficient of HFOI is 0.167 in the regression model without LFOIs, and therefore the one standard deviation increase in HFOI implies 0.029% of the stock return rise on the next day. The economic significance of TLFOI is about twice larger than that of HFOI. We run regressions using weekly observations for a robustness check in Panel B. The rationale for this expansion is that price impact of OI should be permanent, rather than temporary, if OI captures informed order flow. We run the weekly-based regression model of weekly stock return on LFOIs and HFOI aggregated in the past one week and report those estimated coefficients in Table III Panel B. The first lagged terms of ITCH and LFOIs except HLLFOI are positive and significant. In particular, TLFOI has the highest t-statistics among the LFOIs and even the second and third lagged terms are positively correlated with current weekly stock return. Nonetheless, HFOI lagged coefficients are insignificant or negative. They are still insignificant in a weekly regression model without LFOIs, although we do not report here. 17

The regression results indicate that TLFOI outperforms HFOI in terms of return predictive power, suggesting that TLFOI is a better proxy for true OI. This is because HFOI loses its predictive power while lagged TLFOIs are positively associated with the current weekly stock return. TLFOI also outperform PS in terms of economic significance. The standard deviation of TLFOI and PS is 0.094 and 6.458 in our weekly sample, respectively. Therefore, one standard deviation in TLFOI increases 2.414 per mil of weekly stock return, while PS increase 0.426 per mil of weekly stock return. We also observe the dynamic of price impact of OI over twenty trading days under the same rationale of the weekly-based return prediction model. If LFOIs include information on informed order flow, then they should have permanent price impact on a given stock. Figure 1 describes permanent price impact in a more direct way. [Place Figure 1 about here] Figure 1 presents k estimated coefficients of the first lagged OIs from the following regression model, CR i,t,t+k = α t + β t OI i,t 1 + βt T TURN i,t 1 + βt B SPRD i,t 1 + βt R RET i,t 1 + βt R2 RET 2 i,t 1 + ϵ i,t, where subscripts i and t denote an observation of stock i on day t, CR i,t,t+k is cumulative raw return from day t to t + k, and OI is one of OIs including HF OI, T LF OI, BALF OI, HLLF OI, or P S, and BASP RD/HLSP RD corresponds to the type of OI. The price impact of HFOI in Figure 1 (a) is positive in the first seven trading days, but it is reversed to be negative. Although HLLFOI in Figure 1 (d) shows similar dynamic to HFOI, other LFOIs including PS keep positive price impact over twenty trading days. Note that TLFOI maintains its positive price impact at the same level after the first three days. Figure 1 points out that LFOIs can capture informed order flow, while HFOI cannot. This is because price impact of OI is not reversed if and only if the OI contain information. There might be concerns for the ex post bull market in the equities during our sample 18

period. Given positive albeit weak averages of order imbalance and stock returns, it is possible that the positive relation between them is a spurious result. To address this concern, we break the order imbalance into a positive and a negative components to reestimate the model. Specifically, for an order imbalance measure X, the positive imbalance X+ is defined as Max(X, 0), and the negative imbalance X is defined as Min(X, 0). In Table IV, we use the positive and negative order imbalances instead of the original imbalance to estimate the return prediction model. For brevity, we report coefficient estimates only for LFOIs and HFOI while the regressions always include the full set of control variables as in Table III. Neither positive nor negative ITCH order imbalance has a significant coefficient at any lag although both have a positive coefficient at the first lag. The LFOIs have positive and significant coefficients at the first lag for both the positive and negative components regardless of the liquidity proxy used in construction, indicating that the predictive ability of LFOIs do not come from a positive return bias in the sample. The PS imbalance has positive signs at the first lag for both positive and negative components and the coefficient is statistically significant on the negative component, contradicting the negative and significant coefficient in Table III. It seems the pricing effect of PS is less robust than LFOIs. We also find that both positive and negative HFOI lead to higher future returns because the coefficient on positive HFOI is positive and significant and the coefficient on negative HFOI is negative and significant. It is possible that the positive return bias during our sample period has larger impact on HFOI than LFOIs. It could also be true that the short-sale constraints on the stock market make it more difficult for informed traders to exploit negative news, hence reducing the informativeness of selling pressure measured by HFOI. [Place Table IV about here] 19

A.3. Subsample tests The return predictive power of our proposed order imbalances can strengthen or weaken by various factors. This section deals with potential moderators to control the return predictability such as information asymmetry, liquidity, an exchange market, and a market change. We investigate the moderator effect through subsample tests; size subsamples for information asymmetry, percentage bid-ask spread subsamples for liquidity, exchange market subsamples, sub-periods for a market change. [Place Table V about here] The predictive power of order imbalance should be stronger for the stock subject to more severe information asymmetry. For those stocks, a market maker is hard to evaluate the stock as correctly as informed investors can, and, in turn, the stock price likely reflects partial information. Accordingly, informed order flow could have predicted future stock return until stock price reflects all the available information investors hold. Also it is natural to have a conjecture that the difference in market capitalization indicates information asymmetry among investors. This is because small-sized stocks do not have much attention from investors, comparing with large-sized stocks and therefore it is less likely that the information on small-sized stocks to spread over the market than the information on large-sized stocks. For those reason, we expect that order imbalances have stronger predictive power for small-sized stocks than for large-sized stocks. We separate our sample into tertile portfolios based on market capitalization and report the estimated coefficients from our base regression model in the bottom and top tertile portfolios in Table V. Consistent with our expectation, order imbalance shows stronger return predictive power in small-sized subsample than in large-sized subsample. ITCH has the significant and positive coefficient in the first lagged term in a subsample for smallsized stocks. LFOIs are also positive and significant in the first lagged term regardless of subsamples, and the small-sized subsample has larger coefficient in the first lagged term than 20

the large-sized subsample. When it comes to economic significance, the difference between small-sized and large-sized subsamples still exists; one standard deviation increase in TLFOI can generate 0.10% for small stocks and 0.04% for large stocks. We do not report here, but the other LFOIs also have the same implication as TLFOI has. Order imbalance shows stronger predictive power for illiquid stocks. Liquidity is, by definition, the degree at which investors trade securities without price movement. That is, liquid stocks can be traded without much price impact from order imbalance, while illiquid stocks are supposed to have large price impact from order imbalance. Percentage bid-ask spared is one of well-known proxies for liquidity. Therefore, it is more likely that the predictability of order imbalance becomes stronger among stocks with wider bid-ask spread. Table V Panel B reports estimated coefficients in liquidity top (illiquid) and bottom (liquid) subsamples, as we did in Panel A. The regression results support our expectation; stocks in the illiquid subsample have relatively strong prediction power for future stock return. In the case of ITCH, both subsamples do not have significant coefficient in the first lagged term, but the first-lagged ITCH has a positive coefficient in illiquid subsample while it is negative in liquid subsample. LFOIs also have the same pattern as ITCH. For example, TLFOI makes 0.02% return increase when it increases by one standard deviation in liquid subsample, while TLFOI make 0.11% return increase in illiquid subsample. Interestingly, PS shows negative coefficient in the first lagged term in the liquid subsample, implying that it loses predictive power for liquid stocks. The characteristic of an exchange market matters in the informativeness of order imbalance. For instance, NYSE-listed stocks are larger than NASDAQ-listed stocks in terms of market capitalization. Also trades for a stock more frequently occur in NYSE and AMEX than in NASDAQ. Those properties indicate that information spread more efficiently in NYSE and AMEX because stocks in NYSE and AMEX are more likely to grab some attention of investors. Furthermore, frequent trades help investors to evaluate a stock correctly, giving a signal on news arrivals to a market. Therefore, it is more likely that order imbalance 21

is more informative for future stock return in NASDAQ than in NYSE and AMEX. Table V Panel C presents estimated coefficients in two exchange market subsamples; NYSE and AMEX versus NASDAQ. The regression results are consistent with our expectation. In terms of economic significant, one standard deviation increase in TLFOI causes 0.07% of stock return in NYSE and AMEX versus 0.08% of stock return in NASDAQ. In addition, overall estimated coefficients except the first lagged terms of LFOIs in NYSE and AMEX are lower in absolute value than those in NASDAQ, implying quicker price adjustment in NYSE and AMEX. Financial markets have been more and more efficient over time. Many literatures provide evidence on high frequency trading enhances the market efficiency (Hendershott, Jones, and Menkveld (2011) and Brogaard, Hendershott, Riordan (2014)). Additionally, the market size has been bigger and the market liquidity has improved over time. As a market is more efficient, order imbalance is less informative for future stock return because a market can reflect all available information more quickly and therefore the stock price is more likely to be efficient price. Therefore, we expect that order imbalance predictability and transitory price impact become weaker in the latter sample period than in the earlier sample period. We report estimated coefficients in Table V Panel D after dividing our sample in two sub-periods; 1993-to-2003 for early period and 2004-to-2013 for late period. Two sub-period subsamples do not show clear difference. ITCH, TLFOI, and PS is more informative in late subsample, while BALFOI and HLLFOI is more informative in early subsample. However, we observe that, regardless of the type of order imbalance estimation, overall estimated coefficients in late subsample are more likely to be lower in absolute value. The regression results suggest that the speed of price adjustment may be quicker in late subsample than in early subsample. A.4. Investment strategy [Place Table VI about here] 22

Table VI documents the profitability of investment strategies based on one-trading-day lagged LFOIs. We rank all the stocks in our sample by one-trading-day lagged LFOIs for each day, and classify them into decile portfolios. Stocks with the lowest (highest) LFOI belong to Low (High) portfolio. We take short positions for stocks in the Low portfolio and long position for stocks in the High portfolio at day t. Raw LFOIs including T LF OI, BALF OI, HLLF OI, and P S are not profitable at all. First of all, the performance of decile portfolios is not monotonically increasing, and therefore High-Minus-Low investment strategy does not produce positive and significant profits. This results are inconsistent with return predictive power of LFOIs in the previous tables. We have a conjecture that the unprofitability may come from contaminated LFOIs which include the information not only on order imbalances but also returns or illiquidity. To remove the information contents of returns and illiquidity, we utilize residual terms of LFOIs in the regression models of a given LFOI on stock return and its illiquidity factor, denoting it as residual LFOI. They display strong profitability in our investment strategy. For example, High-Minus-Low portfolio by residual T LF OI generates high daily returns of 0.855% with t-statistics of 35.79 and its annual Sharpe Ratio is 18.682. Even after controlling the HML performance by Fama-French three factors, the profitability does not disappear in terms of positive and significant alpha with t-statistics of 86.44. Other residual LFOIs have the same implication as residual TLFOI. The Sharpe Ratio is 15.963 for residual BALFOI, 7.624 for residual HLLFOI, and 11.814 for PS. We make another sample data by using daily CRSP data only from January 1927 to December 2013, reporting in Table VI Panel B. For the reason of missing values in closing bid and offer prices, the sample period of BALFOI and residual BALFOI is restricted to 30 years starting from February 1983 to December 2013. The investment strategy creates very similar results as Panel A shows. That indicates that our investment strategy does not only belong to our dataset and sample period. [Place Table VII about here] 23