LONG-RUN ABNORMAL STOCK PERFORMANCE: SOME ADDITIONAL EVIDENCE

LONG-RUN ABNORMAL STOCK PERFORMANCE: SOME ADDITIONAL EVIDENCE J.F. BACMANN a AND M. DUBOIS a First Draft: February 2002 a Université de Neuchâtel, Pierre-à-Mazel 7, 2000 Neuchâtel, Switzerland Tel: +41 32 718 13 60 Fax: +41 32 718 13 61 E-mail: jean-francois.bacmann@unine.ch and michel.dubois@unine.ch Financial support by the Swiss National Science Foundation (grant n 1214-056849.99) and by the National Centre of Competence in Research Financial Valuation and Risk Management is gratefully acknowledged. The National Centre of Competence in Research are research programmes supported by the Swiss National Science Foundation.

2 LONG-RUN ABNORMAL STOCK PERFORMANCE: SOME ADDITIONAL EVIDENCE Abstract: In this research we study the specification and the power of classic test statistics used in long-term event studies analysis. Using simulations in random samples, we show that test statistics based on an arbitrary benchmark are well specified and as powerful as the ones based on the size and book-to-market benchmark. However, when conditioning the samples on past stock returns performance, we show that a good matching procedure is required in order to obtain well specified and powerful tests. Finally, we examine the specification and the power of calendar-time portfolios. The cross-sectional standardized t-stat is well specified in random samples in which the frequency of the events is random or depends on the past market returns performance. However, when the frequency of events is conditioned on past market returns performance and the stocks are selected among the most extreme returns misspecified test statistics are obtained.

1 Since Ibbotson (1975), the analysis of the long-run abnormal stock returns has attracted a lot of interest in corporate finance. This topic is important for at least two reasons: first, it is a mean to explore whose actions undertaken by the management create value and, second it help determining the sources of stocks misspricing. Routinely, empirical studies have reported negative long-run abnormal returns for some events and positive for others 1 casting some doubt on market efficiency. Recently, several models based on non-rational agent s behavior have tackled the problem. However, as underlined by Fama (1998) these models have trouble in explaining empirical facts for which they are not designed. From an epistemological standpoint they have not gained yet the status of a full theory. Moreover, empirical findings show that, depending on the information, investors underreact or overreact half of the time. Therefore, it could be that long term abnormal returns are not correctly estimated and/or the statistical tests are biased. Barber and Lyons (1997) and Kothari and Warner (1997) examined extensively this methodological issue. They identify three potential sources of misspecification: the survivor bias, the rebalancing bias and the skewness bias. Lyon, Barber and Tsai (1999) (LBT hereafter) show that cross-sectional dependence and a bad asset-pricing model are two additional sources of misspecification. In fact, based on simulations a la Brown and Warner (1980), LBT show that a Traditional event study framework and buy-and-hold abnormal returns calculated using carefully constructed reference portfolios yiel well-specified test statistics in random samples. However, as 1 Three to five years negative abnormal returns are found after IPOs, SEOs, mergers, dividends omissions and listing on the NYSE while the converse is obtained for stock repurchases, splits, spinoffs and earnings announcement; see Ritter (1991), Ritter and Lougrhan (1995), Ikenberry et al. (1995), Ikenberry et al. (1996) among others.

2 underlined by Fama (1998, p. 290, Table 1) most of the events seem selective so that the experimentaél design suggested by LBT is misleading. In this research, we use a benchmark randomly selected because Ferson, Sarkissian and Simin (1999) show that a portfolio constructed with stocks sorted alphabetically help explain the cross-sectional dispersion of stock returns even if this method lack of any financial content. In fact, when applied to the measurement of the long-run performance of stock returns, the same test statistics calculated from a non-financial benchmark provide identical or superior results in terms of specification and power compared to the size and book-to-market benchmarking. Unfortunately from a statistical perspective, financial events are rarely random. When examining stock returns before the event, previous studies have found that abnormal performance is likely to occur for a wide variety of events. LBT (p. 185) suggest that matching firms to firms with similar pre-event returns performance would also control well for the misspecification of the size and book-to-market matching-firm method. For that purpose, we split NYSE and AMEX stocks in quintiles according to their stock returns performance over the past twelve months. Then, we determine two samples by randomly selecting firms within each quintile. For each firm in both samples, we select randomly a matching-firm in three different ways. The matching firm is drawn among a) all NYSE-AMEX stocks, b) the first quintile (highest returns) and c) the fifth quintile (lowest returns). The simulations show that a good matching procedure (a firm with prior abnormal returns is matched with a firm with similar returns) leads to well-specified tests. In addition, real event studies frequently exhibit strong clustering during specific periods of time. For example, several empirical studies document that SEOs are by far more frequent when markets are bullish. Under these conditions, cross-

3 sectional dependence of stock returns makes the grouping of event-firms into portfolios preferable. Finally, we examine the specification and the power of calendar-time portfolios with two benchmarks, namely the Fama and French (1993) and Carhart (1997) models. Abnormal returns are estimated as in-sample error (see Mitchell and Stafford (2000)) and as forecasted errors (see Kothari and Warner (1997)) using a variety of t-statistics (standard t-stat, cross-sectional t-stat, standardized t-stat and cross-sectional and standardized t-stat) inspired from the short-term event study setting. When the event-sample is selected randomly, our main results can be summarized as follows: a) the standardized t-stat has the best specification and is the most powerful test statistics, b) the period of estimation is not of major concern and, c) tests constructed with the Carharts model (1997) are more conservative and less powerful than those based the Fama and French (1993) model. When the frequency of events conditioned on past market returns performance is high, the standardized t- statistics is still well specified. However, the test statistics are misspecified whenever the frequency of events conditioned on past stock returns performance is high. The remainder of the paper is organized as follows. We present the methods used to calculate abnormal returns and the test statistics in Section I. In Section II, we examine the specification and the power of various test statistics based on a nonfinancial matching procedure. In Section III, we study the specification and the power of these test statistics when the matching is based on past returns. Additional results using calendar-time portfolios and test statistics adjusted for cross-sectional heteroscedasticity are provided in Section IV. Section V concludes.

4 I. Abnormal Returns and Statistical Tests In this section we briefly summarize the various calculations of abnormal returns and of the statistical tests used in the litterature. A. Cumulated Abnormal Returns over a long-horizon The model for measuring the normal returns is the following: ( ) E R I = R it t ct where E( Rit I t ) is the monthly expected return for security i during month t, given the set of information I t, R ct is the monthly return of the matching-firm or the control portfolio over the same period. The abnormal return calculated as: ARit = Rit Rct + εit AR it over the month t is 2 where ε it ~ N ( 0, σ ) is an error term independent of i and t, with zero-mean and constant variance. As in others studies, the temporal aggregation of the abnormal returns is done via a rebalancing strategy (CARs hereafter) and a buy and hold strategy (BHAR hereafter). A.1. Cumulated Abnormal Returns CAR T2 = AR ih, it, t= T1 where CAR ih, is an estimate of the cumulated abnormal returns for stock i, over the period h Ti,..., T =,1 i,2 The null hypothesis of no average cumulated abnormal returns is stated as follows:

N N 1 1 H : CAR = 0 vs H : CAR, 0 1 ih, 1, A ih N i= 1 N i= 1 5 As it is well known, the average cumulated abnormal returns can be obtained by rebalancing the portfolio (1 USD long in stock i, 1 USD short in the control c) at the end of each period (month). Because of transaction costs, average cumulated abnormal returns are no longer attainable. However, Fama (1998) recommends using this method because the bad-model problem is less acute compared to the buy-andhold abnormal returns. A.2. Buy and Hold Abnormal Returns Instead of rebalancing the portfolio at the end of each period, a more realistic strategy consists in buying a portfolio, which is 1 USD long in stock i and 1 USD short in the control c. This portfolio is hold until the end of the period h= T,..., T i,1 i,2. The abnormal performance of stock i is computed as: T i,2 i,2 ( + ) ( + ( )) BHAR, = 1 R 1 E R I ih it it t t= Ti,1 t= Ti,1 T where BHAR ih, is the buy and hold abnormal return and h is the holding period. The null hypothesis of no average buy and hold abnormal returns at the horizon h is stated as follows: N N 1 1 H : BHAR = BHAR = 0 vs H : BHAR i, h 0 2 h i, h 2, A N i= 1 N i= 1 B. Common Statistical Tests The most commonly used statistical test in order to test the null hypothesis of no abnormal return ( H and 1 H 2 ) is the standard t-test: ωh t stat = σ ω ( ) h N

6 h where ω is the sample mean (of the CARs or BHARs) and σ ( ω h ) is the crosssectional sample standard deviation for the sample of N firms. However, the distribution of the CARs is often asymmetric and the t-stat must be adjusted in order to get the proper critical values. This problem is even more acute for the BHAR. Johnson (1979) proposed the following correction: ˆ 1 ˆ ˆ 1 3 6N 2 tsa = N S + skws + skw ˆ where ˆ ϖ S = ˆ σ ϖ ( ) i skw ˆ = N i= 1 ( ϖˆ i ϖ) ( ) N ˆ σ ϖ i 3 3 is an estimate of the skewness of the CAR (BHAR) Sutton (1993) recommend bootstrapping t sa in order to obtain a well specified test statistics. Hence, we proceed as in Lyon, Barber and Tsai (1999): 1000 bootstrapped samples of size N = N 4 b are drawn from the original sample. For each b bootstrapped sample, the sa t is calculated as before: b ˆ 1 ˆ ˆ2 1 ˆ tsa = Nb Sb + skwbsb + skwb 3 6Nb Sˆ b b ϖ ϖ = and b ˆ σ ( ϖ ) i skw ˆ b = N b b b ( ϖˆ i ϖ ) i= 1 b N ˆ σ b ( ϖ ) i 3 3 The critical values x * l (lower bound) and empirical distribution of t b sa for a given confidence interval α : * x u (upper bound) are obtained from the b * b * α Pr tsa x l = Pr tsa x u = 2

7 C. The Data In this analysis we use all the NYSE/AMEX firms with available data on the Daily CRSP files. The period covered goes from July 1962 through December 1996. In general, the research on long-term stocks performance focuses mainly on ordinary common shares so that CRSP share codes 10 and 11 are eliminated from our analysis. We use the Daily Files to compute arithmetic monthly returns. This allows us to swap the matching-firms in real time whenever they are delisted. Nasdaq stocks are excluded to mitigate the new listing bias. However, there is no specific reason to eliminate those firms having experienced a specific event like new listing and not those involved in seasoned equity offerings or split which are also known to produce abnormal returns. II. Does Book-to-Market and Size Matching Matter? Barber and Lyon (1997) and Lyon, Barber and Tsai (1999) claim that the matching criterion is crucial. For random samples, they show that size and book-tomarket is required in order to obtain well-specified tests either for matching-firms and control portfolios. However, Fama and French (1993) show that the market itself is an important factor, which cast some doubt on a matching procedure relying on two criteria only. The matching procedure requires well specified and powerful tests indeed. However, non-relevant matching criteria must lead to opposite results too. For that purpose, we choose a criterion without any financial content based on the alphabetical ranking of the stock; see Ferson, Sarkissian and Simin (1999). In order to compare our results with Lyon, Barber and Tsai (1999), we use two benchmarks: a matching-firm and a control portfolio. A. Data and the Sampling Design A.1. Matching-Firm

8 The firms and the event-dates are drawn randomly from the subset of NYSE/AMEX firms previously defined and from July 1962 trough December 1991. Whenever the five-years stock returns series is missing or incomplete for a given pair, a new pair is drawn. We generate 1000 samples of 200 firms each. The selection of the matching-firm is based on two criteria. First, the matchingfirm is drawn randomly from the initial population of firms available at the event date. Second, we select the firm at the event date whose CRSP share code is the next available in the CRSP File. If the matching-firm disappears during the five-years period, it is replaced by another firm selected randomly in case 1 and by the next firm in case 2. The swap is made at the delisting time. Obviously, both criteria lack of any financial content. The question we address is whether this matching procedure leads to well specified and powerful test statistics too. A.2. Control Portfolio During the period covered by our analysis, 2000 stocks are available on the CRSP Files so that 50 equally-weighted reference portfolios consisting of 40 securities each are constructed. In each portfolio, stocks are selected randomly with replacement. When a firm is delisted before the end of the five-years period, it is not replaced in the portfolio. Brav, Geczy and Gompers (2000) a similar sampling technique. B. Results B.1. Specification of Test Statistics We study the specification of the four test statistics presented in Section I.B. for the one year, three years and five years horizons. The results are presented in Table I for a theoretical rejection rate of 5%.

9 Insert Table I First, all the test statistics considered here are well specified at the one-year and the three-years horizons. There is no difference between the random matching and the criteria based on the following CRSP code firm. Interestingly, the two nonfinancial matching procedures produce test statistics, which are as well specified in random samples as the ones based on book-to-market and size criteria. In other words, there is no gain in using these criteria. From a practical point of view, the matching of the Compustat Files and the CRSP Files is not necessary. This has two advantages: the matching is simpler and no bias due to presence in both databases is introduced. Second, concerning the five-years horizon, the test statistics based on the control portfolio remain well specified. Conversely, the test statistics based on the matchingfirm are significantly different at the 1% level from the theoretical rejection rate of 5%. Thus, the control portfolio is the best benchmark for short, medium and long horizons up to five years. Third, the correction introduced in the test statistics to account for skewness and the bootstrapping of the statistics do not out-perform the classic t-stat. In some cases (BHAR and Matching-firm), the classic t-stat is the unique statistics, which is wellspecified at the five-years horizon. B.2. Power of Test Statistics We study the power of the test statistics by adding a constant abnormal return to each stock. Four different values are examined depending on the horizon. We consider 20 percent, -10 percent, 10 percent and 20 percent for the one-year horizon and 50 percent, -20 percent, 20 percent and 50 percent for both the three-years and

10 the five-years horizons. The results of the simulations are presented in Table II and summarized in Figure 1. Insert Table II As far as the power of the test is concerned, our matching criteria (Random Matching or Next CRSP Code Matching) lead to similar results. In fact, this is not surprising because these criteria are independent from any financial theory. The Control Portfolio is a better benchmark than the matching-firm. The power of the test statistics based on the CARs and a control portfolio is always higher than 90% even with a small additional increment (10 percent for a one year-horizon and 20 percent for three to five years horizons). Strikingly, our method produces more powerful tests than a size and book-tomarket based matching. Let us consider two examples. When 10 percent (-10 percent) are added over a one-year period, the standard Student-t applied to the BHARs has a power of 43% (39%) in LBT compared to 63.4% (58.6%); see Table II-A. The difference is even more important with the bootstrapped t-test corrected for the skewness: we find 95.4%(63.3%) against 70% (55%) in LBT. In general, bootstrapping the statistics increases the specification of the tests statistics which is even better after correcting for skewness. However, this result does not hold any longer for the power of the tests. Contrarily to LBT, the bootstapped statistics are less powerful than their standard counterparts and sometimes there is a huge difference. In particular, the power of the bootstrapped t- stat adjusted for skewness (calculate with the BHAR and the control portfolio) is 63.3 percent as opposed to 82.5 percent with the standard t-stat. This is really embarrassing because this techique was supposed to perform well in that setting.

11 Insert Figure 1 We see that the test statistics constructed from a benchmark based on size and book-to-market are less powerful than the ones constructed from an arbitrary benchmark. However, these discrepancies may be explained by the sampling designs of both studies. Nevertheless, our main conclusion remains: simulations based on event-firms selected randomly do not help validate criteria in forming matching-firm or control portfolio benchmarks. II. Matching and Past Performance A. Conditional Samples Based on Past Returns The characteristics of our sample remain unchanged. Our analysis is based on NYSE/AMEX firms from July 1962 through December 1996 (CRSP share codes 10 and 11 excluded). Each month, the securities are sorted according to their twelvemonths prior returns and affected to the corresponding quintile. The quintile Q1 (Q5) contains the stocks with previous high returns (low returns). Two types of event-firm samples are determined depending on the previous performance. We construct these samples by randomly drawing 1000 samples of 200 firms from Q1 and Q5 separately. In order to measure the abnormal returns, a matching-firm is chosen in three different ways: a) a random selection over the whole population, b) a random selection among Q1 firms, and c) a random selection among Q5 firms. For each event-firm, we draw randomly 40 stocks from Q1 (Q5) and calculate the buy and hold abnormal returns for the corresponding equally-weighted portfolio at the one-year, three-years and five-years horizons. When a firm is delisted during the performance measurement period, it is not replaced beyond that date.

12 B. Results To assess the specification of the test conditioned on past returns, we use the bootstrapped t-stat adjusted for skewness. In Table III, we report the critical values at 2.5% and 97.5% in order to highlight the asymmetry of the biases depending on a) the benchmark (matching-firm or control portfolio), and b) the adequacy of the matching procedure based on past returns. Insert Table III First, the bootstrapped t-stat adjusted for skewness is ill-specified when the matching-firm or the control portfolio is selected randomly. Not surprisingly, the results are even worse when firms with high past performance (low) are matched against a control with low past performance (high). In fact, we reproduce a specific momentum-type strategy, which yield a positive performance 2. Second, with matching-firms or control portfolios selected to match past stock returns of event-firms, the test statistics is well specified and nearly symmetric (the empirical rejection rate is the same for the upper and the lower bound). The characteristics of the event-firms are generally ignored in empirical studies. Despite the fact that events are rarely random, the matching-firm or a control portfolio procedure based on size and book-to-market is chosen routinely. However, this is not the best procedure whenever the event-sample exhibit previous specific patterns in stock returns. 2 Cooper (1999) provides evidence that wide varieties of momentum strategies (portfolio weights) produce significant abnormal returns.

13 III. Abnormal Returns in Calendar Time A. Data and the Sampling Design The characteristics of our sample are slightly modified. Our analysis is based on NYSE/AMEX firms from July 1968 through December 1988 (CRSP share codes 10 and 11 excluded) because of the availability of the size and book-to market factors which were downloaded from Ken French s website 3. First, the firms and the event-dates are drawn randomly from the subset of NYSE/AMEX firms previously defined and from July 1968 through December 1991. Whenever the three-years stock returns series is missing or incomplete for a given pair, a new pair is drawn. We construct 1000 samples of 200 firms each, whose returns are aggregated in order to form 1000 equally-weighted portfolios. The number of firms may not be constant up to five years within each portfolio because of delisting. Nevertheless, there is no general agreement on how to circumvent this problem. Second, we assume that events are no longer uniformly distributed over time. Depending on the previous twelve-months market returns ( R12 mt ), the number of events within that month is defined as follows: 10% R12 mt : no event, 10% R12 30% : one event (normal frequency), mt 30% R12 mt : six events (high frequency). Firms are drawn randomly from the population in both normal and high frequency event-periods. This sampling allows us to examine the case of events occurring mostly during bullish market periods. 3 Update Fama and French Factors are available at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/

14 Third, in high-frequency periods firms are drawn randomly from the sub-sample of stocks having experienced high twelve-months returns (above 20%), which corresponds to firms engaged in the event because they have high past returns. Fourth, we examine the converse setting in which the frequency of events is increased when the market was bearish ( R12 30% ). In that case, stocks are drawn randomly from the population (bearish random) and from the previous lowest twelve-months returns (bearish, loser). To study the power of the tests, each month we add a specific increment to the monthly stock returns such that the abnormal return is equal to a given increment at the end of the five-years holding period 4. The incremental abnormal returns take the following value: -20 percent, -10 percent, 10 percent and 20 percent. mt B. Abnormal Returns and Statistical Tests The general model used to calculate abnormal returns is the following: ( ) i T2 R = E R I + γ δ + ε it it t ik ikt it i k= T1 where γ ik is the abnormal return of stock i over month k, δ ikt is a dummy variable equal to 1 whenever k = t and 0 otherwise, i T 1 is the beginning of the window for the stock i, i T 2 is the end of the window for the stock i, ( ) 2 εit is an error term with zero mean and constant variance N 0, σ i. First, expected returns are described by the Fama and French (1993) model: ( ) β ( ) E R I = R + R R + s SMB + h HML it t ft i mt ft i t i t 4 The monthly increment corresponding to a total increment of 30 percent is equal to 60 1+ 0.30 1 = 0.004382.

R ft 15 where is the return on the three-month Treasury bills, is the return of the R mt market portfolio (CRSP value-weighted index), SMB t is the return of the size portfolio, HML t is the return of the book-to-market portfolio, I is the information t set provided by R, SMB and HML. mt t t Brav, Geczy and Gompers (2000) and Jegadeesh (2000) use an extension of the Fama and French model, namely the four factor model proposed by Carhart (1997). The fourth factor is the portfolio M 12. It consists in investing an equal amount in the 30% of stocks which have experienced the highest returns during the last twelve months (from t 12 through t 1) and being short in the 30% with the lowest returns. The equation is: ( it t ) = ft + βi ( mt ft ) + i t + i t + i 12t E R I R R R ssmb hhml mpr where PR12 t are the monthly returns of portfolio M 12. Four sets of parameters are estimated for both models and used to forecast both the conditional mean and the conditional variance 5. First, the parameters are estimated over the event-period Ti,1,..., Ti,2 for each stock i as in Mitchell and Stafford (2000), abnormal returns being in sample error-estimates. Second, the fiveyears period before the event is used, and abnormal returns are forecasted errors. Third, the model is estimated with a five-years moving window ending at timet 1, and the one-period ahead forecast estimates the abnormal return. Fourth, the window used for parameter estimation expends until t 1, and the abnormal return is calculated as before. The null hypothesis of no abnormal returns can be written as follows: 5 See Tashman (2000) concerning the forecasting with linear regression models.

H T N T N it 3: 0 vs H3, A : t= 1 i= 1 t t= 1 i= 1 16 1 γ 1 γit = T n T n 0 t where n t is the number stocks in month t within the event-portfolio, T is the total number of periods for which the portfolio is defined. We omit the months for which the portfolio contains no stocks 6. Whenever, the series is independent and has finite variance, the hypothesis can be tested with a standard t-test (see eq. xx). However, the portfolios weights are time varying, which is a potential source of heteroscedasticity because the variance is a decreasing function of the number of stocks within the portfolio. Therefore, the t-stat is calculated with the series of abnormal returns standardized by the series of their n t ˆ γ it own residual variances 2 ns i= 1 t it. In so doing, low-frequency event-periods are not over-weighted. Another potential source of heteroscedasticity comes from both the variances of the stocks, which is specific and the forecasting horizon, which depends on the forecasting model 7. First, the series of abnormal returns is standardized by the forecasted variance and aggregated at each time t given portfolios weights producing a new series on which the standard t-test and the cross-sectional t-test are calculated. C. Results C.1. Alternative Test Statistics in Random Samples 6 This feature occurs very seldom. In fact, the probability of having no stock is neglectable. 7 The calculation of these test statistics follow Patell (1976) and Boehmer, Poulsen and Musumeci (1991) in which they are found to be well-specified and powerful for the short term (typically a ten days window).

17 The analysis of the empirical specification in random samples is presented in Table IV. The results concerning the specification correspond to no abnormal returns (0 percent increment). Anything else being equal, the choice of the benchmark is not of major concern as the test statistics are very similar. The rejection rate of the null (no abnormal return) tends to be slightly higher than the theoretical rate of 5 percent with Carharts (1997) model, while the converse is true with Fama and French (1993) model. Thus, the former is slightly more conservative. Insert Table IV The estimation period is not very important either. Nevertheless, the estimation over the sample-period provides well specified test statistics (with the exception of the standardized t-test and the Carhart model) at the 1 percent level 8. This is interesting from a practical perspective because no returns are required prior to the event. As far as the specification is concerned, and independently of both the model and the estimation period, the cross-sectional variance adjustment, which accounts for the time-varying variance of the portfolio (t-cross and t-standard cross) matters. The standardization (adjustment for the specific residual variance of the stock) has a minor impact. Even more, as it was the case for the short-term analysis (see Bohemer, Poulsen and Musumeci (1991)), standardization of the abnormal returns alone deteriorates the specification of the test. When both corrections are applied (standardization and cross-sectional adjustment), the empirical specification is close to its theoretical counterpart whatever the model and the estimation period. 8 Our results apparently contradit Kothari and Warner (1997). However, sampling methods (grouping in portfolio instead of single stocks) and standardization do not allowed a direct comparison.

18 The results presented above concerning the benchmark and the estimation period extend to the power of the tests. However, test statistics adjusted for the crosssectional variance are less powerful than the classic t-stat and the standardized t-stat. To conclude, when the abnormal performance is of an unknown form, the t-statistics based on the standardized abnormal returns calculated with the Fama and French (1993) model offers a reasonable solution. C.2. Alternative Test Statistics in Non Random Samples Following Loughran and Ritter (2000), we allow the frequency of events to depend strongly on the market past performance. The purpose is to construct a more realistic sampling design, as many events are driven by past performance. The results concerning the specification of the test statistics are presented in Table V. Insert Table V Strikingly, whenever the event is the consequence of the extreme stock past returns performance, the results concerning the specification are disastrous. We find almost surely an abnormal performance. In fact, the simulation portfolios are momentum type portfolios long (short) in past winners (losers) and short (long) in the market for the Bullish-Winner (Bearish-Loser) sampling, which are known to produce abnormal performance. In this setting, the matching-firm procedure is by far the best solution in order to obtain well-specified test statistics. When the event-frequency is random, the results (see Table VI) are similar to what was found in random samples (see Section III C.1.). Once again, the t-test based on the cross-sectional standardized abnormal returns is well specified. The power of the test statistics presents also similar patterns.

19 Insert Table VI In general, previous findings are confirmed. The power of the test statistics in detecting an abnormal return is higher in bearish market periods and for positive abnormal returns as well. Test statistics calculated with the Fama and French model estimated during the event-period are the most powerful. Finally, the cross-sectional variance adjustment does not produce powerful test statistics. IV. Conclusion The intent of this research was to study the specification and the power of classic test statistics used in long-term event study analysis. Using random samples, we showed that an arbitrary benchmark without any financial content leads to test statistics that are as well specified as the ones based on the size and book-to-market benchmark. However, pre-event abnormal performance has been found which cast some doubt on the reliability of simulations based on pure random samples. When conditioning the samples on the past stock returns performance, our simulations showed that a good matching procedure (a firm with similar returns) leads to well specified tests. Finally, we examined the specification and the power of calendar-time portfolios. When the event-sample is selected randomly, our main results can be summarized as follows. The cross-sectional standardized t-stat has the best specification among the four tests statistics we examined. The period of estimation is not of a major concern. As far as the benchmark is concerned, Carhart model (1997) is slightly more conservative than Fama and French (1993) model. When the frequency of events is conditioned on past market returns performance, the cross-sectional standardized t- statistics remains well specified whenever the stocks are selected randomly.

20 However, when the frequency of events is conditioned on past market returns performance and stocks are selected among the most extreme performers, all the test statistics examined are misspecified and powerless. As underlined by LBT, the analysis of long-run abnormal returns is treacherous as there is no general method, which performs well in the situations frequently encountered in empirical studies. The pattern of abnormal returns during the oneyear period preceding the event has a strong impact on both the specification and the power of test statistics. Thus, it is worth paying attention to the specificity of the sample.

21 References Barber, B. and J. Lyon, 1997. Detecting long-run abnormal stock returns: the empirical power and specification of test-statistics. Journal of Financial Economics 43, 341-372. Boehmer, E., J. Musumeci and A. Poulsen, 1991. Event study methodology under conditions of event-induced variance. Journal of Financial Economics 30, 253-272. Brav, A., C. Geczy and P. Gompers, 2000. Is the abnormal return following equity issuances anomalous?. Journal of Financial Economics 56, 209-250. Brock, W., J. Lakonishok and B. LeBaron, 1992. Simple technical trading rules and the stochastic properties of stock returns. Journal of Finance 47, 1731-1764. Brown, S. and J. Warner,1980. Measuring security price performance. Journal of Financial Economics 8, 205-258. Brown, S. and J. Warner, 1985. Using daily stock returns: the case of event studies. Journal of Financial Economics 14, 3-31. Carhart, M., 1997. On persistence in mutual fund performance. Journal of Finance 52, 57-82. Cooper, M.,1999. Filter rules based on price and volume in individual security overreaction. Review of Financial Studies 12, 901-935. DeBondt, W. and R. Thaler, 1985. Does the stock market overreact?. Journal of Finance 40, 793-805. Eckbo, E., R. Masulis and O. Norli, 2000. Seasoned public offerings : resolution of the new issues puzzle. Journal of Financial Economics 56, 251-292. Fama, E., 1998. Market efficiency, long-term returns, and behavioral finance. Journal of Financial Economics 49, 283-306. Fama, E. and K. French, 1992. The cross-section of expected returns. Journal of Finance 47, 427-465. Fama, E. and K. French, 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33, 3-56. Ferson, W.E., E. Sarkissian and T. Simin, 1999. The alpha factor asset pricing model: a parable. Journal of Financial Markets 2, 49-68. Ferson, W.E. and R.W. Schadt, 1996. Measuring fund strategy and performance in changing economic conditions. Journal of Finance, 425-461. Ibbotson, R., 1975. Price performance of common stock new issues. Journal of Financial Economics 2, 235-272. Ikenberry, D., J. Lakonishok and T. Vermaelen, 1995. Market underreaction to open market share repurchases. Journal of Financial Economics 39, 181-208. Jaffe, J.F., 1974. Special information and insider trading. Journal of Business 47, 410-428. Jegadeesh, N., 2000. Long-term performance of seasoned equity offerings: benchmark errors and biases in expectations. Financial Management 29, 5-30. Kothari, S. and J. Warner, 1997. Measuring long-horizon security price performance. Journal of Financial Economics 43, 301-339. Kothari, S. and J. Warner, 2001. Evaluating mutual fund performance, Journal of Finance, Forthcoming. Loughran, T. and J. Ritter, 1995. The new issues puzzle. Journal of Finance 50, 23-51. Loughran, T. and J. Ritter, 2000. Uniformly least powerful tests of market efficiency. Journal of Financial Economics 55, 361-390. Lyon, J., B. Barber and C. Tsai, 1999. Improved methods for tests of long-run abnormal stock returns. Journal of Finance 54, 165-202. Mandelker, G., 1974. Risk and return: the case of merging firms. Journal of Financial Economics 1, 303-335. Mitchell, M. and E. Stafford, 2000. Managerial decisions and long-term stock price performance. Journal of Business 73, 287-320. Patell, J., 1976. Corporate forecasts of earnings per share and stock price behaviour: empirical tests. Journal of Accounting Research 14, 246-276. Ritter, J., 1991. The long-term performance of initial public offerings. Journal of Finance 46, 3-27. Tashman, L., 2000. Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting 16, 437-450.

22 Table I: The Specification of Alternative Test Statistics Based on a Random Matching in Random Samples In this table the percentage of 1000 samples of 200 firms that reject the null hypothesis of no annual, three-years and five-years abnormal return at the theoretical level of 5 percent are presented. The sample selection is based on a non financial criterion (random matching or next CRSP code). Both the cumulative abnormal returns (CAR) and buy and hold abnormal return (BHAR) are used. Bold italic characters indicate that the empirical rejection rate is different at the 1 percent level from the theoretical rejection rate. Random Matching Next CRSP Code Matching Horizon 1 year 3 years 5 years 1 year 3 years 5 years CAR Matching Firm t-stat 5.4 6.4 7.8 5.3 6.6 7.7 t-stat bootstrap 5.3 6.5 7.9 5.4 6.4 7.7 t-skew 5.5 6.6 8.1 5.4 6.8 8.0 t-skew bootstrap 5.0 6.4 7.8 5.2 6.1 7.4 CAR Control Portfolio t-stat 5.4 5.9 6.1 5.5 5.3 5.5 t-stat bootstrap 5.8 5.5 5.9 5.6 5.0 5.5 t-skew 5.6 5.7 6.3 5.8 5.6 5.6 t-skew bootstrap 5.7 5.4 5.8 5.2 4.8 5.4 BHAR Matching Firm t-stat 5.0 4.4 4.8 6.2 4.6 4.6 t-stat bootstrap 6.1 6.5 7.8 7.6 6.2 7.0 t-skew 6.1 7.2 8.5 7.1 6.3 7.6 t-skew bootstrap 5.8 5.9 7.7 6.9 6.1 6.8 BHAR Control Portfolio t-stat 5.9 5.2 5.3 6.1 4.5 4.3 t-stat bootstrap 6.0 6.2 6.8 6.3 5.9 5.4 t-skew 5.6 6.2 7.3 5.9 6.3 6.0 t-skew bootstrap 5.6 4.9 5.2 5.8 5.3 5.3

23 Table II-A: The Power of Test Statistics Based on a Random Matching in Random Samples, one-year horizon In this table, we present the percentage of 1000 samples of 200 firms drawn randomly that reject the null hypothesis of no annual (Panel A), three-years (Panel B) and five-years (Panel C) abnormal return for various levels of abnormal returns and horizons. The sample selection is based on a non financial criterion (random matching or next CRSP code). Both the cumulative abnormal returns (CAR) and buy and hold abnormal return (BHAR) are used. Random Matching Next CRSP Code Matching Increment -20 % -10 % 10 % 20 % -20 % -10 % 10 % 20 % CAR Matching Firm t-stat 100.0 75.0 81.4 100.0 100.0 77.9 84.6 100.0 t-stat bootstrap 99.9 77.3 78.2 99.8 100.0 75.3 80.6 100.0 t-skew 100.0 73.7 81.5 99.9 100.0 77.6 84.1 100.0 t-skew bootstrap 99.5 74.8 75.3 99.8 99.8 73.0 78.5 99.7 CAR Control Portfolio t-stat 100.0 96.3 97.2 100.0 100.0 97.2 97.4 100.0 t-stat bootstrap 100.0 93.4 97.1 100.0 100.0 94.3 96.4 100.0 t-skew 99.9 95.2 97.0 100.0 99.8 96.3 97.5 100.0 t-skew bootstrap 99.3 89.7 96.4 100.0 99.3 91.2 94.8 100.0 BHAR Matching Firm t-stat 98.0 58.6 63.4 98.2 98.9 59.3 65.3 98.9 t-stat bootstrap 98.3 60.5 59.0 97.1 98.1 60.5 62.3 97.4 t-skew 96.6 58.5 63.3 96.7 97.1 58.0 65.1 97.5 t-skew bootstrap 95.2 56.9 53.6 94.1 94.6 56.0 57.3 94.7 BHAR Control Portfolio t-stat 99.2 82.5 91.6 100.0 99.4 83.1 93.3 100.0 t-stat bootstrap 98.1 71.2 96.0 100.0 98.2 73.9 94.8 100.0 t-skew 94.6 73.7 94.4 100.0 94.1 74.0 95.4 100.0 t-skew bootstrap 88.8 63.3 95.4 100.0 88.8 66.2 94.2 100.0

24 Table II-B (continue): The Power of Test Statistics Based on a Random Matching in Random Samples, three-years horizon Random Matching Next CRSP Code Matching Increment -50 % -20 % 20 % 50 % -50 % -20 % 20 % 50 % CAR Matching Firm t-stat 100.0 79.6 94.7 100.0 100.0 81.6 95.3 100.0 t-stat bootstrap 100.0 84.4 91.3 100.0 100.0 86.7 91.8 100.0 t-skew 100.0 79.5 94.7 100.0 100.0 81.4 95.2 100.0 t-skew bootstrap 100.0 83.2 88.9 100.0 100.0 84.6 89.6 100.0 CAR Control Portfolio t-stat 100.0 99.2 99.7 100.0 100.0 99.2 99.4 100.0 t-stat bootstrap 100.0 98.6 99.6 100.0 100.0 99.0 99.4 100.0 t-skew 99.9 99.0 99.7 100.0 99.9 98.8 99.3 100.0 t-skew bootstrap 99.9 97.6 99.4 100.0 99.9 97.4 98.7 100.0 BHAR Matching Firm t-stat 96.5 43.2 50.0 97.8 97.7 42.0 51.6 98.6 t-stat bootstrap 97.5 45.8 46.0 97.1 97.5 45.3 48.4 97.5 t-skew 93.3 44.4 50.2 94.7 94.9 42.3 53.0 95.4 t-skew bootstrap 92.4 42.3 42.7 91.6 92.9 40.9 44.5 92.2 BHAR Control Portfolio t-stat 98.0 58.1 92.3 100.0 98.2 62.5 92.7 100.0 t-stat bootstrap 96.9 47.6 94.9 100.0 97.5 50.5 94.0 100.0 t-skew 84.4 45.8 95.7 100.0 84.9 49.9 95.1 100.0 t-skew bootstrap 78.1 42.1 94.7 100.0 78.9 44.5 94.8 100.0

25 Tableau II-C (continue) : The Power of Test Statistics Based on a Random Matching in Random Samples, five-years horizon Random Matching Next CRSP Code Matching Increment -50 % -20 % 20 % 50 % -50 % -20 % 20 % 50 % CAR Matching Firm t-stat 100.0 56.4 89.4 100.0 100.0 58.3 91.3 100.0 t-stat bootstrap 100.0 65.4 81.0 100.0 100.0 64.0 82.9 100.0 t-skew 100.0 56.0 89.3 100.0 100.0 57.7 91.1 100.0 t-skew bootstrap 100.0 64.5 78.6 100.0 100.0 63.3 80.2 100.0 CAR Control Portfolio t-stat 100.0 96.5 95.5 100.0 100.0 96.2 96.1 100.0 t-stat bootstrap 100.0 92.2 95.5 100.0 100.0 94.6 95.7 100.0 t-skew 100.0 95.3 95.4 100.0 100.0 95.2 96.2 100.0 t-skew bootstrap 100.0 90.0 94.5 100.0 99.8 92.1 94.1 100.0 BHAR Matching Firm t-stat 72.5 19.7 25.3 74.4 75.9 19.1 22.8 78.4 t-stat bootstrap 77.2 22.2 21.8 74.8 76.0 23.5 22.1 75.4 t-skew 71.2 24.5 28.8 69.9 72.4 23.3 26.2 74.8 t-skew bootstrap 69.7 21.7 20.8 65.3 67.9 21.7 21.4 67.5 BHAR Control Portfolio t-stat 76.3 16.9 74.2 100.0 77.9 18.3 75.9 99.9 t-stat bootstrap 72.0 14.7 73.9 99.9 75.5 16.9 76.0 100.0 t-skew 59.0 11.8 85.3 100.0 62.9 13.7 85.0 100.0 t-skew bootstrap 62.1 13.5 75.9 100.0 62.4 14.6 77.9 100.0

26 Figure 1: Power of the Bootstrapped Student t-test for the BHAR Strategy with a Random Matching-Firms This Figure presents the empirical percentage that reject the null hypothesis of no annual, three-years and five-years buy and hold abnormal returns (BHAR) for various levels of incremental abnormal returns (x-axis) and horizons. The sample selection consists of 1000 samples of 200 random firms whose matching is based on a non financial criterion (the matching-firm is drawn randomly). The results are for the bootstrapped t-test adjusted for skewness 100 power 90 80 70 60 50 1 year 3 years 5 years 40 30 20 10 Abnormal Returns 0-50 -40-30 -20-10 0 10 20 30 40 50

27 Table III: Specification of Bootstrapped-adjusted Test Statsitics in Momentum Based Samples In this table the percentage of 1000 samples of 200 firms that reject the null hypothesis of no annual, three-years and five-years abnormal return at the theoretical level of 2.5 percent and 97.5 percent are presented. The sample selection is based the twelve-months past performance of frims which are classified into quintile. The firms are randomly drawn from the population (random), the first quintile (high performance) and the fifth quintile (low performance). The buy and hold abnormal return (BHAR) is used in order to estimate the abnormal performance. Test statistics are bootstrapped statistics (matching-firm) adjusted for skewness (control portfolio). Bold italic characters indicate that the empirical rejection rate is different at the 1 percent level from the theoretical rejection rate. Horizon 1 year 3 years 5 years Control 2.5% 97.5% 2.5% 97.5% 2.5% 97.5% Sample with Past High Performance Matching-firm Random 0.3 8.1 2.4 3.3 6.5 1.3 Control Portfolio Random 0.2 8.0 5.8 0.8 23.1 0.2 Matching-firm High performance 2.3 2.7 2.6 2.5 2.8 2.4 Control Portfolio High performance 2.1 2.8 2.4 2.7 2.8 2.7 Matching-firm Low performance 0.0 35.3 1.2 8.3 9.8 2.0 Control Portfolio Low performance 0.0 50.6 4.2 1.9 23.0 0.0 Sample with Past Low Performance Matching-firm Random 12.9 0.5 2.1 2.4 0.4 5.8 Control Portfolio Random 13.6 0.1 0.8 11.2 0.0 45.1 Matching-firm High performance 28.3 0.0 3.4 1.8 0.5 9.6 Control Portfolio High performance 30.8 0.0 2.1 6.8 0.0 30.2 Matching-firm Low performance 2.0 2.8 2.2 2.7 2.3 2.6 Control Portfolio Low performance 2.5 2.1 2.7 2.8 2.8 2.5

28 Table IV: Specification and Power of Test Statistics with Calendar Portfolios in Random Samples In this table, we present the empirical rejection rate of the null hypothesis (no abnormal returns) with increments ranging from 30 percent to 30 percent. It is calculated over 1000 samples of 200 firms drawn randomly. The firms are hold a five-years period and aggregate into 1000 equally-weighted portfolios. The following regressions are estimated α ( ) ( ) R R = + β R R + s SMB + h HML + ε and R R = α + β R R + s SMB + h HML + m PR 12 + ε it ft i i mt ft i t i t it it ft i i mt ft i t i t i i it where is the monthly return on the calendar-time portfolio, is the return on the three-month Treasury bills, is the return of the market portfolio (CRSP R it ft R R mt value-weighted index), SMB t is the return of the size portfolio, HML t is the return of the book-to-market portfolio and where PR12 t are the monthly returns of portfolio M 12. The estimatation period is the five-years event period, the five-years period before the event, and an expending period beginning five years before the vent. The series αi + εitand its conditional variance are used to calculate the statistics. Bold italic characters indicate that the empirical rejection rate is different at the 1 percent level from the theoretical rejection rate. Increment -30% -20% -10% 0% 10% 20% 30% Panel A: Fama and French Model Estimation over the event-period t-stat 99.0 85.8 33.2 3.7 36.8 89.1 99.7 t-cross 88.5 74.8 32.6 5.3 17.1 57.5 81.8 t-standard 100.0 93.3 33.2 4.4 60.7 98.5 99.9 t-standard cross 88.4 72.3 22.2 4.2 36.6 75.4 86.9 Estimation before the event-period t-stat 99.1 83.1 30.3 2.7 26.6 84.0 99.1 t-cross 90.2 73.6 34.1 5.9 10.1 46.4 74.8 t-standard 99.7 88.7 30.0 2.7 36.7 92.7 99.7 t-standard cross 88.5 74.4 31.8 4.4 17.5 60.9 81.0 Estimation with an expending sample period t-stat 99.3 85.5 31.3 3.0 27.4 83.7 99.1 t-cross 88.9 74.7 35.0 6.5 10.3 47.2 76.2 t-standard 99.5 88.5 28.6 2.3 44.3 95.8 99.4 t-standard cross 88.8 72.5 28.6 3.6 23.4 67.4 84.6

29 Table IV (continue): Specification and Power of Test Statistics with Calendar Portfolios in Random Samples Increment -30% -20% -10% 0% 10% 20% 30% Panel B: Carhart Model Estimation over the event-period t-stat 98.6 81.7 21.0 6.3 51.4 94.3 100.0 t-cross 86.7 66.2 24.7 4.5 26.1 66.1 82.9 t-standard 99.9 91.5 25.7 7.8 68.2 99.3 100.0 t-standard cross 87.8 66.7 17.3 6.5 44.9 78.7 88.7 Estimation before the Event-period t-stat 95.8 64.6 12.3 6.6 47.6 92.8 99.8 t-cross 81.8 58.4 19.7 4.8 22.7 59.8 81.2 t-standard 98.5 73.5 14.0 8.4 59.3 98.0 99.9 t-standard cross 83.7 61.3 16.1 5.2 33.1 70.9 85.3 Estimation with an Expending Sample-period t-stat 97.4 69.4 11.4 5.8 48.2 93.5 100.0 t-cross 83.5 59.6 20.2 3.7 22.7 60.5 81.2 t-standard 98.7 75.3 10.8 9.7 65.7 98.8 100.0 t-standard cross 83.2 58.2 12.9 5.4 38.4 76.6 87.8

30 Table V: Specification of Alternative Test Statistics Depending on the Frequency of the Event with Calendar Portfolios In this table, we present the empirical rejection rate of the null hypothesis (no abnormal returns). It is calculated over 1000 samples of firms which number is determined according to the twelve-months past market returns. The frequency is high when market returns are extreme. The firms are drawn randomly over the initial population (random Bearish and Random Bullish) and over the stocks which experienced an extreme past performance (Bullish Winner and Bearish Loser). The firms are hold a five-years period and aggregate into 1000 equally-weighted portfolios. The following regressions are estimated ( ) ( ) R R = α + β R R + s SMB + h HML + ε and R R = α + β R R + s SMB + h HML + m PR 12 + ε it ft i i mt ft i t i t it it ft i i mt ft i t i t i i it where is the monthly return on the calendar-time portfolio, is the return on the three-month Treasury bills, is the return of the market portfolio (CRSP R it ft R R mt value-weighted index), SMB t is the return of the size portfolio, HML t is the return of the book-to-market portfolio and where PR12 t are the monthly returns of portfolio M 12. The estimatation period is the five-years event period, the five-years period before the event, and an expending period beginning five years before the vent. The series αi + εitand its conditional variance are used to calculate the statistics. Bold italic characters indicate that the empirical rejection rate is different at the 1 percent level from the theoretical rejection rate. Model Fama and French (1993) Carhart (1997) Market Bullish Bearish Bullish Bearish Bullish Bearish Bullish Bearish Random Random Winner Loser Random Random Winner Loser Estimation over the Event-Period t-stat 5.8 3.3 100.0 100.0 4.7 8.1 99.9 100.0 t-cross 6.0 1.4 85.3 93.3 2.1 3.8 85.3 91.1 t-standard 6.8 6.2 100.0 100.0 6.7 12.7 100.0 100.0 t-standard cross 4.7 4.7 87.1 93.4 4.0 6.8 85.8 91.6 Estimation before the Event-period t-stat 4.0 1.3 99.4 100.0 4.0 8.0 99.9 100.0 t-cross 7.4 0.9 74.4 91.9 3.0 3.6 79.8 92.1 t-standard 3.6 1.2 99.7 100.0 6.0 9.2 100.0 100.0 t-standard cross 4.6 0.9 76.2 92.6 4.3 5.4 80.6 91.4 Estimation with an Expending Sample-period t-stat 4.2 1.9 99.9 100.0 4.1 7.7 99.9 100.0 t-cross 6.6 1.5 81.0 93.6 1.8 3.6 80.3 92.8 t-standard 3.6 2.3 100.0 100.0 7.3 14.1 100.0 100.0 t-standard cross 3.9 1.2 84.5 95.0 4.4 6.0 84.0 93.3