Consumption Fluctuations and Expected Returns

Consumption Fluctuations and Expected Returns Victoria Atanasov, Stig Vinther Møller, and Richard Priestley Abstract This paper introduces a new consumption-based variable, cyclical consumption, and examines its predictive properties for excess stock returns. Future expected stock returns are high when aggregate consumption falls relative to its past values and cyclical consumption is low. This empirical evidence ties consumption decisions of agents to time-variation in expected excess returns in a manner consistent with rational asset pricing. The predictive power of cyclical consumption is not confined to bad times and subsumes the predictability of almost all popular forecasting variables which track economic recessions. Our results appear compatible with prominent explanations of asset prices based on time-varying risk premia such as models with habit formation mechanisms. JEL Classification: G10, G12, G17. Keywords: cyclical consumption fluctuations, time-varying expected stock returns, predictability. Chair of Finance, University of Mannheim, L9 1-2, 68161 Mannheim, Germany. Email: atanasov@uni-mannheim.de, phone: +49 621 181 2984 CREATES, Aarhus University, Fuglesangs allé 4, DK-8210 Aarhus V., Denmark. Email: svm@econ.au.dk, phone: +45 87164825. Department of Finance, BI Norwegian Business School, Nydalsveien 37, N 0444 Oslo, Norway. Email: richard.priestley@bi.no, phone: +47 464 10 515

1 Introduction Stock return predictability has been linked to the identification of good and bad times in the economy that might capture variation in investors preferences or the quantity of risk. Cochrane (2017) asks "just what are the bad times...in which investors are particularly anxious that their stocks do not fall? Well something about recessions is an obvious candidate." and further "But what is the feared event exactly? How do we measure the event?". Measuring recessions in the economy has been at the forefront of identifying bad times. 1 Perhaps the often reported diffi culty in finding consistent results regarding predictability (see, for example, Welch and Goyal (2008)) is due to the weakness we have in identifying bad times either because it is diffi cult to measure recessions empirically or because bad times are better described another way. In this paper, we take a novel and more direct approach to linking stock return predictability to both bad and good economic times. This is the first paper that ties stock return predictability directly to fluctuations in consumption and therefore links stock return predictability to investors consumption decisions in a manner consistent with rational asset pricing. In good times, when aggregate consumption increases relative to its past history, cyclical consumption is high and marginal utility of consumption is low. Hence, investors are willing to give up current consumption and invest which in turn forces stock prices to increase and future expected returns to decrease. Conversely, in bad times, when aggregate consumption declines relative to its past values, cyclical consumption is low and marginal utility of consumption is high. Thus, in order to induce investors to postpone the valuable current consumption and invest, expected returns in the future need to be high. It is our conjecture that cyclical fluctuations in consumption 1 For example, the predictive ability of interest rates and term and default spreads (Fama and Schwert (1977), Fama (1981), Keim and Stambaugh (1986), Fama and French (1989), and Ang and Bekaert (2007)), dividend price ratios (Campbell and Shiller (1988) and Fama and French (1988)), and the output gap (Cooper and Priestley (2009)) has been advocated on grounds that these variables follow business cycle patterns (recessions) that might track time-variation in expected returns. 2

should be useful in picking out bad and good times in the economy as measured from a representative agent s point of view and therefore informative about future excess stock returns. The empirical results we present in this paper confirm the idea that future expected returns are high (low) when consumption is falling below (rising above) its trend and cyclical consumption is low (high). These cyclical consumption fluctuations, which we intermittently refer to as cc, capture a significant fraction of variation in future stock market returns both in-sample and out-of-sample. These results are important because they emphasize a strong relation between expected return variation and the macroeconomy suggesting that asset prices are driven by primitive shocks and not other asset prices as is the case when using different measures of interest rate spreads, earnings and dividend yields or other price-to-fundamentals ratios. We also show that the predictive power of cyclical consumption is not confined to bad times alone. Cyclical consumption provides a consistent description of how positive and negative macroeconomic events, reflected through consumption decisions of investors, affect stock market returns. These results are notable because they stand in stark contrast to Rapach, Strauss, and Zhou (2010), Henkel, Martin, and Nardari (2011), Dangl and Halling (2012) and Golez and Koudijs (2018) who find that popular predictor variables can only forecast stock returns in bad times, whereas there is essentially no evidence of predictability in good times, that is, during business cycle expansions. Our findings appear to support standard consumption-based explanations of asset prices which generate time-variation in risk premia over good and bad times. One prominent example is the habit formation model of Campbell and Cochrane (1999) in which a slow-moving external habit acts as a trend for consumption. In the model, the dynamics of expected stock returns are driven by shocks to the current level of consumption that move consumption in relation to its past realizations. Simply speaking, the argument is 3

as follows: as consumption declines relative to its trend in a recession, the risk of falling short of the minimum level of consumption increases, people become more risk-averse, stock prices fall, and future expected returns rise. The time-variation in the equity premium is, therefore, linked to investors preferences which change over good and bad times and reflects changes in lagged consumption. We perform a battery of robustness checks and address a number of econometric concerns surrounding predictive regressions with persistent predictors (Nelson and Kim (1993) and Stambaugh (1999)). Both a novel IVX testing approach of Kostakis, Magdalinos, and Stamatogiannis (2015) that robustifies the inference to the degree of regressor s persistence, and an advanced bootstrap procedure that accounts for the regressor s timeseries properties indicate strong evidence of predictability at the one-quarter horizon which extends to horizons of about five years. The predictability does not vanish during the recent post-oil-crisis period in which standard popular business cycle indicators have proven dismal as predictive variables (Welch and Goyal (2008)). In addition, we show that the forecasting power of cyclical consumption fluctuations is not confined to the aggregate U.S. stock market. Robust patterns of predictability exist across decile portfolios sorted on industry and across a wide spectrum of various financial characteristics including size, several price-to-fundamentals ratios, momentum and reversal, amongst others. To extract the cyclical component of consumption, we employ a simple and robust linear projection method of Hamilton (2017). This procedure is advantageous over other detrending methods in two important respects. First, it ensures that the identified cyclical component is consistently estimated for a wide range of nonstationary processes. Second, it produces a series which is accurately related to the underlying economic fluctuations as opposed to, for instance, the popular Hodrick and Prescott (1997) filter which can spuriously generate dynamic relations. This feature is particularly appealing because it 4

implies that the predictive ability of cyclical consumption reflects the true predictability rather than a statistical artifact (Hamilton (2017)). We explore a variety of alternative specifications and utilize other econometric procedures to isolate cyclical variation in consumption such as polynomial time trends and backward-looking moving averages, and generally find similar conclusions. Our choice of the detrending procedure of Hamilton (2017) as a benchmark specification provides a conservative and robust view of return predictability. Generally, there is not much evidence in the extant literature in favor of returns being predictable from aggregate consumption measures. Perhaps the most prominent consumption-based predictive variable is Lettau and Ludvigson s (2001) consumptionwealth ratio. If valid, a log-linearized approximation to an aggregate budget constraint implies that an empirical analogue to the consumption-wealth ratio can be obtained as a residual from a cointegrating relation between consumption, financial asset wealth, and labor income. 2 We find that cyclical consumption contains predictive information which goes clearly over and above that of many well-recognized variables that track economic recessions, including the consumption-wealth ratio of Lettau and Ludvigson (2001), the ratio of labor income to consumption of Santos and Veronesi (2006), and conditional volatility of consumption of Bansal, Khatchatrian, and Yaron (2005). We consider nineteen alternative popular recession-based economic variables and find that none of them can systematically generate a better out-of-sample forecast than cyclical consumption. The paper proceeds as follows. Section 2 explains how we construct cyclical consumption. Section 3 presents the benchmark results from our predictive analysis. A number of robustness tests are summarized in Section 4. Section 5 compares the out-of-sample fore- 2 Byrne and Davis (2003) and Rudd and Whelan (2006) cast doubts about the precision of this approximation and the out-of-sample properties of the consumption-wealth ratio and question its robustness to the use of theoretically consistent aggregate data. 5

casting ability of alternative predictor variables to that of cyclical consumption. Section 6 concludes. 2 Extracting cyclical consumption As our primary measure of consumption, we use aggregate seasonally adjusted consumption expenditures on nondurables and services from the National Income and Product Accounts (NIPA) Table 7.1 constructed by the Bureau of Economic Analysis (BEA) in the Department of Commerce of the United States. The data are quarterly, in real per capita terms, measured in 2009 chain weighted dollars, and span the period from the first quarter of 1947 to the fourth quarter of 2017. To extract the cyclical component of consumption, we employ a simple and robust linear projection method of Hamilton (2017) and regress the log of real aggregate consumption series, c t, on a constant and four lagged values of c as of date t k: c t = b 0 + b 1 c t k + b 2 c t k 1 + b 3 c t k 2 + b 4 c t k 3 + e t, (1) where the residual measures cyclical consumption, cc: cc t = c t b 0 b 1 c t k b 2 c t k 1 b 3 c t k 2 b 4 c t k 3. (2) Hamilton (2017) notes that this procedure has several advantageous features. Under mild assumptions, it ensures that the identified cyclical component is consistently estimated for a wide range of unknown and possibly nonstationary processes. 3 Hence, it provides 3 Hamilton (2017) provides a formal proof for the following statement. Consider a variable y t for which a researcher seeks to remove the nonstationary component without modeling the nonstationarity. The decomposition in Equation (1) will imply a stationarity process e t, if either the dth difference of y t or the deviation of y t from a dth-order deterministic time polynomial is stationary for some d as sample size becomes large. 6

a reasonable model-free way to construct a time-series which is accurately related to the actual economic fluctuations as opposed to, for instance, the Hodrick and Prescott (1997) filter which can spuriously generate series with dynamics that have no relation to the underlying data-generating process. Moreover, by virtue of the fact that ĉc t is a one-sided filter, any finding that ĉc t can predict future observations of some other variable should represent a true predictive ability of cc rather than an artifact of a choice of a detrending method. In this respect, Hamilton (2017) argues that in contrast to the HP cyclical series which is readily forecastable from its own lagged values and likewise past values of other variables, the realizations of ê t will by construction be diffi cult to predict. 4 An empirical implementation of Equation (1) requires a choice of k. Given that our goal is to capture a slowly moving but transient variation in the risk premium, we follow the recommendation of Hamilton (2017) and compute cc using a value of six years (k = 24). 5 Figure 1 shows a plot of cc along with recession dates as defined by the NBER. Cyclical consumption has an unconditional mean of zero by construction, a standard deviation of 3.74%, and a first order autocorrelation of 0.97 implying a half-life of slightly over six years, which implies highly persistent expected returns in the return forecasting regressions as emphasized by Campbell and Cochrane (1999), Pastor and Stambaugh (2009), and van Binsbergen and Koijen (2010). 6 The figure illustrates that cc exhibits significant fluctuations in the postwar period: It typically reaches its highest values some time before the onset of recessions, and falls throughout economic contractions. Our contention is that these fluctuations in cyclical consumption constitute a more accurate description of good and bad economic times than previously employed recession based 4 For practical purposes, Hamilton (2017) examines the properties of this linear projection procedure in some common settings such as a random walk or a pure deterministic time trend. 5 We investigated the forecasting power of cyclical consumption for future stock returns at various horizons k ranging from one quarter up to ten years (k = 1, 2,..., 40) and found that the results are generally robust toward other choices of k in the interval of five to ten years. 6 For comparison, Lettau and Ludvigson (2013) identify a risk aversion shock with a half-life of over four years. 7

variables. 3 Predictive regression analysis We investigate the forecasting ability of cyclical consumption for stock returns on the S&P 500 index and the Center for Research in Security Prices (CRSP) value-weighted index of U.S. stocks listed on the NYSE, AMEX, or NASDAQ. Excess returns are computed by subtracting the return on the 30-day Treasury-bill rate from the actual stock return. To calculate real returns, we deflate nominal returns with the U.S. inflation rate, measured by the growth rate of the aggregate consumer price index (CPI) from the Bureau of Labor Statistics. For the in-sample analysis, we use the most recently available finally revised consumption figures and full-sample parameter estimates in Equation (1). For out-ofsample tests, we use real-time consumption vintages and ensure that the estimate of cc at time t is based on data and parameter estimates which were available to the investor at time t. 7 3.1 Basic return predictive regressions We test the ability of cyclical consumption to capture time-variation in expected returns by estimating the following predictive regression model: r t+h = α + βcc t 1 + ε t+h, (3) where h denotes the horizon in quarters, r t+h is h-quarter log stock market return, and cc t 1 is two-quarter lagged cyclical consumption. We include a second lag of cc in the 7 We also consider a scenario when the predictive regression is estimated recursively, but cc is computed over the full sample. This estimation procedure trades-off effi ciency gains against the "look-ahead-bias" (Lettau and Ludvigson (2001) and Welch and Goyal (2008)). 8

regression to account for delays in macroeconomic releases. 8 To test the significance of β in Equation (3), we use the Newey and West (1987) heteroskedasticity- and autocorrelationrobust t-statistic (truncated at lag h; our results are robust towards other choices of truncation lags). Table 1 reports our benchmark findings. It shows the OLS estimates of β, the corresponding t-statistics (in parentheses), and the adjusted R 2 s, R2, (in square brackets) from simple forecasting regressions of log stock returns on the two-quarter lagged values of cyclical consumption, cc. When considering the results for excess returns in Panel A of Table 1, we find that the sign of the estimated coeffi cient on cc is significantly negative at standard levels of significance across all values of h. Thus, expected returns are low when cyclical consumption is high in economic upswings, and expected returns are high when cyclical consumption is low in economic downturns. This result is consistent with investors responding rationally to countercyclical variation in the price of consumption risk over time: A fall in consumption relative to its past history indicates bad times of high marginal utility of consumption and high future expected returns. The predictive impact of cyclical consumption is economically large. In particular, the point estimate of β in the quarterly regression on the S&P 500 index (first row, first column in Table 1) is -1.6 in annual terms. This figure implies that a fall in cc by one standard deviation below its mean leads to a rise in the expected return of about 6 percentage points at an annual rate. The estimate of the coeffi cient is strongly statistically significant and the associated R 2 is 3.07%. The R 2 statistics tend to increase with the horizon. The results for real and actual returns and for the broader CRSP value-weighted index are qualitatively similar. A general concern with predictability regressions is that their reliability can be un- 8 The U.S. Bureau of Economic Analysis (BEA) at the Department of Commerce typically releases the "advance", "second", and "third" estimates of NIPA consumption expenditure for quarter t near the end of the first, second, and third months of quarter t + 1, respectively. Annual revisions, which generally cover the quarters of the three most recent calendar years, are usually carried out each summer and incorporate newly available major annual source data. Comprehensive revisions are carried out at about five-year intervals and incorporate major periodic source data and changes in concepts and methods. 9

dermined by the uncertainty regarding the order of integration of the predictor variable. Statistical inference can be unreliable when the predictor variable is persistent and its innovations are highly correlated with returns (Nelson and Kim (1993) and Stambaugh (1999)). Modelling the predictive variables as local-to-unity processes can lead to invalid inference if the regressor contains stationary or near-stationary components (Valkanov (2003), Lewellen (2004), Campbell and Yogo (2006), and Hjalmarsson (2011)). We address these econometric concerns in two ways. First, we compute empirical p-values for the slope estimates from a wild bootstrap procedure that accounts for the persistence in regressors and correlations between equity stock return and predictor innovations, and allows for general forms of heteroskedasticity. This simulation produces an empirical distribution that better approximates the finite sample distribution of the slope estimates in Equation (3). 9 Second, we employ a novel IVX testing approach of Kostakis, Magdalinos, and Stamatogiannis (2015) that is robust to the regressor s degree of persistence (including unit root, local-to-unit root, near-stationary or stationary persistence classes) and has good size and power properties. This approach alleviates practical concerns about the quality of inference under possible misspecification of the (generally unobservable) time-series properties of the regressor in long-horizon predictive regressions. Table 2 reports the results using their IVX estimator to test the significance of β. We find that the null hypothesis of no predictability can be usually rejected at the 1% level. In summary, we show that stock returns are predictable by cyclical consumption fluctuations at various business cycle horizons over the 1954-2017 period. Expected returns are high when consumption falls relative to its past history and marginal utility of consumption increases. In bad times as measured by high marginal utility from current 9 In fact, since cc is a purely macroeconomic variable, its innovations have lower correlation with the innovations in returns, which almost eliminates the small sample bias. For more powerful tests, we follow the recommendation of Inoue and Kilian (2004) and calculate p-values for a one-sided alternative hypothesis (see also, Neely, Rapach, Tu, and Zhou (2014) and Rapach, Ringgenberg, and Zhou (2016)). We summarize the details of the bootstrap algorithm in the appendix. 10

consumption, investors want to consume more and require a higher expected premium as a compensation for bearing risk. These findings constitute new evidence of time-varying risk premia which ties stock return predictability directly to fluctuations in consumption as in investors first order conditions in the classical CCAPM of Lucas (1978) and Breeden (1979). 3.2 Predicting stock returns in good and bad times Several popular predictor variables are able to predict returns in bad times as defined by recessions but not in good times, that is, during business cycle expansions (Rapach, Strauss, and Zhou (2010), Henkel, Martin, and Nardari (2011), Dangl and Halling (2012), and Golez and Koudijs (2018)). This finding is a concern for standard asset pricing models that emphasize the impact of either time variation in risk aversion or the quantity of risk on asset prices during both good and bad times. In our sample, for instance, bad times, as measured by NBER dated recessions, account for less than 15% of all observations. 10 In light of this, Cujean and Hasler (2017) develop a theoretical mechanism with heterogeneous agents that causes stock return predictability to concentrate in bad times. Several other studies emphasize the usefulness of financial institutions and intermediation coupled with frictions and market segmentation since the 2007-2009 sub-prime financial crisis for rationalizing stock market behavior and capturing a propagation of a shock in bad times as opposed to normal and good times. To examine whether the relationship between future returns and cyclical consumption is only present in bad economic times, we estimate a linear two-state predictive regression model similar in a spirit to Boyd, Hu, and Jagannathan (2005): r t+h = α + β bad I bad cc t + β good (1 I bad ) cc t + ε t+h, (4) 10 Over the 1954-2017 sample period, the NBER s Business Cycle Dating Committee identifies 222 quarters as expansions and the remaining 34 quarters as recessions (contractions). 11

where I bad denotes the state indicator that equals one during recessions and zero otherwise, β bad and β good are the slope coeffi cients which measure the return predictability in bad and good times, respectively, and cc t is one-quarter lagged cyclical consumption. We follow Henkel, Martin, and Nardari (2011) and use the NBER-dated chronology of expansions and recessions to identify good and bad times ex post. For consistency with our previous analysis, Table 3 presents the results for excess, real, and actual returns. We generally find robust predictability patterns across different horizons. Similar to Henkel, Martin, and Nardari (2011) and Dangl and Halling (2012), our estimates suggest that the expected returns adjust more during recessions than during expansions. However, in contrast to these studies, we find that the estimated slope coeffi cient is negative and statistically significant in good and bad times. For example, at a horizon of one quarter, the coeffi cient estimates and t-statistics in Panel A of Table 3 are -0.72 and -1.64 in bad times, and -0.37 and -2.75 in good times, with bootstrap p-values indicating statistical significance at the 10% and 1% levels, respectively. To understand these units, note that a one-standard-deviation fall in cc during recessions leads approximately to a 250 basis points rise in the expected excess return, roughly a 10% increase at an annual rate, while a corresponding change in annual returns during expansions amounts to less than 6%. These figures entail an average response of expected returns of about 6.2 percentage points per year, which is very close to our benchmark estimates reported in Panel A of Table 1. We find broadly similar evidence for real and actual returns in Panels B and C of Table 3 with R 2 statistics varying from around 2-3% at a quarterly horizon to levels of 20-30% at horizons of four to five years. Hence, in contrast to Henkel, Martin, and Nardari (2011) and Dangl and Halling (2012), these results highlight that stock returns are generally predictable from macroeconomic fluctuations during both good and bad times as measured by economic 12

expansions and recessions. 11 3.3 Alternative detrending methods Since there is no a priori theoretical guideline regarding a choice of an appropriate econometric procedure to isolate a cyclical component of consumption, it is instructive to compare the predictive ability of cc with other empirical measures of cyclical consumption. In the following, we consider five such definitions. First, we follow a voluminous literature in macroeconomics and finance and assume a secular linear upward trend in consumption: c t = b 0 + b 1 t + e t, (5) where the residual measures the cyclical consumption, cc. A second technique extends a linear trend formulation to allow for a breakpoint and hence makes it possible to account for a well-known fall in the macroeconomic risk, or the volatility of the aggregate economy, at the beginning of the 1990s 12 : c t = b 0 + b 1 t + e t for t t 1, c t = b 0 + b 1 t + b 2 (t t 1 ) + e t for t > t 1, (6) where the breakpoint t 1 corresponds to the first quarter of 1992 (see also, Lettau, Ludvigson, and Wachter (2008)). Essentially, Equation (6) presents a piecewise OLS regression 11 When applying regression (4) to the nineteen popular alternative economic variables that we study in Section 5, we find that only two display significance in good and bad times, namely, the investmentto-capital ratio of Cochrane (1991) and the output gap of Cooper and Priestley (2009). The predictive power of the remaining seventeen variables widely used in the literature is generally unstable and often vanishes entirely in the post-1980 period. 12 An extensive body of the macroeconomic literature finds evidence of a regime shift to lower volatility of real macroeconomic activity occurring in the last two decades of the 20th century (see e.g. McConnell and Perez-Quiros (2000) and Stock and Watson (2002)). 13

which fits two separate lines to the disconnected data around the break date. Next, we allow for higher order time polynomials such as a quadratic time trend model which conveniently accounts for slowly changing trends by establishing a quadratic exposure estimate b 2 that can intensify or diminish the linear time trend: c t = b 0 + b 1 t + b 2 t 2 + e t, (7) and a corresponding cubic representation: c t = b 0 + b 1 t + b 2 t 2 + b 3 t 3 + e t. (8) Finally, we follow Campbell (1991) and Hodrick (1992) and calculate a "stochastically detrended" consumption series as a backward-looking moving average based on a five-year window, where cc in quarter t is equal to the difference between the natural logarithm of consumption in quarter t and the average of the natural logarithm of consumption in quarters t-20 to t-1. 13 The six measures of cyclical consumption that we identify display cross-correlations of 0.39 to 0.92. Table 4 reports estimation results for the predictive regression in Equation (3) based on alternative measures of cc. Cyclical consumption displays stable and robust predictive power. Compared to the results in Table 1, the breaking and cubic detrending methods yield systematically stronger predictability at any forecasting horizon. These results emphasize that our choice of the linear detrending procedure of Hamilton (2017) as a benchmark specification provides a conservative view of return predictability. Further, the question regarding which method should be employed to isolate cyclical variation in consumption appears largely irrelevant since all methods reveal substantial return predictability. 13 We obtain similar results for windows of three or four years. 14

3.4 Out-of-sample analysis Welch and Goyal (2008) show that in-sample predictability does not necessarily imply that investors can benefit from a better portfolio allocation and that most variables that have been used to predict stock returns in the extant literature perform poorly out-ofsample. There are two reasons that can cause out-of-sample forecasts to differ from in-sample forecasts. First, in out-of-sample forecasts, the coeffi cients in the predictive model could change over time. Second, the macroeconomic time series available today could differ from those which were available in real time due to ongoing data revisions. To address these concerns, we construct a real-time data set for cc based on vintage data from the Archival Federal Reserve Economic Data (ALFRED) database of the Bureau of Economic Analysis at the Federal Reserve Bank of St. Louis with data observations from each vintage starting in 1947Q1. Because vintage data on population estimates of the Bureau of Economic Analysis can be downloaded only for the period after 1999, we use total consumption expenditure for nondurable goods and services in the out-of-sample calculations. Following Møller and Rangvid (2015) we assume that the real-time practitioner uses the final consumption estimates which typically become available in the last month of each quarter. To gauge the situation of an investor operating in real time, we reestimate the parameters in cc recursively every period, based upon an expanding window and data available at the time of the forecast. At a cost of a larger sampling error in the early estimation recursions, this technique provides a means to circumvent a so-called "look-ahead" bias (Brennan and Xia (2005) and Lettau and Ludvigson (2005)). 3.4.1 Out-of-sample test statistics To guard against a possibility that our conclusions are affected by any particular period, we consider three out-of-sample forecast evaluation periods: 1980-2017, 1990-2017, and 15

2000-2017. 14 We follow Welch and Goyal (2008) and Campbell and Thompson (2008) and choose the first sub-sample to start in 1980. The second sub-period corresponds to the sample studied by Rapach, Ringgenberg, and Zhou (2016). Finally, we evaluate a recent post-2000 period for purposes of comparison with Rapach, Strauss, and Zhou (2010). For nested forecast comparison tests, we specify a model of constant expected returns, that is, a benchmark model where a constant is the sole explanatory variable. The constant expected return model is a restricted nested version of an unrestricted model of time-varying expected returns which includes both a constant and cc as predictive variables. Accordingly, we evaluate whether our return predictions are more precise than predictions from the prevailing mean model. For example, Welch and Goyal (2008) show that the historical average forecast is a very stringent out-of-sample benchmark. 15 The assessment of out-of-sample predictability involves four metrics. The first statistic we report is the powerful ENC-NEW statistic of Clark and McCracken (2001) which extends the encompassing test of Harvey, Leybourne, and Newbold (1998) by deriving a nonstandard asymptotic distribution of this test statistic under the null of nested forecasts. The ENC-NEW statistic tests the null hypothesis that the restricted forecasting model encompasses the unrestricted forecasting model; the alternative is that the timevarying expected return model contains information that could be used to significantly improve the forecast of the constant expected return model. The second is the MSE-F statistic of McCracken (2007) which tests the null hypothesis that the restricted forecasting model has a mean squared error (MSE) that is less than or equal to that of the 14 Starting the out-of-sample evaluation in 1980Q1 provides a reasonably long initial in-sample period for reliably estimating the parameters used to generate the first predictive regression forecast. For consistency with the in-sample analysis, we take publication lags into account and use a two-quarter lagged value of cc in out-of-sample calculations. The results for one-quarter-ahead returns are qualitatively similar. 15 We find that an autoregressive model that includes a constant and lagged dependent variable with lag length selection based on AIC and BIC information criteria does not improve, and often even degrades the out-of-sample predictive power of a regression that uses just a constant term. Hence, we show comparison tests with the more parsimonious model of a constant expected return as a benchmark. 16

unrestricted forecasting model; the alternative is that the unrestricted model has smaller MSE. The third test is the out-of-sample R 2 OOS statistic of Campbell and Thompson (2008) which measures the proportional reduction (or increase) in the MSE of the unrestricted model relative to the MSE of the prevailing mean benchmark forecast. The R 2 OOS statistic is measured in units that are comparable to the in-sample R 2. The R 2 OOS takes positive (negative) values when the predictive regression model predicts better (worse) than the historical mean. The critical values for the ENC-NEW and MSE-F statistics are obtained from a bootstrap procedure described in the appendix. To assess the statistical significance of the ROOS 2 s, we employ the Clark and West (2007) test statistic, which allows us to test the null hypothesis that the historical average MSE is less than or equal to the predictive regression MSE against the alternative hypothesis that the historical average MSE is greater than the predictive regression MSE. This statistic is a correction of the Diebold and Mariano (1995) statistic and is demonstrated to be more suitable for nested models. Finally, to measure the economic value of the equity premium forecasts, we follow Campbell and Thompson (2008) and compute the certainty equivalent return (CER) for an investor with mean-variance preferences who allocates across stocks and risk-free bills using the time-varying expected returns model relative to the historical mean return forecast. At the end of each quarter t, we calculate the optimal weight of equities as w t,h = r t+h γ σ 2, (9) t+h where r t+h is a forecast of h-quarter excess return, σ 2 t+h is a forecast of its variance, and γ is the risk aversion coeffi cient. The share 1 w t,h is allocated to risk-free bills, and the respective portfolio return is given by r p,t+h = w t,h r t+h + r f,t+h. (10) 17

We use a rolling ten-year window of past returns to estimate the variance, constrain w t,h to lie between 0 and 1.5, i.e. preventing investors from shorting stocks or taking more than 50% leverage, and assume a risk aversion coeffi cient of two. The portfolio CER can then be computed as µ p (γ/2) σ 2 p, where µ p and σ 2 p are the mean and variance, respectively, for the investor s portfolio over the forecast evaluation period. The CER gain is the difference between the CER for an investor who uses a predictive regression forecast and the CER for an investor who uses the historical average forecast. We multiply this difference by 400, so that it can be interpreted as the percentage portfolio management fee that an investor would be willing to pay each year to have access to the predictive regression forecast in place of a prevailing mean forecast. 3.4.2 Baseline out-of-sample results In Table 5, we show results of out-of-sample predictions of the log excess return on the S&P 500 index over various horizons ranging from one quarter to five years. We generally find that the unrestricted model generates significantly better forecasts than the restricted model. For instance, the ENC-NEW encompassing test rejects the null hypothesis that the forecasts from the constant expected return model encompass the forecasts from the time-varying expected return model at the 1% level for all horizons and all forecasting periods that we consider. The MSE-F test significantly rejects the null hypothesis that the MSEs from the unrestricted model are bigger than or equal to those from the historical average return. The out-of-sample R 2 OOS statistics in Table 5 are all positive, meaning that cc delivers a lower average forecasting error than the historical average forecast. For example, at the one-quarter horizon, the R 2 OOS is 3.73% when we forecast from 1990, 5.08% when we forecast from 1990, and 6.43% when we forecast from 2000. Campbell and Thompson (2008) show that the correct way to judge the magnitude of the out-of-sample R 2 is to compare it with the squared Sharpe ratio for the portfolio that 18

is predicted. A ratio of the two provides an estimate of the increase in return that can be obtained for a mean-variance investor if this investor relies on information contained in the predictive variable when making portfolio decisions. The out-of-sample R 2 is 3.73% when predicting next quarter log S&P 500 excess return over the post-1980 period; the respective squared Sharpe ratio is 2.86%. This implies that a mean-variance investor would increase the average quarterly portfolio return by a proportional factor of about 30% if relying on return forecast generated by cc. The absolute increase in portfolio return depends on risk aversion, but is about 15% per year for an investor with a unity risk aversion, and about 7.5% per year for an investor with a risk aversion of two. This return enhancement for a market timer who allocates his investment optimally between the stock market and the risk-free asset comes in part from taking on greater risk. The associated welfare gain for a mean-variance investor with relative risk aversion of two is provided in Table 5 in the row labeled "CER gain". We find reliably positive and sizable CER gains for each time horizon. These gains tend to increase up to time horizons of two years and decline thereafter. 16 We draw broadly consistent conclusions regarding out-of-sample predictability from a series of robustness checks. For instance, we conduct additional tests based on fixed fullsample parameter values or when relying on today-available, i.e. revised, consumption data. Similar conclusions emerge also from tests with one-quarter lagged value of cc, tests with autoregressive benchmark model, tests with the CRSP index return, and tests with actual and real returns. To summarize, our results show that cyclical consumption fluctuations that we identify display statistically significant out-of-sample predictive power for aggregate stock market returns. This is the case if an investor started a forecast in 1980, 1990, or 2000. These 16 Differences between CER gains and out-of-sample R 2 statistics are at least partly due to the estimated variance of stock return that is necessary to calculate the CER gains. The utility gains reported in Table 5 are limited by the leverage constraint but do not take into account transaction costs. 19

results are in contrast to Welch and Goyal (2008) who accentuate that a long list of popular business cycle predictor variables have been unsuccessful out-of-sample in the last few decades. 4 Robustness tests In this section, we investigate the predictive ability of cyclical consumption from a crosssectional perspective, explore the robustness of our results to changes in the length of the sample period and alternative empirical measures of consumption, and examine international evidence. 4.1 Stock portfolios sorted on characteristics In the preceding analysis, we have assessed the predictability of stock returns by means of two commonly used stock market indices that give a broad view of the behavior of the aggregate equity premium. In what follows, we investigate how well our predictor variable can forecast U.S. portfolios of stocks sorted on industry SIC codes and 15 various financial characteristics including market equity, book-to-market equity, earnings-price and cashflow-price ratios, dividend yield, momentum, short-term and long-term reversals, operating profitability, investment, accruals, market beta, net share issues, and total and residual variances. 17 Table 6 reports the estimation results from in-sample univariate predictive regressions for each of the 10 decile portfolios sorted across the 16 alternative criteria. The tests are conducted over the longest possible sample period, that is, starting in the first quarter of 1954 or the third quarter of 1963 depending on data availability. A general result is that cyclical consumption fluctuations strongly negatively predict the entire cross-section 17 The portfolio data are from Ken French s homepage. 20

of returns and hence further reinforce our benchmark results. We find that only two of the 160 estimated coeffi cients are not statistically significant at the 5% level according to bootstrap p-values. These results emphasize that time-varying expected rates of return across a large number of portfolio sorts contain a common macroeconomic component. 4.2 Temporal stability of estimates Welch and Goyal (2008) and Campbell and Thompson (2008) highlight that many business cycle predictor variables have performed particularly poorly both in-sample and out-of-sample after the oil price crisis in the 1970s. To address this point, Table 7 reexamines the empirical evidence of predictability over the post-1980 period which includes the great equity bull market at the end of the twentieth century. Our estimates reveal that the predictive ability of cyclical consumption in the latter part of the sample is comparable to and often stronger than that over the full sample. The estimates in Table 7 show statistical significance for returns at various horizons and R 2 values which are often well beyond those reported in Table 1. This observation stands in contrast to many other predictor variables which record a reduction in the extent of predictability in the post-oil-price-crisis of the mid 1970s. We obtain similar results in three other episodes of the economic history: in the post-1965 data (see also, Welch and Goyal (2008)); a period predating the global financial crisis; and a sample which omits the data in the aftermath of the run-up in prices in the early 2000s. The explanatory ability of cyclical consumption fluctuations is not confined to any particular period and is not concentrated in sub-samples with severe crises, a pattern often found in the literature. We also study the temporal stability of the β estimates in Equation (3) to structural breaks as prescribed by Elliott and Müller (2006). Their proposed qll test statistic for the hypothesis that β t = β for all t and any h is particularly useful in the context of predictive regressions because it is asymptotically effi cient for a wide range of data-generating 21

processes, has superior size properties in small samples than other popular statistics, and is simple to construct. Moreover, the simulation analysis in Paye and Timmermann (2006) shows that the test of Elliott and Müller (2006) possesses excellent finite sample size properties even in the presence of highly persistent lagged endogenous predictors. We find that the qll statistics for our benchmark estimates are never significant at any horizon (not reported). 4.3 Alternative consumption measures As explained in Section 2, our main empirical analysis conventionally focuses on real per capita NIPA expenditure on nondurable goods and services as a proxy of aggregate consumption. In this section, we consider the predictive ability of cyclical consumption extracted from various subcategories of personal consumption expenditure (PCE) including i) nondurable goods (NON), ii) services (SERV), iii) durable goods (DUR), iv) the stock of durable goods (SDUR) constructed from the year-end estimates of the chained quantity index for the net stock of consumer durable goods published by the Bureau of Economic Analysis (BEA) following Yogo (2006), v) nondurable and durable goods (GOODS), and vi) total PCE. Table 8 shows results from the benchmark regression (3) applied to the log excess return on the S&P 500 index. The predictive power of cyclical consumption is generally qualitatively similar in terms of coeffi cient magnitudes, statistical significance, and R 2 measures across the six different expenditure aggregates that we consider. According to the R 2 statistics, nondurable goods emerge as the strongest predictor of stock returns at short- and medium-term horizons of one quarter to two years. At longer-term horizons, the GOODS category, i.e. nondurable and durable goods jointly, reveals strongest predictability. Overall, we record R 2 values of up to 2.69% and 37.88% for quarterly and five-year returns, respectively. 22

Since our benchmark measure of aggregate consumption comprises of nondurable goods and services, it appears instructive to compare the performance of these two individual PCE subcomponents. The results in Table 8 show that services produce higher slope coeffi cients than nondurables for h = 1, 4, and 8, whereas nondurables generally yield bigger R 2 and t-values. From this perspective, it is interesting to note that the predictive ability of nondurables and services, measured separately, compares fairly to that of the aggregate consumption proxy in Table 1. Note also that at horizons of two years and above, we often find stronger results based on alternative PCE categories in Table 8 than in Table 1. This evidence reinforces our main findings and further highlights the conservative nature of our benchmark results. 4.4 International evidence To mitigate concerns regarding overfitting or "data snooping" (Lo and MacKinley (1990) and Bossaerts and Hillion (1999)), we investigate the predictability of stock returns in the remaining G7 countries: Canada, France, Germany, Italy, Japan, and the United Kingdom. We follow Solnik (1993), Ang and Bekaert (2007), Hjalmarsson (2010), and Rapach, Strauss, and Zhou (2013) and collect international total return indexes in national currency from Morgan Stanley Capital International (MSCI) available since 1970. Quarterly excess return series are calculated by subtracting the available local short-term interest rate retrieved from the OECD database from nominal returns. For the G7 index, we use the effective three-month Federal Funds rate for the U.S. as a short-term rate. For each country in the sample, we compute the cyclical component of consumption by fitting the regression in Equation (1) to the logarithm of individual countries real seasonally adjusted private final consumption expenditures recorded since the first quarter of 1960 in the OECD database. 18 Since no vintage series for international consumption data are 18 Over the 1970-2017 period, cyclical consumption exhibits cross-country correlations that range between -0.04 and 0.92. We find highest volatility with a standard deviation of 7.66% for the United 23

available, we restrict our analysis to in-sample tests of predictability. Table 10 presents cross-country evidence regarding stock return predictability for the G7 countries over the 1970-2017 period. To facilitate comparisons, we also show results for the United States over this shorter sample and for the aggregate G7 index for which the MSCI stock market series are recorded since the beginning of 1977. We generally find a stable negative relation between cyclical consumption and future stock returns. This relation is also statistically significant for the aggregate G7 index and in every individual country apart from Canada and Germany. The consistency of the estimated sign, its size, and the statistical significance provide evidence that cyclical consumption is typically useful in tracking future movements in local market equity returns. These results are largely consistent with our benchmark findings and they suggest that our main results are probably not caused by overfitting or data snooping. 5 Alternative predictor variables How does the predictive information contained in cyclical consumption compare to other well known predictor variables that have been rationalized by their ability to track business cycle conditions? To address this question, we consider a set of out-of-sample tests with alternative popular business cycle variables in the extant literature. The forecasting variables that we consider include fifteen predictors studied by Welch and Goyal (2008), 19 the consumption-aggregate wealth ratio of Lettau and Ludvigson (2001), the share of labor income to consumption of Santos and Veronesi (2006), the consumption volatility of Bansal, Khatchatrian, and Yaron (2005), and the output gap of Cooper and Priestley (2009). We download the data on the consumption-wealth ratio from the website of Martin Lettau, compute the share of labor income to consumption using the definition Kingdom and lowest volatility with a standard deviation of 3.38% for France. 19 The source of these data is the online library of Amit Goyal. 24

of labor income in Lettau and Ludvigson (2001) following Santos and Veronesi (2006), calculate the consumption volatility following Bansal, Khatchatrian, and Yaron (2005) ( J as σ c,t 1,J log ηc,t j ), where η c,t is the residual from an AR(1) process of log j=1 growth rate in real per capita nondurables and services and J = 4, and construct the output gap from industrial production data available at the Federal Reserve Bank of St. Louis following Cooper and Priestley (2009). This gives us a total of nineteen alternative predictor variables: 1. Log dividend-price ratio (dp): log of a 12-month moving sum of dividends paid on the S&P 500 index minus the log of prices on the S&P 500 index. 2. Log dividend yield (dy): log of a 12-month moving sum of dividends paid on the S&P 500 index minus the log of lagged prices on the S&P 500 index. 3. Log earnings-price ratio (e/p): log of a 12-month moving sum of earnings on the S&P 500 index minus the log of prices on the S&P 500 index. 4. Log dividend-payout ratio (d/e): log of a 12-month moving sum of dividends minus the log of a 12-month moving sum of earnings on the S&P 500 index. 5. Stock variance (svar): sum of squared daily returns on the S&P 500 index. 6. Book-to-market ratio (b/m): ratio of book value to market value for the Dow Jones Industrial Average. 7. Net equity expansion (ntis): ratio of a 12-month moving sum of net equity issues by NYSE-listed stocks to the total end-of-year market capitalization of NYSE stocks. 8. Treasury bill rate (tbl): interest rate on a three-month Treasury bill (secondary market). 9. Long-term yield (lty): long-term government bond yield. 10. Long-term return (ltr): return on long-term government bonds. 11. Term spread (tms): long-term yield on government bonds minus the Treasury bill rate. 25