Chris Kirby 1. INTRODUCTION - PDF Free Download

FIRM CHARACTERISTICS, CROSS-SECTIONAL REGRESSION ESTIMATES, AND INTERTEMPORAL ASSET PRICING TESTS Chris Kirby Researchers typically employ cross-sectional regression methods to identify firm-level characteristics that help to explain the cross-section of average stock returns. I develop a straightforward approach for testing whether the coefficient estimates produced by these methods satisfy the pricing restrictions imposed by a given stochastic discount factor. The empirical analysis reveals that the evidence from cross-sectional regression studies poses a substantial challenge to existing asset pricing models. The tests produce emphatic rejections for several candidate SDF specifications that perform well in prior research. It appears that the rejections are driven in part by the presence of nonlinearities in the data. Keywords: stochastic discount factor, size effect, book-to-market ratio, gross profitability, capital expenditures, corporate liquidity, momentum. 1. INTRODUCTION Numerous studies in the capital markets literature use cross-sectional regressions like those pioneered by Fama and MacBeth (1973) to identify firm characteristics that help to explain the cross-section of average stock returns (see, e.g., Fama and French, 1992, 2006, 2008; Cooper et al., 2008; Novy-Marx, 2013; Ball et al., 2014). I develop a straightforward approach for testing whether the coefficient estimates produced by these regressions satisfy the pricing restrictions imposed by a given stochastic discount factor (SDF). The proposed testing framework is applicable to any regression specification in which the characteristics used as regressors are known to market participants prior to the start of the interval over which the returns are measured. Because the estimated marginal effects produced by such specifications take the form of payoffs on dynamically-rebalanced, zero-investment positions, it follows that these effects can be priced in the same manner as excess stock returns. To see this more clearly, let R i,t and X i,t denote the return on the stock of firm i and some observable characteristic of firm i for period t. The Fama and MacBeth (1973) methodology consists of two basic steps: (i) use data for a cross-section of firms to fit a regression of R i,t on X i,t for a number of different time periods; (ii) use the average values of the resulting coefficient estimates to draw inferences. If we have N firms in total and fit the regressions by ordinary least squares (OLS), then the estimated slope coefficient for period t can be expressed as a linear combination of the N stock returns for period t. In this expression, the return for firm i is multiplied by a term that is proportional to X i,t (1/N) N X i=1 i,t. Hence, the OLS estimator of the slope coefficient defines a trading strategy in which stocks that fall into the upper and lower tails of the characteristic s distribution are weighted much more heavily than those that fall near the mean of the distribution. The amount invested in the Comments welcome. Belk College of Business Administration, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223; ckirby10@uncc.edu 1

stock of firm i is positive if X i,t is greater than the cross-sectional mean of the characteristic, and negative if X i,t is less than the cross-sectional mean of the characteristic. Of course one needs to assume that the value of X i,t is known in period t 1 to make such a trading strategy feasible. This assumption is satisfied for the overwhelming majority of the cross-sectional regressions considered in the literature. Fama and French (1992), for example, use monthly stock returns to fit a sequence of cross-sectional regressions in which the regressors are the log of the firm s size, as measured by its market equity, and the log of the ratio of its book equity to market equity. These regressions are specified such that the size measure is lagged by a minimum of one month and the book-to-market ratio is lagged by a minimum of six months. This ensures that both variables are elements of the public information set at the beginning of the holding period that is used to compute the returns. Most other studies that address the relation between firm characteristics and stock returns adopt the same or similar procedures for lagging the regressors. As a consequence, the payoffs to the associated trading strategies can be priced using standard techniques. Under the strategy associated with a cross-sectional regression of R i,t on X i,t, for example, the investment in the ith stock is proportional X i,t (1/N) N X i=1 i,t. Because the constant of proportionality is the same for all i, the strategy is self financing. Accordingly, the payoff to the strategy has to satisfy the same pricing restriction as excess stock returns. This follows because an excess stock return is also the payoff to a self-financing strategy. In particular, it is the payoff earned by shorting a government discount bond and investing in the stock. Thus the proposed approach for assessing whether the estimates produced by cross-sectional regressions satisfy the pricing restrictions imposed by a candidate SDF is straightforward. One simply compares the time-series average of the sequence of estimated marginal effects produced by the regressions to the estimated covariance of this sequence with the SDF. This approach easily generalizes to cross-sectional regressions that employ more than one explanatory variable. The sequence of estimated marginal effects for each variable can be interpreted as a sequence of payoffs produced by a self-financing trading strategy and priced accordingly. It is therefore a simple matter to construct both individual and joint tests of the hypothesis that the cross-sectional regression estimates satisfy the pricing restrictions imposed by a candidate SDF. In effect, the proposed methodology represents a general procedure for computing the expected value of the cross-sectional regression coefficients under the assumption that any given SDF specification is valid. One limitation of the methodology is that it relies on a single point estimate to determine whether the information provided by a given characteristic about the cross-section of expected stock returns is consistent with the restrictions imposed by a candidate SDF. If the restrictions are rejected, then it may be difficult to glean much about the aspects of the pricing model that are at odds with the data. To address this issue, I extend the idea of pricing estimated marginal effects to nonparametric regression models. Specifically, I use the well-known k-nearest-neighbor classification scheme to construct estimates of the local marginal effects at a number of different points along the cross-sectional distribution of the 2

characteristic. The extent to which a candidate SDF is capable of explaining the observed changes in the estimated marginal effect along this distribution provides insights regarding the reasons for the success or failure of the pricing model. The candidate SDF specifications employed for the empirical analysis include those implied by the capital asset pricing model (CAPM) of Sharpe (1964) and Lintner (1965), the consumption CAPM of Breeden (1979), and three other consumption-based models drawn from the recent literature: the conditional consumption CAPM of Lettau and Ludvigson (2001), the ultimate consumption CAPM of Parker and Julliard (2005) and the durable consumption CAPM of Yogo (2006). I focus on these three models because they appear to perform well in pricing tests that employ quarterly excess returns on a set of 25 portfolios formed by sorting firms on size and book-to-market values. Indeed, the aforementioned studies report that the models explain most of the spread in average portfolio returns that is generated by the size and book-to-market sorting procedure. I provide an alternative perspective on the extent to which the models explain the cross-sectional variation in expected stock returns that is captured by these characteristics, and extend the pricing analysis to cover several additional characteristics of interest: the ratio of gross profits to total assets, the ratio of net capital expenditures to total assets, the ratio of cash equivalents to total assets, and two lagged return measures that capture short-term reversal and momentum phenomena. Thus I consider seven characteristics in total. I begin the empirical analysis by fitting cross-sectional regressions to quarterly stock returns for 1963 to 2009 using various combinations of the characteristics as explanatory variables. The regressions are specified such that the returns for quarter three of year t to quarter two of year t + 1 are matched with accounting information for fiscal years that end in calendar year t 1. As anticipated, the regressions produce evidence that all seven of the characteristics help to explain the cross-sectional variation in expected stock returns. Most specifications produce Fama and MacBeth (1973) t-statistics that range from around two to six in magnitude. Moreover, the local linear regressions produce substantial evidence of nonlinearity for the majority of the characteristics. The evidence suggests, for example, that the marginal effect of the ratio of gross profits to total assets on expected stock returns decreases with the magnitude of this ratio. The average value of the estimated marginal effect is around four for unprofitable firms, and around one for highly profitable firms. The first set of pricing tests is conducted using the seven estimated marginal effects produced by the standard Fama and MacBeth (1973) procedure. To establish a set of baseline results, I implement the tests for the Sharpe-Lintner CAPM and consumption CAPM. Both models are rejected at the 1% significance level. This is not surprising given the findings of prior research. The diagnostics suggest that the Sharpe-Lintner CAPM fits the data better in some regards. For instance, regressing the average values of the estimated marginal effects on the fitted values from the model produces an R-squared of 70.1%. But both models produce large and statistically significant pricing errors. Of the three remaining models, the ultimate consumption CAPM displays the least evi- 3

dence of misspecification. It is the only model for which the test does not reject at standard significance levels, producing a probability value of 16%. In comparison, the test statistics for the conditional CAPM and durable consumption CAPM have probability values of less than 1%. On the other hand, the diagnostics suggest that the failure to reject the ultimate consumption CAPM stems from a lack of precision in the estimates. A number of the pricing errors for the model are larger in magnitude than those for the consumption CAPM, but the associated standard errors are too big to generate statistically significant t-statistics. For the next set of pricing tests, I use 21 estimated local marginal effects: three per characteristic. All of the models perform poorly in these tests. Not only do the test statistics have probability values of less than 1% in every case, the maximum pricing errors are considerably larger than those obtained using the estimated marginal effects produced by the standard Fama and MacBeth (1973) procedure. The ultimate consumption model fares no better that the other models in this regard. Indeed, it produces a larger average pricing error and a lower cross-sectional R-squared than the Sharpe-Lintner CAPM. To develop additional insights, I use characteristic-specific pricing tests to investigate the performance of the models for each individual characteristic. These tests are constructed using ten estimated local marginal effects that correspond to ten different points along the observed support of the characteristic s cross-sectional distribution. As noted earlier, for example, the average value of the estimated marginal effect of the ratio of gross profits to total assets decreases from around four for unprofitable firms to around one for highly profitable firms. Accordingly, the sample covariance of the sequence of estimated local marginal effects with a candidate SDF needs to mirror this decline to successfully explain the evidence of nonlinearity in the relation between firm profitability and stock returns. The characteristic-specific tests bring the extent to which the regression estimates pose a challenge to asset pricing theory into sharper focus. Both the Sharpe-Lintner CAPM and consumption CAPM are rejected at the 1% significance level in every test. Neither model shows any sign of capturing the cross-sectional variation in the estimated local marginal effects for even one of the seven characteristics. The ultimate consumption CAPM and conditional consumption CAPM perform slightly better. The test statistic for the former has a probability value of 1% for the ratio of gross profits to total assets and 2% for the ratio of cash equivalents to total assets, while that for the latter has a probability value of 5% for the ratio of net capital expenditures to total assets. But the statistics for the remaining characteristics indicate that both models are rejected at the 1% significance level. The results for the durable consumption CAPM are slightly more encouraging. The test statistic has a probability value of 34% for the ratio of gross profits to total assets, 12% for the ratio of net capital expenditures to total assets, 2% for the log book-to-market ratio, and 1% for log size. These findings suggest that the covariance between durable consumption growth and the sequence of estimated local marginal effects is empirically relevant for at least some characteristics. Nonetheless, the model clearly falls short of fully explaining the cross-sectional regression evidence by a sizable margin. Moreover, it explains surprisingly 4

little of the cross-sectional variation in average stock returns captured by the size and bookto-market variables. Thus the debate as to whether these characteristics proxy for systematic risk exposures appears to be far from settled. Overall the analysis suggests that the presence of nonlinearities in the data plays a key role in many of the rejections. This finding highlights the potential for nonparametric methods to yield new insights in asset pricing research. 2. ASSET PRICING RESTRICTIONS ON ESTIMATED MARGINAL EFFECTS Fama and French (1992) present compelling evidence that two easily-measured firm characteristics the market value of equity (ME) and book-to-market equity (BE/ME) ratio help to explain the cross-section of average stock returns. They show in particular that monthly cross-sectional regressions of firm-level stock returns on lagged values of these characteristics produce statistically-significant average slopes over their 1963 to 1990 sample period, even though the regressions control for the estimated market risk of each firm. This finding immediately sparked a lively debate that continues to this day. Some researchers contend that a firm s ME and BE/ME ratio capture irrational pricing phenomena (see, e.g., Lakonishok et al., 1994; LaPorta, 1996), while others argue that these characteristics proxy for firm exposures to systematic risk factors (see, e.g., Fama and French, 1993). The question at the center of this ongoing debate is whether the finding that certain firm characteristics, such as the BE/ME ratio, are correlated with average stock returns can be reconciled with the predictions of the Sharpe-Lintner CAPM or any other asset pricing model. Most of the existing evidence on this question is due to an approach popularized by Fama and French (1993), which consists of forming a relatively small number of portfolios by grouping firms on the basis of their observed characteristics, and then evaluating the statistical significance of the pricing errors for the portfolio returns. Although this approach is simple enough to implement for one or two characteristics, the sorting procedure developed by Fama and French (1993) quickly becomes intractable as the number of characteristics under consideration increases. Moreover, there is no direct connection between the pricing tests for the characteristic-based portfolio returns and the cross-sectional regression estimates. This makes it difficult to assess the degree to which the tests using the portfolio returns deliver reliable inferences regarding the regression evidence. In this section, I outline a direct approach for testing whether the results produced by cross-sectional regressions are consistent with rational asset pricing. The testing framework can be applied to any regression specification for which the regressors used to explain the returns over the interval t 1 to t are contained in the time t 1 information set. The basic idea is to test whether the average estimated marginal effects produced by the sequence of cross-sectional regressions satisfy the pricing restrictions implied by a given SDF. Using the proposed approach, it is straightforward to evaluate the asset pricing implications of the estimated relation between any chosen set of firm characteristics and expected stock returns. I begin by describing the asset pricing framework and type of regression covered by my approach, and then show how asset pricing models restrict the average values of the 5

estimated marginal effects produced by the regression analysis. 2.1. Asset pricing framework Let R i,t denote the gross return on the stock of firm i over the interval t 1 to t. If the stock market is frictionless, then the principle of no arbitrage implies that R i,t satisfies a conditional moment restriction of the form (2.1) E(m t R i,t I t 1 ) = 1, where m t denotes an admissible SDF and I t 1 denotes the period t 1 information set (Harrison and Kreps, 1979). Hence, we have (2.2) E(R i,t ) = 1 Cov(m t, R i,t ) E(m t ) by iterated expectations. Equation (2.2) is the fundamental restriction on unconditional expected returns that follows from rational pricing in a frictionless-markets setting. It implies that all cross-sectional variation in unconditional expected returns is explained by cross-sectional variation in the unconditional covariance between the SDF and returns. A similar restriction applies for the case in which stock returns are measured in excess of the known rate of return on a government-issued discount bond. To see this, note that equation (2.1) implies (2.3) E(m t R e i,t I t 1 ) = 0, where R e i,t R i,t R f,t denotes the difference between the gross stock return and gross bond return for the interval t 1 to t. Hence, it follows that (2.4) E(R e i,t) = Cov(m t, R e i,t) E(m t ) if rational pricing prevails. Linear specifications of m t feature prominently in the asset pricing literature. For example, the Sharpe-Lintner CAPM implies that the expected excess return on stock i is given by (2.5) E(Ri,t) e = Cov(Re i,t, Rmp,t) e E(R Var(R mp,t) e mp,t) e where R e mp,t denotes the excess return on the market portfolio. We can express equation (2.5) in the form of equation (2.3) by specifying (2.6) m t = 1 (R mp,t µ mp )λ, where µ mp = E(R e mp,t) and λ = E(R e mp,t)/var(r e mp,t). Similarly, the arbitrage pricing theory (APT) of Ross (1976) implies that m t is linear in a set of factors that represent sources of systematic risk in the economy. 6

While nonlinear specifications of m t are the norm in the literature on consumption-based asset pricing, it is common to use linear approximations to these specifications in empirical tests (see, e.g., Lettau and Ludvigson, 2001; Parker and Julliard, 2005; Yogo, 2006). Constructing such approximations is straightforward. For any m t > 0, it follows that (2.7) m t E(m t ) 1 + log ( mt E(m t ) ) by performing a first-order Taylor series expansion of exp(log m t ) around the point log m t = log E(m t ). 1 Substituting equation (2.7) into equation (2.4) yields (2.8) E(R e i,t) = Cov(log m t, R e i,t). We can therefore nest any model in which log m t is linear or approximately linear within the familiar pricing framework of linear factor specifications. I base the proposed tests of whether the estimates produced by fitting cross-sectional regression models are consistent with rational pricing on the restriction in equation (2.8). 2.2. Sequential cross-sectional regressions using firm characteristics Suppose we are presented with cross-sectional regression evidence that some firm characteristic, such as the BE/ME ratio, is correlated with average stock returns. How can we assess whether this evidence is consistent with the pricing restrictions imposed by a given SDF specification? My approach is motivated by the fact that the characteristics used as explanatory variables in cross-sectional regression models are generally lagged by one or more time periods to ensure that they are known to market participants prior to the start of the interval over which the returns are measured. Consequently, the estimated marginal effects can be interpreted as payoffs on dynamically-rebalanced, zero-investment positions in the set of stocks used for the analysis and priced accordingly. Consider a scenario in which the dataset contains the returns for a collection of N stocks in periods t = 1, 2,..., T. 2 Let R t = (R 1,t, R 2,t,..., R N,t ) denote the N 1 vector of gross stock returns for period t. The relation between firm characteristics and expected stock returns is typically investigated using the regression methodology pioneered by Fama and MacBeth (1973). To illustrate, let X i,t I t 1 denote a 1 M vector of predetermined characteristics for firm i that are used to explain the cross section of stock returns for period t. The Fama and MacBeth (1973) methodology consists of two basic steps. First, we fit a cross-sectional regression model of the form (2.9) R t = a + X t b + e t, 1 A first-order Taylor series expansion yields exp(x) exp(x 0 ) + exp(x 0 )(x x 0 ). Equation (2.7) follows by setting x = log m t and x 0 = log E(m t ). 2 I assume that N is fixed for ease of exposition. Allowing the number of stocks used to fit the regression model to vary from one period to the next presents no difficulties. 7

for each t {1, 2,..., T }, where X t is an N M matrix with ith row X i,t and b = (b 1,..., b M ) is an M 1 vector of slope coefficients. This delivers a sequence {â t, ˆb t } T t=1 of estimated coefficients. The elements of ˆb t are the estimated marginal effects of the characteristics on expected stock returns for period t. Second, we compute the average value of ˆb t to see if the characteristics help to explain the cross-section of average stock returns. If any element of (1/T ) T ˆb t=1 t is statistically significant, then we conclude that there is a relation between the corresponding characteristic and expected stock returns. If we employ the usual strategy of fitting the regressions by OLS, then the estimator of b for the period t regression is given by ˆb 1 t = ˆΣ ˆΣ XX XR, where (2.10) ˆΣXX = 1 N ( N X i,tx i,t i=1 1 N N i=1 is the sample covariance matrix of X i,t and ( (2.11) ˆΣXR = 1 N X 1 N N i,tr i,t N i=1 i=1 X i,t X i,t ) ( 1 N ) ( 1 N ) N X i,t, i=1 ) N R i,t. is the vector of sample covariances between X i,t and R i,t. Note, however, that this estimator can also be expressed as ˆb t = Z tr t, where Z t is an N M matrix with ith row ( ) (2.12) Z i,t = 1 X i,t 1 N X i,t ˆΣ 1 N N XX. i=1 This alternative representation of the OLS estimator, which emphasizes the fact that the elements of ˆb t are simply linear combinations of the returns for period t with coefficients that are known functions of variables in the period t 1 information set, makes it clear that we can interpret the estimated marginal effects from the cross-sectional regressions as payoffs on dynamically-rebalanced positions in the set of stocks used for the analysis. These payoffs can be priced by applying the SDF framework of Section 2.1. i=1 2.3. Pricing the estimated marginal effects Because the conditional moment restriction in equation (2.1) holds for all stocks, it implies a pricing restriction for R t of the form (2.13) E(m t R t I t 1 ) = 1, where 1 denotes an N 1 vector of ones. Multiplying both sides of this equation by Z t I t 1 and taking unconditional expectations yields (2.14) E(m tˆbt ) = 0, where 0 denotes an M 1 vector of zeros. Hence, using equation (2.7), we obtain (2.15) E(ˆb t ) = Cov(log m t, ˆb t ) 8

as the fundamental restriction on the estimated marginal effects implied by rational asset pricing. This restriction can be tested using standard large-sample procedures. Two aspects of these results are particularly noteworthy. First, the right side of equation (2.14) is a vector of zeros because each column of Z t sums to zero. Thus the estimated marginal effects are payoffs on zero-investment positions. In the M = 1 case, for example, the amount invested in asset i in period t 1 is given by (2.16) Z i,t = X i,t ˆµ X, N ˆσ X 2 where ˆµ X and ˆσ X 2 denote the estimated cross-sectional mean and estimated cross-sectional variance of the characteristic. If X i,t exceeds ˆµ X, then the amount invested in the ith stock is positive; otherwise, it is negative. Because the amount invested in the ith stock is proportional to the standardized value of X i,t, stocks that fall into the upper and lower tails of the characteristic s cross-sectional distribution are weighted much more heavily than those that fall near the mean of distribution. Equation (2.14) implies that the resulting payoff must satisfy the same pricing restriction as excess stock returns. Second, the most straightforward estimator of E(ˆb t ) is simply (1/T ) T ˆb t=1 t, the time-series average of the sequence of estimated marginal effects produced by the Fama and MacBeth (1973) methodology. Thus the t-ratios that are typically reported in cross-sectional regression studies and the proposed test for rational pricing focus on the same basic information. But the test for rational pricing compares the average estimated marginal effects to the estimated covariances of these variables with the SDF rather than to zero. This is a key distinction. We may find that rational pricing is rejected for a particular specification of the SDF even if the t- ratios indicate that the average estimated marginal effects are statistically indistinguishable from zero. This might occur, for instance, if the average estimated marginal effects are positive, but the estimated covariances with the SDF are negative. The idea of pricing estimated marginal effects can easily be extended to cover nonparametric regression models. The only requirement is that the nonparametric estimator take the form of a linear smoother. This extension holds the potential to yield more informative pricing tests. Suppose, for example, that the cross-sectional relation between a given characteristic and average stock returns is nonlinear. In this case a nonparametric approach should capture the variation in the marginal effect over the cross-sectional distribution of the characteristic. The extent to which a given SDF is capable of explaining the observed changes in the estimated marginal effect along this distribution could provide useful insights regarding the reasons for the success or failure of the pricing model. To develop this idea more fully, I discuss how it could be implemented using local linear regression techniques. 2.4. Extension to estimated local marginal effects The objective in local linear regression is to estimate the marginal effects of the characteristics at particular set of values X i,t = x. One of the most widely used local linear estimators 9

is based on the well-known k-nearest-neighbor (k-nn) classification scheme that was first analyzed in detail by Cover and Hart (1967). The k-nn estimator for the period t regression is obtained by solving the weighted-least-squares problem (2.17) min a,b N ω i (R i,t a X i,t b) 2, i=1 where the weight for the ith firm is given by { 1 if i Nk (x) (2.18) ω i = 0 otherwise with N k (x) = {i such that X i,t is one of the k nearest observations to x}. 3 I implement the methodology using the standardized Euclidean distance, i.e., ( M ( ) ) 1/2 2 Xij,t x j (2.19) X i,t x =, j=1 ˆσ j,t where ˆσ j,t denotes the sample standard deviation of the jth regressor in the cross-section for period t. The resulting estimator of the vector of local marginal effects can be expressed as 1 ˆb t (x) = ˆΣ XX(x) ˆΣ XR (x), where ˆΣ XX (x) and ˆΣ XR (x) denote the sample covariance matrix of X i,t and the vector of sample covariances between X i,t and R i,t for the sample comprised of the observations with the k smallest values of X i,t x. There are three features of the k-nn approach that make it an attractive choice for the purposes considered here. First, it is easy to assess the degree of smoothing achieved for a given value of k by viewing the k-nn estimator as an OLS estimator for a sample of size k. Thus the implications of selecting a particular k are fairly intuitive. Second, the k-nn estimator is well behaved even if the data are unevenly distributed over the support of X i,t. This is not the case for the standard multivariate kernel regression estimator. Third, the k- NN approach for cross-sectional regressions can be viewed as a straightforward generalization of the Fama and MacBeth (1973) methodology in which the sample is restricted to firms that satisfy certain screening criteria with respect to their characteristic values. The restriction on E(ˆb t (x)) implied by a given specification of m t takes the same form as that in equation (2.15). This restriction can be tested for a range of different values of x to assess how well a candidate SDF captures the cross-sectional variation in the estimated local marginal effects. However, using the standard k-nn estimator to implement such an approach is unlikely to yield satisfactory results except in the M = 1 case. The reason is that the sparsity of the data in higher dimensions causes the performance of the estimator to deteriorate as the value of M increases. This phenomenon is commonly referred to as the curse of dimensionality in the nonparametric regression literature. To overcome this 3 In general, the k-nn estimator converges at the same asymptotic rate as the local linear kernel regression estimator. This is not surprising given that the k-nn estimator can be recast as a kernel regression estimator that uses the uniform kernel with a stochastic bandwidth (see, e.g., Ouyang et al., 2006). 10

problem, I employ an improved estimation technique known as marginal integration. This technique circumvents the curse of dimensionality for the class of additive nonparametric regression models (see, e.g., Fan et al., 1998). Motivating the marginal integration estimator is straightforward. Suppose a random variable Y is described by an additive regression model of the form (2.20) Y = g 1 (X 1 ) + g 2 (X 2 ) + + g M (X M ) + e, where g 1 ( ), g 2 ( ),..., g M ( ) are unknown functions. Equation (2.20) retains the additive structure of the standard linear regression model, but it does not restrict the marginal effects to be constant. Now suppose that we want to construct an estimator of g 1 (X 1 ). To identify this function, we impose the normalization E(g 1 (X 1 )) = 0 and define (2.21) µ Y = E(g 2 (X 2 ) + + g M (X M )). Then we note that (2.22) (g 1 (X 1 ) + g 2 (X 2 ) + + g M (X M ))f(x 2,..., X M )dx 2... dx M = µ Y + g 1 (X 1 ), where f(x 2,..., X M ) denotes the marginal distribution of X 2,..., X M. In other words, we can obtain g 1 (X 1 ) up to an additive constant by integrating the unknown regression function over the marginal distribution of X 2,..., X M, and the constant can be estimated consistently using the sample mean of the dependent variable. To construct the marginal integration estimator of g 1 (X 1 ), we simply replace the expectations with sample averages and the unknown regression function with a suitable pilot nonparametric smoother. I use the k-nn version of the multiple regression model as the smoother, and estimate the local marginal effect for each regressor as follows. Assume for the purposes of illustration that the objective is to estimate b 1 (x 1 ), the marginal effect of X i1,t on the conditional expected value of R i,t at the point X i1,t = x 1. Let x (i) denote an M 1 vector whose first element is x 1 and whose remaining elements are the observed sample values of X i2,t,..., X im,t. The marginal integration estimator ˆb 1,t (x 1 ) is given by the first element of the vector (1/N) N ˆb i=1 t (x (i) ), where ˆb t (x (i) ) denotes the OLS estimator of the vector of slope coefficients for the period t regression using the sample comprised of the k nearest observations to x (i). Intuitively, we obtain the estimator of b 1 (x 1 ) by computing the k-nn estimator of the regression coefficients for the specified value of the first characteristic using each of the N observed joint realizations of the remaining characteristics, and then averaging (integrating) over the resulting distribution of N estimated marginal effects for the first characteristic. The estimators of b 2 (x 2 ),..., b M (x M ) are obtained in an analogous fashion. 3. THE CANDIDATE SDF SPECIFICATIONS The empirical analysis focuses primarily on consumption-based asset pricing models. This focus reflects findings that several such specifications the conditional consumption CAPM 11

of Lettau and Ludvigson (2001), the ultimate consumption CAPM of Parker and Julliard (2005), and the durable consumption CAPM of Yogo (2006) perform reasonably well in pricing the excess returns on a set of 25 portfolios formed via the ME and BE/ME sorting rules of Fama and French (1993). Indeed, these studies report that the models capture a large fraction of the cross-sectional variation in average stock returns that is produced by the Fama and French (1993) procedure. I provide an alternative perspective on the pricing performance of the three models in this regard, and extend the analysis to cover several additional characteristics that have recently gained prominence in the literature. To establish a set of baseline results, I consider the pricing performance of the Sharpe- Lintner CAPM and consumption CAPM. The SDF for the former is given in equation (2.6). That for the latter is obtained by linearizing the specification of m t implied by the intertemporal optimization problem of a representative agent whose utility function displays constant relative risk aversion (CRRA). Under CRRA preferences, we have [ ( ) ] λ Cns,t (3.1) E ψ R i,t I t 1 = 1 i, C ns,t 1 where C ns,t denotes the agent s real consumption for period t, which is assumed to consist of nondurable goods and services, ψ < 1 is his rate of time discount, and λ is his coefficient of relative risk aversion (see, e.g., Hansen and Singleton, 1982). Hence, using the linear approximation in equation (2.7) yields a pricing restriction for excess returns of the form (3.2) E(R e i,t) = Cov(R e i,t, c ns,t )λ i, where c ns,t = log C ns,t log C ns,t 1. This is equivalent to specifying (3.3) m t = 1 ( c ns,t E[ c ns,t ])λ as the SDF for pricing excess returns. The specifications of m t for the remaining pricing models are obtained by generalizing the CRRA model along various dimensions. To derive the ultimate consumption CAPM, we note that equation (3.1) can be rearranged to obtain (3.4) C λ ns,t 1 = E(ψC λ ns,tr i,t I t 1 ), and hence we can use recursive substitution to express C λ ( ) H (3.5) C λ ns,t = E ψ H C λ ns,t+h R f,t+h I t h=1 ns,t as for any integer H > 0, where R f,t+h denotes the rate of return on a one-period discount bond that matures in period t + h. Substituting equation (3.5) into equation (3.4), applying the law of iterated expectations, and rearranging once more yields [ (3.6) E ψ H+1 ( Cns,t+H C ns,t 1 ) λ R i,t ] H R f,t+h I t 1 = 1 i. h=1 12

We can therefore use consumption growth measured over H + 1 periods rather than over a single period to evaluate the pricing performance of the CRRA model. Parker and Julliard (2005) contend that using a large value of H for empirical work might improve the performance of the CRRA model because the impact of household investment choices in period t is unlikely to be fully reflected in consumption data until a number of subsequent periods have passed. This misalignment, they argue, could arise from a number of phenomena, such as errors in measuring consumption using survey data, mismeasurement of marginal utility that arises from nonseparability with other factors such as leisure, and a slow response of marginal utility to wealth shocks that stems from adjustment costs. In their empirical tests, allowing for an adjustment lag of around three years (H = 11) produces a substantially better fit to the average excess returns for the 25 Fama-French portfolios than assuming that the adjustment takes place in a single period. Indeed, Parker and Julliard (2005) conclude that the fit of the linearized version of the model rivals that of the threefactor model of Fama and French in their study. I set H = 11 for the empirical analysis. The linearized version of equation (3.6) used for the tests ignores the contribution of the discount bond returns to the dynamics of m t. Parker and Julliard (2005) show that this simplification has very little impact on the pricing performance of the model. Under this approach, equation (2.7) implies a pricing restriction for Ri,t e of the form ( ) H (3.7) E(Ri,t) e = Cov Ri,t, e c ns,t+h λ i. h=0 This restriction can be obtained by specifying ( H [ H ]) (3.8) m t = 1 c ns,t+h E c ns,t+h λ h=0 h=0 as the SDF for pricing excess returns. Lettau and Ludvigson (2001) derive a conditional version of the consumption CAPM by interacting c ns,t with a lagged conditioning variable. Their approach is motivated by the observation that dynamic models with optimizing agents often imply that log consumption, log asset wealth, and log human wealth (human capital) are cointegrated, which suggests that short-term deviations of the log consumption-wealth ratio from its long-run trend should summarize investor expectations about discounted future returns on the market portfolio. Lettau and Ludvigson (2001) use the cointegrating residual between log consumption, log asset wealth, and log labor income to measure these deviations. If we let cay t denote the realization of this residual for period t, then the resulting version of the conditional consumption CAPM implies an SDF of the form (3.9) m t = 1 (cay t 1 E[cay t 1 ])λ 1 ( c ns,t E[ c ns,t ])λ 2 (cay t 1 c ns,t E[cay t 1 c ns,t ])λ 3 for pricing excess returns. Lettau and Ludvigson (2001) report that this model explains 70 percent of the cross-sectional variation in average returns on the 25 Fama-French portfolios over their 1963 to 1998 sample period. 13

Yogo (2006) develops a durable consumption CAPM by considering the intertemporal optimization problem of an agent whose utility is a function of both the consumption of nondurable goods and the service flow from a stock of durable goods. Under his model, [( ( ) 1/σ ( ) ) 1/σ 1/ρ κ ] Cns,t g(it /C ns,t ) (3.10) E ψ R 1 1/κ W,t R i,t g(i t 1 /C ns,t 1 ) I t 1 = 1 i, where ( ) I (3.11) g = C ns C ns,t 1 ( ( I 1 η + η C ns ) 1 1/ρ ) 1/(1 1/ρ), and C ns,t, I t, and R W,t denote the consumption of nondurables, stock (inventory) of durable goods, and gross return to wealth in period t. The time-invariant parameters of the SDF (i.e., ψ, σ, ρ, κ, and η) reflect various aspects of preferences, such as relative risk aversion and the elasticity of substitution between durable and nondurable goods. To derive a linear version of the model, Yogo (2006) approximates log m t around the special case of Cobb-Douglas intraperiod utility, which is understood to correspond to ρ = 1. Linearization yields a pricing restriction for excess returns of the form (3.12) E(R e i,t) = Cov(R e i,t, c ns,t )λ 1 + Cov(R e i,t, c d,t )λ 2 + Cov(R e i,t, r W,t )λ 3 i, where c ns,t = log C ns,t log C ns,t 1, c d,t = log I t log I t 1, and r W,t = log R W,t. This restriction can be obtained by specifying (3.13) m t = 1 ( c ns,t E[ c ns,t ])λ 1 ( c d,t E[ c d,t ])λ 2 (r W,t E[r W,t ])λ 3 as the SDF for pricing excess returns. Yogo (2006) reports that this specification explains 94 percent of the cross-sectional variation in average excess returns on the 25 Fama-French portfolios for his 1951 to 2001 sample period. 4. PARAMETER ESTIMATION AND STATISTICAL INFERENCE I employ the generalized method of moments (GMM) of Hansen (1982) for estimation and inference. The most common approach to GMM estimation of SDF models is to use an overidentified system of moment conditions. To illustrate, consider a scenario in which the estimation is carried out using data on excess returns for a set of M portfolios (or estimated marginal effects for a set of M characteristics). Let F t denote a K 1 vector of factors, and let µ F = E[F t ]. Linear specifications of m t imply a pricing restriction of the form (4.1) E[R e t(1 (F t µ F ) λ)] = 0, where R e t denotes the M 1 vector of excess portfolio returns and λ denotes a K 1 vector of parameters. If we combine equation (4.1) with the definition E[F t µ F ] = 0, then we obtain 14

a set of M + K moment conditions that depend on 2K unknown parameters. Typically, we have M > K, yielding M K overidentifying restrictions. While I consider GMM estimators and tests that are based on overidentified systems, I do so primarily for the purpose of drawing comparisons with the findings of Lettau and Ludvigson (2001), Parker and Julliard (2005) and Yogo (2006). Most of the pricing tests are conducted using an alternative approach in which the non-traded factors are replaced with their mimicking portfolios, and the parameters are estimated using regression methods. The mimicking-portfolio strategy is motivated by two considerations. First, using mimicking portfolios provides insights with respect to a potential identification issue that complicates inference for linear SDF specifications. Second, the Monte Carlo experiments of Balduzzi and Robotti (2008) indicate that the tests based on mimicking portfolios perform better than other GMM-based tests if the factors are measured with error. 4.1. Estimators and tests for overidentified systems The estimators and tests for the overidentified systems are constructed as follows. Equation (4.1) implies an SDF of the form (4.2) m t = (1 (F t µ F ) λ) for pricing excess returns. Note, however, that rescaling m t has no effect on equation (4.1). It is therefore apparent that specifying an SDF for pricing excess returns necessarily entails some type of normalization. The SDF in equation (4.1) is scaled such that its mean is equal to one (the mean normalization ). This is one of the normalizations commonly used in the empirical literature. To obtain the other, we express equation (4.1) as (4.3) E(R e t(1 F t λ )] = 0, where λ = λ/(1 + µ F λ). Equation (4.3) implies an SDF of the form (4.4) m t = 1 F t λ for pricing excess returns. It is scaled such that the linear function that describes the relation between m t and F t has an intercept of one (the intercept normalization ). I discuss estimation using both normalizations in Appendix A. In each case, I consider two GMM estimators. The first is based on a prespecified weighting matrix and can be computed analytically in one step. The second is based on the asymptotically optimal weighting matrix and is obtained by iterating on the standard procedure for constructing this matrix until convergence. To evaluate the statistical significance of the pricing errors, I implement standard large-sample tests of the overidentifying restrictions. The one-step approach is of interest because the resulting estimators of λ and λ are related to the cross-sectional regression estimator of the factor risk premiums in the traditional two-pass approach to testing linear factor pricing models. In particular, they are obtained by performing a cross-sectional regression of ˆµ R on the columns of ˆΩ RF and ˆΣ RF, respectively, where ˆµ R, ˆΩ RF, and ˆΣ RF denote the sample analogs of µ R = E[R e t], Ω RF = E[R e tf t], and Σ RF = Ω RF µ R µ F. 15

4.2. Using mimicking portfolios to detect identification problems Burnside (2014) argues that several recent studies of consumption-based SDF specifications show signs of identification problems that are consistent with the presence of useless factors, i.e., factors that are either uncorrelated with excess asset returns, or that are linear combinations of other factors in the model. To see why including a useless factor is problematic, consider the single-factor case (i.e., K = 1). It is easy to see that λ is identified only if F t has a non-zero covariance with the excess return on at least one asset. Thus identification fails if we specify a useless factor. In contrast, λ is identified even if the factor is useless. Indeed, if Cov(R e t, F t ) = 0, then E(R e tf t ) = E[R e t]e[f t ], which implies that the pricing restriction in equation (4.3) is satisfied for all assets for λ = 1/E[F t ]. Either scenario is clearly problematic. Under the mean normalization, including a useless factor invalidates standard large-sample test procedures. This means that the resulting inferences are unreliable. However, the potential consequences of including a useless factor under the intercept normalization are even more worrisome, because the factor can appear to explain a large part of the cross-sectional variation in average returns. While it is tempting to argue that the presence of a useless factor is unlikely to escape detection, this is not necessarily the case. For example, Burnside (2011) shows that the consumption-based SDF used by Lustig and Verdelhan (2007) to investigate the pricing of excess currency portfolio returns shows very little evidence of being correlated with these returns. Although it is impossible to completely rule out identification problems, the pricing tests can easily be structured such that these problems have a low probability of going unrecognized. Consider the implications of expressing F t as (4.5) F t = ς + ω R e t + u t, where the scalar ς and M 1 vector ω are chosen such that E[u t ] = 0 and E[u t R e t] = 0. Equation (4.5) decomposes F t into three components: a constant, the excess return for the portfolio that is maximally correlated with F t (i.e., a factor-mimicking portfolio), and a residual that is uncorrelated with the excess asset returns. The hypothesis that F t is useless can be formulated as ω = 0. Hence, it can be tested using a Wald statistic of the form (4.6) J ω T = T ( ˆω 1 ˆV ω ˆω), where ˆV ω denotes a consistent estimator of the asymptotic covariance matrix of T ( ˆω ω). Under the null hypothesis, JT ω has a chi-square distribution with M degrees of freedom in large samples. If ω = 0 can be rejected at a 5% or 1% significance level, then we can be reasonably certain that λ is identified. It is well known that replacing F t with its mimicking portfolio yields a pricing restriction that can be tested using standard regression techniques. This is because Rp,t e = ω R e t is an excess portfolio return that must be priced in the same way as any other excess return. To implement the approach, we simply substitute equation (4.5) into equation (4.1) to obtain (4.7) E[R e t] = Cov(R e t, R e p,t)λ, 16

where λ = E[R e p,t]/var(r e p,t), and then note that this restriction can be expressed as (4.8) E[R e t] = βe[r e p,t], where β = Cov(R e t, R e p,t)/var(r e p,t). 4 This single-beta model implies that all of the crosssectional variation in the expected excess asset returns is due to cross-sectional variation in the betas with respect to the mimicking portfolio. If we use this approach to replace every non-traded factor, then we can conduct the pricing tests by regressing the excess asset returns on the excess returns for the set of mimicking portfolios. In other words, we can use the familiar Gibbons et al. (1989) approach to test the model. Balduzzi and Robotti (2008) show how to encompass this regression-based approach within the GMM framework. The key is to estimate the mimicking-portfolio weights jointly with the parameters of the pricing model. This ensures that the test statistics properly reflect the uncertainty regarding the true portfolio weights. The procedure is as follows. 4.3. Using mimicking portfolios to conduct pricing tests Suppose we want to test a K-factor specification. The weights of the factor-mimicking portfolios are defined by the K(M + 1) 1 vector of moment conditions (4.9) E ( F t ς ω R e t (F t ς ω R e t) R e t ) = 0, where ς is a K 1 vector and ω is an M K matrix. Similarly, the coefficients for the multivariate regression of the excess asset returns on R e p,t = ω R e t are defined by the M(K + 1) 1 vector of moment conditions (4.10) E ( R e t α β R e p,t) (R e t α β R e p,t) R e p,t ), where α is an M 1 vector of intercepts and β is a K M matrix of slopes. I estimate ς, ω, α, and β jointly with γ F = E(R e p,t) using an exactly-identified (K(M + 1) + M(K + 1) + K) 1 vector of moment conditions. The testable implication of the model is α = 0. I test this hypothesis using the Wald statistic (4.11) J α T = T ( ˆα 1 ˆV α ˆα), where ˆV α denotes a consistent estimator of the asymptotic covariance matrix of T ( ˆα α). Under the null hypothesis, JT α has a chi-square distribution with M degrees of freedom in large samples. Additional details are provided in Appendix A. 4 More specifically, I substitute for F t µ F in equation (4.1) using equation (4.5) along with the relation ς = µ F ω µ R. 17

5. DATA, DESCRIPTIVE STATISTICS, AND ESTIMATED MARGINAL EFFECTS The empirical analysis employs data from a number of sources that are widely used in asset pricing research. These include Compustat, the Center for Research in Security Prices (CRSP), the National Income Product Account (NIPA) tables published by the Bureau of Economic Analysis (BEA), and the Ken French on-line data library. Appendix B contains a detailed description of the methods used to construct the quarterly time series that are employed in the cross-sectional regressions and asset pricing tests. The quarterly observations begin in the second half of 1963 and end in 2009. 5 The choice of characteristics is guided by prior research. I consider the ratio of gross profits to total assets (GP/TA), the logarithm of the BE/ME ratio (log(be/me)), the logarithm of ME in millions (log(me)), the stock return from 1 month prior to the beginning of the regression quarter to the beginning of the quarter (R 1:0 ), and the stock return from 12 months prior to the beginning of the regression quarter to 1 month prior to the beginning of the quarter (R 12:1 ). Novy-Marx (2013) reports that all of these variables are jointly significant in explaining the cross-section of monthly stock returns, and that the explanatory power of the GP/TA ratio rivals that of the BE/ME ratio. I also consider two other characteristics from the recent literature: the ratio of net capital expenditures to total assets (EX/TA), and the ratio of cash equivalents to total assets (CH/TA). The former is included because a number of studies report that capital investment is inversely correlated with subsequent stock returns (see, e.g., Titman et al., 2004; Cordis and Kirby, 2015), and net capital expenditures is a key input in calculating free cash flow. The latter is included because there is emerging evidence that cash holdings are directly correlated with subsequent stock returns (Huang and Wang, 2009). The definitions of the SDF variables mirror those employed in prior studies. I measure nondurable consumption by combining quarterly personal consumption expenditures on nondurable goods with those on services. To compute nominal values of this measure, I use data on the quarterly growth rate of each category of expenditures along with data on the expenditure shares over my sample period. I divide the resulting values by the U.S. population numbers reported in the NIPA tables to obtain quarterly per capita nominal consumption levels, and convert the per capita levels into a real per capita growth rate series { c ns,t } T t=1. The series of quarterly inflation rates {π ns,t } T t=1 used to convert nominal to real values is obtained by combining the quarterly deflator for nondurable goods with that for services. The real stock of durable goods is computed from quarterly personal consumption expenditures for durable goods by applying the perpetual inventory model. As in Yogo (2006) and Burnside (2011), the quarterly depreciation rates used for these calculations are imputed from the real end-of-year stock of durable goods reported in the NIPA tables. I divide the quarterly real stock of durable goods by the U.S. population numbers reported in the NIPA tables to obtain quarterly per capita values, and convert these values into a real per capita 5 The sample used for the pricing tests ends in 2009 to reserve the data needed to construct m t for the ultimate consumption CAPM. This requires quarterly consumption growth rates through 2012 with H = 11. 18