Why Does Stock Market Volatility Change Over Time? A Time-Varying Variance Decomposition for Stock Returns

Why Does Stock Market Volatility Change Over Time? A Time-Varying Variance Decomposition for Stock Returns Federico Nardari Department of Finance W. P. Carey School of Business Arizona State University John T. Scruggs Department of Banking and Finance Terry College of Business University of Georgia April 25, 2005 Scruggs acknowledges financial support from a Terry-Sanford Summer Research Award. Correspondence: Federico Nardari, Main Campus, PO Box 873906, Tempe, AZ 85287-3906, tel: (480) 965-7961, fax: (480) 965-8539, e-mail: Federico.Nardari@asu.edu. Correspondence: John T. Scruggs, Department of Banking and Finance, Terry College of Business, University of Georgia, Brooks Hall, Athens, GA, 30602-6253, tel: (706) 542-3649, fax: (706) 542-9434, e-mail: jscruggs@terry.uga.edu.

Why Does Stock Market Volatility Change Over Time? A Time-Varying Variance Decomposition for Stock Returns Abstract We extend the variance decomposition model of Campbell (1991) to allow for time-varying stock market volatility. Specifically, we introduce a model in which the covariance matrix of the predictive vector autoregression (VAR) follows a multivariate stochastic volatility (MSV) process. This new VAR-MSV model permits the decomposition of unexpected real stock return variance into three time-varying components: variance of news about future dividends, variance of news about future returns, and a covariance term. We develop Bayesian Markov chain Monte Carlo (MCMC) econometric techniques for estimating the VAR-MSV model. These methods are well-suited for estimating models with latent stochastic volatilities, and are not subject to the small-sample biases and unit root problems that plague frequentist estimation of predictive regressions. We report strong evidence that real stock returns are predictable when the dividend-price ratio and a stochastically detrended short-term interest rate are employed as forecasting variables. The time-varying variance of news about future returns is the primary determinant of stock market volatility (both levels and changes). The variance of news about future dividends increased dramatically during the 1973 1974 recession and peaked during the 1980 recession before descending in the 1980s. However, its contribution to stock market volatility was offset by positive correlation between news about future dividends and news about future returns from 1974 1984. Key Words: Variance decomposition; Return predictability; Vector autoregression; Multivariate stochastic volatility; Markov chain Monte Carlo; Gibbs sampling. JEL Classification: G12 (Asset Pricing); C11 (Bayesian Analysis); C15 (Statistical Simulation Methods; Monte Carlo Methods); C32 (Multiple Equation Time Series Models).

Campbell (1991) asks the fundamental question: What moves the stock market? Intuition derived from a dynamic dividend growth model suggests that stock price movements are caused by either changes in rational expectations of future dividends (i.e., news about future dividends ) or changes in rational expectations of future returns (i.e., news about future returns ). Campbell (1991) captures this intuition in a clever model where expected returns are time-varying. Campbell s model decomposes the variance of unexpected real stock returns into components associated with uncertainty about future dividends and uncertainty about future real returns. The model is composed of: (1) an asset pricing framework based on a log-linear approximation of a dynamic dividend growth model, and (2) a vector autoregression (VAR) for forecasting stock returns. Campbell finds that the variability and persistence of expected stock returns account for a considerable degree of volatility in unexpected stock returns. He finds that the variance of news about future dividends accounts for only one-third to one-half of the variance of unexpected stock returns. In this paper we ask a different, but closely related, question: Why does stock market volatility change over time? It is well-established that stock market volatility is time-varying. This fact has important implications for asset pricing models, portfolio allocation decisions, risk management, and derivative security pricing. Previous research on the question has examined both microstructure explanations (e.g., trading volume, margin requirements, bid-ask spreads, etc.) and macroeconomic explanations (e.g., levels and volatilities of macroeconomic variables). 1 A well-known example is Schwert (1989), which reports that stock market volatility is higher during recessions, but is puzzled to find only weak evidence that it is associated with the time-varying volatility of macroeconomic variables (e.g., inflation, monetary growth, industrial production). Motivated by Campbell s approach, this paper is the first to explore whether time-varying stock market volatility is driven by uncertainty about future dividends or uncertainty about future real returns. 2 We introduce an extension of the Campbell (1991) model that allows the covariance matrix of the VAR to vary over time. Specifically, we assume that the covariance matrix of the VAR follows a multivariate stochastic volatility (MSV) process. The VAR-MSV model enables the decomposition of unexpected real stock return variance into three time-varying components: variance of news about future returns, variance of news about future dividends, and a covariance term. This time-varying variance decomposition allows us to investigate how the variance components vary 1 Bollerslev, Chou, and Kroner (1992) reviews some of this research. 2 Using intuition similar to Campbell s, Schwert (1989) suggests (p. 1116) that If macroeconomic data provide information about the volatility of either future expected cash flows or future discount rates, they can help explain why stock return volatility changes over time. 1

over time, and how they contribute (in levels and changes) to the variance of unexpected returns. We find that stock market volatility changes over time primarily because of the time-varying volatility of news about future returns. A time-series plot of the variance of news about future returns closely tracks (both in levels and in changes) the variance of unexpected real returns for our post-war sample period (1952:1 2002:12). The share of return variance attributable to news about future returns ranges from 0.71 to 1.20, but is typically less than one (the time-series average is 0.84). Many, but not all, episodes of high variance of news about future returns are associated with recessions. The other two components of stock market volatility play secondary roles. The variance of news about future dividends is typically much smaller than the variance of news about future returns. However, it did increase dramatically during the recessions of 1973 1975, 1980, and 1981 1982. The share of return variance attributable to news about future dividends ranges from 0.10 to 0.61 (time-series average is 0.24) in the post-war period, but is always less than the share attributable to news about future returns. When the variance of news about future dividends is at its highest (i.e., 1974 1983), its contribution to return variance is tempered by a covariance term that implies positive correlation between news about future dividends and news about future returns. When this correlation is positive (negative), return variance is less (more) than the sum of the variances of news about future dividends and news about future returns. The share of return variance attributable to this covariance term ranges from 0.21 (when the correlation is negative) to 0.79 (during the 1980 recession when the correlation is positive), but is typically in the neighborhood of zero. We find that monthly real stock returns are predictable using the dividend-price ratio and the stochastically detrended short-term interest rate. Since the predictive regressors (especially the dividend-price ratio) are highly autocorrelated, it follows that expected real stock returns are highly persistent. When expectations regarding future returns are revised, they are revised for many periods in the future. Since news about future returns is the discounted sum of such revisions in expectations, a small innovation in the expected return can cause a relatively large stock price movement. Likewise, a small increase in the volatility of expected return innovations can contribute to a relatively large increase in stock market volatility. Such changes in the variance of news about future returns are the primary reason that stock market volatility changes over time. We employ Bayesian Markov chain Monte Carlo (MCMC) methods to analyze the model. These 2

methods are well-suited for making inferences regarding return predictability, and this paper makes an important contribution in that area. Frequentist econometric methods (e.g., OLS) are problematic for three main reasons. First, OLS estimators are biased in small samples when lagged endogenous regressors are employed as predictive variables (see Stambaugh (1999)). Bayesian MCMC methods, in contrast, provide exact small-sample posterior densities for the model s parameters and other functions of interest. Second, the correct asymptotic distribution theory is unclear if a predictive variable is nearly integrated or exhibits a unit root (see Lewellen (2004), Campbell and Yogo (2005) and Torous, Valkanov, and Yan (2005)). Inferences based on Bayesian posterior densities are valid whether or not the predictive variable has a unit root (see Sims (1988) and Sims and Uhlig (1991)). Third, the OLS estimator bias corrections derived in Stambaugh (1999) and Lewellen (2004) apply to univariate predictive regressions. Stambaugh (1999) finds that extending these approaches to the VAR framework with multiple predictive regressors is not straightforward. In contrast, Stambaugh shows that the Bayesian approach can be readily extended to VAR predictive regressions. We implement just such an extension. 3 The Bayesian approach has other attractive features. Models with latent variables (stochastic volatilities in this case) are amenable to estimation using MCMC methods. Using the Bayesian approach, the parameter space is augmented with the stochastic volatilities and the Gibbs sampler algorithm effectively integrates out these nuisance parameters. The model parameters and latent variables are estimated simultaneously, so posterior densities implicitly incorporate estimation error (i.e., parameter uncertainty). Our Bayesian MCMC estimation methodology for estimating SV models builds on techniques developed in Chib, Nardari and Shephard (2002, 2005). The VAR approach to forecasting asset returns is used in many strands of the asset pricing literature. Following Campbell (1991), many researchers employ a VAR framework coupled with a log-linear approximation of the present value relation to decompose unexpected asset returns. Papers that employ this approach include Campbell (1993), Campbell and Ammer (1993), Campbell (1996), Ammer and Mei (1996), Lamont (1998), Lamont and Polk (2001), Hollifield, Koop, and Li (2003), and Campbell and Vuolteenaho (2004). Hodrick (1992) and Stambaugh (1999) examine the statistical properties of predictive regressions (including VARs) using lagged endogenous regressors. Kandel and Stambaugh (1996), Barberis (2000), and Shanken and Tamayo (2004) use a VAR framework to examine the sensitivity of asset allocations to evidence about return predictability. 3 In related work, Hollifield, Koop, and Li (2003) employ Bayesian MCMC methods to estimate VAR predictive regressions where the VAR errors are homoskedastic. 3

The VAR-MSV model introduced in this paper has potential applications in all of these strands of the literature. The remainder of the paper is organized as follows. Section I derives the VAR-MSV model. Section II discusses our Bayesian MCMC estimation methodology. The more technical aspects are included in the appendices. The data are described in Section III. We discuss our empirical results in Section IV. Section V concludes and suggests directions for future research. I The Model A Decomposing Unexpected Stock Returns We begin by decomposing unexpected real stock returns into components related to changes in rational expectations of future real dividends and future real stock returns. We employ a model derived by Campbell (1991). The model is based on a clever log-linear approximation of a dynamic accounting identity. 4 Let h t+1 denote the log real return on a stock held from times t to t + 1, and let d t+1 denote the log real dividend paid during period t + 1. Campbell (1991) shows that unexpected real stock returns can be decomposed into components related to changes in expectations regarding future dividends and changes in expectations regarding future real returns: h t+1 E t h t+1 = (E t+1 E t ) ρ j d t+1+j (E t+1 E t ) ρ j h t+1+j. (1) j=0 The parameter ρ is the average ratio of the stock price to the sum of the stock price and the dividend. It should be slightly less than one. In our empirical work, we set ρ = 0.997 for the 1952:1 2002:12 sample of monthly data. It is clear from this model that an unexpected positive real return shock (i.e., good news) must be caused by either an upward revision in expected dividend growth or a downward revision in expected future real returns. Let η t+1 h t+1 E t h t+1 define the unexpected real stock return in period t + 1. For notational 4 Campbell derives the equation by taking a first-order Taylor series approximation of the present value relation for a dividend-paying stock. The approximate equation is solved forward, imposing the terminal condition that the log dividend-price ratio does not follow an explosive process. The log-linear approximation technique was originally proposed in Campbell and Shiller (1988a, b). j=1 4

convenience, we express (1) more simply as: η t+1 = η d,t+1 η h,t+1, (2) where η d,t+1 represents news about future dividends and η h,t+1 represents news about future real returns. B VAR Approach to Estimating Unobservable Components Following Campbell (1991), we employ a VAR model to forecast future real stock returns. Let z t+1 denote a k 1 state vector of variables known at the end of period t + 1. The real stock return h t+1 is the first element in z t+1. The remaining elements of z t+1 are forecasting variables. We assume that the state vector z t+1 follows a first-order VAR process: z t+1 = Az t + w t+1 (3) where z t is the lagged state vector, A is a conforming coefficient matrix, and w t+1 is the error vector. In the homoskedastic VAR model employed by Campbell (1991), the error vector is assumed to be distributed multivariate normal, w t+1 N k (0, Σ). Multi-period forecasts of the state vector are obtained by matrix multiplication, E t z t+j = A j z t. Let e1 denote a k 1 vector with a 1 in the first row and zeroes in the others. e1 is designed to pick the first element from a state or error vector. For example, e1 z t+1 = h t+1 picks the real stock return from the state vector, and e1 w t+1 = η t+1 = h t+1 E t h t+1 picks the unexpected stock return from the error vector. Campbell (1991) combines the first-order VAR forecasting model with the previously described asset pricing framework to decompose unexpected real stock returns into news about future returns and news about future dividends. It follows from (1) that the news about future returns 5

(i.e., the discounted sum of revisions in expected real returns) can be written as η h,t+1 = (E t+1 E t ) ρ j h t+1+j j=1 = e1 ρ j A j w t+1 j=1 = e1 ρa(i ρa) 1 w t+1 = λ w t+1 (4) where λ = e1 ρa(i ρa) 1 is a nonlinear function of the VAR coefficients. Since η t+1 = e1 w t+1, it follows from the error decomposition in (2) that η d,t+1 = (e1 + λ )w t+1. (5) The vectors λ and (e1 + λ ) map the VAR innovations into news about future returns and news about future dividends, respectively. It should be emphasized that Campbell (1991) doesn t explicitly model dividend growth or news about future dividends. The term η d,t+1, as defined in (5), is residual in nature. Although Campbell refers to this term as news about future dividends, it might reasonably include a noise component. Campbell doesn t discuss the possible contribution of irrational mispricing or noise trading (i.e., trading for reasons unrelated to fundamentals) to unexpected stock returns. It s not clear whether predictable variation in real stock returns tracks rational variation in the market risk premium or is due to predictable corrections of irrational mispricing (e.g., mean reversion). The model in (1) is based on a dynamic accounting identity, not an economic model of rational investor behavior. Distinguishing between rational and irrational stories about return predictability is an important pursuit, but it is not the focus of this paper. 5 In his conclusion, Campbell emphasizes several caveats that apply equally to this paper. In particular, he cautions that the return decomposition cannot be given an unambiguous structural interpretation. Since the variables in the VAR are determined simultaneously, it is probably overly simplistic to conclude that news about future returns and news about future dividends determine unexpected stock returns. 5 Shanken and Tamayo (2004) investigates the relative roles of played by risk and mispricing in explaining stock return predictability. 6

C Multivariate Stochastic Volatility Campbell (1991) assumes that the error vector in (3) is homoskedastic (i.e., w t+1 N k (0, Σ)). A distinctive feature of our model is the multivariate stochastic volatility (MSV) specification for the covariance matrix Σ t+1. In extending the VAR model to the heteroskedastic case, we would like a specification for Σ t+1 that is sufficiently rich to allow for non-zero time-varying correlations between VAR errors. However, our methodology for estimating stochastic volatility processes, discussed Section II below, assumes that the errors are orthogonal. Thus, we need to transform the covariance matrix Σ t+1. This is not an obstacle. Because Σ t+1 is a positive definite symmetric matrix, there exists a unique triangular factorization Σ t+1 = BV t+1 B where B is a k k lower triangular matrix with unit elements on the main diagonal, B = 1 0 0 b 21 1 0...... b k1 b k2 1, (6) and V t+1 is a k k diagonal matrix with positive elements on the main diagonal. 6 transformation, the VAR system in (3) can be rewritten With this z t+1 = Az t + BV 1/2 t+1 ɛ t+1 (7) where ɛ t+1 N k (0, I k ) and V t+1 is a diagonal matrix of stochastic variances. With this specification, the VAR errors are permitted to have non-zero, time-varying contemporaneous correlations. Let v j,t+1 denote the log of the jth element of V t+1, such that V t+1 = diag{exp(v 1,t+1 ),..., exp(v k,t+1 )}. (8) We assume that each of the log-variances in (8) follows an independent autoregressive (AR) process, v j,t+1 = µ j + φ j (v j,t µ j ) + σ j ζ j,t+1, (9) where ζ j,t+1 N (0, 1) and E[ζ t+1 ζ t+1 ] = I k. Stochastic volatility (SV) models such as this offer an 6 Section 4.4 of Hamilton (1994) discusses the triangular factorization. 7

attractive alternative to models from the GARCH family. The chief difference between a GARCH model and (9) is the variance innovation term σ j ζ jt+1 in the SV model. The variance innovation in a GARCH model is typically the lagged squared innovation from the return process. SV models permit innovations in both the return and variance processes. Taylor (1994) and Ghysels, Harvey, and Renault (1996) review SV models such as (9). Articles that develop econometric techniques for estimating SV models include Taylor (1986), Harvey, Ruiz, and Shephard (1994), Jacquier, Polson, and Rossi (1994), Kim, Shephard, and Chib (1998), and Chib, Nardari and Shephard (2002, 2005). D Variance Decomposition From (1), the variance of unexpected real returns can be decomposed into three terms, Var(η t+1 ) = Var(η d,t+1 η h,t+1 ) = Var(η d,t+1 ) 2Cov(η d,t+1, η h,t+1 ) + Var(η h,t+1 ). (10) Uncertainty about returns in period t + 1 can be expressed as Var(η h,t+1 ) = λ Σ t+1 λ. (11) Likewise, uncertainty about dividends in period t + 1 can be expressed as Var(η d,t+1 ) = (e1 + λ )Σ t+1 (e1 + λ ). (12) And, the covariance is Cov(η d,t+1, η h,t+1 ) = (e1 + λ )Σ t+1 λ. (13) Campbell (1991) reports ratios of the unconditional versions of these three terms to the unconditional variance of real returns. We plot corresponding time-varying ratios involving these three terms. The denominator of each ratio is the variance of the unexpected real return, Var(η t+1 ) = Σ 11,t+1 = e1 Σ t+1 e1. By construction, these three ratios sum to one at each point in time. The difference between the shares of Var(η t+1 ) attributable to Var(η d,t+1 ) and Var(η h,t+1 ) is an 8

intertemporal constant. Given the assumption that Σ t+1 = BV t+1 B, this is a mechanical result: Var(η d,t+1 ) Var(η h,t+1 ) Var(η t+1 ) = (e1 + λ )Σ t+1 (e1 + λ ) λ Σ t+1 λ e1 Σ t+1 e1 = 1 + 2λ 1 + 2λ 2 B 21 + 2λ 3 B 31. (14) Var(η t+1 ) and Var(η h,t+1 ) are estimated directly. Recall from equations (2) and (5) that that η d is residual in nature. It follows that Var(η d,t+1 ) and Cov(η d,t+1, η h,t+1 ) are also somewhat residual in nature, and are estimated indirectly conditional on the variance decomposition in (10). II Bayesian Estimation OLS estimation of VAR predictive regressions is problematic for several reasons. Employing lagged endogenous regressors, especially highly autocorrelated ones, can induce finite-sample biases in predictive regressions. 7 This is because return innovations are correlated with innovations in the predictive regressors. Stambaugh (1999) derives the finite-sample properties of the OLS estimator in this case and proposes a correction for the bias. However, the validity of the correction depends critically on the assumed stationarity of the predictive variable (see Lewellen (2004) and Torous, Valkanov, and Yan (2005)). For many well-known predictive variables, this is a heroic assumption. If the predictive variable s order of integration is uncertain, then the asymptotic distribution of the OLS estimator is of non-standard form and traditional inferential tools (e.g., t-tests and p-values) are invalid (see Campbell and Yogo (2005) and Torous, Valkanov, and Yan (2005)). Furthermore, the bias corrections suggested in Stambaugh (1999) and Lewellen (2004) are for univariate predictive regressions. As discussed in Stambaugh (1999), the extension to models with multiple predictive variables is not straightforward. We employ Bayesian Markov chain Monte Carlo (MCMC) methods to estimate the model. Unlike frequentist methods, Bayesian methods treat the parameters as random variables given a likelihood function and fixed data. The Bayesian approach is natural in this setting because the regressors are stochastic rather than deterministic (i.e., fixed in repeated samples). The Bayesian approach to estimating predictive regressions has many attractive features. Bayesian methods deliver exact finite-sample posterior densities for both parameters and features of interest. Inferences 7 See Mankiw and Shapiro (1986), Nelson and Kim (1993), and Stambaugh (1999) for detailed discussions. 9

made from posterior densities are valid even if the predictive regressor is nearly-integrated or exhibits a unit root (see Sims (1988) and Sims and Uhlig (1991)). And, as shown in Stambaugh (1999), the Bayesian approach can be readily extended to models with multiple predictive regressors. We implement such an extension for the VAR forecasting model. Bayesian estimation requires three elements: the data, a likelihood function dictated by the model, and prior densities for the model s parameters. For illustration of the general principle, let D denote the data, p(d ψ) denote the likelihood function, and π(ψ) denote the prior density for the parameter set ψ. In Bayesian analysis, the object of interest is the joint posterior density of the parameters given the data, π(ψ D). Following Bayes rule, the joint posterior density for the parameters is proportional to the product of the likelihood function and the prior density on the parameters: π(ψ D) p(d ψ)π(ψ). Bayesian inference is accomplished by analyzing the joint posterior density of the model s parameters, or other functions of interest. A Likelihood Functions We consider two models: the homoskedastic VAR model in equation (3), and the VAR-MSV model in (7). For notational convenience, let z = {z 1,..., z T } denote the dependent variables of the VAR systems in (3) and (7). A.1 Homoskedastic VAR Model Let ψ H = {A, Σ} denote the parameters of the homoskedastic VAR model in (3). The likelihood function (i.e., sampling density) for the homoskedastic VAR model is p(z ψ H ) = = T p(z t ψ H, z t 1 ) t=1 T N k (z t Az t 1, Σ) (15) t=1 where N k a k-variate Normal density. Appendix A describes the probability distribution functions and notation used in this paper. 10

A.2 VAR-MSV Model The parameters of the VAR-MSV model in (7) are the VAR matrix A and the lower triangular matrix B. The latent variances in the diagonal matrix V t+1 are generated by the k stochastic volatility processes described in (9). Let θ j = {µ j, φ j, σ j } denote the parameters of the jth stochastic volatility process. And, let ψ MSV = {A, B, θ 1,..., θ k } denote the full parameter set for the VAR-MSV model. The likelihood function for the VAR-MSV model in (7) is p(z ψ MSV ) = = T t=1 T t=1 p(z t ψ MSV, z t 1, v t )p(v t ψ MSV )dv t N k (z t Az t 1, BV t B )p(v t θ, F t 1 )dv t (16) where F t 1 = {v 1,..., v t 1 } denotes the history of the latent log-variance processes through t 1. Note that the parameters θ = {θ 1,..., θ k } do not enter (16) directly. Rather, the likelihood function is evaluated by integrating over the latent log-variances v t = [v 1,t+1 v k,t+1 ]. Estimation of this model using frequentist methods (e.g., maximum likelihood) is impractical because the integral in (16) in analytically intractable. However, the model is amenable to estimation using Bayesian MCMC methods. The Gibbs sampler, discussed below in Section II.C, effectively integrates over the latent log-variances to obtain the joint posterior density of the model s parameters and functions of interest (e.g., variance components). B Priors In Bayesian econometrics, prior densities summarize the researcher s prior beliefs about the model s parameters. Following Hollifield, Koop, and Li (2003), we consider four priors: Prior 1 An uninformative base prior. Prior 2 Base prior with covariance stationarity imposed. Prior 3 Base prior with covariance stationarity imposed and stochastic initial condition. Prior 4 Prior 3 with an additional features of interest prior. 11

B.1 Base Prior For the homoskedastic VAR model, we assume an independent Normal-Wishart prior. We will refer to this as the base prior for the homoskedastic VAR model. Specifically, we assume that the priors are independent and can be factored π(a, Σ) = π(a)π(σ), (17) where π(a) = N k 2 ( a a0, A 1 ) 0 (18) π(σ 1 ) = W ( Σ 1 S 0, s 0 ). (19) Note that a = vec(a ) where the vec( ) operator stacks the columns of its argument. We assume that Σ 1 is distributed Wishart. This is equivalent to assuming that Σ is distributed inverse- Wishart. Appendix A describes the probability distributions and notation used in the paper. The researcher s prior beliefs are reflected in the choice of hyperparameters a 0, A 0, s 0 and S 0. The Normal-Wishart prior converges to the diffuse or flat prior employed by Stambaugh (1999) as a 0, A 0, s 0 and S 0 all converge to zero. Our choice of hyperparameters for the base prior corresponds to relatively uninformative prior beliefs regarding the parameters. We discuss our choice of hyperparameters further with the empirical results. For the VAR-MSV model, we assume the priors are independent and can be factored π(a, B, θ 1,..., θ k ) = π(a)π(b)π(θ 1 ) π(θ k ). (20) We assume π(a) is the same as the homoskedastic VAR model (see (18)). Let b denote the (k(k 1)/2) 1 vector containing the free elements of B. The base prior for b is also Normal: ( π(b) = N k(k 1)/2 b b0, B 1 ) 0. (21) The priors π(θ j ) = π(µ j )π(φ j )π(σ j ) are chosen to be relatively uninformative regarding the 12

dynamics of the SV processes described in (9). For µ j, the unconditional log variance, we assume: π(µ j ) = N (µ j m 0j, M 0j ). (22) φ j reflects the persistence of the SV process. We constrain φ j to the region of stationarity (i.e., φ j < 1) by invoking the change of variable φ j = 2φ j 1 and assuming that φ j is distributed Beta: ( ) π(φ j) = B φ j φ (1) 0j, φ(2) 0j. (23) We assume that σ j, the volatility of the SV process, is distributed Inverse-Gamma: π(σ j ) = IG (σ j α 0j, β 0j ). (24) The choice of hyperparameters {m 0j, M 0j, φ (1) 0j, φ(2) 0j, α 0j, β 0j } are discussed with the empirical results. B.2 Base Priors with Covariance Stationarity When lagged endogenous regressors are employed in predictive regressions (including VARs), inference is very sensitive to high autocorrelation, and especially a unit root, in a predictive variable. If the researcher has prior beliefs regarding the stationarity of the predictive variable (covariance stationarity in the case of a VAR), then inference can be enhanced by incorporating those prior beliefs. The matrix A is covariance stationary if its eigenvalues lie inside the unit circle (i.e., A < 1). 8 The base prior does not restrict A to the region of covariance stationarity. As demonstrated in Hollifield, Koop, and Li (2003), sampling A from outside the region of covariance stationarity has dramatic (perhaps pathological) effects on posterior densities. Let S {A R k k such that A < 1} denote the region of covariance stationarity, and let I(A S) be the corresponding indicator variable. Following Stambaugh (1999) and Hollifield, Koop, and Li (2003), we entertain prior beliefs regarding the covariance stationarity of the VAR. Under this prior belief, A is constrained to the region of covariance stationarity and (18) is rewritten as: 8 See Hamilton (1994) p. 259. π(vec(a)) = N k 2 ( vec(a) a0, A 1 ) 0 I(A S). (25) 13

The covariance stationarity prior on A applies to both the homoskedastic VAR and VAR-MSV models. Priors on all other parameters are identical to the base prior. B.3 Base Prior with Covariance Stationarity and Stochastic Initial Condition In the standard regression framework, the initial condition z 0 is assumed to be deterministic. In that case, it would be appropriate to work with the priors and likelihood functions discussed above. For the application we consider, the assumption of deterministic regressors is clearly invalid. Following Stambaugh (1999) and Hollifield, Koop, and Li (2003), we consider estimation of the VAR models under the alternative assumption that the initial condition z 0 is stochastic rather than deterministic. First, consider the homoskedastic VAR model. Since the initial condition is stochastic, z 0 can be treated as a parameter with its own prior density. Our prior on z 0 is a function of the other parameters of the model. Given ψ H and assuming covariance stationarity, we assume that z 0 is drawn from its unconditional or steady-state density, p(z 0 ψ H ) = N k (z 0 0, Ω 0 ), (26) where vec(ω 0 ) = [I (A A)] 1 vec(σ). (27) Equations (26) and (27) imply that the vector z 0 is drawn from a multivariate Normal density with mean zero (since we work with demeaned data) and unconditional covariance matrix Ω 0. Stambaugh (1999) distinguishes between conditional and exact likelihood functions. In Stambaugh s terminology, the likelihood function in (15) is conditional on observing the initial condition z 0. When the initial condition is stochastic rather than deterministic, Stambaugh (1999) employs an exact likelihood function which corresponds to the product of the conditional likelihood in (15) and the density in (26). Hollifield, Koop, and Li (2003) treat the density in (26) as a prior. So, depending on one s perspective, the density in (26) could be considered a prior or a component of the likelihood function. Since the posterior density is proportional to the product of the likelihood function and the priors, the distinction is immaterial. The inference will be the same. Next, consider the VAR-MSV model. Given ψ MSV, we assume that z 0 is drawn from its uncon- 14

ditional density, p(z 0 ψ MSV ) = N k (z 0 0, Ω 0 ) (28) where vec(ω 0 ) = [I (A A)] 1 vec ( BV 0 B ) (29) { ( V 0 = diag exp µ 1 + 1 σ 2 ) ( 1 2 (1 φ 2 1 ),..., exp µ k + 1 σ 2 )} k 2 (1 φ 2 k ). (30) The exponential term exp(µ j + σj 2/2(1 φ2 j )) in (30) is the unconditional or steady-state variance of the jth independent SV processes described in (9). It follows that BV 0 B is the unconditional covariance matrix for the VAR system error vector. B.4 Features of Interest Prior Hollifield, Koop, and Li (2003) suggests that parameters for a reduced-form VAR such as (3) are difficult to interpret. This makes eliciting prior beliefs regarding the model s parameters equally difficult. However, researchers may have prior beliefs regarding features of interest. In the present model, we are interested in the terms of the variance decomposition. These features of interest are highly non-linear functions of the model s parameters. We discuss technical aspects of the features of interest prior in Appendix B. C Prior-Posterior Analysis Combining the likelihood function and the priors via Bayes rule, one obtains the joint posterior density of the model s parameters given the data. Given the form of the likelihood functions in (15) and, especially, in (16), and given any of the priors described above, the joint posterior cannot be estimated (i.e., sampled) directly. Fortunately, the Gibbs sampling method bypasses the computation of the likelihood function and computation of the joint posterior density. Rather, the Gibbs sampler algorithm generates draws from the conditional distribution of each block of parameters (i.e., the distribution of each block given the data, the prior and the other blocks of parameters). The draws from these conditional densities eventually converge to draws from the joint posterior density. Inference is based on summary statistics (e.g., mean, standard deviation, etc.) describing the distribution of the sample draws of the model s parameters, and of functions 15

thereof. Bayesian estimation of the homoskedastic VAR model is discussed in detail in Hollifield, Koop, and Li (2003). Our Bayesian treatment of the homoskedastic VAR model is essentially the same, and we refer the reader to that paper for details. For the VAR-MSV model, note that it is the latent log-variances (v t ), and not the parameters of the SV processes (θ), that appear in the likelihood function (16). Bayesian MCMC methods treat latent variables as parameters and, thus, produce a sample from their posterior distribution as well. The other parameters are then drawn conditioning not only on the data, but also on the simulated values of the latent variables. This approach is known as data augmentation. Conditioning on the latent variables, the sampling density of z t can be written z t A, B, z t 1, v t N k (Az t 1, BV t B ) (31) Let y jt = ln(b 1 j (z t Az t 1 )) 2 and {v j } = (v 1,..., v k ), where v j = (v j1,..., v jt ) denotes the time series of the jth latent SV process. In similar fashion, let y j = (y j1,..., y jt ). Draws from the (augmented) posterior density are obtained by cycling through the following steps. 1. Initialize B, z 0 and v j for j = 1,..., k. 2. Sample A z, {v j }, B, z 0. 3. Sample B z, {v j }, A, z 0. 4. Compute y j and sample θ j and {v j } by repeating the following steps for j = 1,..., k: (a) Draw θ j from θ j y j (b) Draw v j from v j y j, θ j. 5. Sample z 0 B, z, {v j }, A 6. Go to step 2 and repeat. Under Prior 1 and Prior 2, step 5 and any conditioning on z 0 in the other steps are omitted. Details on each step of the simulation are provided in Appendix B. 16

III Data We closely follow Campbell (1991) in constructing the data set. The vector z t = {h t (D/P ) t rb t } includes three elements: a real stock index return, a dividend-price ratio, and a stochastically detrended short-term interest rate. The dividend-price ratio (or dividend yield) and relative T-bill rate are chosen for their well-documented ability to forecast stock returns. Fama and Schwert (1977) first document the forecasting power of short-term interest rates. Shiller (1984), Fama and French (1988), and Campbell and Shiller (1988a) were among the first papers to report the significant forecasting power of the dividend-price ratio. These results are not without controversy. Mankiw and Shapiro (1986), Nelson and Kim (1993), and Stambaugh (1999) find that OLS estimates of the dividend-price ratio s predictive power are biased in small samples (i.e., forecasting power is overstated) since the dividend-price ratio (a lagged endogenous regressor) is highly autocorrelated. Stambaugh (1999) and Lewellen (2004) find that incorporating prior information about the stationarity of the predictive regressor strengthens evidence of predictability. Ang and Bekaert (2004) report that the ability of the dividend-price ratio to predict returns is best visible at short horizons with the short rate as an additional regressor. 9 Given the state of the literature, we believe that our choice of predictive regressors for the VAR is reasonable, and that estimation of the model using Bayesian MCMC methods may provide useful insights on the predictability debate. We study the period 1952:1 2002:12 (612 monthly observations). We choose this period because of evidence that the data generating process was fundamentally different prior to 1952. As noted by Schwert (1989) and Kim, Nelson, and Startz (1991), stock returns in the pre-war period (particularly the depression years) were substantially more volatile than in any period since 1952. Furthermore, interest rates were artificially smooth prior to the 1951 Fed-Treasury accord. We construct a real total stock return index by deflating the CRSP monthly value-weighted index of NYSE stocks (with dividends) using the Consumer Price Index (CPI) obtained from CRSP. The log total return is defined h t = ln(p t + d t ) ln(p t 1 ) where P t is the real index level at t and d t is the real dividend paid between t 1 and t. Real returns are expressed in percent per month. We compute the dividend-price ratio in the manner of Fama and French (1988). We construct monthly dividend and price index series using the return series (with and without dividends) for 9 Other recent papers in the debate include Goyal and Welch (2003) and Campbell and Yogo (2005). 17

the CRSP monthly value-weighted index of NYSE stocks. The dividend-price ratio is the sum of the dividends paid over the previous twelve months divided by the current level of the stock price index, ( ) D P t = 1 11 d t j. P t j=0 The dividend-price ratio is expressed in percent per annum. We construct the stochastically detrended short-term interest rate in the manner of Campbell (1991) and Hodrick (1992). The relative bill rate is the one-month Treasury bill yield, y 1,t+1, less its twelve-month moving average, rb t = y 1,t 1 12 12 i=1 y 1,t i. The Treasury bill data are from Ibbotson & Associates. The relative bill rate is multiplied by 1200 to express it in percent per annum. The time-series of (D/P ) t and rb t are plotted in Figure 1. Figure 1 goes about here. IV Empirical Results A Priors We assume a prior for the A matrix that contains almost no information. Specifically, we set the hyperparameters a 0 = 0 and A 1 0 = 10 6 I k 2. This implies that VAR coefficients are normally distributed with mean zero and standard deviation 1000 (i.e., nearly flat). For the homoskedastic VAR model, the hyperparameters for the prior on the Σ matrix are S 0 = I k and s 0 = 8. The latter choice is the least informative within the values for which the prior mean of Σ exists. Our choices for A and Σ are identical to Hollifield, Koop, and Li (2003). For the B matrix in the VAR-MSV model, we set b 0 = 0 and B 1 0 = 10 I k(k 1)/2. For µ j, the average log-variance for error shocks, we set m 0j = 2 and M 0j = 25 for all j. At the prior mean, this implies an average volatility of about 8.3% per month with a standard deviation of over 18

200%. Again, this prior is very uninformative. For φ j, we set φ 1 0j = 20 and φ2 0j = 1.5 so that the distribution has a mean of 0.86 with standard deviation of 0.11, reflecting the expected persistence in the stochastic volatility dynamics. For σ j, we set α 0j = 2.39 and β 0j = 0.347 implying a mean of 0.25 and a standard deviation of 0.4 for the volatility of the log-variance. B Homoskedastic VAR Models The Campbell (1991) model allows us to address the question: What moves the stock market? Table I reports summary statistics describing the posterior densities for parameters of the VAR coefficient matrix A and the covariance matrix Σ for the homoskedastic VAR model. We report results obtained under the four priors discussed in Section II. Since the posterior densities for A and Σ are very robust to our different prior specifications, we discuss them together. Unless otherwise noted, our discussion will focus on the posterior densities obtained under Prior 4 (i.e., the features of interest prior). We find that the dividend-price ratio and the relative bill rate do forecast future real stock returns. Posterior estimates of A 12 and A 13 are 0.312 and 0.599, respectively. The inference is statistically reliable. For the dividend-price ratio, the 90% Bayesian confidence interval for A 12 is [0.091, 0.548], and only 0.0077 of the posterior density is less than zero. 10 For the relative bill rate, the 90% Bayesian confidence interval for A 13 is [ 0.837, 0.361], and only 0.0006 of the posterior density is greater than zero. High dividend-price ratios forecast higher than average future real stock returns, and high relative short-term interest rates forecast lower than average future real stock returns. The parameters in the second and third rows of A indicate that the dividend-price ratio and the relative bill rate would be described well by univariate first-order autoregressions. As expected, the dividend-price ratio is very highly autocorrelated. The posterior estimate of A 22 is 0.988 with a 90% Bayesian confidence interval of [0.978, 0.997]. This is where one might reasonably expect the covariance stationarity prior to play an important role. However, posterior estimates of A 22 are nearly unaffected by the imposition of covariance stationarity. The proportion of the posterior density of A 22 greater than one ranges from 0.0154 under Prior 1 to 0.0108 under Prior 4. We conclude that the dividend-price ratio is stationary for the 1952 2002 sample period. It is important to note that our inferences regarding predictability are not sensitive to the existence of 10 We use the term posterior estimate to denote the mean of the posterior density. The 90% Bayesian confidence interval describes the region between the 5% and 95% quantiles of the posterior density. Alternatively, we could use the standard deviation of the posterior density to describe the precision of the posterior estimate. 19

a unit root in the predictive regressors. As discussed in Sims (1988) and Sims and Uhlig (1991), Bayesian posterior densities, and hence inferences, are valid even when regressors are non-stationary. Campbell and Yogo (2005) and Torous, Valkanov, and Yan (2005) show that traditional frequentist inference tools (e.g., t-tests and p-values) may be invalid when regressors are nearly integrated. Table I goes about here. The posterior estimates of Σ and R 2 reported in Table I are also very robust to prior specification. The unconditional variance of unexpected stock returns, Σ 11, is 16.680. As expected, the covariance between unexpected real stock returns and innovations in the dividend-price ratio is strongly negative. The posterior estimate of Σ 21 is 0.618, which corresponds to a contemporaneous correlation of ρ 21 = 0.895. Innovations in the relative bill rate are weakly correlated with unexpected real returns (Σ 31 = 0.260, ρ 31 = 0.082) and innovations in the dividend-price ratio (Σ 32 = 0.010, ρ 32 = 0.077). Note that the parameters of Σ are estimated simultaneously with the parameters of A. Posterior estimates of R 2 are 0.030, 0.978 and 0.558 for h t+1, (D/P ) t+1 and rb t+1, respectively. 11 Campbell (1991) reported corresponding R 2 statistics of 0.024, 0.937 and 0.450 for 1952:1 1988:12. Posterior densities for the variance decompositions are reported in Table II. We report the shares of unexpected real stock return variance attributable to the variance components defined in (10). It is here that the importance of the covariance stationarity prior becomes evident. Under Prior 1 (i.e., the base prior), the posterior mean and standard deviation are blown up by a few MCMC draws from outside the region of covariance stationarity. In contrast, the median, 5% and 95% quantiles of the posterior densities appear reasonable and are very similar to those obtained under the other priors. The results for Prior 1 are consistent with the pathological results reported in Hollifield, Koop, and Li (2003). Table II goes about here. We find that most of the variance of unexpected real returns in the post-war period is attributable to news about real returns. Under Prior 2, posterior estimates of the shares due to Var(η d ), Var(η h ) and 2Cov(η d, η h ) are 0.267, 0.786 and 0.054, respectively. Under Prior 4 (i.e., the features of interest prior), posterior estimates of the shares change very little (0.261, 0.739 and 11 For each equation in the VAR, we sample R 2 = 1 SSE/SST for each iteration of the Gibbs sampler. 20

0.000), but the estimates are more precise (i.e., lower posterior standard deviations). These results are reasonably consistent with Campbell (1991), which reports shares of 0.127, 0.772 and 0.101, respectively, for the 1952:1 1988:12 period. Even when covariance stationarity is imposed (i.e., under Priors 2, 3 and 4), variance decompositions remain sensitive to draws of A near the edge of the region of stationarity. The resulting skewness in the posterior densities of the shares of Var(η) is evident in the differences between the posterior means and medians reported in Table II. Under Prior 4, the posterior medians are 0.236, 0.711 and 0.070, respectively. Bayesian posterior densities for two other features of interest are also reported in Table II. Sample draws for Corr(η d, η h ) are constructed from the sample draws of the model s other parameters. The posterior mean {median} for Corr(η d, η h ) is 0.068 { 0.090}, but the Bayesian confidence interval [ 0.520, 0.468] is relatively wide (i.e., imprecise) and includes zero. Thus, we can offer no conclusive inference regarding the sign of Corr(η d, η h ). Campbell (1991) reports the point estimate of Corr(η d, η h ) = 0.161 for 1952:1 1988:12, but is also inconclusive regarding the sign (std. error = 0.256). Following Campbell (1991), we also define a VAR persistence measure. Let P h σ(η h,t+1) σ(u t+1 ) = (λ Σλ) 1/2 (e1 AΣA e1) 1/2 (32) where η h,t+1 is the previously defined news about future returns, u t+1 = e1 Aw t+1 is the innovation to the expected return, and the σ( ) operator gives the standard deviation of its argument. The VAR persistence measure summarizes the economic effect of persistent expected returns: a 1% increase in the expected return causes a P h % decrease in the real stock price index. We report the posterior densities of P h for the homoskedastic VAR model in Table II. Under Prior 4, the posterior mean {median} is 7.252 {6.844} and the Bayesian 90% confidence interval is [4.076, 11.720]. For comparison, Campbell (1991) reports a point estimate of 5.794 (std. error = 1.469) for 1952:1 1988:12. Note that Campbell (1991) uses the delta method to compute standard errors for these features of interest. The Bayesian MCMC approach produces exact small-sample posterior densities for these features of interest. Although the predictive regressors explain only a modest fraction of the variance in real stock returns (i.e., R 2 = 0.030), we conclude that news about future returns explains the lion s share of the variance in unexpected real returns. How do we reconcile these two seemingly disparate conclusions? Expected real stock returns are very persistent. This characteristic is related to the persistence of the predictive regressors, particularly the dividend-price ratio. Viewed through the 21

lens of the asset pricing framework, a small innovation in the expected real return can cause a large unexpected stock return (i.e., P h = 7.252). If expected returns were constant (i.e., unforecastable), we would expect all of the variance in unexpected real stock returns to be attributable to news about future dividends. The data does not support this alternative hypothesis. Our results are consistent with previous empirical evidence that aggregate dividend growth is approximately constant (i.e., unforecastable). 12 C VAR-MSV Models The VAR-MSV model introduced in this paper allows us to address an additional question: Why does stock market volatility change over time? Given the asset pricing framework and VAR-MSV model for forecasting real stock returns, the variance of unexpected real returns has three timevarying components: variance of news about future returns, variance of news about future dividends, and a covariance term. We first examine the predictability of real stock returns. Table III reports summary statistics describing the posterior densities for parameters of the VAR-MSV model. Posterior estimates of the A matrix are very robust to prior specification, and are not appreciably different than those reported in Table I for the homoskedastic VAR model. We focus primarily on the posterior densities under Prior 4. The first row of A indicates strong evidence of real stock return predictability. Future real returns are positively related to the dividend-price ratio (A 12 = 0.295) and inversely related to the relative bill rate (A 13 = 0.526). For A 12, the proportion of the posterior density less than zero ranges from 0.0378 under Prior 1 to 0.0049 under Prior 4. For A 13, the proportion of the posterior density greater than zero is less than 0.0003 for all four priors. The dividend-price ratio is highly persistent. The posterior mean of A 22 is 0.991 with a 90% Bayesian confidence interval of [0.983, 0.998]. The covariance stationarity prior appears to play a more important role for the VAR-MSV model than for the homoskedastic VAR model. For A 22, the proportion of the posterior density greater than one ranges from 0.0904 under Prior 1 to 0.0123 under Prior 4. The relative bill rate is less persistent. The posterior mean of A 33 is 0.756 with a 90% Bayesian confidence interval of [0.738, 0.828]. Most of the persistence in expected returns is associated with the the forecasting power and persistence of the dividend-price ratio. The vectors λ and (e1 + λ ) (not reported in Table III) 12 Cochrane (2001) provides a lucid review and interpretation of the empirical evidence on long-run stock return predictability and excess volatility. Cochrane concludes (p. 405) that Return forecastability follows from the fact that dividends are not forecastable, and that the dividend/price ratio is highly but not completely persistent. 22