Expected Returns and Risk in the Stock Market M. J. Brennan∗ Anderson School, UCLA Alex P. Taylor† Manchester Business School January 9, 2016 Abstract In this paper we present new evidence on the predictability of stock returns, and examine the extent to which time variation in expected returns is due to time variation in risk exposure or due simply to mispricing or sentiment. In doing this we develop two new models for the prediction of stock market returns, one risk-based, and the other purely statistical. The pricing kernel model expresses the expected excess return as the covariance of the market return with a pricing kernel that is a linear function of portfolio returns. The discount rate model is based on the log-linear present value model of Campbell and Shiller and predicts the expected excess return directly as a function of weighted past portfolio returns. For aggregate market returns the two models provide independent evidence of predictable variation in returns, with R2 of 6 − 8% for 1-quarter returns and 10-16% for 1-year returns. For value-based arbitrage portfolios such as HM L we do find evidence of predictability from the discount rate model that is not captured by the risk-based model and this additional predictability is related to measures of time-varying sentiment and liquidity. Keywords: Predictability, Expected returns, Risk, Sentiment JEL Classification Codes: G12, G14, G17 ∗ Michael Brennan is Emeritus Professor at the Anderson School, UCLA, Professor of Finance, Manchester University. † Corresponding Author: Alex P. Taylor, Accounting and Finance Group, Manchester Business School, The University of Manchester, Booth Street West, Manchester, M15 6PB, England, e-mail: [email protected], Tel: +44(0)161 275 0441, Fax: +44(0)161 275 4023. 1 1 Introduction The two principal issues that arise in the literature on stock market predictability are whether returns are predictable, and whether the predictability arises from time variation in risk or whether it is a function of sentiment and waves of optimism and pessimism. The issue of whether returns are in fact predictable is a vexed one in view of the highly persistent nature of, and slight theoretical justification for, most of the predictor variables that have been used. These purely statistical concerns, which are exacerbated by the data mining that is implicit in the broad search for significant predictors by different researchers, are only partially alleviated by ‘out of sample’ tests. Nevertheless, despite some prominent papers challenging the existence of predictability,1 the professional consensus seems to be moving towards the view that returns are predictable. We provide new and strong support for this view. Moreover, since our predictive variables depend only on lagged portfolio returns, it is a simple matter to bootstrap the data under the null hypothesis of no predictability to obtain powerful tests of the null hypothesis. This is not generally possible for models that rely on macro-economic or accounting data series as predictor variables. There is less prior evidence on whether the measured time variation in returns can be attributed to time variation in risk, simply because many of the tests of predictability are motivated by either purely statistical models or simple present value models that exclude risk variation from consideration. Significant exceptions include Merton (1980) and Ghysels et al. (2005) who show that time-variation in market returns is driven, at least in part, by time-variation in market volatility, and Scruggs (1998) and Guo et al. (2009) who consider time-varying returns in relation to an ICAPM type pricing kernel. In this paper we show that time-variation in the covariance of the market return with a pricing kernel that is spanned either by the three Fama-French factors, or by the returns on the market portfolio and three portfolios formed on the basis of lagged dividend yield, can explain 14-16% of 1-year returns in sample, and 9-13% out of sample. We find no evidence that expected returns on the market portfolio are influenced by time-varying sentiment or liquidity. Our empirical analysis relies on two new models for the prediction of the market expected (excess) return or discount rate. The first model expresses the expected excess return as the covariance of the market return with a pricing kernel that is a linear function of portfolio returns. The second model predicts the expected excess return directly as a function of weighted past portfolio returns. The first model, which we refer to as the pricing kernel model, constrains the predictors with the discipline of an asset pricing model and assumes that the time variation in expected returns is driven solely by time variation in risk, where the risk of the market return is measured by its covariance with a portfolio which captures innovations in the pricing kernel. This model does not seem to be consistent with irrationality except insofar as it can be shown that the pricing kernel itself reflects ‘irrational’ concerns. This is a difficult task, even for the sceptic of rational pricing, if we accept the tag de gustibus non est disputandum, since the pricing kernel mirrors the marginal utility and therefore the tastes of the representative agent. The second model, which we refer to as the discount rate model, exploits the accounting identity of the log-linear present value model of Campbell and Shiller (1988), and combines this with the assumption of a factor structure of returns to identify shocks to the discount rate. 1 e.g. Goyal and Welch (2008). 2 Our analysis rests on the intuition that past returns will be negatively related to future returns insofar as realized returns reflect shocks to discount rates. However, since returns reflect news about future cash flows as well as about discount rates, the past series of returns on a portfolio of stocks provides only limited information about future expected returns. Our solution to this problem of contamination of the discount rate signal by cash flow news is to form a linear combination of portfolio returns to ‘soak up’ the cash flow news which enables us to identify the shocks to discount rates. By assuming a stochastic process for the discount rate and aggregating these shocks over time, we are able to arrive at an estimate of the current expected excess return. This approach is a purely statistical one that makes use of the accounting identity of Campbell and Shiller but that contains no economic assumptions. It is consistent with time variation in expected returns being generated by cycles of excessive optimism and pessimism and changing market liquidity,2 as well as being generated by time variation in risk and/or risk aversion. The two models offer largely independent evidence on the existence of predictability since, although they are similar in that they both extract information from past portfolio returns and assume the same AR(1) process for expected returns, they imply different sets of predictor variables. Therefore, evidence of a common predictable component from the two quite different models is strong support for predictability. Moreover, comparing the results for the two models casts light on the question on whether the predictability is rational or not. Under rational pricing the discount rate model and the pricing kernel model imply the same expected return series and, absent empirical problems concerning the ability of the selected portfolios to satisfy the spanning requirements of the two models, any predictability that is captured by the discount rate model but not by the pricing kernel model is in a sense outside the classical asset pricing framework. For the market portfolio, the expected return series generated by the two models of the discount rate are related, with the highest correlation between the time series of quarterly risk premium estimates from the two models being 0.62. Bootstrap simulations show that under the null hypothesis the chance of observing the levels of predictability that we find for the two models and a correlation between the two model predictions as high as we find is less than 1%. The pricing kernel model estimates seem to be superior. The maximum in-sample R̄2 obtained for this model is 8.3% for quarterly returns as compared with 5.6% for the discount rate model. When the models are used to predict 1-year excess returns the in-sample R̄2 for the parsimonious version of the pricing kernel model is 15.7% as compared with 9.9% for the discount rate model. Out of sample, the discount rate model does not improve on a naive forecast, while the pricing kernel model reduces the naive forecast error of one year market excess returns by 9-13%. The simple CAPM predicts that the market return spans the pricing kernel. While our findings are not consistent with this, we do find evidence that the component of the pricing kernel that is associated with the market return itself contributes significantly (at the 1% level) to time variation in expected market returns. We also find evidence that the projection of the pricing kernel onto the three Fama-French (1993) factors provides strong predictive power for market returns.3 This is consistent with these factors capturing important components of the pricing kernel, and provides evidence against those who attribute the empirical success of the Fama-French 3-factor model in pricing the cross-section of stock returns simply to data-mining (Mackinlay (1995)) or market inefficiency (Lakonishok et al (1994)). 2 Cf. Amihud (2002). Fama and French (1995, 1996) argue that the value and size premia move closely with investment opportunities. 3 3 We also examine the predictive properties of a restricted pricing kernel model in which the variables that enter the pricing kernel are motivated by Merton’s (1973) ICAPM. As Brennan et al. (2004) and Nielsen and Vassalou (2006) have shown, under reasonable assumptions the pricing kernel can be shown to depend on only the market return, the Sharpe ratio, and the riskless interest rate. When this restriction is imposed, the model predicts quarterly excess returns with an R2 of 5.6% as compared with an R2 of 8.3% for the unrestricted model. We have noted that the pricing kernel model attributes all time-variation in expected returns to time-variation in risk, while the discount rate model takes no account of the source of the time-variation in expected returns, so that differences in the expected returns from the two models potentially reflect such factors as time-varying sentiment or liquidity. We find that for returns on the market portfolio the only evidence of predictability that is not related to changing risk is a small high frequency component associated with lead-lag effects in returns. Otherwise the two models yield very similar components of variation with persistence ρ ≈ 0.70.8 for quarterly data. Neither the Amihud (2002) measure of illiquidity nor the Baker-Wurgler (2006) measure of sentiment are significantly related to the expected market returns. For returns on the spread portfolios, SM B, HM L, and HM Z (the spread between high and zero dividend yield portfolios), the story is more complex. The discount rate model finds considerable time series variation in the expected returns on these portfolios, explaining 12-16% of the 1-year returns. The results for the SM B portfolio are similar to those for the market portfolio in that there is no evidence that the expected returns are affected by time-varying sentiment or liquidity. However for the arbitrage portfolio HM Z, 30% of the difference between the expected return estimated from the unconstrained discount rate model and the expected return from the risk-based model is attributable to time-variation in the Baker-Wurgler (2006) measure of sentiment. For HM L, time variation in both sentiment and the Amihud (2002) measure of illiquidity contribute to explaining the difference in the expected returns from the two models. The paper is organized as follows. In Section 2 we discuss how the paper is related to the extensive existing literature that is concerned with return predictability. Section 3 develops the two models of expected returns and Section 4 describes the data. Section 5 presents the main empirical results for the pricing kernel model. Section 6 is concerned with the estimation of the discount rate model. Section 7 compares the time series of risk premium estimates from the two models. Section 8 reports further empirical findings, and Section 9 concludes. 2 Related Literature The pricing kernel model originates with Merton (1980) who uses the simple CAPM pricing kernel to forecast the expected return on the market portfolio: under the CAPM the covariance of the pricing kernel with the market return is proportional to the variance of the market return. Subsequent efforts to model the equity risk premium in terms of the volatility of the market return have met with mixed success. Several authors have reported a positive but insignificant relation between the variance of the market return and its expected value; others find a significant but negative relation; and some find both a positive and a negative relation depending on the method used.4 More recently, Ghysels et al. (2005) establish a 4 For references, see Ghysels et al. (2005). 4 significant positive relation between the monthly market risk premium and the variance of returns estimated using daily data. We confirm the existence of a significant positive risk return relation at the quarterly frequency, with the market variance being captured by a distributed lag on past squared quarterly returns. Bandi et al. (2014) report a positive relation between a low frequency component of market return variance and a similarly slow moving component of the market excess return although the economic model that gives rise to this relation is not specified. Campbell and Vuolteenaho (2004), Brennan, Wang, and Xia (2004), and Petkova (2006) all show that the value premium is correlated with innovations in their measures of investment opportunities. Scruggs (1998) employs a two-factor ‘ICAPM-type’ pricing kernel in which the second factor is the return on a bond portfolio to capture time-variation in the equity premium; he finds that the equity premium is related to the covariance of the market return with the bond return, although the results are sensitive to the assumption of a constant correlation between bond and stock returns as pointed out by Scruggs and Glabadanidis (2003). Guo et al. (2009) follow a similar approach, using the return on the Fama-French HM L portfolio as a second factor, and find that the lagged market volatility and covariance of its return with the return on the Fama-French (1993) HM L portfolio predict market excess returns over the period 1963-2005, but not over earlier periods. Unlike these papers which allow the predictor variable to be pre-determined by an ICAPM interpretation of their role in cross-sectional asset pricing tests, our general pricing kernel model identifies directly the component of the pricing kernel that is correlated with market returns. Ross (2005) develops an upper bound on the predictability of stock returns which depends on the volatility of the pricing kernel, and tighter bounds that require further specification of the pricing kernel have been provided by Zhou (2010) and Huang (2013). These tighter bounds ‘provide a new way to diagnose asset pricing models’ (Huang, 2013, p1). Our levels of predictability fall well within the Ross bounds, and the level of predictability that we find from the pricing kernel model is precisely that delivered by the (partial) specifications of the pricing kernel that we propose. Our approach does not require the specification of the complete stochastic discount factor and our goal is not to test any particular asset pricing model or stochastic discount factor specification. Our discount rate model builds on the distinction between between discount rate news and cash flow news that was developed by Campbell (1991), and was used in a similar context by Campbell and Ammer (1993) and Campbell and Vuolteenaho (2004). Their approach is to extract the discount rate news from the coefficients of a VAR in which the state variables are variables that are known to predict stock returns. In contrast, our state variables are constructed as distributed lags of past returns on portfolios that are chosen to capture shocks to the discount rate. Our focus on the information about discount rate innovations that is contained in portfolio returns is related to Pastor and Stambaugh (2009) who use prior beliefs on the correlation between discount rate shocks and portfolio returns to develop a Bayesian approach to predictive regression systems. However, while we focus on the information contained in past portfolio returns, Pastor and Stambaugh are concerned primarily with the predictive power of the dividend yield and the cay variable. Our findings are also related to research on the predictive ability of lagged equity portfolio returns. Hong, Torous and Valkanov (2007) regress market returns on previous period industry 5 and market returns and find evidence that some industries (when combined with the lagged market return) lead the stock market at the monthly frequency; they interpret this as evidence of slow diffusion of information. In this paper our concern is with more persistent variation in the equity premium corresponding to business cycle or even lower frequencies. Our discount rate model, which nests the simple approach of regressing the market return on the lagged portfolio return, is also able to accommodate a stationary but persistent process for the expected return. Eleswarapu and Reinganum (2004) find that 1-year stock market returns are negatively related to the past return on glamour stocks. We find little evidence of this in our sample period. Ludvigson and Ng (2007) estimate the principal components of a large number of equity portfolio returns and other conditioning variables and examine the ability of these components to predict market returns. The discount rate model is also related to research by van Binsbergen and Koijen (2009) that uses a Kalman filter to estimate the expected market return and dividend growth rate from market returns and the price-dividend ratio. However, we use a linear combination of portfolio returns instead of the dividend growth rate to filter out the cash flow news. Like van Binsbergen and Koijen (2009), we assume that the expected excess return follows an AR(1) process. Cochrane (2008) also provides statistically significant evidence on the predictive role of the dividend yield by using the implication of the present value relation that the dividend yield must predict either returns or dividend growth and showing that it does not predict the latter. In an interesting recent paper, Kelly and Pruitt (2013) combine cross sectional information on book-to-market ratios to forecast 1-year stock returns and obtain out of sample R2 as high as 13%. In this paper we use both quarterly and 1-year stock returns: for 1-year returns we obtain in sample R2 of 14-16%, and out of sample R2 of 9-13%. Like ours, their estimates of the time series of expected excess returns have low persistence relative to previous findings. However, the predictability that we identify is not strongly related to that of the Kelly and Pruitt (2013) model.5 The cay predictor of Lettau and Ludvigson (2001) also identifies a component of the expected return that is essentially orthogonal to our model predictions. Two important issues that arise in the extensive literature on predictability are the inference problems caused by highly persistent predictor variables, and the effects of data-mining arising from the collective search for predictor variables by the research community. Stambaugh (1999), Torous et al. (2004), and Campbell and Yogo (2006) develop test procedures that take account of persistence. Foster et al. (1997) analyze the effect of overfitting data in the context of predictive regressions. Ferson et al (2003) examine the interaction of data-mining and spurious regression for the case of highly persistent expected returns. The same concerns over datamining potentially arise in our empirical analysis. However, a major advantage of our approach is that it allows us to assess whether the levels of predictability that we find can be explained by overfitting of the data. First, since we use only portfolio returns as predictor variables, it is straightforward to compute significance levels by simulation under the null hypothesis of no predictability or serial independence of returns. In contrast, when macro-economic series are used as predictor variables more extended assumptions are required to simulate the data under the null hypothesis. Secondly, whereas the previous literature has involved search over an undefined domain of potential predictors which does not lend itself to an assessment of the effects of data mining on levels of significance, in our approach for a candidate set of spanning 5 The correlation between the predicted excess return series is less than 0.25. 6 portfolios the search is over a well-defined set of predictor variables characterized by a single weighting parameter, β. This allows us to assess the effects of data-mining on significance levels. Our analysis indicates that the level of predictability found cannot be explained by data-mining or the persistence of the predictor variables. 3 Two Models of Expected Returns In this section we derive two different models of expected returns. Both are constructed by summing up past shocks: the first, the pricing kernel model, sums up past shocks to the covariance of the pricing kernel with the market return; the second, the discount rate model, sums up past shocks to the discount rate. Both models assume that the equity premium follows an AR(1) process. The first model assumes that we can find a set of portfolios whose time varying beta coefficients span the time-varying loading of the pricing kernel on the return on the market portfolio. The second assumes that we can find a set of portfolios that spans the space of aggregate cash flow and discount rate shocks. To see the relation between the two models of the expected market excess return let RM,t+1 denote the excess return on the market portfolio from time t to t + 1, and let m̃t+1 denote the pricing kernel at time t + 1. Then it follows from the definition of the pricing kernel that αM,t, the expected excess return on the market portfolio at time t, is given by: αM,t = −covt (m̃t+1 , RM,t+1) (1) where we have imposed the normalization that Et (m̃t+1 ) = 1. We can write the kernel, m̃t+1 , as the sum of a component that is a time-varying linear function of the market return, mt+1 , and a component that is orthogonal to the market return, ηt+1 : m̃t+1 = am (t) + bm (t)RM,t+1 + ηt+1 ≡ mt+1 + ηt+1 (2) where covt(ηt+1 , RM,t+1) = 0, and bm (t) captures time variation in the sensitivity of the pricing kernel to the market return.6 Then 2 αM,t = −covt (bm(t)RM,t+1 , RM,t+1) ≡ −bm (t)σM,t (3) so that time variation in the equity premium is controlled by the component of the pricing kernel that is correlated with market returns. The pricing kernel model assumes that bm (t) can be written as a fixed linear combination of the loadings of a set of portfolio returns on the market return. We call these portfolios ‘ (pricing kernel) beta-spanning portfolios’. Thus write Rp,t, the return on spanning portfolio p, p = 1, · · ·, P as: 6 Time variation in am (t) and bm (t) implies that the pricing kernel (2) defines a ‘conditional factor model’(Cf. Cochrane, 2002, Ch. 8.) 7 Rp,t+1 = ap(t) + bp(t)RM,t+1 + up (t + 1) (4) Then the pricing kernel model assumes that bm (t) = −ΣPp=1 δpc bp(t) for a set of constant portfolio weights δpc , so that the sensitivity of the pricing kernel to market returns, bm(t), can be expressed c as a linear combination of the portfolio loadings on the market return, −ΣM p=1 δp bp (t). Equation (3) shows that if the expected market return is to be a time-varying function of its variance, c then bm (t) must be time-varying, and the requirement that bm (t) = −ΣM p=1 δp bp (t) for a set c of constant portfolio weights δp , then means that the beta coefficients of the beta-spanning portfolios also be time-varying.7 The discount rate model on the other hand assumes that innovations in αM,t, and therefore in bm (t)vart(RM,t+1), can be expressed as a linear combination of the innovations in the returns d on a (possibly different) set of (factor) spanning portfolios, ΣM p=1 δp (Rp,t+1 − Et [Rp,t+1 ]), for a constant set of portfolio weights δ d . The pricing kernel model illustrates the intimate nature of the relation between cross-section asset pricing and asset return dynamics, since the pricing kernel is the basis of cross-section asset pricing while the dynamics of the covariance between the market return and the pricing kernel describe the dynamics of the equity premium.8 We should note however, that our procedure identifies only mt+1 , the component of the pricing kernel whose covariance with the market return exhibits time series variation, rather than the whole pricing kernel that is required to price the cross-section of asset returns. The relation between cross-sectional asset pricing and time-variation in asset returns that appears in the ICAPM has led Campbell (1993, pp499-500) to note that ‘the intertemporal model suggests that priced factors should be found not by running a factor analysis on the covariance matrix of returns, nor by selecting important macro-economic variables. Instead, variables that have been shown to forecast stock-market returns should be used in cross-sectional asset pricing studies.’9 We shall proceed in part in the reverse direction, by testing whether variables (portfolio returns) that have shown to be important in cross-section asset pricing, and which therefore belong in the pricing kernel, also have important information for forecasting market returns. 3.1 The Pricing Kernel and Expected Returns Equation (3) expresses the arithmetic expected excess return on the market portfolio as the negative of the conditional covariance of the market return with the pricing kernel: αM,t = −covt (bm(t)RM,t+1 , RM,t+1) where bm (t) is the conditional sensitivity of the innovation in the pricing kernel to the market return, the pricing kernel ’beta’. Assume that there exists a set of beta-spanning portfolios, p = 1, · · · , P , and constant portfolio weights δpc , p = 1, · · · , P , such that : 7 Appendix A provides sufficient conditions for the existence of a set of loading-spanning portfolios. Ross (2005), Zhou (2010) and Huang (2014) make the same point. 9 In contrast, Fama writes of ‘the multi-factor models of Merton (1973) and Ross (1976) ‘that they are an empiricist’s dream ... that can accommodate... any set of factors that are correlated with returns’. (Fama 1991, p. 1594) 8 8 bm (t) = −ΣPp=1 δpc bp(t) (5) where bp(t) is the beta coefficient of beta-spanning portfolio p. Then αM,t = covt(ΣPp=1 δpc bp(t)RM,t+1, RM,t+1) = ΣPp=1 δpc covt (Rp,t+1 , RM,t+1) (6) We assume that the risk premium, and therefore the covariance of the market return with the pricing kernel, follows an AR(1) process: αM,t = ā + ρα[αM,t−1 − ā] + ξαt (7) A sufficient condition for (7) is that the conditional covariances of the beta-spanning portfolio returns in equation (6) follow AR(1) processes with the same persistence parameter, ρα, and with innovations which are equal up to a constant to the product of the spanning portfolio and market returns: covt(Rp,t+1 , RM,t+1) = ap + ρα[covt−1 (Rp,t, RM,t) − ap ] + Rp,tRM,t (8) Then αM,t = ā + ρα[ P X δpc covt−1 (Rp,tRM,t) − ā] + ξαt (9) p=1 where ā = PP c p=1 δp ap , and ξα,t = PP c p=1 δp Rp,t RM,t . Then the market risk premium can be written as an affine function of the geometrically weighted average of past values of the weighted average of products of spanning portfolio and market returns: αM,t = ā+ ∞ X (ρα)j ξα,t−j = ā + j=0 = ā+ P X ∞ X j=0 (ρα)j P X δpc Rp,t−j RM,t−j (10) p=1 δpc xcp,t(ρα) (11) p=1 where xcp,t (ρα) ≡ P∞ j=0 (ρα ) j [Rp,t−j RM,t−j ]. Then the predictive system for the market excess return becomes: P X RM,t+1 = a0 + δpc xcpt(β) + t+1 (12) p=1 xcpt(β) = ∞ X β s Rp,t−s RM,t−s s=0 9 (13) where we have substituted β for ρα to ensure consistency of notation with the discount rate model that follows. An attractive feature of the model is that the parameters a0 , δpc , β can be estimated relatively easily since, given an estimate of β, the estimation reduces to a standard predictive regression with predictor variables xcpt(β). We discuss the estimation details in Section 5. 3.2 The Discount Rate Model Our analysis for this model is motivated by the log-linear model of Campbell and Vuolteenaho (2004)10 which decomposes the unexpected return on stocks into cash flow news and discount rate news. Let µt be the expected log excess return on the market portfolio, so that the realized excess return can be written as: rM,t+1 = µt + t+1 (14) We assume that µt follows an AR(1) process so that: µt = µ̄ + ρ[µt−1 − µ̄] + zt (15) We refer to zt , the innovation in the expected market return, as the discount rate news. Our second assumption is that there exists a set of P well diversified (factor) spanning portfolios whose excess returns, rpt, p = 1, · · ·, P follow an exact factor model and span the space of innovations in cash flows and discount rate news, so that rpt = βp0 + kp µt−1 + M X βpj yjt + γpzt (16) j=1 where yjt (j=1,· · ·,M) denotes innovations in common cash flow factors, and zt is the discount rate news. We are implicitly assuming that shocks to the risk free rate are small and can be subsumed in the cash flow news.11 The second term in (16) captures time variation in the expected returns on the spanning portfolios which depend on variation in the systematic expected return factor, µt−1 . The third term corresponds to cash flow news which we allow to have a factor structure, and the fourth term is the effect of aggregate discount rate news. The number of spanning portfolios, P , is equal to one plus the number of cash flow innovations: P = M + 1. Consider a ‘z-mimicking’ portfolio whose weights on the P spanning portfolios, δp , are such P P that Pp=1 δpd βpj = 0, j = 1, · · ·, M ; Pp=1 δpd γp = 1. This ensures that the z-mimicking portfolio 10 See also Campbell and Shiller (1988), Campbell (1991), and Campbell and Ammer (1993). We have repeated the analysis using gross returns in place of excess returns: the proportion of the return that is attributable to discount rate news is largely independent of which definition of returns is used, which is consistent with shocks to the risk free rate playing only a minor role. 11 10 return loads only on the discount rate news, zt . Then it is easily shown that the discount rate news can be written as a linear function of the return on the z-mimicking portfolio: zt = δ0d + P X δpd rpt − wµt−1 , (17) p=1 where δ0d = − PP d p=1 δp βp0 , w= PP d p=1 δp kp . Then combining equations (14) and (17), the process of the market expected log return is: µt = (1 − ρ)µ̄ + (ρ − w)µt−1 + δ0d + P X δpd rpt (18) p=1 Substituting recursively for µt−j , the expected log excess market return, µt , may be written as a linear function of geometrically weighted past returns on the P spanning portfolios: µ̄(1 − β − w) + δ0d X d d + δp xpt(β) (1 − β) P µt = (19) p=1 where β ≡ ρ − w, and xdpt (β) = ∞ X β s rp,t−s (20) s=0 Combining (14) with (19), the log excess return on the market portfolio may be written as: rM,t+1 = a0 + P X δpd xdpt(β) + t+1 (21) p=1 where the predictor variables, xpt (β), are weighted averages of past log returns on the spanning portfolios as shown in (20).12 Note that, comparing equations (15) and (18), the weighting parameter, β, may deviate from the true persistence of the expected return, ρ: we report both parameters below. Equation (21) is the basis of our discount rate model based predictive regression. However, in most of our empirical analysis we shall substitute arithmetic returns for the logarithmic returns that follow strictly from the Campbell log-linearization. Then the predictive system becomes: RM,t+1 = a0 + P X δpd xdpt(β) + t+1 (22) β s Rp,t−s (23) p=1 xdpt (β) = ∞ X s=0 12 We shall henceforth use the term ‘predictor variable’ for xdpt , the variables formed as distributed lags on the returns on the spanning portfolios, rpt . 11 where RM,t and Rp,t denote the (arithmetic) excess returns on the market portfolio and portfolio p respectively. Note first that the only difference between the predictive systems from the discount rate based model, defined by equations (22) and (23), and the pricing kernel based model, defined by (12) and (13), lies in the definition of the predictive variables which we have denoted by xdpt (β) and xcpt (β). The former is a weighted average of past returns on portfolio p, while the latter is a weighted average of past products of portfolio returns and market returns. Secondly, while we have developed the two models for the prediction of the market excess return, it is clear that the same approaches can be used, mutatis mutandis, to predict returns on any portfolio, by regressing the portfolio returns on a set of predictor variables formed as geometric weighted averages either of past returns on a set of spanning portfolios (xdpt(β)) or of the products of past returns on a set of spanning portfolios with the return on the portfolio whose return is to be predicted (xcpt (β)). In section 8.2 we shall apply the models to the prediction of returns on certain arbitrage or spread portfolios. Thirdly, we have no a priori method of identifying the spanning portfolios. Therefore we shall consider different candidate sets of spanning portfolios whose choice is discussed in the following section. While we refer to them as ‘spanning portfolios’, they are in fact only candidate sets of spanning portfolios whose adequacy must be judged by the predictive performance of the models. Finally, we observe that, in contrast to earlier studies that use financial ratios such as interest rates and dividend yields as predictors, our predictor variables, xdpt(β) and xcpt(β), are constructed simply from past returns on the portfolios. This will facilitate tests of the model. 4 Data The market excess return, RM,t, is defined as the difference between the return on the Standard and Poor’s 500 portfolio for quarter t and RF t, which is the risk free rate for the quarter, taken as the return on a 3-month Treasury Bill. The S&P500 return is taken from CRSP, and the Treasury Bill rate series is from the Federal Reserve Bank of St Louis. We use quarterly data on portfolio returns from 1927.3 to 2010.4. The estimation period is 1946.1 to 2010.4, while the earlier returns are used to calculate the value of the predictor variables at the beginning of 1946.1. The predictor variables for the discount rate model, xdpt (β), are formed from the lagged excess returns on the spanning portfolios using equation (23). The predictor variables for the pricing kernel model, xcpt(β), are formed from lagged cross products of market excess returns and excess returns on the spanning portfolios using equation (13).13 We consider the following proxies for the spanning portfolios in both the discount rate and the pricing kernel models: (i) the market portfolio (M); (ii) the three Fama-French portfolios (FF3); (iii) the market portfolio and and 3 portfolios formed on the basis of dividend yield (3DP): the highest and lowest yielding quintiles of dividend paying stocks, and a portfolio of non-dividend paying stocks. We consider in addition two expanded sets of spanning portfolios: 6BM − S is the market portfolio plus the 6 Fama-French size and book-to-market sorted portfolios; and 6DP consists of the market portfolio and 5 quintiles of stocks ranked by dividend 13 Note that for the discount rate model the spanning portfolios are assumed to span the space of discount rate and aggregate cash flow shocks, while for the pricing kernel model the betas of the loading-spanning portfolios are assumed to span the beta of the pricing kernel. 12 yield plus the zero dividend yield portfolio. Data on these portfolio returns were taken from the website of Ken French. The market portfolio was included as a proxy for a spanning portfolio because of the mixed prior evidence that the market variance predicts future market returns discussed above, and because this portfolio spans the pricing kernel under the CAPM and so provides a natural baseline for the pricing kernel model. The three Fama-French portfolios were included because of the empirical success of the FF model in pricing the cross-section of asset returns, which suggests that these portfolios belong in the pricing kernel. The dividend yield portfolios were included because portfolios that differ in yield will have different durations and have different sensitivities to discount rate shocks, including those caused by shocks to the covariance between the pricing kernel and asset returns.14 The expanded sets of portfolios were included because in the discount rate model the dimensionality of cash flow shocks seems likely to be greater than the number of portfolios required to span the pricing kernel. In addition, expanding the set of spanning portfolios helps us to assess the spanning adequacy of our original candidate sets of spanning portfolios. We also compare our predictor variables with variables that have been used earlier in the literature. Following Goyal and Welch (2008), these include, the Dividend (Earnings) yield on the market portfolio, which is defined as the log of the ratio of dividends (earnings) on the S&P500 over the past 12 months to the lagged level of the index; the Book-to-market value ratio for the Dow Jones Industrial Average; the Stock Variance which is the sum of squared daily returns on the S&P500 index over the previous quarter; the 3 month Treasury Bill rate; the Long Term Yield which is the yield on long term US government bonds; the Term Spread which is the difference between the Long Term Yield and the Treasury Bill rate; Inflation which is the one month lagged inflation rate; the Default Yield Spread which is the difference between BAA and AAA-rated corporate bond yields; and cay which is the consumption, wealth, income ratio of Lettau and Ludvigson (2001). Fuller descriptions of these variables are to be found in Goyal and Welch (2008) and the actual data series were taken from the website of Amit Goyal. The small-stock value spread is often used as a state variable in models of predictable stock returns as in Brennan et al. (2001), Cohen et al. (2003), and Campbell and Vuolteenaho (2004). It is constructed from the book-to-market values of portfolios formed by a 2 by 3 sort on size and book-to-market ratio, available from the website of Ken French. It is defined as the log(BE/ME) of the small high-book-to-market portfolio minus the log(BE/ME) of the small low-book-to-market portfolio. The book-to-market values for these portfolios are defined on a yearly basis and the method described in Campbell and Vuolteenaho (2004) is used to construct monthly values of the value spread. Eleswarapu and Reinganum (2004) find that yearly stock market returns are negatively related to the lagged returns on glamour stocks over the prior 36 month period. Following Eleswarapu and Reinganum (2004) we consider five portfolios formed by sorting on the bookto-market ratio. The glamour portfolio is defined as the quintile with the lowest book-to-market ratio and its cumulative log return over the past 36 months is used as the predictor variable. We utilize the portfolio data from Ken French’s website rather than construct quintiles in the slightly different manner described in Eleswarapu and Reinganum (2004). We find qualitatively similar results to theirs, obtaining an R2 ≈ 5% over their sample period, and R2 ≈ 3% over our sample period in regressions on annual excess log returns. 14 See Brennan and Xia (2006) and Lettau and Wachter (2007). 13 We compare our model predictions with those of Kelly and Pruitt (2013). For this purpose we use their 12 month in-sample forecasts constructed using 100 portfolios.15 Finally, we explore the dependence of the expected return series from the two models on measures of sentiment and stock market illiquidity. For the former we use the Baker-Wurgler (2006) sentiment index taken from http://people.stern.nyu.edu/jwurgler. Illiquidity is measured by the Amihud (2002) measure of illiquidity and we are grateful to Sahn-Wook Huh for calculating this measure for us. Empirical Tests and Estimates of the Pricing Kernel model 5 We start by estimating equations (12) and (13) for the pricing kernel model. A non-linear maximum likelihood estimator is implemented by choosing values of β, forming the predictor variables from equation (13), and then running an OLS regression of market excess returns on the predictor variables to estimate the regression coefficients, δ. The MLE estimator of β is the value that minimizes the sum of squared residuals (or equivalently maximizes the R2 ) in (12). However, a problem with this MLE estimator is that, as Stambaugh (1986) shows, the small sample bias in the R2 is increasing in the persistence parameter of the predictor portfolio, ρ, and therefore in the weighting parameter β which is being estimated.16 The higher bias in R2 associated with higher values of β will tend to result in estimates of β that are too high. Therefore we employ a bias-adjusted procedure which is described in Appendix B. Having estimated the parameters (a0 , δc , β), we form time series estimates of the expected market excess P return, α̂M,t = â0 + Pp=1 δ̂pc xcpt(β̂), and use these estimates to compute the autocorrelation of the expected market excess return, ρα. We use quarterly returns on the different sets of beta-spanning portfolios to form the predictor variables, xc (β), and report estimation results for prediction horizons of both 1-quarter and 1-year for the sample period from 1946.1 to 2010.4 Table 1 reports the results of tests of predictability for the pricing kernel model for the 1-quarter and 1-year horizon, using different sets of spanning portfolios, along with estimates of the weighting parameter, β, and the persistence parameter, ρα. The primary sets of spanning portfolios are (i) the market portfolio (M ); (ii) the three Fama-French portfolios (F F 3); and (iii) the market portfolio and the three dividend yield portfolios (3DP ). Panel A reports the results for prediction of the quarterly excess return on S&P500 portfolio, and Panel B for prediction of the 1-year return. Wc is the Wald statistic corrected for bias from an estimation that seeks to maximize this bias-adjusted statistic. For the 1-quarter horizon the null hypothesis of no predictability is rejected at the 1% level for all three sets of spanning portfolios except F F 3 where the significance level is only 5% when the Wald criterion is used. After bias correction the fraction of the variance of returns explained by the model (R2c ) ranges from 4.1% for M to 8.3% for 3DP . The significance of the results for the single spanning portfolio, M , implies that time variation in the volatility of the simple CAPM pricing kernel has predictive power for the equity premium and that the market excess return is predicted by a weighted average of past squared market returns. This result contrasts with the findings of Goyal and Welch (2008), but is consistent with the results 15 We thank the authors for making these forecasts available to us. Bootstrap simulations show that for the BM-S model the 5% critical values of R2 and the Wald statistic start to rise rapidly once β exceeds about 0.9. 16 14 of Ghysels et al. (2005), although these authors estimate the market variance from daily data. The weighting parameter, β, is 0.67±0.02 for F F 3 and 3DP , and the estimated autocorrelation, ρα, is 0.82 ± 0.02; the values for M are somewhat lower. The autocorrelation implies a half-life for shocks to the discount rate of about 3.25 quarters. When the forecast horizon is extended to 1 year Panel B shows that the single predictive variable of the M model is no longer significant under the R2 criterion, although the F F 3 and 3DP sets of spanning portfolios yield R2c of 14% and 15.7% respectively which are significant at the 1% level. Under the Wald criterion the predictions are significant at the 5% level for all three sets of spanning portfolios. As with the quarterly estimates, the estimates of β and ρα for the 3DP and F F 3 spanning portfolios are very close, and the expected return estimates for the different sets of spanning portfolios are very closely related: Panel B of Table 2 shows that the correlation between the 3DP and F F 3 estimates is 0.91, while the correlations of these estimates with the M estimates is 0.62 and 0.73 respectively. If the sets of spanning portfolios we have selected do not in fact span the innovation in the pricing kernel, we should expect that increasing the number of portfolios in the spanning set would increase the predictive power of the model. To assess this we also fit the model using two expanded sets of spanning portfolios: 6BM − S is the market portfolio plus six size and bookto-market portfolios and nests the F F 3 set of portfolios; similarly 6DP is the market portfolio plus an expanded set of dividend yield sorted portfolios and nests 3DP . For the dividend yield sorted portfolios, there is no evidence that spanning is improved with the larger set of portfolios: as we move from 3DP to 6DP the corrected R2c actually falls for both quarterly and 1-year return predictions. On the other hand, expanding the set of spanning portfolios from F F 3 to 6BM − S does improve the corrected R2c from 0.063 (0.140) to 0.073 (0.171) for the quarterly (1-year) forecasts, suggesting that F F 3 may be too parsimonious to fully capture the pricing kernel. Nevertheless, in the interests of parsimony and to minimize the perils of data-mining, we focus our attention on the three primary sets of spanning portfolios for the pricing kernel model. Panel A of Table 2 reports the estimates of (a0 , δ c, β, ρα) from the quarterly return predictions for the three sets of spanning portfolios. Significance levels for the coefficients are computed from standard errors calculated from bootstrap simulations under the alternative hypothesis.17 Components of the pricing kernel that are significant at the 5% level or better include the market portfolio (except for 3DP ) and the two other F F 3 portfolios (SM B only at the 10% level), as well as the zero dividend yield portfolio. Their signs are generally consistent with prior knowledge of the pricing kernel. Thus the positive coefficients on RM and HM L are consistent with a positive risk premium for the market portfolio and for the HM L portfolio, while the insignificant coefficient on SM B offers no support for a small firm premium. The results for the 3DP spanning portfolios suggest that there is a positive premium associated with covariance with returns on a portfolio of high yield (value) stocks, and a negative premium associated with covariance with zero yield (growth) stocks. In summary, the results are consistent with time-variation in the covariance of the market return with the pricing kernel leading to variation in expected market returns. The high correlation between the 3DP and F F 3 risk premium estimates, as well as the limited improvement from increasing the set of beta spanning portfolios, suggests that the betas of both the 17 Note that β is undefined under the null. 15 3DP and F F 3 sets of spanning portfolios do a good job of spanning the pricing kernel beta: bM (t) ≈ Σ3p=1 δpc (t). 5.1 Expected return series Panel B of Table 2 shows that the average risk premium predicted for each of the three sets of spanning portfolio is 1.78% per quarter: this is equal to the sample average excess return. The 3DP estimates yield the most variability in the risk premium: 2.54%, which is 1.4 times the mean premium. The M estimates are much less variable: their standard deviation is only 1.71% while the standard deviation of the F F 3 estimates is 2.21%. Figure 1 plots the expected 1-year return series obtained using the three different sets of spanning portfolios. It is apparent that the series based on F F 3 and 3DP track each other well, reaching low points around the dot-com bubble and rising steeply in the wake of the financial crisis. There are three pronounced peaks in the series. We list them in decreasing order of importance and for each we report equity premium estimates from the F F 3 (3DP ) models which can be compared with the mean 1-year equity premium of 7.7%. In 2009 following the collapse of Lehman Brothers in September 2008 the estimated premium was 37.9% (38.5%); in 1975 during the inflationary recession following the first oil crisis, 29.2% (33.8%); in 1956 following a period of heightened tension over Formosa, 30% (26%). These are very pronounced fluctuations in the estimated equity premium, and it is striking that the two sets of spanning portfolios yield relatively similar estimates. Interestingly, the estimated equity premium from both models was negative from the third quarter of 2000 to the last quarter of 2001:18 it seems that at least part of the runup in equity prices around the turn of the millennium can be attributed to a sharp decline in the equity premium.19 5.2 Out of Sample Tests Table 3 reports the out of sample forecasting power of the pricing kernel model for different horizons, using different sets of spanning portfolios. Results are reported for both quarterly forecasts that are derived by estimating the model on quarterly returns and for 1-year forecasts that are based on parameter estimates from fitting the model to one year excess returns. The models are estimated initially over the period 1946.1 to 1965.4 and the parameter estimates are used to forecast the market excess return for 1966.1. Then the estimation period is extended by one quarter for the next forecast, and so on. For the multi-quarter and multi-year forecasts we compare the sum of the realized excess returns over the next k quarters (years) with forecasts of the sum based on the forecast of αM,t+1 and the parameters of the AR1 process estimated over the same period. The table reports the ratio of the mean square model forecast errors to the mean square error of a forecast that is based on the out of sample historical mean also starting in 1928.2: a ratio less than unity implies that the model outperforms the naive historical mean forecast. The 3DP model shows generally the strongest out of sample forecast power. For the 18 Boudoukh et al. (1993) report reliable evidence that the ex ante equity market risk premium is negative in some states of the world. 19 It was in September 1999 that James K. Glassman and Kevin A. Hassett published their article Dow 36,000, which argued that future dividends on the market should be discounted at a rate below the Treasury bond rate. 16 quarterly returns forecasts it reduces the error variance relative to the naive model by 3%,10%, 12% and 16% for forecasts of 1,2,3 and 4 quarters, and for 1-year returns by 13%, 13% and 7% for forecasts of 1, 2 and 3 years. This compares with the error variance reduction reported by Kelly and Pruitt (2013) of ‘up to 13%’. The F F 3 quarterly return forecasts improve on the naive forecast by 4-6% and the 1-year forecast improvement is 9%. Even using the single spanning portfolio, M , reduces the forecast error by 2-3% for 1 and 2 quarter forecasts. In order to assess the statistical significance of the relative error variance statistics for the out of sample forecasts, the 10,000 samples of the market and portfolio returns were bootstrapped under the assumption of no predictability as described in Appendix B, and the out of sample forecast procedure was applied to the generated data. Significance is then determined by comparing the sample statistic with the distribution of the bootstrapped statistics. The 3DP and F F 3 forecast improvements for the quarterly forecasts and for the 1 year forecast are significant at the 1% or 5% levels. The 3DP 1-year forecast for 2 years is also significant at the 5% level. 5.3 Comparison with other predictor variables The extensive prior literature on stock return predictability demands that attention be given to the performance of the pricing kernel model relative to that of earlier predictors that have been proposed. Panel A of Table 4 reports the results of quarterly regressions of the 1 quarter market excess returns for the period 1946.1-2009.4 on 13 different predictor variables that have widely used.20 With the exception of the Lettau-Ludvigson (2001) cay and the Kelly-Pruitt (2013) prediction (kp), the in sample R2 are less than 1%, despite the fact that there is a small sample bias in the R2 for many of the predictor variables due to their high autocorrelations, rho. To account for the small sample bias we report bootstrapped significance levels for the regression coefficient which are indicated by stars in the table. For cay (kp) the in-sample R2 is 4.2% (2.3%) and the autocorrelation of the predictor is 0.925 (0.931). The out of sample performance of the different predictors is represented by REVOOS , which is the ratio of the variance of the prediction error yielded by rolling out of sample regressions starting in 1966.1 to the error variance of a simple historical mean predictor. The historical mean and the predictive models are estimated over the period from 1928.2 to the year before the forecast. A value of REVOOS less than unity implies that the predictive model improves on the naive historical mean forecast. Excluding cay and kp, only 4 of the predictors improve on the historical mean and then only by modest amounts. cay reduces the forecast error variance by 2.1%, kp by 1.5%, the Default Yield Spread by 1.3%, and the Glamor variable by 1.1%. For the other variables the improvement is less than 1%. The two predictive variable model reported in Panel B is the ICAPM motivated model of Guo and Savickas (2009) which performs poorly in our sample period (which is longer than theirs). Table 5 reports correlations between 4 quarter moving averages of these other predictor variables and corresponding averages of the forecast equity premium from the pricing kernel model using different sets of spanning portfolios. Considering only correlations that are greater than 0.3 in absolute value, we see that the predicted premium from the CAPM kernel, M , has correlations of -0.36 with Glamor, 0.55 with Stock Variance, and 0.46 with the Default Yield Spread. The predictions from the F F 3 kernel have correlations of 0.41 and -0.40 with Stock 20 See Goyal and Welch (2008). 17 Variance and the T-bill yield and 0.33 with the Term Spread; the predictions of the 3DP kernel have correlations of 0.36 and -0.34 with the Stock Variance and the Term Spread. In Table 6 the risk premium estimates from the pricing kernel model for different sets of spanning portfolios are compared with the 4 most significant ‘other predictors’ by regressing the market excess returns on the competing predictors. Regressions 1-4 include the Dividend-price ratio, Glamor, and the Kelly-Pruitt prediction along with the model risk premium estimates for the three sets of spanning portfolios. None of the other predictors is significant while the coefficients of the pricing kernel model predictors are highly significant and close to their theoretical value of unity. Regressions 5-8 repeat the analysis when cay is added to the list of other predictors. cay is highly significant with t-statistic in excess of 3.6 in all the regressions. The model predictors remain highly significant in the presence of the cay variable and the coefficients of the model predictions remain within one standard error of their theoretical value of unity. It appears that the pricing kernel model predictors are capturing a component of time variation in expected returns that is largely orthogonal to that captured by cay and the other predictor variables. 6 Empirical Test and Estimates of the Discount Rate model Tables 7 and 8 report the results of estimating equations (22) and (23) for the discount rate model. The estimation procedure parallels that for the pricing kernel model described above except that the predictor variables, xd (β), are geometrically weighted averages of simple portfolio returns rather than the products of the returns with the return on the market portfolio. For the discount rate model the spanning portfolios are required to span the cash flow factors as well as the discount rate news. Three or four portfolios may well be insufficient for this. Therefore, in addition to the 3DP and F F 3 sets of spanning portfolios we include the two expanded sets of spanning portfolios: 6BM − S is the market portfolio plus the 6 Fama-French size and book-to-market sorted portfolios, while 6DP is the market portfolio plus 6 portfolios formed on the basis of dividend yield. We consider first the market portfolio, M , as the candidate single spanning portfolio. The model is then unable to predict market returns and the estimation does not even converge for 1-year return predictions. This is what we should expect since the returns on the market portfolio alone cannot span the innovations in both cash flows and the discount rate, even if there is only a single cash flow factor.21 This candidate spanning portfolio is therefore omitted from the subsequent tables and we concentrate on sets of spanning portfolios with more than a single member. Considering next the quarterly return predictions for the other sets of spanning portfolios reported in Panel A of Table 7, we find that when the spanning portfolios are either F F 3 or 6BM − S, both of which are based on size and book-to-market sorts, the model identifies 21 Menzly et al. (2004) present a model in which risk aversion and expected dividend growth are varying stochastically: increases in expected dividend growth increase stock prices and are associated with increases in discount rates; this induces a positive association between lagged market returns and expected future returns. On the other hand, increases in risk aversion increase discount rates and reduce stock prices and therefore induce a negative association between market returns and expected future returns. The two effects are offsetting so that the net relation between market movements and future returns becomes insignificant. Lettau and Ludvigson (2003) and van Binsbergen and Koijen (2009) provide empirical evidence of positive covariation between expected dividend growth rates and discount rates which is consistent with Menzly et al. 18 a high frequency component of the variation in the discount rate. While the prediction for F F 3 is significant at the 5% level, the persistence parameter for the estimated predicted excess return, ρα , is only 0.268, implying a half-life of only about one half of a quarter or six weeks; the persistence parameter obtained using the 6BM − S portfolios is a little higher but the predictions are statistically insignificant. On the other hand, expanding the set of dividend yield based spanning portfolios from 3DP to 6DP almost doubles the R2 ; the predictions are significant at the 5% level for both 3DP and 6DP sets of spanning portfolios. The persistence parameter for the predicted return series obtained using these dividend yield based sets of spanning portfolios is in excess of 0.8 and is comparable to that reported for the pricing kernel model in Table 1. Panel B shows that when the model is used to predict 1-year returns, only the predictions based on the 6DP set of spanning portfolios are significant - at the 5% or 10% level depending on the criterion. However, all sets of spanning portfolios except F F 3 identify a component with a persistence around 0.8. Table 8, which corresponds to Table 2 for the pricing kernel model, reports the full set of parameter estimates for the 1-quarter implementation of the discount rate model. Virtually none of the individual parameter estimates except β and ρα for 3DP and 6DP are significant at the 5% level, and the only δ d parameter estimate that is significant is the coefficient of SM B when the set of spanning portfolios is F F 3. The out of sample predictive power of the model was estimated following the procedure described in Section 5.2. However, in contrast to the results for the pricing kernel model the out of sample discount rate model forecasts failed to outperform the naive forecast. This is probably due to the larger number of parameter estimates required by the large number of portfolios necessary to span the space of cash flow and discount rate innovations, which is also manifest in the lack of significance of the δ parameter estimates in Table 8. Comparison of Pricing Kernel and Discount Rate model estimated risk premium series 7 Table 9 reports comparative statistics for the two model estimates of the risk premium: for the discount rate model we use the estimates based on F F 3 and 6DP spanning portfolios and for the pricing kernel model the F F 3 and 3DP estimates.22 Panel A shows that the standard deviation of the risk premia is highest for the pricing kernel model estimates based on the 3DP spanning portfolios (2.54%), and then for the 6DP discount rate model estimates (2.39%). Panel B shows that the correlation between these quarterly (1-year) series is 0.59 (0.61), and their close relation is shown in Figure 2. Despite their different conceptual and empirical bases it is apparent that the two models are identifying a common component of the expected return series; we have seen that their persistence parameters, ρα, are 0.83 and 0.88. Panel B shows that the correlation between the pricing kernel model quarterly (1-year) estimates based on the F F 3 and 3DP spanning portfolios is 0.92 (0.85), confirming the visual impression from Figure 1. This high correlation gives us some assurance that the two sets of spanning portfolios are spanning the same component of the pricing kernel. The cross-model correlation between the quarterly (1-year) estimates for the 3DP pricing kernel model and the 6DP discount rate model is 0.59 (0.61), and between the F F 3 pricing kernel model and the 22 Note that the F F 3 estimates for the discount rate model show very low persistence. 19 6DP discount rate model is 0.54 (0.62). Only the F F 3 discount rate model estimates show relatively low correlations with the other series and, as we have noted, the F F 3 portfolios do not appear to be adequate to span the cash flow and discount rate news. Simultaneous bootstrap simulations for the two models provide further confirmation that the predictability that they identify is not spurious. 10,000 bootstrap data samples of the portfolio and market returns were generated under the null hypothesis in the manner described in Section 5. For each sample the F F 3 pricing kernel model and 6DP discount rate model predictors of quarterly market excess returns were estimated. The simulations imply that the probability of obtaining the joint levels of predictability reported for the two models in Tables 1 and 7 is 0.14%; and this declines to 0.03% when the correlation of 0.54 between the estimated risk premium series is taken into account. Panel C reports the R2 from univariate and bivariate regressions of the quarterly and 1-year market excess return on the pricing kernel model predictions using F F 3 and 3DP spanning portfolios and the discount rate model predictions using the F F 3 and 6DP spanning portfolios. The numbers on the diagonal are the R2 from the univariate regressions while the off-diagonal numbers are the R2 from the bivariate regressions with predictors given by the corresponding row and column headings. Comparing the diagonal and off-diagonal terms, we see that for both pricing kernel models the prediction can be improved by combining it with the prediction of one of the discount rate models. For example, the adjusted R2 for 1-year predictions of the pricing kernel models using 3DP spanning portfolios increases from 19.4% to 25.7% when combined with the 6DP discount rate model prediction. Similarly, for both discount rate models the prediction can be improved by combining it with the prediction of one of the pricing kernel models. For example the adjusted R2 for the 6DP quarterly forecasts from the discount rate model rises from 8.7% to 11-12% when combined with one of the pricing kernel model forecasts. Thus the forecasts of neither model dominate those of the other. Rather, while there are elements of predictability that are not captured by the pricing kernel model but are captured by the discount rate model and vice versa, the modest increase in the R2 when the discount rate model predictions using the F F 3 or 6DP spanning portfolios are combined with the 6DP discount rate model prediction indicates that the two models are identifying a common component of predictability which is consistent with Figure 2 and the correlations reported in Panel B. 8 Additional empirical findings In this section we report first some results on forecasts of the covariance of the pricing kernel with the market return derived from the pricing kernel model. Secondly, we analyze the effect of restricting the pricing kernel specification to a particular version of the ICAPM. Then we consider the ability of the two models of the discount rate to capture time-variation in the expected returns on some additional portfolios. Finally, we relate the time-variation in the expected return series from the discount rate model to time-variation in market liquidity and in sentiment. 8.1 Predicting the covariance of the pricing kernel with the market return The pricing kernel model rests on the familiar result that the expected excess return on the market portfolio is determined by the conditional covariance of the pricing kernel with the mar20 ket return as shown in Equation (1). It follows that if the model forecasts the expected excess return it should also forecast the corresponding conditional covariance. Such an implication can be tested in principle by regressing an estimate of the conditional covariance on the expected return forecast, α̂M,t: Ĉt+1 = a + bα̂M,t + t+1 (24) where Ĉt+1 is the estimated covariance of returns in period (t + 1), and α̂M,t is the estimate formed at the end period t of the arithmetic excess market return for period (t+1). If log returns within a quarter or a year were approximately iid then it would be possible to estimate the quarterly or 1-year covariance, Cq , Cy of the log of the market return with the log of the pricing kernel return by appropriately scaling estimates of the covariance derived from high frequency data. However, the covariance that we require is the covariance of the arithmetic not the log returns, and there is also extensive evidence of lagged cross auto-correlations between security and portfolio returns,23 both factors making the relation between the short run covariance that can be estimated using high frequency data and the covariance of quarterly or annual returns uncertain. To illustrate this, Table 10 reports scaled ratios of the covariance of the market return with the pricing kernel return calculated using daily, weekly, and monthly returns to the corresponding covariance calculated using quarterly returns. If long run returns were simply sums of short run returns and the returns were iid, we would expect these ratios to be equal to unity apart from sampling error. In fact, the ratios are considerably in excess of unity and even in excess of two for daily returns. This implies that we have to be very cautious in estimating covariances of long interval returns by simply scaling up estimates of the covariance obtained from short interval or high frequency returns. Therefore we face a quandary: we cannot estimate the conditional covariance of returns over the next quarter (year) from the observed one quarter or one year return, and yet if we use more high frequency data we are uncertain of the relation between the covariance of the one quarter (year) return and the covariance of higher frequency returns; the higher is the frequency of returns used the more efficient will be the estimator but also the more biased. Therefore in Table 11 we report the results of estimating equation (24) using different proxies for the covariance. The table is based on the pricing kernel model of predicted expected excess market returns for 1 quarter and 1 year using the F F 3 beta-spanning portfolios. The parameters for the 1-quarter prediction model on shown in the central column of Table 2. α̂yM,t and α̂qM,t are the 1-year and 1-quarter return predictions. The proxies for the 1-year and 1-quarter covariances, y q Ĉt+1 and Ĉt+1 are appropriately scaled sample covariances using daily, weekly, monthly and (for the 1-year covariance) quarterly returns. The proxies that use relatively low frequency data are very noisy: for example the proxy for the 1-year covariance that is estimated using quarterly data is based on only four observations; on the other hand, as we have mentioned, the proxies obtained from high frequency data are likely to be highly biased. These countervailing forces are apparent in the table: in Panel A the R2 for the prediction of the 1-year covariance rises from 0.003, when we use the proxy estimated from daily returns, to 0.097 when we use the proxy obtained from 12 monthly returns, but then declines to 0.063 for the proxy based on 4 quarterly returns. For the prediction of the 1-quarter covariance the R2 rises monotonically as 23 Levhari and Levi (1977) showed that CAPM betas vary systematically with the return interval even if returns are iid. Lo and MacKinlay (1990) show that the returns on large firms systematically lead the returns on small firms. Gilbert et al. (2014) show that the difference between daily and quarterly betas depends on the opacity of the firm accounts. Brennan and Zhang (2014) show that the ratio of long to short horizon betas depends on firm size, book-to-market ratio, number of analysts etc. 21 we move to proxies derived from lower return frequencies. Despite the difficulties of proxying the time-varying 1-year or 1-quarter covariances of the market return with the pricing kernel, the table provides strong evidence that in forecasting the excess return on the market portfolio with the pricing kernel model we are also forecasting the conditional covariance of the low frequency market return with the pricing kernel. 8.2 An ICAPM pricing kernel The pricing kernel that we have specified so far depends on undefined state variables that are assumed to be spanned by the returns on the different sets of portfolios we have introduced. In this section we restrict the pricing kernel to depend on the market return, the innovation in the interest rate and the innovation in the Sharpe ratio. These three variables are motivated by the discussion of the ICAPM in Brennan et al. (2004) and Nielsen and Vassalou (2006).24 First, a time series of realized reward to volatility ratios, RM.t+1 /σM,t+1 , was calculated by dividing the market excess return for each quarter by a scaled version of the realized daily volatility during the quarter. The Sharpe ratio, which is the predicted value of the reward to volatility ratio, is assumed to follow an AR(1) process, and its innovations are assumed to be spanned by the market return and the returns on the 6DP portfolios. Then, following the logic of Section 3.1, the realized reward to volatility ratio is given by: Rt+1 /σM,t+1 = a0 + P X SR δpSR xSR ) + t+1 pt (β (25) p=1 where xSR pt (β) = ∞ X β s rp,t−s s=0 Equation (25) is essentially the discount rate model of expected returns, except that the market excess return is standardized by the estimated volatility. The innovation in the Sharpe ratio is assumed to be given by the realized return on the portfolio with weights proportional to δpSR , which we denote by RSR,t. Equation (25) was estimated over the period 1946.1 to 2010.4 and the parameter estimates are reported in the first column of Table 12. Comparing these estimates with those for the 6DP discount rate model in Tables 7 and 8, we note that the R̄2 is 0.083 when the dependent variable is the reward to volatility ratio, as compared with 0.065 when the dependent variable is the raw excess return, and two of the δpSR are significant at the 5% level. Secondly, an AR(1) model was fitted to the quarterly risk free rate series and the innovations from the model, uRF t , were projected onto the the 6DP portfolio returns. The estimates of the parameters, δpRF , are reported in the second column of Table 12. We see that the portfolio returns capture about 63% of the variation in the residual. Portfolio weights proportional to δpRF were used to calculate the returns on the RF mimicking portfolio, RRF,t. 24 Briefly: in a continuous time diffusion setting the instantaneous investment opportunity set is completely described by the interest rate and the Sharpe ratio. If these two variables follow a joint Markov process, then they are sufficient statistics for the entire investment opportunity set. 22 Then the predictive system for the market excess return is: IC IC RM,t+1 = a0 − δRM xIC RM,t − δSR xSR,t − δRF xRF,t (26) where xIC RM,t (β) = ∞ X xIC SR,t (β) = ∞ X xIC RF,t (β) = ∞ X β s RM,t−s RM,t−s s=0 β s RSR,t−s RM,t−s s=0 s βRF RRF,t−s RM,t−s s=0 The weighting parameter for the RF variable, βRF , was set equal to the estimated autoregressive parameter of the AR(1) process for RF , and the other parameters were estimated as described above, using market and portfolio returns for the period 1946.1 to 2010.4. The results for the restricted pricing kernel model are shown in the third column of Table 12. The bias-adjusted R2 of 0.052 compares with the values of 0.083 for the unrestricted 3DP pricing kernel model, and of 0.041 for the CAPM pricing kernel model (M ),shown in Table 1, and the value of 0.056 for the 6DP discount rate model shown in Table 7. The t-statistics for the two ICAPM state variables which are calculated using bootstrap standard errors are both in excess of two. The parameter estimates imply that there is a positive risk premium associated with covariation with the Sharpe ratio, but a negative premium associated with covariation with the interest rate. Overall, the results show that time variation in the covariance of a simple ICAPM pricing kernel with the market return captures a significant fraction of the time variation in expected returns that is captured by the more general pricing kernels that we have considered. 8.3 Sentiment, illiquidity and predicted returns Thus far we have either left the determinants of expected returns unspecified as in the discount rate model or assumed that expected returns are determined solely by changing risk as in the pricing kernel model. Other possible determinants of expected returns include investor sentiment (Baker and Wurgler (2006)) and market illiquidity (Amihud (2002)). Moreover Baker and Wurgler have found that their measure of investor sentiment is an important determinant of the expected returns on spread portfolios that are long portfolios of small, high book-to-market, or high dividend yield stocks and short the corresponding portfolios of big, low book-to-market and low dividend yield stocks. To the extent that investor sentiment reflects only psychological factors and is independent of risk as captured by the pricing kernel, we should expect that the pricing kernel model estimates of expected return would be unrelated to investor sentiment. Similarly, we also expect that the pricing kernel estimates of expected return will be independent of illiquidity to the extent that market illiquidity is independent of aggregate risk. On the other hand, the expected returns estimated using the discount rate model may well be related to such non-risk factors. As a preliminary to exploring this, we estimate 1-year expected returns for 3 spread portfolios using the two models: the spread portfolios are the Fama-French SM B and HM L factor 23 returns and the returns on a portfolio that is long in the top 20% of firms ranked by dividend yield and short in a portfolio of non-dividend paying stocks, which we denote by HM Z (High minus Zero). The models are estimated by simply replacing the market excess return in equations (12) and (22) with the spread portfolio return; the spanning portfolios for both models are the F F 3 portfolios. Selected model parameter estimates are reported in Table 13. Panel A shows that there is significant predictability for the 3 spread portfolio returns using the discount rate model, with the 1-year R2 ranging from 12 to 16% after bias correction. The (quarterly) persistence parameter for all three portfolios is around 0.86, implying a half-life for shocks of 4.6 quarters. Out of sample tests for the discount rate model show the model predictions reducing the error in the 1 year forecast of returns by 9%, 6% and 7% relative to the naive model for the SM B, HM L and HM Z portfolios respectively. This out of sample forecast performance for the spread portfolios is in marked contrast to the poor out of sample performance of the discount rate model when applied to market excess returns. On the other hand, Panel B shows little evidence of significant predictability from the pricing kernel model for the HM L and HM Z spread portfolios: thus there is little evidence that time variation in the returns on these portfolios is driven by time-varying risk which raises the question of whether the time-variation in expected returns on these portfolios is driven by time variation in sentiment or liquidity. However, for SM B the pricing kernel model yields predictability that is not only significant, but is greater than that of the discount rate model: the R2c rises from 12.5% to 16.8%. Moreover, the correlation between the two model estimates for SM B is 0.45 while it is less than 0.3 for the other two spread portfolios. Table 14 shows the correlations between the 1-year expected returns on the spread portfolios calculated from the discount rate model using the F F 3 spanning portfolios. The expected returns on HM L and HM Z have a correlation of 0.77 but, while the expected returns on HM L and SM B have a positive correlation (0.20), the expected returns on HM Z and SM B are almost uncorrelated. Figure 3, which plots the three expected return series, shows that the expected return on HM Z was strongly negative for most of the 1970’s: high dividend yield stocks had very low expected returns relative to those on zero yield ‘growth’ stocks during this period. To determine whether the difference between the discount rate model and the pricing kernel model estimates of expected returns are related to sentiment or liquidity the difference between the quarterly F F 3 discount rate model estimates of the 1-year expected returns and the corresponding pricing kernel model estimates were regressed on the Baker and Wurgler (2006) (BW) measure of investor sentiment and the average value of the Amihud (2002) measure of market illiquidity for the previous year.25 The results are reported in Table 15. For completeness, the corresponding difference for the market portfolio was also included. In this case the discount rate model estimate of the 1-year expected excess return for the market portfolio, RM , was estimated using the 6DP set of predictor portfolios. The market illiquidity variable contributes to the explanation of the difference between the expected return series for the HM L portfolio: the positive regression coefficient (t = 2.33) implies that value stocks have a higher expected return than growth stocks after adjusting for risk when Illiquidity is high. However there is no evidence that Illiquidity affects the expected returns on the market portfolio or the other 25 We align the BW measure for the end of year t with the expected return for year t because most of the variation in the BW measure is associated with variables from year (t − 1) (new issue returns, the relative pricing of dividend and non-dividend paying stocks, and share turnover). 24 two spread portfolios after adjusting for risk.26 Consistent with BW, we find no evidence that their measure of sentiment is associated with (the non-risk based component of the) expected returns on SM B. However, BW sentiment is significantly positively associated with the non-risk component of the expected returns on HM L (t = 2.65) and HM Z (t = 5.19). When BW sentiment is high, value firms have high expected returns relative to growth firms, and high dividend yield stocks have high expected returns relative to non-dividend paying stocks. While the dividend-yield spread portfolio results are consistent with BW, those authors were unable to find a statistically significant sentiment effect for the book-to-market ratio. Standardized time series of the HM Z portfolio 1-year expected return and the Baker-Wurgler sentiment series are plotted in Figure 4. It is clear that they track each other closely. The difference between the expected return series from the unconstrained discount rate model and that from the risk based pricing kernel model is an estimate of the component of the expected return that is not explained by risk. We have shown that this non-risk based component of the expected returns on HM Z and HM L is associated with time varying sentiment and illiquidity (HM L). Stambaugh et al. (2012) have argued that anomalies in stock returns are due in large part to impediments to short sales, and that mispricing is generally over-pricing that cannot be arbitraged away. They find that anomalies are highest following periods of high sentiment which gives rise to overpricing; and that the returns on the short leg of a strategy are more negative when sentiment is high, while the returns on the long leg are largely invariant to sentiment. This argument and the related findings suggest that the time-variation in the returns on the three arbitrage portfolios will be mainly due to time-variation in the returns on the short leg whose profits will come from periodic overpricing. To determine whether this is the case, the returns on the long and short portfolios underlying each of the arbitrage portfolios were regressed on dummy variables that capture whether the expected return on the arbitrage portfolio as calculated using the discount rate model is above or below its median value (Dlow , Dhigh). The market return, Rm,t was included as a control since the long and short portfolios are not market neutral. Rp,t = alow Dlow,t−1 + ahigh Dhigh,t−1 + βRmt + p,t The dependent variable is the one year return on the long or short portfolio, and the dummy variables were determined using estimates of expected 1-year return on the corresponding arbitrage portfolio from the discount rate model with F F 3 as spanning portfolios. The regression was estimated using overlapping quarterly data and t-statistics were calculated using NeweyWest standard errors with 4 lags. The results are shown in Table 16. In the table the long component of each spread portfolio is shown above the short component. ahigh − alow measures the difference between the market adjusted returns in states when the expected return on the spread portfolio is high and low and it therefore provides a simple measure of the contribution of individual portfolio to the time variation in the spread portfolio returns. Contrary to the expectations raised by the Stambaugh et al. findings, the time-variation in the returns on SM B and HM L are mainly due to the variation in the long side returns- Small firms for SM B, and both 26 Amihud (2002) does find an association between expected Illiquidity and the market return although the significance of the results becomes marginal when the standard deviation of the market return is included in the regression. 25 big and small high book-to-market returns for HM L. The variation in the short side returns is only approximately one eighth that of the long side returns. Moreover, for the constituent portfolios of HM L there is no significant difference between the variation in returns for large and small firms where impediments to short sales are likely to be larger. Finally, although we do not have an equilibrium model against which to assess returns there is not much evidence that the time variation in returns on HM L and SM B is due to periodic overpricing. Rather the highest absolute market adjusted returns are for the long portfolios in the high state - 6.7% for the Small firm component of SM B and 9.5% for the small high book-to-market portfolio of HM L. In contrast, returns for the supposedly overpriced short side in the high state are only 1.8% for Big firms in SM B and 1-2% for the low book-to-market portfolios in HM L. Thus, even though we have seen that the returns on HM L are strongly related to sentiment, there is not much evidence of the effects of short sales impediments which would imply that most of the time-variation in returns would come from the short-side portfolios. However, for the dividend spread portfolio, HM Z, the evidence favoring the Stambaugh et al. hypothesis is much stronger. Here more of the time variation comes from the short side portfolio of zero yield stocks and there is a strong suggestion of overpricing in the -6.1% average risk adjusted return on this portfolio in the high state. 9 Conclusion In this paper we have shown that it is possible to extract the expected returns on the market portfolio from the lagged returns on a set of spanning portfolios, and have provided new evidence on the predictability of (excess) stock returns. Our first model, the pricing kernel model, assumes that time variation in expected returns is driven solely by time variation in risk, where the risk of the market return is measured by its covariance with a portfolio whose returns capture innovations in the pricing kernel. This model expresses the expected excess market return as a weighted average of past cross-products of the returns on the beta-spanning portfolios and the market portfolio. The second model, the discount rate model relies on the fact that in a world of time varying discount rates, returns on common stock portfolios reflect shocks to discount rates as well as to cash flow expectations. By assuming that the expected return follows an AR1 process we are able to express the expected return as a weighted average of past returns on a portfolio whose weights are chosen so that its return has exposure only to discount rate innovations. The pricing kernel model predicts quarterly returns with a corrected R2 of 6-8% and 1-year returns with a corrected R2 of 14-16%. Out of sample, it reduces the mean square forecast error by 3-4% for quarterly forecasts and by 9-13% for 1-year forecasts. These predictions and forecast improvements are highly statistically significant. The predictive power of the model is also robust to the inclusion of other predictor variables that have been examined previously, and the component of predictability that we identify is essentially orthogonal to the LettauLudvigson (2001) cay variable. The persistence of the estimated expected return series is considerably less than the persistence of variables that have been commonly used to proxies for expected returns such as the dividend-price ratio, the earnings-price ratio and the T-bill rate. While many of these variables have a persistence in quarterly data of above 0.9 we identify expected returns with persistence of approximately 0.8 The three Fama-French factors, F F 3, are found to span a significant component of the 26 pricing kernel that gives rise to the time variation in expected market returns, although the predictive power of the model for quarterly (1-year) returns increases from 6.3% (14%) to 7.3% (17.1%) when the spanning portfolios are increased from F F 3 to a full set of 6 size and book-to-market sorted portfolios, which indicates that the F F 3 spanning is not perfect. Nevertheless, the major role of the F F 3 factors in capturing the time-variation of expected returns is important new evidence in support of their role in rational cross-sectional asset pricing. We also examine the predictive properties of a restricted pricing kernel model in which the variables that enter the pricing kernel are motivated by Merton’s (1973) ICAPM and depend on only the market return, the Sharpe ratio, and the riskless interest rate. When this restriction is imposed, the model predicts quarterly excess returns with a corrected R2 of 5.6% as compared with 8.3% for the unrestricted model. The discount rate model predicts quarterly (1-year) market returns with a corrected R2 of 6% (10%) when the set of spanning portfolios is the market portfolio and six portfolios formed on the basis of dividend yield (6DP ). The larger set of spanning portfolios required by this model is due to the need to span the cash flow as well as discount rate shocks on the portfolios. Since this requires the estimation of a larger set of coefficients, the individual coefficient estimates are not significant, although the quarterly predictions themselves are significant at the 5% level; and out of sample the model does not improve on the performance of a naive forecast. Although the two model forecasts have quite different conceptual bases and employ different predictor variables, the forecasts themselves are quite closely related; the highest correlation between the time series of forecasts from the two models is around 0.6. This gives us some confidence that the models are identifying a common component of the expected return series. But, while the pricing kernel model and and the 6DP version of the discount rate model identify predictability with a persistence of around 0.8 in quarterly data, the F F 3 (and 6BM − S) versions of the discount rate model pick up a high frequency component with a persistence of around 0.3 in quarterly data. The pricing kernel model attributes all the time variation in expected returns to time variation in risk, while the discount rate model imposes no such restrictions and potentially allows for expected returns to be affected by factors such as sentiment or liquidity. Therefore to investigate the role of non-risk factors in determining expected returns we also fitted the F F 3 discount rate model to the 1-year returns on the arbitrage or spread portfolios, HM L, SM B and HM Z, where HM Z is a portfolio that is long high dividend yield stocks and short zero yield stocks. The expected return series for all three portfolios have a predictable component with a persistence above 0.85 in quarterly data. Only a portion of the time-variation in the expected returns on the HM L and HM Z spread portfolios is captured by the risk based pricing kernel model, and we find that 7% (30%) of the variation in the expected returns on HM L (HM Z) that is not captured by the risk-based model is explained by the Baker-Wurgler sentiment variable and illiquidity. We find no evidence that the expected returns on either the market portfolio or SM B are afected by sentiment or illiquidity. Thus we have shown that a significant component of the variation in expected returns on stock portfolios is attributable to time-variation in risk. For the value-based arbitrage portfolios (HM L and HM Z) there is also evidence of ‘mispricing’ which is associated with waves of optimism and pessimism in financial markets and changing illiqidity. In support of the ‘sentiment’ hypothesis, we have shown that expected returns on HM L and HM Z are related 27 to the Baker-Wurgler measure of sentiment. An issue that we have not explored is that, while the evidence from the F F 3 pricing kernel model suggests that the Fama-French factors capture important aspects of the pricing kernel, we have found that a significant component of the variation in the expected return on HM L is due to mispricing relative to the risk-based model. Is it possible that this portfolio can capture an important element of the pricing kernel while itself being subject to mispricing? We leave this intriguing issue to future research. 28 10 Appendix A Pricing Kernel Spanning Assumption In the pricing kernel model we assume that the time-varying loading of the pricing kernel on the return on the market portfolio, bm(t), is spanned by the time-varying betas of the (pricing kernel) beta-spanning portfolios, bp(t): bm(t) = ΣPp=1 δpc bp(t), where δpc is a set of constant portfolio weights and bp(t) is the beta of portfolio p. To motivate this assumption assume that the vector of spanning portfolio betas, bp (t), can be written as an affine function of a K-element vector of state variables, xt, so that: bp (t) = bp0 + b0p1 xt where bp0 is an (P x1) vector of constants and bp1 is a (KxP ) matrix of constants. Assume further that the pricing kernel sensitivity to the market return, bm(t), can also be written as an affine function of the same state variables: bm(t) = bm0 + b0m1 xt where bm0 is a scalar and bm1 is a Kx1 vector. Then a sufficient condition for the pricing kernel model spanning assumption, bm (t) = ΣPp=1 δpc bp (t), is that the (1xP ) vector δ c satisfies: δ c bp0 = bm0 δ c b0p1 = b0m1 These (K+1) equations in the P unknowns, δpc , will in general have a solution if P ≥ K + 1. 11 Appendix Estimation Procedure To adjust for the small sample bias arising from persistence in the predictor variables we proceed as follows. First we estimate the small sample bias in the estimated R2 as a function of the estimated β under the null hypothesis of no predictability, BR2 (β). Then our bias-corrected estimator of β is given by β̂ = argmaxβ [R2c ] where R2c = R2 (β) − BR2 (β). In a similar fashion we calculate a bias corrected Wald statistic, W c . To calculate the bias in the estimated R2 under the null, BR2 (β), we adopt a bootstrap approach which reflects the null hypothesis of no predictability. Specifically, we fit a GARCH(1,1) to the returns on the market portfolio and each of the spanning portfolios and save both the (T xP ) matrix of the fitted volatilities and the (T xP ) matrix of standardized return innovations. To construct each bootstrap data sample we randomly select T (P x1)-vectors of standardized innovations and construct the period t vector of returns by multiplying the fitted volatilities for period t by the randomly selected standardized innovations and adding the intercepts from the GARCH estimation. In this way we preserve the exact time series of volatilities for the portfolios and the vector of mean returns, as well as the cross-sectional correlation structure, while ensuring that the returns are serially independent. Then from the simulated spanning portfolio returns we generate the predictor variables for the pricing kernel model, xcpt (β) = P∞ s 2 s=0 β Rp,t−sRM,t−s for different values of β and calculate the R from regressing the market 29 excess return on the generated predictor variables. For each generated sample we estimate the parameters (a0 , δ, β) in equation (12) and calculate the resulting R2 . Repeating this 10,000 times we calculate BR2 (β) as the average value of R2 in the bootstrapped samples.27 The predictor variables, xcp (β), are formed by truncating the summation in equations (13) at 1927.3. The predictive regressions and the return vectors which are sampled for the bootstrap start in 1946.1. Given the parameter estimates, we assess the statistical significance of our results by determining the proportion of the 10,000 bootstrap samples in which the calculated value of the corrected R2c ≡ R2 − BR2 (β) (or corrected Wald statistic) exceeds that calculated using the actual data. Standard errors of the parameter estimates are also obtained from the bootstrap estimation. The procedure was then repeated, replacing the quarterly excess return as dependent variable with the 1-year excess return starting in the same quarter. Although this induces overlap in the dependent variable this is accounted for in the bootstrap simulations used to calculate standard errors and significance. 27 Estimations are performed by a grid search over 0 ≤ β ≤ 0.95. 30 12 References Amihud, Y., 2002, Illiquidity and Stock Returns, Journal of Financial Markets, 5, 31-56. Baker, M., and J. Wurgler, 2006, Investor sentiment and the cross-section of stock returns, Journal of Finance, 61, 1645-1680. van Binsbergen, J.H., and R.S.J. Koijen, 2009, Predictive regressions: a present value approach, unpublished manuscript. Boudoukh, J., M. Richardson, and T. Smith, 1993, Is the ex ante risk premium always positive? A new approach to testing conditional asset pricing models, Journal of Financial Economics, 34, 387-408. Bandi, F.M., B. Perron, A. Tamoni, and C. Tebaldi, 2014, The scale of predictability, unpublished manuscript. Brennan, M.J., A.S. Wang and Y. Xia, 2001, A Simple Model of Intertemporal Capital Asset Pricing and Its Implications for the Fama-French Three-Factor Model, Unpublished Manuscript. Brennan, M., A. Wang, and Y. Xia, 2004, Estimation and Test of a Simple Model of Intertemporal Asset Pricing, Journal of Finance, 59, 1743-1775. Brennan, M.J., and Y. Xia, 2006, Risk and Valuation under an Intertemporal Capital Asset Pricing Model, Journal of Business, 79, 1-36. Brennan, M.J., and Y. Zhang, 2014, Capital Asset Pricing with a Stochastic Horizon, unpublished paper. Campbell, J., 1991, A variance decomposition for stock returns, Economic Journal, 101, 157179. Campbell, J., 1993, Intertemporal asset pricing without consumption data, American Economic Review, 83, 487-512. Campbell, J., and J. Ammer, 1993, What moves stock and bond markets?, Journal of Finance, 48, 3-38. Campbell, J. and R. Shiller, 1988, The dividend-price ratio and expectations of future dividends and discount factors, Review of Financial Studies 1, 195228. Campbell, J., and T. Vuolteenaho, 2004, Bad beta, good beta, American Economic Review94, 1249-1275. Campbell, J., and M. Yogo, 2006, Efficient tests of stock market predictability, Journal of Financial Economics81, 27-60. Cochrane, J.H., 2008, The dog that did not bark: a defence of return predictability, Review of Financial Studies, 21, 1533-75. Cochrane, J.H., 2002, Asset Pricing, Princeton NJ. Cohen, R.B., C. Polk and T. Vuolteenaho, 2003, The value spread, Journal of Finance, 58, 609-641. Eleswarapu, V and M. Reinganum, 2004, The predictability of aggregate stock market returns: 31 evidence based on glamour stocks, Journal of Business, 77, 275-294. Fama, E.F. 1991, Efficient Capital Markets: II.,Journal of Finance, 46, 1575-1617. Fama, E., and K. French, 1993, Common Ri.sk Factors in the Returns on Stocks and Bonds, Journal of Financial Economics, 33, 3-56. Fama, E., and K. French, 1995, Size and Book-to-Market Factors in Earnings and Returns, Journal of Finance, 50, 131-155. Fama, E., and K. French, 1996, Multifactor Explanations of Asset Pricing Anomalies, Journal of Finance, 51, 55-84. Ferson, Sarkissian and T.T. Simin, 2003, Spurious regressions in financial economics, Journal of Finance, 58, 1393-1413. Foster, F.D., T. Smith and R.E. Whaley, 1997, Assessing goodness-of-fit of asset pricing models, Journal of Finance, 52, 591-607. Ghysels, E, Santa-Clara, P, and R. Valkanov, 2005, There is a risk-return trade-off after all, Journal of Financial Economics, 76, 509548. Gilbert, T., C. Hrdlicka, J. Kalodimos, and S. Siegel, 2014, Daily data is bad for beta: Opacity and frequency-dependent betas, Review of Asset Pricing Studies 4, 78-117. Goyal, A., and I. Welch, 2008, A comprehensive look at the empirical performance of equity premium prediction, Review of Financial Studies, 21, 1455-1508. Guo, H., Savickas, R., Wang,Z., and Yang, J, 2009, Is Value Premium a Proxy for Time-Varying Investment Opportunities? Some Time Series Evidence, Journal of Financial and Quantitative Analysis, 44, 133-154. Hong, H., W. Torous, and Valkanov, 2007, Do industries lead the stock market, Journal of Finance, 83, 367-396. Huang, D., 2013, What is the maximum return predictability permitted by asset pricing models?, unpublished manuscript, Washington University. Kelly, B., and S. Pruitt, 2013, Market expectations in the cross-section of present values, Journal of Finance, 68, 1721-1756. Lakonishok, J., Shleifer, A., and R.W. Vishny, 1994, Contrarian investment, extrapolation, and risk, Journal of Finance, 49, 15411578. Lettau, M., and S. Ludvigson, 2001, Consumption, aggregate wealth and expected stock returns, Journal of Finance, 56, 815-849. Lettau, M., and S. Ludvigson, 2003, Expected Stock Returns and Expected Dividend Growth, Journal of Financial Economics, 76, 583-626. Lettau, M., and S. Ludvigson, 2010, Measuring and modeling variation in the risk-return tradeoff,Handbook of Financial Econometrics, vol. 1, Ait-Shalia, Y. and and L-P. Hansen Eds., 617690. Lettau, M., and J. Wachter, 2007, Why is long-horizon equity less risky? A duration-based explanation of the value premium, Journal of Finance, 62:55-92. 32 Levhari, D., and H. Levy, 1977, The Capital Asset Pricing Model and the Investment Horizon, The Review of Economics and Statistics, 59, 92-104. Lo, A., and C. MacKinlay, 1990, When are contrarian profits due to stock market overreaction?, Review of Financial Studies, 3, 175-205. Ludvigson, S., and S. Ng, 2007, The empirical risk-return relation: a factor analysis approach, Journal of Financial Economics, 83, 171-222. MacKinlay, A. C., 1995, Multifactor Models Do Not Explain Deviations from the CAPM, Journal of Financial Economics, 38, 3-28. Menzly, Lior, Tano Santos, and Pietro Veronesi, 2004, Understanding Predictability, Journal of Political Economy, 112, 1-47. Merton, R. C., 1973, An intertemporal capital asset pricing model, Econometrica, 41, 867-887. Merton, R. C., 1980, On Estimating the Expected Return on the Market: An Exploratory Investigation, Journal of Financial Economics, 8, 1-39. Newey, W. K., and K. D. West, 1987, A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix, Econometrica, 55, 703708. Nielsen, L.T. and M. Vassalou, 2006, The Instantaneous Capital Market Line, Economic Theory, 28, 651-664. Pastor, L., and R.F. Stambaugh, 2009, Predictive systems: living with imperfect predictors, Journal of Finance, 64, 1583-1628. Petkova, R., 2006, Do the Fama-French Factors Proxy for Innovations in Predictive Variables?, Journal of Finance, 61, 581-612. Ross, S.A, 2005, Neoclassical Finance, Princeton University Press. Scruggs, J.T., 1998, Resolving the Puzzling Intertemporal Relation between the Market Risk Premium and Conditional Market Variance: A Two-Factor Approach, Journal of Finance, 53, 575-603. Scruggs, J.T., and P. Glabadanidis, 2003, Risk Premia and the Dynamic Covariance between Stock and Bond Returns, Journal of Financial and Quantitative Analysis, 38, 295-316. Stambaugh, R.F., 1986, Bias in regression with lagged stochastic regressors, CRSP Working Paper No. 156, University of Chicago. Stambaugh, R.F., 1999, Predictive regressions, Journal of Financial Economics, 54, 375-421. Stambaugh, R.F., Yu,J., and Yuan, Y., 2012, The short of it: Investor sentiment and anomalies,Journal of Financial Economics, 104, 288302. Torous, W., R. Valkanov, and S. Yan, 2004, On predicting stock returns with nearly integrated explanatory variables, Journal of Business, 77, 937-966. Zhou, G., 2010, How much stock return predictability can we expect from an asset pricing model? Economic Letters, 108, 184-186. 33 Figure 1: Market expected returns estimated from the pricing kernel model The figure shows the expected 1-year excess returns for S&P500 estimated from the pricing kernel model. We estimate the model with three different sets of spanning portfolios: the S&P500 portfolio (M ), the 3 Fama-French portfolios (F F 3), and the S&P500 portfolio and three portfolios sorted on the basis of dividend yield (3DP ). The plot shows the expected 1-year excess returns for each quarter over the period 1946Q1-2010Q4. 34 Figure 2: Comparison of expected returns from pricing kernel and discount rate models The figure compares the expected 1-year excess returns for S&P500 estimated from the pricing kernel model with that of the discount rate model. The spanning portfolios for the pricing kernel model are the 3 Fama-French portfolios (F F 3), and for the discount rate model the expanded spanning portfolios based on dividend yield are used, which are the S&P500 portfolio and six portfolios sorted on dividend yield (6DP ). The plot shows the expected 1-year excess returns for each quarter over the period 1946Q1-2010Q4. 35 Figure 3: Expected 1-year returns for spread portfolios The figure shows the expected 1-year returns for the spread portfolio returns (SMB, HML, HMZ) estimated from the discount rate model. SM B and HM L are the Fama-French size and book-to-market factors. HM Z is a portfolio formed on dividend yields that is long a portfolio of the 20% of stocks with highest dividend yields and short a portfolio of non-dividend paying stocks. The expected returns for the spread portfolios are derived using the discount rate model with the 3 Fama-French portfolios (F F 3) as spanning portfolios. The plot shows the expected 1-year returns for each quarter over the period 1946Q1-2010Q4. 36 Figure 4: The Baker-Wurgler sentiment measure and expected returns on the HMZ portfolio The figure shows the relation between the Baker-Wurgler (BW) measure of sentiment and the estimated 1-year return of the HMZ portfolio, HM Z(t) ≡ Et [RHM Z,t+1 ]. The expected return is derived using the discount rate model where the spanning portfolios are the 3 Fama-French portfolios (F F 3). HM Z is a portfolio formed on dividend yields that is long a portfolio of the 20% of stocks with highest dividend yields and short a portfolio of non-dividend paying stocks. BW(t+1) denotes the sentiment series advanced by one year in order to make its principal components more timely. The plot shows the expected 1-year returns and sentiment measure for each quarter over the period 1965Q3-2010Q4. Both series are standardized to zero mean and unit volatility. 37 Panel A. Predicting quarterly excess returns Spanning Portfolios β ρα w R2 W R2c M FF3 3DP 0.522 0.655 0.694 0.631 0.801 0.829 0.109 0.146 0.135 0.043 0.066 0.088 21.558 28.756 46.775 0.041 0.063 0.083 0.081 0.079 59.991 60.659 0.073 0.071 Expanded sets of spanning portfolios 6BM − S 0.650 0.778 0.128 6DP 0.675 0.819 0.144 Wc ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗ 20.166 23.809 54.803 46.989 60.151 ∗∗∗ ∗∗ ∗∗∗ ∗∗ ∗∗ Panel B. Predicting 1-year excess returns Spanning Portfolios β ρα w R2 W R2c M FF3 3DP 0.400 0.562 0.547 0.505 0.719 0.727 0.105 0.157 0.180 0.038 0.161 0.184 11.969 33.846 36.503 0.033 0.140 0.157 0.212 0.173 95.181 73.281 0.171 0.130 Expanded sets of spanning portfolios 6BM − S 0.486 0.653 0.167 6DP 0.501 0.693 0.192 Wc ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ 10.857 28.614 37.971 75.079 53.447 ∗∗ ∗∗ ∗∗ ∗∗ ∗∗ Table 1: Pricing kernel model: tests of predictability and persistence parameter estimates The table reports the results of tests of the of no predictability P of the market excess return for P null hypothesis ∞ c c s the pricing kernel model: RM,t+1 = a0 + P δ x (β) + where x (β) = p t+1 pt pt p=1 s=0 β Rp,t−s RM,t−s . Panel A predicts quarterly and panel B 1-year returns. RM,t+1 is the quarterly (or 1-year) excess return on the S&P500 index; Rp,t , p = 1, · · · , P , are the quarterly excess returns on a set of predictor portfolios. We estimate the model with three different sets of spanning portfolios: the S&P500 portfolio (M ), the 3 Fama-French portfolios (F F 3), and the S&P500 portfolio and three portfolios sorted on the basis of dividend yield (3DP ). The expanded sets of spanning portfolios are the S&P500 portfolio and the six portfolios sorted on the basis of size and book-to-market ratio (6BM − S), and the S&P500 portfolio and six portfolios sorted on the basis of dividend yield (6DP ). The sample period is 1946.1 to 2010.4 (the prediction period). Estimations are performed by a grid search over 0 ≤ β ≤ 0.95. The parameters are chosen to maximize the R2c of the predictive regression, where R2c denotes the R2 of the predictive regression adjusted to correct for small sample bias. Levels of significance are determined by a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ∗∗ ∗∗∗ , , denote significance at the 10%, 5%, and 1% levels. ρα is the first order autocorrelation of the estimated market risk premium, and w ≡ ρα − β. W is the Wald statistic calculated using the Newey-West (1987) correction with 4 lags. Wc denotes the bias corrected value for W which is calculated by maximizing the bias-adjusted Wald statistic. Significance levels for Wc are calculated by the bootstrap procedure, and indicated by stars. 38 Panel A: parameter estimates Spanning portfolios: M FF3 3DP β 0.522∗∗∗ (0.18) 0.631∗∗∗ (0.18) -0.000 (0.01) 1.316∗∗∗ (0.43) 0.655∗∗∗ (0.10) 0.801∗∗∗ (0.09) -0.004 (0.02) 1.531∗∗∗ (0.39) 0.694∗∗∗ (0.08) 0.829∗∗∗ (0.08) -0.003 (0.02) 1.529 (1.45) ρα a0 RM Fama-French portfolios -1.165∗ (0.67) 1.268∗∗ (0.59) SM B HM L Dividend yield sorted portfolios -1.658∗∗∗ (0.53) 0.787 (0.97) 1.175 (0.73) zero Lo20 Hi20 R̄2 0.043∗∗∗ 0.066∗∗∗ 0.088∗∗∗ Panel B. Descriptive statistics of risk premium estimates M FF3 3DP Mean Std.Dev. Min. Max. 1.78% 1.71 0.09 10.76 1.78% 2.21 -2.64 11.96 1.78% 2.54 -5.17 12.67 Correlation matrix M FF3 3DP 1.00 0.73 0.62 1.00 0.91 1.00 Table 2: Pricing kernel model: parameter estimates for quarterly forecast Panel A reports parameter estimates from the regression: RM,t+1 = a0 + P X δp xcpt (β) + t+1 p=1 with spanning portfolio quarterly returns, Rp,t−s used to form the predictors, xcpt (β) = P∞ different s s=0 β Rp,t−s RM,t−s . The dependent variable is the quarterly excess return on the S&P500 portfolio, RM,t . We estimate the model with three different sets of spanning portfolios: the S&P500 portfolio (M ), the 3 Fama-French portfolios (F F 3), and the S&P500 portfolio and three portfolios sorted on the basis of dividend yield (3DP ). The sample period is 1946.1 to 2010.4 (the prediction period). The parameters are estimated by Supβ (R2 − bias). Standard errors (in parentheses) are calculated from a bootstrap simulation using 10000 realizations. Panel B reports means and standard deviations for the risk premium estimates obtained using the different sets of spanning portfolios, as well as the correlations between the estimates. 39 Panel A. Quarterly forecasts Spanning Portfolios: Horizon M 1 quarter 2 3 4 0.97 0.98 1.00 1.02 FF3 ∗∗ ∗ 0.96 0.94 0.95 0.95 3DP ∗∗∗ ∗∗ ∗∗ ∗∗ 0.97 0.90 0.88 0.84 ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Panel B. 1-year forecasts Spanning Portfolios: Horizon M FF3 3DP 1 year 2 3 1.02 1.06 1.15 0.91∗∗ 0.99 1.07 0.87∗∗∗ 0.87∗∗ 0.93∗ Table 3: Relative Error Variance of Out of Sample Return forecasts for the Pricing Kernel Model The table reports the ratio of the variance of the error in forecasting the market excess return using the pricing kernel model with different sets of spanning portfolios to the variance of the error of a naive forecast. A value below one indicates that the model outperforms the naive forecast which is the sample mean calculated from data up to the quarter of the forecast. The quarterly forecasts are made each quarter and are extended out to 4 quarters using the estimated parameters of the AR(1) process. The 1-year forecasts are also made quarterly and are extended out to 3 years using the estimated parameters of the AR(1) process. The model and the historical mean are estimated using data starting in 1926.2 and ending in the quarter before the forecast is made. The spanning portfolios are the the S&P500 portfolio (M ), the 3 Fama-French portfolios (F F 3), and the S&P500 portfolio and three portfolios sorted on the basis of dividend yield 3DP . Levels of significance are determined by a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ∗∗ ∗∗∗ , , denote significance at the 10%, 5%, and 1% levels. The 1st training sample if from 1946.1 to 1965.12. The first forecast made in 1966.1 and the last forecast is for 2010.4. 40 In sample regression Predictor Variable α R2adj β ρ REVOOS Panel A: univariate regressions dp ep b/m VS glam svar tbl lty tms infl dfy kp cay 0.110 0.060 0.003 0.105 0.025 0.018 0.030 0.028 0.008 0.022 0.012 -0.020 0.017 (2.66) (1.48) (0.19) (2.00) (3.24) (3.23) (3.22) (2.33) (1.04) (3.34) (0.88) (-1.20) (3.11) 0.027 0.015 0.027 -0.059 -0.024 -0.100 -0.280 -0.171 0.625 -0.478 0.619 0.313** 0.897*** (2.20) (1.03) (1.17) (-1.65) (-1.31) (-0.15) (-1.59) (-0.94) (1.63) (-1.10) (0.42) (2.50) (3.64) 0.018 0.004 0.003 0.007 0.003 -0.004 0.007 0.000 0.008 0.001 -0.003 0.023 0.042 0.984 0.950 0.980 0.816 0.900 0.456 0.949 0.981 0.834 0.474 0.885 0.931 0.925 1.015 1.058 1.177 1.000 0.989 1.008 1.001 1.019 0.994 0.996 0.987 0.985 0.979 1.117 -0.552 (0.68) (-0.17) -0.005 0.620 0.712 1.240 Panel B: bivariate regression svar scov 0.007 (0.81) Table 4: Predictive regressions using other predictor variables The table reports estimates of the equation: RM,t+1 = α + βXt + t+1 for the period 1946.1-2010.4. RM,t+1 is the 1 quarter S&P500 excess return in quarter t + 1 and Xt is the lagged value of the predictor variable. ρ is the auto-correlation of the variable. t-statistics are in parentheses and are adjusted for serial correlation in the residuals using the Newey-West correction (1987) with 4 lags. Panel A reports univariate regression results. The predictor variables are: the Dividend (Earnings) yield, dp (ep), defined as the log of the ratio of dividends (earnings) on the S&P500 over the past 12 months to the lagged level of the index; the Value Spread V S, is the log book-to-market ratio of the small high-book-to-market portfolio minus the log book-to-market ratio of the small low-book-to-market portfolio; Glamour, glam, is the cumulative log return over the past 36 months of the quintile of stocks with the lowest book-to-market ratio; b/m is the book-to-market ratio for the Dow Jones Industrial Average; the Stock Variance, svar, is the sum of squared daily returns on the S&P500 index over the previous quarter; tbl is the 3 month Treasury Bill rate; the Long Term Yield, lty is the yield on long term US government bonds; the Term Spread, tms, is the difference between the Long Term Yield and the Treasury Bill rate; Inflation, inf l, is the one month lagged inflation rate; the Default Yield Spread, df y, is the difference between BAA and AAA-rated corporate bond yields; and cay is the consumption, wealth, income ratio of Lettau and Ludvigson (2001) which is available over the period 1952.3-2010.4. Fuller descriptions of these variables are to be found in Goyal and Welch (2008) and the actual data series were taken from the website of Amit Goyal. kp is the in-sample predicted one year excess return from Kelly and Pruitt (2012). Panel B reports the results from a multivariate predictive regression in which the predictors are svar and scov, the sum of the daily cross-products of the market excess return with the HM L return. Levels of significance for β, indicated by stars, are determined by a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. REVOOS is the ratio of the out of sample variance of the error of a forecast of the quarterly return using the predictor variable to the variance of the error of a naive forecast equal to the historical mean. Out-of-sample forecasts start in 1966 and the model is estimated using data starting 1928.06 to one year before the forecast year. 41 Spanning portfolios: M FF3 3DP -0.05 -0.21 0.05 0.09 -0.36 0.55 -0.04 0.09 0.30 0.03 0.46 -0.06 0.01 0.12 -0.12 0.06 -0.11 -0.27 0.41 -0.40 -0.27 0.33 -0.17 0.19 0.21 0.04 0.12 -0.13 0.04 -0.10 -0.23 0.36 -0.29 -0.15 0.34 -0.21 0.25 0.24 0.07 Predictor Variable dp ep b/m VS glam svar tbl lty tms infl dfy kp cay Table 5: Correlations of Pricing Kernel Model risk premium estimates with other predictor variables The table reports the correlations between 4-quarter moving averages of the other predictor variables which are defined in Table 4 and 4-quarter moving averages of the expected return estimates from the pricing kernel model for different sets of spanning portfolios. The spanning portfolios are the S&P500 portfolio (M ), the three Fama-French portfoios (F F 3) and the S&P500 portfolio and three portfolios formed on dividend yield (3DP ). The sample period is from 1946.1 to 2010.4, except for cay which is from 1952.3 to 2010.4. Regressor 1 2 3 4 5 6 7 8 dp 0.011 (0.61) -0.020 (-1.06) 0.211 (1.09) 0.010 (0.56) -0.001 (-0.03) 0.236 (1.27) 0.015 (0.86) -0.003 (-0.15) 0.095 (0.51) 0.018 (1.06) -0.002 (-0.08) 0.034 (0.18) -0.009 (-0.41) -0.033 (-1.68) 0.322 (1.59) 1.002 (3.87) -0.010 (-0.45) -0.012 (-0.58) 0.349 (1.77) 0.914 (3.65) -0.002 (-0.10) -0.013 (-0.66) 0.206 (1.03) 0.878 (3.75) -0.003 (-0.12) -0.013 (-0.67) 0.153 (0.74) 0.848 (3.74) glam kp cay Pricing kernel predictions: M 1.008 (4.03) FF3 3DP 2 R̄ 0.022 0.061 0.923 (5.04) 0.078 0.944 (3.52) 0.949 (5.78) 0.101 0.066 0.100 0.874 (4.73) 0.116 0.881 (5.61) 0.137 Table 6: Regression of quarterly excess returns on Pricing kernel model predictions and other predictors The table reports the results of regressions of quarterly market excess returns on selected other predictor variables and forecasts from the pricing kernel model for different sets of spanning portfolios. The spanning portfolios are the S&P500 portfolio (M ), the three Fama-French portfoios (F F 3) and the S&P500 portfolio and three portfolios formed on dividend yield (3DP ). t-statistics are in parentheses. The sample period is from 1946.1 to 2010.4, except for the regressions involving cay which are from 1952.3 to 2010.4. 42 Panel A. Predicting quarterly excess returns Spanning Portfolios β ρα w R2 W R2c M FF3 3DP -0.252 0.153 0.884 -0.145 0.268 0.877 0.107 0.115 -0.007 0.009 0.046 0.034 3.498 17.452 12.365 0.008 0.044 0.030 Expanded sets of spanning portfolios 6BM − S 0.248 0.334 0.086 6DP 0.923 0.856 -0.067 0.042 0.065 21.713 33.553 0.036 0.056 Wc ∗∗ ∗ ∗∗ 2.418 14.055 13.130 11.855 22.920 ∗∗ ∗∗ ∗∗ Panel B. Predicting 1-year excess returns Spanning Portfolios β ρα w R2 W R2c Wc M FF3 3DP dnc 0.014 0.825 0.066 0.830 0.052 0.005 0.023 0.094 8.115 17.132 0.021 0.055 4.407 10.099 0.121 0.181 23.739 52.138 0.042 0.099 Expanded sets of spanning portfolios 6BM − S 0.887 0.767 -0.120 6DP 0.902 0.847 -0.055 ∗ 8.815 36.368 ∗∗ Table 7: Discount rate model: tests of predictability and persistence parameter estimates The table reports the results of tests of P the null hypothesis of no predictability the market excess return for Pof ∞ d d s the discount rate model : RM,t+1 = a0 + P p=1 δp xpt (β) + t+1 where xpt (β) = s=0 β Rp,t−s . Panel A predicts quarterly and panel B 1-year returns. RM,t+1 is the quarterly (or 1-year) excess return on the S&P500 index; Rp,t , p = 1, · · · , P , are the quarterly excess returns on a set of spanning portfolios. We estimate the model with five different sets of spanning portfolios: the market portfolio (M ), the 3 Fama-French portfolios (F F 3), the S&P500 portfolio and three portfolios sorted on the basis of dividend yield (3DP ); six portfolios formed on the basis of firm size and book to market ratio (6BM -S), and the S&P500 portfolio and six portfolios formed on the basis of dividend yield (6DP ). The sample period is 1946.1 to 2010.4 (the prediction period). Estimations are performed by a grid search over 0 ≤ β ≤ 0.95. The parameters are chosen to maximize the R2c of the predictive regression, where R2c denotes the R2 of the predictive regression adjusted to correct for small sample bias. Levels of significance are determined by a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. dnc denotes did not converge. ρα is the first order autocorrelation of the estimated market risk premium, and w ≡ ρα − β. W is the Wald statistic calculated using the Newey-West (1987) correction with 4 lags. Wc denotes the bias corrected value for W which is calculated by maximizing the bias-adjusted Wald statistic. Significance levels for Wc are calculated by the bootstrap procedure, and indicated by stars. 43 β ρα a0 RM Fama-French portfolios smb hml Size and book-to-market sorted portfolios FF3 6BM-S 3DP 6DP 0.153 (0.24) 0.268 (0.24) 0.018∗∗ (0.01) 0.145∗ (0.08) 0.248 (0.24) 0.334 (0.23) 0.021∗ (0.01) 0.764 (0.53) 0.884∗∗∗ (0.18) 0.877∗∗∗ (0.19) 0.026 (0.02) 0.312 (0.19) 0.923∗∗∗ (0.13) 0.856∗∗∗ (0.14) 0.030 (0.02) 0.272 (0.25) -0.030 (0.06) -0.188 (0.13) -0.040 (0.06) -0.103 (0.13) -0.251 (0.16) 0.034 (0.16) 0.209 (0.16) -0.169∗ (0.10) 0.065∗∗ -0.321∗∗∗ (0.11) -0.102 (0.10) sl 0.042 (0.15) 0.000 (0.31) -0.341 (0.21) -0.406 (0.36) 0.062 (0.25) 0.010 (0.18) sn sh bl bn bh Dividend yield sorted portfolios zero Lo20 Qnt2 Qnt3 Qnt4 Hi20 R̄2 0.046∗∗ 0.042 -0.120 (0.09) 0.034∗ Table 8: Discount rate model: parameter estimates for quarterly forecasts This table reports parameter estimates from the regression: RM,t+1 = a0 + P X δp xdpt (β) + t+1 p=1 P s with different spanning portfolio quarterly returns, Rp,t−s used to form the predictors, xdpt (β) = ∞ s=0 β Rp,t−s . The dependent variable is the quarterly excess return on the S&P500 portfolio, RM,t . We estimate the model with four different sets of spanning portfolios: the 3 Fama-French portfolios (F F 3), the S&P500 portfolio and three portfolios sorted on the basis of dividend yield (3DP ); six portfolios formed on the basis of firm size and book to market ratio (6BM -S), and the S&P500 portfolio and six portfolios formed on the basis of dividend yield (6DP ). The sample period is 1946.1 to 2010.4 (the prediction period). The parameters are estimated by Supβ (R2 − bias). Standard errors (in parentheses) are calculated from a bootstrap simulation using 10000 realizations. 44 Panel A. Quarterly expected returns Model Pricing Kernel Spanning portfolios: FF3 3DP FF3 6DP Mean Std. Dev. 1.78% 2.21 1.78% 2.54 1.78% 1.90 1.78% 2.39 Discount Rate Panel B. Correlations between risk premium estimates Model Pricing Kernel Spanning portfolios: FF3 Pricing Kernel - F F 3 Pricing Kernel - 3DP 1.00 0.92 1.00 Discount Rate - F F 3 Discount Rate - 6DP 0.22 0.54 0.28 0.59 Pricing Kernel - F F 3 Pricing Kernel - 3DP 1.00 0.85 1.00 Discount Rate - F F 3 Discount Rate - 6DP 0.33 0.62 0.27 0.61 Discount Rate 3DP FF3 6DP Quarterly 1.00 0.35 1.00 1.00 0.30 1.00 1-year Panel C. R̄2 from univariate and bivariate regressions Model Pricing Kernel Discount Rate Spanning portfolios: FF3 FF3 Pricing Kernel - F F 3 Pricing Kernel - 3DP 0.074 0.096 0.099 Discount Rate - F F 3 Discount Rate - 6DP 0.108 0.110 0.126 0.120 Pricing Kernel - F F 3 Pricing Kernel - 3DP 0.167 0.197 0.194 Discount Rate - F F 3 Discount Rate - 6DP 0.241 0.172 0.200 0.257 3DP 6DP Quarterly 0.054 0.113 0.087 0.030 0.205 0.200 1-year Table 9: Comparison of the discount rate and pricing kernel models This table reports statistics for the quarterly and 1-year excess return forecasts from the discount rate and pricing kernel models for the period 1946.1 to 2010.4. The spanning portfolios are the the S&P500 portfolio (M ), the 3 Fama-French portfolios (F F 3), and the S&P500 portfolio and in turn, the three (six) portfolios sorted on the basis of dividend yield (3DP, 6DP ). Panel B reports correlations between the different estimates of the market risk premium. The expected market excess returns are calculated quarterly. A rolling four-quarter average is taken to reduce noise before calculating the correlations in Panel B. Panel C reports the R2 values from (i) univariate predictive regressions of the market returns on the fitted expected returns for each model (on the diagonal); and (ii) for bivariate regressions on the fitted expected returns for two models (the off diagonal elements). 45 Daily d Weekly Monthly w m Quarterly cov (Rm ,pk)Nd cov q (Rm ,pk) cov (Rm ,pk)Nw cov q (Rm ,pk) cov (Rm ,pk)Nm cov q (Rm ,pk) cov q (Rm ,pk) cov q (Rm ,pk) 2.267 1.884 1.207 1.000 Table 10: Covariance ratios This table reports covariance ratios for different return intervals. The covariance is between the market return, Rm , and the portfolio with weights δc . The portfolio weights δc are those estimated for the prediction of quarterly market excess returns using the pricing kernel model with F F 3 spanning portfolios, reported in Table 8. Rpk is the return on this portfolio. Covariances are calculated using daily, weekly, monthly and quarterly returns and the sample period is 1946.1-2010.4. The covariances are then converted to a quarterly basis by multiplying by the number of days (weeks, months), Nd etc., in a quarter and divided by the covariance from quarterly data, cov q (Rm , Rpk ). 46 Panel A: Pricing kernel estimated from prediction of 1-year return y Ĉt+1 = a + bα̂yM,t + t+1 a b R2 Covariance estimated: Quarterly Monthly Weekly Daily 0.025 (1.86) 0.052 (5.78) 0.098 (6.19) 0.117 (5.71) 0.433 (2.78) 0.328 (3.56) 0.176 (1.13) 0.184 (1.15) 0.063 0.097 0.007 0.003 Panel B: Pricing kernel estimated from prediction of 1-quarter return q Ĉt+1 = a + bα̂qM,t + t+1 a b R2 Covariance estimated: Monthly 0.015 (5.14) 0.020 (5.90) 0.023 (5.08) Weekly Daily 0.455 (3.28) 0.520 (2.50) 0.497 (2.63) 0.066 0.043 0.026 Table 11: Predicting the covariance of the market return with the estimated pricing kernel The realized covariances between market returns and the pricing kernels are estimated for year (quarter) t + 1, y q Ĉt+1 (Ĉt+1 ) using quarterly, monthly, weekly and daily returns. Thus for the Panel A (B), if n daily (weekly, monthly, quarterly) observations are used to estimate the realized covariance over the following year (quarter), the realized covariance is defined by: Ĉ ≡ covt(R, δc0 Rp ) ≡ n n X [(Rt+i − < R >)(δc0 Rp,t+i − < δ0 Rp >)] i=1 c where δ is the kernel weight vector estimated by using the Fama-French portfolios to forecast 1-year (1-quarter) returns. < R > is the mean return per period (day week etc.) during the year (quarter). The covariance estimates are regressed on the 1-year (1 quarter) predicted market excess return, α̂yM,t (α̂qM,t ). y Ĉt+1 = a + bαyM,t + t+1 q Ĉt+1 = a + bαqM,t + t+1 where αyM,t = s=0 β s δc0 Rp,t−s Rt−s is the predicted 1-year market excess return using the Fama-French portfolios, and δc is the vector of portfolio weights from the 1-year return prediction and αqM,t is defined analogously. t-statistics are in parentheses and are adjusted for serial correlation in the residuals using the Newey-West correction (1987) with 4 lags. The sample period is 1946.1-2010.4. P 47 Dependent variable: SR RF RM β 0.926∗∗∗ (0.08) 0.607∗∗ (0.24) 3.743 (3.16) 0.973∗∗∗ (0.02) 0.011∗∗∗ (0.00) -0.026 (0.02) 0.658∗∗∗ (0.11) 0.000 (0.01) 1.580∗∗∗ (0.44) -1.104 (0.73) -0.777 (1.61) -4.16∗∗ (2.02) 1.461 (2.01) 2.859 (2.02) -2.386∗∗ (1.27) 0.002∗∗∗ (0.00) 0.017∗ (0.01) 0.016∗ (0.01) -0.047∗∗∗ (0.01) 0.058∗∗∗ (0.01) -0.028∗∗ (0.01) a0 RM Dividend yield sorted portfolios: Zero Lo20 Qnt2 Qnt3 Qnt4 Hi20 0.576∗∗ (0.24) -44.823∗∗ (22.40) SR RF R̄2 0.083 R̄2c 0.626 0.056 0.052 Table 12: Restricted pricing kernel model: parameter estimates for quarterly forecast P SR SR The first column reports parameter estimates for RM,t+1 /σM,t+1 = a0 + P p=1 δp xpt (β) + t+1 , where RM,t is market excess return for quarter t, σM,t is an appropriately scaled estimate of the daily return volatility, SR and P∞ thes 6DP predictor portfolio quarterly returns, Rp,t−s are used to form the predictor variables, xpt (β) = s=0 β Rp,t−s . The second column reports estimates of δpRF , the coefficients from the regression of the residual from an AR(1) model of the riskless interest rate, RF , on the portfolio returns. The third column reports estimates of IC IC RM,t+1 = a0 − δRM xIC RM,t − δSR xSR,t − δRF xRF,t P∞ s P∞ s IC where xIC RM,t (β) = s=0 β RM,t−s RM,t−s , xRF,t (β) = s=0 βRF RRF,t−s RM,t−s , and the portfolio returns RRF,t and RSR,t are formed using weights proportional to the regression coefficients δRF and δSR . βRF is set equal to the autoregressive coefficient for the riskless interest rate. R̄2c is the bias corrected estimate of R̄2 . Standard errors (in parentheses) are calculated using a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. are in parentheses. The sample period is 1946.1 to 2010.4 48 Panel A. Discount rate model R2 W R2c Spread Portfolio β ρα w SM B HM L HM Z 0.738 0.988 0.981 0.853 0.879 0.879 0.115 -0.109 -0.102 Spread Portfolio β ρα w SM B HM L HM Z 0.976 0.464 0.510 0.943 0.618 0.690 -0.033 0.154 0.180 0.151 0.217 0.206 21.804 28.536 22.566 0.125 0.161 0.159 Wc ∗∗ ∗∗ ∗∗∗ Panel B. Pricing kernel model R2 W R2c 0.202 0.084 0.097 30.172 14.204 9.655 0.168 0.063 0.078 17.451 20.588 16.324 ∗ ∗∗ ∗ Wc ∗∗∗ ∗ 27.397 12.384 5.273 ∗∗ Table 13: Tests of predictability and persistence parameter estimates for 1-year returns on spread portfolios This table reports selected parameter estimates for the predictability of 1-year spread portfolio returns from the discount rate and pricing kernel models using the 3 Fama-French portfolios (F F 3) as spanning portfolios. The sample period is 1946.1 to 2010.4 (the prediction period). SM B and HM L are the Fama-French size and book-to-market factors. HM Z is a portfolio formed on dividend yields that is long a portfolio of the 20% of stocks with highest dividend yields and short a portfolio of non-dividend paying stocks. Levels of significance are determined by a bootstrap procedure in which returns over the period 1946.1 to 2010.4 are sampled under the null hypothesis: ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. See notes to Table 1 for further details. SM B HM L HM Z SM B HM L HM Z 1.000 0.199 -0.031 0.199 1.000 0.774 -0.031 0.774 1.000 Table 14: Correlations of expected 1-year returns on spread portfolios This table reports correlation of the 1 year expected expected returns on the spread portfolios. SM B and HM L are the Fama-French size and book-to-market factors. HM Z is a portfolio formed on dividend yields that is long a portfolio of the 20% of stocks with highest dividend yields and short a portfolio of non-dividend paying stocks. Expected returns are calculated from the discount rate model using the three Fama-French factors (F F 3) as spanning portfolios. The sample period is 1946.1 to 2010.4. 49 RM SM B HM L HM Z Const. Sentiment Illiquidity R2 0.01 (0.88) 0.01 (0.44) -0.02 (-1.55) -0.04 (-2.13) -0.01 (-0.83) -0.01 (-0.87) 0.02 (2.65) 0.06 (5.19) -2.06 (-1.09) -0.65 (-0.27) 3.80 (2.33) 1.23 (0.51) 0.00 0.00 0.07 0.30 Table 15: Expected return estimates, sentiment and illiquidity This table reports the results of regressing the difference in expected 1-year returns from the discount rate model, µdrm,t , and the pricing kernel model, µpk,t , on the Baker-Wurgler (2006) measure of investor sentiment (Sentiment) and the average value of the Amihud (2002) measure of market illiquidity for the previous year (Illiquidity): µdrm,t − µpk,t = c + β1 Sentimentt+1 + β2 Illiquidityt + t The regression is estimated using expected 1-year returns on the market portfolio RM , and on each of the spread portfolios (SM B, HM L, HM Z). The expected 1-year returns for the spread portfolios and for the pricing kernel estimates for the market portfolio are derived using the 3 Fama-French portfolio (F F 3) as spanning portfolios. The discount rate model estimates for the market portfolio use the 6DP portfolios as spanning portfolios. The sentiment variable is advanced by one year in order in order to make its principal components more timely. SM B and HM L are the Fama-French size and book-to-market factors. HM Z is a portfolio formed on dividend yields that is long a portfolio of the 20% of stocks with highest dividend yields and short a portfolio of non-dividend paying stocks. The sample period is from 1965.1 to 2010.4. The regression is estimated on a time series of quarterly estimates of 1-year expected returns and t − statistics in parentheses are calculated using Newey-West standard errors with 4 lags. ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. 50 alow ahigh ahigh − alow β R2 1.04 (16.24)∗∗∗ 0.95 (31.86)∗∗∗ 0.10 (1.70)∗ 0.69 0.99 (11.92)∗∗∗ 0.98 (13.91)∗∗∗ 1.12 (13.25)∗∗∗ 0.98 (45.40)∗∗∗ -0.07 (-0.89) 0.62 0.84 (10.97)∗∗∗ 1.26 (18.64)∗∗∗ -0.42 (-3.84)∗∗∗ 0.68 SM B spread Small Big SMB -1.33 (-0.79) 0.52 (0.91) -1.85 (-1.32) 6.68 (3.16)∗∗∗ 1.77 (2.60)∗∗∗ 4.91 (2.61)∗∗∗ 8.00 (3.36)∗∗∗ 1.24 (1.82)∗ 6.76 (3.17)∗∗∗ 0.95 0.09 HM L spread sh bh sl bl HML 2.60 (1.08) -0.14 (-0.14) -1.40 (-0.57) -0.05 (-0.09) 1.95 (1.55) 9.48 (4.17)∗∗∗ 5.91 (3.58)∗∗∗ -1.79 (-0.82) -1.16 (-1.67)∗ 9.17 (4.59)∗∗∗ 6.88 (2.46)∗∗ 6.06 (3.69)∗∗∗ -0.38 (-0.12) -1.11 (-1.51) 7.21 (3.97)∗∗∗ 0.77 0.61 0.95 0.09 HM Z spread Hi20 zero HMZ 0.07 (0.05) 2.17 (0.88) -2.09 (-0.68) 4.76 (2.52)∗∗ -6.07 (-3.40)∗∗∗ 10.83 (4.04)∗∗∗ 4.69 (2.51)∗∗∗ -8.23 (-3.15)∗∗∗ 12.92 (4.06)∗∗∗ 0.70 0.20 Table 16: Sources of time variation in spread portfolio returns This table reports the results of estimating the regression: Rp,t = alow Dlow,t−1 + ahigh Dhigh,t−1 + βRmt + p,t where Rp is a portfolio 1-year return. Dlow (Dlow ) is a dummy variable which is equal to one when the expected 1-year return, Et−1 [Rp,t ], on the corresponding spread portfolio (SM B,HML, HM Z) is above (below) its median value. The expected return on the spread portfolios are calculated from the discount rate model using the three Fama-French factors (F F 3) as spanning portfolios. The regression is estimated using a quarterly time series of data over the period from 1946.1 to 2010.4. t − statistics in parentheses are calculated using Newey-West standard errors with 4 lags. ∗ ,∗∗ ,∗∗∗ denote significance at the 10%, 5%, and 1% levels. 51
© Copyright 2026 Paperzz