Optimal Forecasts of Long-Term Returns and Asset Allocation

Compounded estimation errors in long-term expected returns:
More bad news for the equity premium
(or: which means are justified by the end ?)
Eric Jacquier (HEC, CIREQ, CIRANO)
From:
Optimal Estimation of the Risk Premium for the Long Run and Asset Allocation
With Alex Kane and Alan Marcus
Estimates of Long Term Expected Returns: Economic vs Statistical Rationales
OUTLINE OF TALK
1.
2.
3.
4.
5.
6.
Motivation
Summary of results
Biases of arithmetic and geometric methods, Unbiased forecast
Efficient forecast (Minimum mean squared error)
Robustness to distributional assumptions
Optimal long-term allocation with estimation error: Utility Based Forecast
1
1 MOTIVATION
Estimates of expected future long-term returns are crucial inputs in empirical asset pricing:
• Wealth a portfolio is expected to generate over the long term
Pension funds, Retirement policy.
• Input to the asset allocation decision, optimal mix of risky and risk-free assets for the long
run.
Current portfolio theory
• Very sophisticated models of time varying opportunity
• Effect of parameter uncertainty on the forecast and the optimal decision are not as often
discussed.
Barberis (2000)
2
Fact (i.i.d returns)
For known arithmetic mean one-period return of a portfolio 1+E(R), the unbiased estimate of
the long-term expected return is [1+E(R)]H
Question:
Given estimates of the mean (e.g., equity premium), what is the best forecast of long-term
expected returns?
Compound the arithmetic average or the geometric average over the relevant horizon H,
or something else?
Recall: we want to estimate: E[(1+R)H]
Arithmetic average:
Estimate 1+E(R) by the sample average of 1+R, and compound
− )H = 1.08575 = 454
(1+R
Appeal:
Best estimator of one-period E(R) under i.i.d assumption
−.
Maximum likelihood justification for compounding at R
3
Geometric average:
1+G = (PT/P1)1/T
defines G
=> 1/T log(PT/P1) = log (1+G)
(1+G)H = 1.0775 = 160
Appeal:
Simple intuition for compounding
Note:
For log-normal returns log(1+R) ~ N(µ, σ)
− ) estimates exp(µ + 0.5 σ2)
(1+R
(1+G) estimates exp(µ)
• Ibbotson SBBI yearbook simulates future values with arithmetic returns
Academics
Practitioners
favor arithmetic average even recently
debate fiercely; lean toward geometric average.
4
• “Forecasting US Equity Returns in the 21st Century”, John Campbell, July 2001
Geometric average or arithmetic average? The geometric average is the cumulative past return
on US equities annualized. Siegel (1998) studies long-term historical data on value weighted
US share indexes. He reports a geometric average of 7.0% over two different sample periods,
1802-1997 and 1871-1997. The arithmetic average return is the average of one-year past
returns on US equities. It is considerably higher than the geometric average return, 8.5% over
1802-1997, 8.7% over 1871-1997.
When returns are serially uncorrelated, the arithmetic average return represents the best
forecast of future return in any randomly selected future year. For long holding periods, the
best forecast is the arithmetic average compounded up appropriately: If one is making a 75year forecast, for example, one should forecast a cumulative return of 1.08575 based on 18021997 data.
5
• Cooper (1996) studies unbiased estimation of a discount factor for present value analysis
Approximations, simulations: concludes unbiased estimate is close to the arithmetic mean.
• Exception: Blume (JASA 1974);
o notion of unbiased forecast different from either approach
o no analytical results
• Recent market downturn brings back the question of the equity premium:
Fama and French (2002) , Jagannathan, McGratten, and Scherbina (2000)
The one-period equity risk premium, a.k.a. the mean return in excess of the risk-free
rate, is less than implied by the post-1926 average returns.
Estimation error in mean returns large even for long sample sizes
− ?”, not the implications for long-term returns.
They talk about “What is R
• Issue is central for
Long-term retirement planning
Growing defined-contribution investments market
Social security reform
6
2 SUMMARY OF RESULTS
• Maximum Likelihood (arithmetic) approach biased upward and very inefficient.
• Geometric approach biased, most often downward
• => Unbiased estimator (U) of expected long-term return for T, H: It is still extremely inefficient!
• => Efficient (Minimum mean squared error, MMSE) estimator: M
• Both estimators impose a “penalty” that deflates the annual compounding rate as the forecasting
horizon increases. M < U < MLE
M can be < G as H/T increases.
• Estimation uncertainty and horizon H interact to produce optimal portfolio allocations in stark
contrast with accepted wisdom:
Longer investment horizons requires lower, not higher, allocations to risky assets.
• Estimator consistent with Expected Utility and conventional risk aversion is quite “close” to the
efficient estimator.
• Results adjust easily to non i.i.d. returns. (still a one period decision)
7
3 BIASES OF ARITHMETIC AND GEOMETRIC METHODS, UNBIASED FORECAST
3.1 Basic Problem
rt = ln(1 + Rt) ~ N(µ,σ2), no autocorrelation.
H
VH = $1 × exp(µH + σ∑ εt+i ),
ε ~ i.i.d. N(0,1)
(1)
i=1
E(VH) = e(µ + ½σ2)H = [1 + E(R)]H
(2)
Ibbotson, Campbell (2001), use (2) with an estimate of E(R)
• In the following, we ignore estimation error in σ.
µ: Only the calendar span can increase precision of estimation
σ: Sampling frequency increases precision of estimation
High frequency data available, estimation of σ is a second order effect. Merton (1980)
• Jensen’s Inequality:
− )]H = [1 + E(R)]H = E(V ).
− ]H > [1 + E(R
E[1 + R
H
=> Arithmetic method is biased upward
8
(3)
Probability densities of ^
µ, the unbiased estimate of µ, and final portfolio value, E(VH) is estimated by
^
2
µ is symmetrically distributed around µ (Panel A). The distribution of the estimator of E(VH) is skewed. Skewness and bias of
e(µ + ½ σ )H. ^
the estimator of E(VH) increase with H. Panels B, C, and D show investment horizons H = 2, 10, 20 years. Annual returns are log-normal µ=
.10, σ= .20.
9
3.2 A caution: what we use as arithmetic estimator
First approach:
estimate E(R)
− ]H
A = [1 + R
and substitute in (2):
estimate µ
and substitute in (2):
1
Second approach:
^
A2 = e(µ + ½σ )H
2
Both are MLE estimators of E(VH), not exactly equal in small sample.
We will use A2 for analytics.
See Figure 1.
10
11
3.3 Bias of the arithmetic and geometric estimators
T
T


1
1

^
• µ = T ∑ ln(1 + R−i ) = T µT + σ ∑ ε−i  .


i=1
i=1
unbiased, with standard error σ/ T.
µ^ = µ + ω σ/ T ,
ω ~ N(0,1)
(4)
• Arithmetic estimator:
^
2
2
2
A = e(µ + ½ σ )H = e(µ + ωσ/ T + ½ σ )H = e(µ + ½ σ )H e(ωσ/ T )H
Substitute from equation (2):
E(A) = e(µ + ½σ )H E[eωσH/ T ] = E(VH)
2
2 2
e½σ H /T
Bias
12
(5)
• Geometric estimator
E(G)
^
2 2
= E(eµH) = E[e(µ + ωσ/ T )H ] = eµH + ½σ H /T
2
= E(VH) e½ σ (H/T − 1)H
Biased:
upward if
H> T
downward if
H<T
(6)
Q: Does it matter?
13
Table 1:
Bias induced by forecasting final portfolio value using arithmetic average return of portfolio over a
sample period. Ratio of forecast to true expected value of cumulative return. [σ: annual standard
deviation.
Sample period = 75 years
Horizon (years)
10
20
30
40
1.015
1.027
1.043
1.062
1.062
1.113
1.181
1.271
1.145
1.271
1.455
1.716
1.271
1.532
1.948
2.612
σ
0.15
0.2
0.25
0.3
Sample period = 30 years
10
Horizon (years)
20
30
1.038
1.069
1.110
1.162
1.162
1.306
1.517
1.822
40
σ
0.15
0.2
0.25
0.3
1.401
1.822
2.554
3.857
1.822
2.906
5.294
11.023
14
3.3 Unbiased estimator
• Simple inspection of (5) or (6):
^ + ½ σ2(1 − H/T) removes bias.
Compounding at µ
• Formally: for a (T, H) one can design an unbiased estimator in the class
^
2
C = e(µ + ½kσ )H
(7)
Class nests G (k = 0) and A (k = 1).
Class is a log-linear combination of A and G:
Log(C) = (1-k) Log(G) + k Log(A)
15
^
2
• Larger class e(k1µ + ½ k2σ )H
?
k1≠1 leads to infeasible estimators, i.e., functions of the true parameter.
We restrict our analysis to feasible estimators, i.e., functions solely of sample statistics.
• Class: w1 A + w2 G ?
Requires w2 = 0 for feasibility, then it maps one to one with those in equation (7)
• Result:
Unbiased estimator U:
solve for the value kU that so that E(C) = E(VH).
kU = 1 – H/T.
(8)
• Bias vanishes only if H<<T ( T/H → ∞ )
16
Estimates of compounding rates and future portfolio values. A, G, U: are annual compounding rates using arithmetic average A,
geometric average G, and unbiased estimator U (H = 40) computed over the sample period specified. Corresponding forecasts of
future portfolio values (relative to an initial $1 investment) for an investment horizon H = 40 years are denoted V(A), V(G), V(U).
a
T
(years) Begin
Sample estimates
^
^
µ
σ
Annual growth rates
A
G
U
Future portfolio value
V(A)
V(G)
V(U)
11.9
21.8
6.8
14.0
9.0
15.5
Country/Index
Freq
Canada/TSE
Canada/TSE
A
A
78
52
1914
1950
2001
2001
4.8
6.6
16.7
14.9
6.4
8.0
4.9
6.8
France/SBF250
France/SBF250
France/SBF250
A
A
A
145
82
52
1857
1920
1950
2001
2001
2001
5.1
8.5
8.7
19.7
24.7
22.2
7.3
12.2
11.8
5.2
8.9
9.1
6.7 16.7
10.6 101.5
9.7 87.0
7.7
30.0
32.5
13.5
56.0
40.8
Germany/DAX
Germany/DAX
Germany/DAX
A
A
A
145
82
52
1857
1920
1950
2001
2001
2001
1.9
5.5
8.0
32.2
37.0
22.8
7.3
13.1
11.2
1.9
5.7
8.3
5.8 17.0
9.4 139.5
9.0 69.4
2.1
9.0
24.5
9.6
36.7
31.2
UK/FTAS
UK/FTAS
UK/FTAS
A
A
A
201
82
52
1801
1920
1950
2001
2001
2001
2.4
5.5
6.4
15.6
20.0
24.7
3.7
7.8
9.9
2.4
5.7
6.6
3.4
6.7
8.3
4.2
20.1
43.8
2.6
9.0
12.9
3.9
13.6
24.2
Japan/Nikkei
Hong Kongb
MSCI/$Emg Mkt
A
D
M
52
28
14
1950
5/31/73
1/88
2001
8.8
1/2/01 10.7
7/01
8.2
24.1
30.7
24.2
12.4
16.7
11.8
9.2
11.3
8.5
9.9 107.9
9.0 475.8
2.5 85.7
33.8
72.2
26.6
44.2
31.0
2.7
End
a: Frequency: A annual; M monthly; W weekly; D daily
b: $HangSeng Without dividend yield. µ=13.5 with dividend yield
5.6
7.1
4 EFFICIENT ESTIMATION
• Unbiasedness not a goal per se.
Reduce a possibly unmanageable universe of candidate estimators
Can lead to inadmissible estimators, Stein (1956)
• Rather minimize a loss function – measure of average distance to the true parameter
• Mean squared error:
E[( β* - β)2] = E([β* - E(β*)]2) + [E(β*) -β]2
… can be viewed as generalization of the Maximum Likelihood Principle (in small sample)
• Derive the minimum MSE estimator in class C.
MSE(C)
= E[C − E(VH) ]2
^ H + ½kσ2H − eµH + ½σ2H )2
= E(eµ
^ H + kσ2H − e2µH + σ2H − 2eµ
^ H+ ½kσ2H +µH + ½σ2H)
= E(e2µ
^ = µ + ω σ/ T, evaluate, ..
Substitute µ
1
2 2
2
2
2 2
2
2
MSE(C) = e 2µH + 2σ H /T+kσ H + e 2µH + σ H - 2 e 2µH + ½σ H /T + ½kσ H + ½σ H
Minimize over k:
kM = 1 – 3 H/T
(10)
• Stronger downward penalty than the unbiased estimator
See Figure 2:
U, M penalize H by linearly decreasing the compounding rate
Effects are large, even for T=75, H=30, 12.7% down to 10%
Dramatic corrections for small T: 8% even for H = 20
Final wealths are very different. T=75: 120, 80, 25 $.
2
G
U 30
30
U, T=75
U, T=30
120
40
A
1.12
20
G
1.10
Annual Compounding Rate
1.06
1.08
1.10
1.12
10
0
M 30
0
80
20
M 75
60
U 75
(a): E(Vh) per $1 invested
40
Compounded Wealth
100
A
1.08
M, T=75
M, T=30
1.04
1.04
1.06
(b): E(Vh) in Annualized Terms
0
10
20
Horizon in years
30
Figure 2: Estimators of Long-Term Expected Returns: A, G, U, M. ^
µ = 0.1, σ = 0.2.
3
40
Q: Are the results specific to this loss function?
• Loss function seems relevant:
Error in $ at time when investor will withdraw investment.
Capital available when wage activity will cease
Relative squared error would yield the same estimator
Consistent with Maximum Likelihood, but allows for small sample
• Asymmetric loss?
Cost of over/under estimating terminal wealth.
^ > 0.
• Undesirable feature of (both) estimators: Can generate <0 estimates even if µ
Q: Do we significantly increase precision by using the “efficient” estimator?
… Let’s plot the Root Mean Squared Error of all these estimators
Top plot: Imprecision is very large, better deal with it!
Bottom plot:
Unbiased offers very mediocre precision gain even compared to basic geometric estimator
Efficient offers gain on all, (same as G when H =T/3)
4
2.5
1.5
2.0
Root Mean Squared Error / E[Vh]
A
G
1.0
U
a) Root Mean Squared Errors over E[Vh]
0.0
0.5
M
10
20
Horizon H in years, T = 60 years
30
40
60
M/A
60
0
40
0
-20
0
20
U/A
20
40
b) % RMSE Gain over A
-20
% RMSE Gain over A
G/A
0.0
0.1
0.2
0.3
0.4
H / T, for H = 1 to 40 years, T = 60 years
0.5
0.6
Figure 3: Root Mean Squared Errors of Estimators A, G, U, M. ^
µ = 0.1, σ = 0.2, T=60 years.
5
5 ROBUSTNESS TO DISTRIBUTIONAL ASSUMPTIONS
• Negative long term autocorrelation in returns
Summers (1986), Poterba and Summers (1988), and Fama and French (1988) , Richardson and
Smith (1990), Jacquier (1992)
• Effect on our analysis:
Enters through the sum of H future returns and T past returns used to estimate µ.
• Easily corrected
Correlation matrix C: TxT for past returns, HxH for future returns
Vector of ones of lengths T and H: i
Variance of a sum of past and future returns: σ2i′Ci instead of Tσ2 or Hσ2
• The future:
E(VH) in (2), exponential term becomes H(µ + ½ i′CH i σ2/H).
• The past:
µ^ in (14):
µ+ωσ
i′CTi/T.
6
• Define
FT = i′CT i/T
and FH = i′CH i/H,
Then
H FT
kU = 1 – T × F
H
(21)
3H FT
kM = 1 – T × F
H
(22)
Other non-i.i.d specifications easily extend the ration FT / FH.
See Figure 4: estimated MA(4) on annual SP500 log-returns from 26-01.
Forecasts are barely affected by the correction.
7
80
60
U
40
Estimate E(Vh)
Ignoring autocorrelation
Incorporating autocorrelation
0
20
M
10
20
30
40
Horizon H in Years
Figure 4: Effect of autocorrelation on estimators U and M, ^
µ = 0.1, σ = 0.2, T = 75, MA(4) on
annual S&P returns: θ = (-0.16, -0.02, -0.16, -0.08) estimated on 1926-2001.
8
• Heteroskedasticity?
Little problem for large H:
Volatility reverts to unconditional variance above a year
Currently studying the sensitivity to Kurtosis (will increase the estimation uncertainty of σ)
• Alternative estimation of µ
Fama-French (2002) argue that the dividend discount model amounts to reduction in the variance
of the estimator.
Just convert the variance reduction into an equivalent increase in T
• σ is estimated as well
induces non-normality in the predictive distribution of log-returns
very large degrees of freedom if higher frequency of returns is used
induces a variance inflation in the predictive distribution ν/(ν−2).
Currently extending results to estimation of σ: Unbiased estimator looks even worse
Geometric estimator is robust, it doesn’t use an estimate of σ!
9
6 OPTIMAL LONG-TERM ALLOCATION WITH ESTIMATION ERROR
6.1 Basic framework
α = µ + ½ σ2
r0 the risk-free return.
Power utility function and relative risk aversion γ, investor maximizes the expectation of her utility
of final wealth:
VH1 − γ
1
=
exp[(1 − γ) ln(VH)]
U(VH) =
1−γ
1−γ
(11)
w: risky portfolio and (1 − w): risk free asset
• Portfolio value is then log-normal with parameters:
2
ln(VH) ~ N(µH, σH) ≡ N[(r0 + w(α − r0) − ½ w2σ2H, Hw2σ2]
10
(12)
• Expected utility is then:
E[U(VH)] =
1
exp{(1 − γ)H[r0 + w(α − r0) − ½ w2σ2 + ½(1 − γ)w2σ2]}
1−γ
(13)
Maximize (13) with respect to w =>
well known optimal allocation w* =
α − r0
.
γσ2
• Independent of the horizon for i.i.d. returns,
Extensively discussed in the literature.
BUT…..
• Conventional advice is to increase allocation with the horizon.
• Largely motivated in the literature via predictability in expected returns, e.g., Garcia et al.
(2000), Wachter (2000) and others.
• These results most always assume knowledge of the parameters of the return distribution.
11
6.2 Long-term allocation with estimation error
• Optimal asset allocation is affected by estimation uncertainty in α, more so as H/T grows.
• Bawa, Brown, and Klein (1979) and others model the:
“variance inflation” induced by estimation error in a one-period framework.
Not very dramatic for a short horizon.
• We now show that the interaction between estimation error and (long) forecasting horizon is very
important.
We need to incorporate the uncertainty in α in the above asset allocation:
• Practice of substituting a point estimate in the optimal allocation in place of the unknown α is
incorrect.
12
• The investor, has a distribution for α that represents its uncertainty. a sampling distribution or, for
a Bayesian, a posterior distribution.
=> E[U(VH)] in (13) is random, as a non-linear function of a random variable α.
• Basic decision theory:
The correct expected utility to maximize, follows from first integrating α out of equation (13).
In Bayesian jargon, this integration produces the expected utility of wealth given the data,
E[U(VH | D)]
… to be then optimized by the investor.
• Specifically:
E[U(VH) | D] =
⌠
⌡E[U(VH)/α]
p(α | D) dα
(14)
13
^ , σ2/T).
• With diffuse priors, the posterior distribution of α is N(α
• Integrate (14):
E[U(VH | D)] =
1
^ − r0) − ½ w2σ2 + ½(1 − γ)w2σ2(1 + H)]}
exp{(1 − γ)H[r0 + w(α
T
1−γ
Note:
^,
1) α in (13) is indeed replaced with α
2) But a new term in (15) reflects the variance inflation due to the estimation of α.
• Finally maximize (15) for the optimal asset allocation:
w* =
^ − r0
α
(16)
H
H
σ [γ(1 + T ) − T ]
2
14
(15)
• Note:
H<<T:
It collapses to the “textbook” case.
γ > 1:
. risky asset allocation decreased relative to known α case
. The more so the greater the ratio H/T.
. Contrary to the common advice to invest more in stocks for longer horizons.
. Happens even if returns are unpredictable.
Exception:
log-utility, γ = 1.
linear in α: estimation uncertainty and the horizon H do not affect the location of the optimum.
See figure 3:
15
0.8
0.8
Gamma=2, Conventional
0.6
Gamma=2, T=30
0.6
Optimal weight in equity
Gamma=2, T=75
0.4
0.4
Gamma=4, Conventional
Gamma=4, T=75
0.2
0.2
Gamma=4, T=30
0
10
20
Horizon in Years
30
40
Figure 5: Joint effects of horizon and estimation error on optimal allocation, ^
µ = 0.1, σ = 0.2
16
• Hedging demands, learning, and other dynamic features are likely to be very minimal:
^ is unpredictable. The best forecast of a future µ
^ is the current one.
The anticipated process of µ
Barberis (2000) in his analysis concludes that estimation risk and rebalancing opportunities
interact little.
17
Estimates of Long Term Expected Returns:
Economic vs. Statistical Rationales
Eric Jacquier∗ ,
CIRANO, CIREQ, HEC Montréal, 3000 Cote Sainte-Catherine, Montréal QC H3T 2A7, Canada
This version: July 2004
Rewrite the Optimal Allocation (ignore r0 without loss of generality):
α
b
α∗
≡ 2 ,
w = 2
H
H
σ γ
σ γ(1 + T ) − T
∗
α∗ is the estimate of annualized expected return corrected for estimation risk, by a change of measure.
... in a manner consistent with the investor’s utility.
α∗ =
2H
Rewrite α∗ as α
b − k σ2
T
α
b
1
1+ H
T (1 − γ )
to compare with the other estimators.
2
Unbiased α
b − σ2 H
T
σ2 H
Minimum MSE α
b − 3 2 T Utility α
b − γ αb+ H H
T
γ−1
∗
T
Email: [email protected], Tel.: (514) 340-6194.
1
• Loss function consistent with investor’s utility.
• Can not have α∗ < 0 when µ
b > 0...since α∗ → 0+ when H
T → ∞
Result?
For conventional µ, σ, γ, the estimation risk penalty is closer to the
MMSE for low H/T , and to the unbiased for high H/T .
2
0.08
High mean: mu = 0.08, sig=0.15
Annualized Estimate
0.06
Unbiased
Min MSE
0.04
Gam=2
Gam=4
0.0
0.5
1.0
H/T
1.5
2.0
0.0
Annualized Estimate
0.02
0.04
0.06
Low mean: mu = 0.05, sig=0.15
0.0
Figure 1:
size
0.5
1.0
H/T
1.5
2.0
Annualized optimal estimates vs ratio of horizon over sample
3
7 Conclusion
• Forecasting portfolio values over long horizons when the return parameters are estimated.
• Many address the estimation of the expected portfolio growth rate, no focus on cumulative
compound returns.
• Academic literature nearly exclusively focuses on ML, practitioners on geometric estimators,
rarely on unbiased, never on efficient estimators.
• Derive analytical unbiased and efficient estimators of long term expected returns. They are
Far lower than the MLE,
Lower than geometric average estimator when H > T/3.
The efficient estimator is far lower and far more efficient than the unbiased estimators.
• Derive a utility-consistent estimate of long-term expected returns. It validates the severe
estimation error penalty of the efficient estimator.
18
• Our results compound the recent work of other studies that argue for downward revisions of the
per-period market risk premium (α-r0).
• Long-term asset allocation is contrary to conventional wisdom: longer investment horizons imply
lower allocations to risky assets.
• Results easily adjust for serial correlation and heteroskedasticity in returns. Realistic values of
autocorrelation and heteroskedasticity appear unimportant for the long-horizon forecasts.
• Current work:
1) Study the problem from the Bayesian view point to allow for the optimal incorporation of views
on future returns.
2) Robustness of the optimal allocation result
3) Generalisation to the estimation of σ, and the optimal estimation of entire distributions
(quantiles)
19