Evaluating asset pricing models using the second Hansen

ARTICLE IN PRESS
Journal of Financial Economics 97 (2010) 279–301
Contents lists available at ScienceDirect
Journal of Financial Economics
journal homepage: www.elsevier.com/locate/jfec
Evaluating asset pricing models using the second
Hansen-Jagannathan distance$
Haitao Li a, Yuewu Xu b,, Xiaoyan Zhang c
a
b
c
Stephen M. Ross School of Business, University of Michigan, Ann Arbor, MI 48109, USA
Graduate School of Business, Fordham University, New York, NY 10023, USA
Johnson Graduate School of Management, Cornell University, Ithaca, NY 14850, USA
a r t i c l e in fo
abstract
Article history:
Received 20 December 2007
Received in revised form
16 July 2008
Accepted 31 October 2008
Available online 10 March 2010
We develop a specification test and a sequence of model selection procedures for nonnested, overlapping, and nested models based on the second Hansen-Jagannathan
distance, which requires a good asset pricing model to not only have small pricing
errors but also be arbitrage free. Our methods have reasonably good finite sample
performances and are more powerful than existing ones in detecting misspecified
models with small pricing errors but are not arbitrage-free and in differentiating models
that have similar pricing errors of a given set of test assets. Using the Fama and French
size and book-to-market portfolios, we reach dramatically different conclusions on
model performances based on our approach and existing methods.
& 2010 Elsevier B.V. All rights reserved.
JEL classification:
C51
C52
G12
Keywords:
Stochastic discount factor
Specification test
Model selection
Hansen-Jagannathan distance
Arbitrage
1. Introduction
$
We thank Vikas Agarwal, Andrew Ang, Warren Bailey, Michael
Brennan, Charles Cao, Sris Chatterjee, Jin-Chuan Duan, Jim Hodder,
Kewei Hou, Jingzhi Huang, Jon Ingersoll, Ravi Jagannathan, Raymond
Kan, Andrew Karolyi, Kenneth Kavajecz, Bob Kimmel, Tao Li, Peter C. B.
Phillips, Mark Ready, Marcel Rindisbacher, Cesare Robotti, Tim Simin,
Rene Stulz, Zhenyu Wang, Chu Zhang, Feng Zhao, Guofu Zhou, and
seminar participants at the Chinese University of Hong Kong, Cornell
University, Fordham University, Georgia Institute of Technology, Georgia
State University, Ohio State University, Penn State University, Rutgers
University, Singapore Management University, Syracuse University, the
University of Hong Kong, the University of Toronto, the University of
Virginia, the University of Wisconsin at Madison, Worcester Polytechnic
Institute, the 2005 Western Finance Association Meeting, the 2006 China
International Conference in Finance, and the Asset Pricing Symposium at
Hong Kong University of Science and Technology for helpful comments.
Haitao Li acknowledges financial support from the NTT research
fellowship at the Mitsui Life Finance Center at the University of
Michigan. We are responsible for any remaining errors.
Corresponding author.
E-mail address: [email protected] (Y. Xu).
0304-405X/$ - see front matter & 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.jfineco.2010.03.002
The fundamental theorem of asset pricing, one of
the cornerstones of neoclassical finance, establishes the
equivalence between the absence of arbitrage and
the existence of a positive stochastic discount factor
(SDF) that correctly prices all assets.1 The main purpose of
this paper is to develop asset pricing tests that fully reflect
the implications of the fundamental theorem of asset
pricing. Specifically, we develop a systematic approach for
estimating, testing, and comparing asset pricing models
based on the second Hansen-Jagannathan distance (HJD).
1
See Ross (2005) for a recent review of neoclassical finance. See
Cochrane (2005) for a comprehensive treatment of both theoretical and
empirical asset pricing based on the SDF approach and all the related
references.
ARTICLE IN PRESS
280
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
The first and second HJDs developed by Hansen and
Jagannathan (1991, 1997) measure specification errors
of SDF models by least squares distances between an
SDF model and the set of admissible SDFs that can
correctly price a set of test assets.2 The first HJD considers
the set of all admissible SDFs, which we denote as M.
The second HJD considers only the smaller set of strictly
positive admissible SDFs, which we denote as M þ .
The positivity constraint of the second HJD guarantees
the admissible SDFs to be arbitrage-free and is important
for pricing derivatives associated with the test assets.
Hansen and Jagannathan (1997) show that, while the
first HJD represents the maximum pricing error of a
portfolio of the test assets with a unit norm, the second
HJD represents the minimax bound of the pricing errors
of a portfolio of both the test assets and their
related derivatives with a unit norm. The second HJD
represents a more stringent criterion for evaluating
asset pricing models and is generally larger than the first
HJD.
Using the second HJD in evaluating asset pricing
models has an important advantage, although the existing
empirical literature has mainly focused on the first HJD.3
Conceptually, the second HJD reflects more fully the
implications of the fundamental theorem of asset pricing
than the first HJD. While the first HJD tests only whether
an SDF model can correctly price the test assets, the
second HJD further tests whether the SDF model is strictly
positive. As a result, the second HJD is more powerful than
the first HJD in detecting misspecified SDF models that
can price the test assets but are not strictly positive, a
situation that is especially likely to happen to linear factor
models.
Dybvig and Ingersoll (1982) show that linear asset
pricing models are not arbitrage-free and are not appropriate for pricing derivatives because their SDFs take
negative values in certain states of the world. Although
linear factor models are seldom used directly to price
derivatives, they have been widely used in performance
evaluation of mutual funds and hedge funds. Mutual
funds and hedge funds often employ dynamic trading
strategies that generate option-like payoffs.4 Most hedge
funds directly trade derivatives, and their returns exhibit
option-like features.5 Grinblatt and Titman (1989), Glosten and Jagannathan (1994), Ferson and Khang (2002),
and Ferson, Henry and Kisgen (2006), among others,
emphasize the importance of the positivity constraint for
mutual fund performance evaluation. The fast-growing
2
Following Hansen and Jagannathan (1997), we refer to admissible
models as models that can correctly price the set of test assets.
3
Studies that use the first HJD are Jagannathan and Wang (1996),
Jagannathan, Kubota and Takehara (1998), Campbell and Cochrane
(2000), Lettau and Ludvigson (2001), Hodrick and Zhang (2001), Dittmar
(2002), and Kan and Zhou (2002), among others.
4
See Merton (1981), Dybvig and Ross (1985), and others for more
detailed discussions on this issue.
5
TASS, a hedge fund research company, reports that more than 50%
of the four thousand hedge funds it follows use derivatives. Fung and
Hsieh (2001), Agarwal and Naik (2004), Ben Dor and Jagannathan (2002),
and Mitchell and Pulvino (2001) show option-like features in hedge fund
returns.
hedge fund industry and the need to evaluate hedge fund
performances further highlight the significance of the
second HJD for empirical asset pricing studies.
Even for applications that do not involve derivatives,
using the second HJD for estimating and comparing asset
pricing models can still be beneficial. For example, one is
likely to obtain more robust and reliable parameter
estimates using the second HJD than the first HJD,
especially for linear factor models. The first HJD chooses
the parameters of a linear model to minimize the pricing
errors of the test assets. However, such estimated models
could be far from M þ , because in many cases the
estimated SDF models have to take negative values with
high probabilities to price the test assets. Therefore,
models estimated using the first HJD are likely to overfit
the test assets and to perform poorly out of sample. The
second HJD helps to overcome the overfitting problem
because it chooses parameters to minimize the distance
between an SDF model and M þ . As a result, the second
HJD provides more realistic assessments of model performance and leads to estimated SDF models that are closer
to M þ .
Moreover, the second HJD is more powerful than the
first HJD in distinguishing the relative performances
of models that have small pricing errors of the same
set of test assets. According to Lewellen, Nagel, and
Shanken (LNS, 2010), it is difficult to differentiate models
that have been developed to explain the cross-sectional
returns of the 25 size and book-to-market (BM) portfolios
of Fama and French (1993) using traditional methods,
because these models tend to have small pricing errors for
the test assets by construction. While LNS (2010) suggest
several interesting ways to improve the traditional
methods, the second HJD represents a powerful measure
of relative model performances. SDF models with
similar pricing errors for the Fama and French portfolios
could have very different probabilities in taking negative
values and thus can be differentiated based on the second
HJD. In addition, because the second HJD measures the
distance between an SDF model and M þ , a model with a
smaller second HJD is likely to be a better model. Given
that most asset pricing models are approximations
of reality and likely to be misspecified, it is important
to have a powerful measure such as the second HJD
to compare the relative performances of misspecified
models.
Despite its many advantages, the second HJD has been
rarely used in the existing literature. One main reason is
that econometric analysis of the second HJD is much more
difficult than that of the first HJD. Standard asymptotic
analysis involves a Taylor series approximation of the
second HJD near true parameter values. This procedure,
however, breaks down for the second HJD, which involves
a function that is not differentiable with respect to model
parameters in the traditional sense. As a result, the
standard econometric techniques used in Jagannathan
and Wang (1996) for analyzing the first HJD cannot be
applied to the second HJD.
To fully exploit the theoretical advantages of the
second HJD for empirical asset pricing studies, we develop
a systematic approach for evaluating asset pricing models
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
based on the second HJD and apply the new approach
to empirical applications using the Fama and French 25
size/BM portfolios. By doing so, our paper makes the
following methodological and empirical contributions to
the existing literature.
First, we overcome the nondifferentiability issue of the
second HJD by introducing the concept of differentiability
in quadratic mean of Le Cam (1986), Pollard (1982), and
Pakes and Pollard (1989). In particular, we develop a
second-order stochastic representation of the second HJD
based on the new concept of differentiation under general
conditions (i.e., the SDF model can be either correctly
specified or misspecified). The second-order stochastic
representation forms the foundation of our econometric
analysis of the specification test and model selection
procedures based on the second HJD.
Second, we develop the asymptotic distribution of the
second HJD under the null hypothesis of a correctly
specified model, which has not been formally developed
in the literature. The new asymptotic distribution makes
it possible to conduct specification tests of SDF models
based on the second HJD and to identify misspecified
models that cannot be identified by the first HJD. We also
develop the asymptotic distribution of model parameters,
which provide diagnostic information on potential
sources of model misspecifications.
Third, we develop a sequence of model selection
procedures in the spirit of Vuong (1989) for non-nested,
overlapping, and nested models based on the second HJD.
Given that most asset pricing models are likely to
be misspecified, it is also very important to compare
the degrees of misspecifications of different models. We
compare the relative performances of two models based
on the asymptotic distribution of the difference between
the second HJDs of the two models. One challenging
aspect of the analysis is that the difference could have
either normal or weighted w2 asymptotic distributions
depending on the structures of the two models. Our
model selection procedures represent probably the first
attempt to formally compare the relative performances of
asset pricing models based on the second HJD. Simulation
evidence shows that both the specification test and model
selection tests have reasonably good finite sample
performances for sample sizes typically considered in
the existing literature.
Finally, we demonstrate the advantages of the second
HJD through empirical applications, in which we reach
dramatically different conclusions on model performances based on our approach and existing methods. In
particular, we evaluate several well-known asset pricing
models that have been developed in the literature to
explain the cross-sectional returns of the Fama and French
25 portfolios (the same set of models considered in LNS,
2010). Though certain models appear to have good
performances in pricing the Fama and French portfolios
according to the first HJD, their SDFs take negative values
with high probabilities and are overwhelmingly rejected
by the second HJD. The second HJD is also more powerful
than the first HJD in distinguishing models that have
similar pricing errors of the test assets but are not
arbitrage-free.
281
Our paper extends Hansen, Heaton, and Luttmer (HHL,
1995) and Hansen and Jagannathan (1997) in important
ways. Both papers show that the second HJD follows an
asymptotic normal distribution under the null hypothesis
that a given SDF model is misspecified. Their results,
however, cannot be applied to our setting for at least two
reasons. First, their asymptotic distribution becomes
degenerate under the null hypothesis of a correctly
specified model and therefore cannot be used for
specification test. Second, their results cannot be used
for formal model comparison, because they do not provide
the distribution of the difference between the second
HJDs of two models. One might be tempted to conclude
that the difference between the two second HJDs should
follow a normal distribution as well. Our model selection
tests reveal the full complexities of the issue: the
difference could follow either normal or weighted w2
distributions depending on model structures. Therefore,
while Hansen and Jagannathan (1997) introduce the
second HJD as an important theoretical measure of
specification errors, our model selection procedures make
it possible to formally compare the relative performances
of misspecified models based on the second HJD in
empirical studies.
Our paper is also related to Wang and Zhang (2005),
one of the first papers after Hansen and Jagannathan
(1997) that seriously studies the second HJD. Wang and
Zhang (2005) develop a simulation-based Bayesian
approach for inferences of the second HJD. Based on
Markov Chain Monte Carlo simulation and additional
assumptions on the data-generating process, they are able
to obtain the posterior distribution of the second HJD and
to demonstrate that the second HJD can make big
differences in empirical analysis of asset pricing models.
The Bayesian methodology of Wang and Zhang (2005),
though a nice contribution to the literature, is very
different from traditional methods in the literature, such
as the generalized method of moments (GMM) test of
Hansen (1982) or the Jagannathan and Wang (1996) test.
In contrast, our paper provides a systematic approach for
evaluating asset pricing models based on the second HJD
within the established econometric framework in the
existing literature.
Our model selection procedures differ from that of
Vuong (1989) in several aspects. While Vuong (1989)
compares relative model performance based on the
Kullback and Leibler information criterion, a statistical
measure, we compare model performance based on the
second HJD, an economic criterion. Whereas Vuong
(1989) considers only smooth likelihood functions, we
have to deal with the nondifferentiability issue of the
second HJD.
The rest of this paper is organized as follows. In
Section 2, we discuss the advantages of the second HJD in
evaluating asset pricing models. Section 3 develops a
specification test and a sequence of model selection tests
based on the second HJD. Section 4 provides simulation
evidence on the finite sample performances of the new
asset pricing tests. Section 5 contains the empirical
results. Section 6 concludes, and the Appendix provides
the mathematical proofs.
ARTICLE IN PRESS
282
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
2. Advantages of the second HJD in evaluating asset
pricing models
In this section, we first provide a brief introduction to
the two HJDs. Then we discuss the advantages of using the
second HJD as a criterion for evaluating asset pricing
models.6
Let the uncertainty of the economy be described by a
filtered probability space ðO,F ,P,ðF t Þt Z 0 Þ for t = 0,1,y,T.
Suppose we use n test assets with payoffs Yt (an n 1
vector) at t in an empirical analysis of asset pricing
models. Denote Y , a subspace of L2, as the payoff space of
all the test assets. In the absence of arbitrage, there must
exist a strictly positive SDF that correctly prices all the
test assets. That is, for all t, we have
E½mt þ 1 Yt þ 1 jF t ¼ Xt , mt þ 1 40, 8Yt þ 1 2 Y ,
ð1Þ
where Xt, an n 1 vector, represents the prices of the n
assets at t. The random variable mt + 1 discounts payoffs at
t + 1 state by state to yield price at t and hence is called a
stochastic discount factor. If the market is complete, then
mt + 1 is unique. Otherwise multiple mt + 1 satisfy Eq. (1).
Without loss of generality, for the rest of the paper, we
focus our discussions on the unconditional version of the
above pricing equation.7 That is,
E½mt þ 1 Yt þ 1 ¼ E½Xt , mt þ 1 40, 8Yt þ 1 2 Y :
ð2Þ
In an important paper, Hansen and Jagannathan (1997)
develop two measures of specification errors of SDF
models. The first HJD, denoted as d, measures the least
squares distance or the L2-norm between a candidate SDF
model H and M. That is,
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
d ¼ minJHmJ ¼ min EðHmÞ2 ,
ð3Þ
m2M
m2M
where M ¼ fmt þ 1 : E½mt þ 1 Yt þ 1 ¼ E½Xt for 8Yt þ 1 2 Y g denotes the set of all admissible SDFs. The second HJD,
þ
denoted as d , is defined as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
d þ ¼ min JHmJ ¼ min EðHmÞ2 ,
ð4Þ
m2M þ
m2M þ
where M þ ¼fmt þ 1 : E½mt þ 1 Yt þ 1 ¼E½Xt ,mt þ 140,8Yt þ 1 2 Y g
denotes the set of positive SDFs. In general, the second
HJD is bigger than the first one, because M þ is a subset of
M. Often the SDF model H depends on some unknown
parameters g, and the two distances are defined as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dðgÞ ¼ min minJHðgÞmJ ¼ min min EðHðgÞmÞ2
ð5Þ
g
m2M
g
m2M
and
d þ ðgÞ ¼ min min JHðgÞmJ ¼ min min
g m2M þ
g m2M þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
EðHðgÞmÞ2 :
ð6Þ
6
The discussions in this section draw materials from Cochrane
(2005), Dybvig and Ingersoll (1982), Hansen and Jagannathan (1997),
and Wang and Zhang (2005).
7
If we include enough scaled payoffs in our analysis, the unconditional pricing equation becomes the conditional pricing equation. For
notational convenience, we omit time subscripts t whenever the
meaning is obvious.
In addition to interpretations as least squares distances,
the two HJDs have interpretations as pricing errors. Hansen
and Jagannathan (1997) show that the first HJD has an
interpretation as the maximum pricing error of a portfolio
of the test assets with a unit norm, i.e.,
d ¼ max jEðmYÞEðHYÞj, 8Y 2 Y :
JYJ ¼ 1
ð7Þ
Hansen and Jagannathan (1997) also show that the second
HJD has an interpretation as the minimax bound on pricing
errors of any payoff in L2 with a unit norm, i.e.,
d þ ¼ min max jEðmYÞEðHYÞj, 8Y 2 L2 :
m2M þ JYJ ¼ 1
ð8Þ
þ
While d considers only pricing errors of the test assets, d
considers pricing errors of both the test assets and payoffs
that are not in Y but in L2, which are derivatives that can be
constructed from trading the test assets. Therefore, the
positivity constraint on m in the definition of the second
HJD is closely related to derivatives pricing: Only a strictly
positive m is arbitrage-free and can price both the test
assets and their related derivatives.
Fig. 1 provides a graphical illustration of the
differences between the two HJDs in a one-period, twostate economy. The horizontal (vertical) axis represents
payoffs when state one (two) occurs at the end of the
period. The straight line going through the origin
represents the payoff space of the test assets Y . Let Y*
represent the SDF that is in the payoff space Y and can
correctly price all the test assets.8 Then M is represented
by the straight (dot-dash) line that intersects with Y at Y*
and is orthogonal to Y (the thin line), and M þ is
represented by the segment of M that is in the first
quadrant (the thick line). Suppose HðgÞ is an SDF model
that we want to evaluate. The dotted line represents dðgÞ,
the shortest distance between HðgÞ and M, the dashed
þ
line represents d ðgÞ, the shortest distance between HðgÞ
9
and M þ . One can reach very different conclusions on
model performances and obtain very different parameter
estimates using the two HJDs. While the first HJD chooses
g to minimize dðgÞ, the second HJD chooses g to minimize
d þ ðgÞ. Therefore, certain models might have large second
HJDs, even though they have small first HJDs.
The second HJD, as a criterion for evaluating asset
pricing models, reflects more fully the implications of the
fundamental theorem of asset pricing and therefore is
more powerful in detecting misspecified models than the
first HJD. While the first HJD requires only that m
correctly prices all the test assets, the second HJD imposes
the additional restriction of m 40. As a result, the second
HJD can detect misspecified SDF models that can price the
test assets but are not strictly positive, a situation that is
especially likely to happen to linear factor models.
The second HJD also leads to more reliable and
economically more meaningful parameter estimates than
the first HJD, especially for linear factor models. Consider
8
By the law of one price, Y* always exists, and Y þ e 2 M, 8 e ? Y .
Rigorously speaking, the solution to minm2M þ JHmJ does not
exist in our simple example, because M þ is not closed. To avoid this
technical issue, we can re-define M þ as the set of non-negative
admissible SDFs following Hansen and Jagannathan (1997).
9
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
283
S=2
H(γ)
δ(γ)
Payoff space of primary assets Y
δ
Y*
+(γ)
M+: Set of positive admissible SDFs
S=1
M: Set of admissible SDFs
Fig. 1. The first and second Hansen-Jagannathan distances in a one-period, two-state economy. This graph illustrates the difference between the first and
second HJDs in a one-period, two-state (S = 1 or 2) economy. The stochastic discount factor is denoted by HðgÞ. The dot-dash line represents the set M of
admissible stochastic discount factors (SDFs) that can correctly price the primary assets. The Solid thick line segment represents the set M þ of admissible
positive SDFs that can correctly price the primary assets. The dotted line segment represents the first HJD dðgÞ, while the dash line segment represents the
þ
second HJD d ðgÞ. Y* represents the SDF that is in the payoff space Y and can correctly price all test assets.
a linear factor model HðgÞ in Fig. 1. If we choose g to
minimize the first HJD, the estimated model Hðg^ Þ could be
far away from M þ even though the estimated pricing
errors for the test assets are small. This is because in many
cases the small pricing errors are obtained at the expense
of the estimated model Hðg^ Þ taking negative values with
high probabilities. As a result, the estimated model tends
to overfit the test assets and is likely to perform poorly
out of sample. In contrast, the second HJD mitigates the
overfitting problem by choosing g such that HðgÞ is as
close as possible to M þ . Therefore, although it could
appear that the second HJD unfairly punishes SDF models
that can price the test assets well but are not strictly
positive, the second HJD provides more realistic assessments of the performances of such models and leads to
estimated SDF models that are closer to M þ .
The second HJD is a powerful measure of relative
performances of misspecified SDF models and helps
discriminate models that cannot be distinguished by the
first HJD. For example, many models have been proposed
in the literature to explain the cross-sectional returns of
the Fama and French size/BM portfolios. LNS (2010) point
out that because these models tend to have small pricing
errors for the test assets by construction, it is very difficult
to differentiate them using traditional methods, which
mainly focus on pricing errors for model evaluation.
Because the second HJD measures the distance between
an SDF model and M þ , models with similar pricing errors
for the Fama and French portfolios could have very
different probabilities of taking negative values and thus
can be differentiated based on the second HJD. Given that
most linear factor models are misspecified, the ability to
compare relative model performances using the second
HJD is one of its important advantages for empirical asset
pricing studies.
3. Asset pricing tests based on the second HJD
In this section, we first develop a second-order
asymptotic representation of the second HJD, which forms
the foundation of our econometric analysis of the second
HJD. Then we develop the asymptotic distribution of the
second HJD under the null hypothesis of a correctly
specified model, which can be used for specification tests
of SDF models. Finally, we develop a sequence of model
selection procedures in the spirit of Vuong (1989), which
can compare the relative performances of SDF models
based on the second HJD regardless whether the models
are correctly specified or not.
3.1. Nondifferentiability and asymptotic representation
of the second HJD
One technical difficulty we face in econometric
analysis of the second HJD is that the second HJD involves
a nonsmooth function that is not pointwise differentiable.
We overcome this difficulty by introducing the concept of
differentiability in quadratic mean which has been used in
statistics and econometrics literature in dealing with
nonsmooth objective functions. Based on the new concept
of differentiation, we develop an asymptotic representation of the second HJD.
Following Hansen and Jagannathan (1997), our econometric analysis of the second HJD focuses on the conjugate
representation of the minimization problem in Eq. (4):
þ
0
0
½d 2 ¼ maxfEH2 E½Hl Y þ 2 2l EXg,
l
ð9Þ
where l is an n 1 vector of Lagrangian multipliers of the
constraint that m has to be admissible in Eq. (4), and
0
0
½Hl Y þ ¼ max½0,Hl Y.10 The first-order condition of
the above optimization problem is given as
EXE½ðHl0 YÞ þ Y ¼ 0:
ð10Þ
0
½Hl0 Y þ
2 M þ . That is,
Suppose l0 solves Eq. (10), then
0
H½Hl0 Y þ is the necessary adjustment of H so that it
10
For more detailed discussions on the conjugate representation of
0
the second HJD, see Hansen and Jagannathan (1997). Replacing ½Hl Y þ
0
by ½Hl Y in Eq. (9), then Eq. (9) becomes the definition of ½d2 , and our
entire analysis for the second HJD can be easily adapted to the first HJD.
ARTICLE IN PRESS
284
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
becomes a member of M þ .11 For SDF models that depend
on unknown model parameters g, the second HJD is
defined as
þ
½d 2 ¼ min maxEfðyÞ,
g
ð11Þ
l
where y ¼ ðg,lÞ and fðyÞ HðgÞ2 ½HðgÞl Y þ 2 2l X.
Suppose we have the following time series observations of asset prices, payoffs, and an SDF model,
fðXt1 ,Yt, Ht ðgÞÞ : t ¼ 1,2, . . . ,Tg, where g is a k 1 vector of
model parameters. Following Hansen and Jagannathan
(1997), we use the empirical counterpart of EfðyÞ ¼
Efðg,lÞ,
0
ET fðyÞ ¼
T
1X
0
0
fHt ðgÞ2 ½Ht ðgÞl Yt þ 2 2l Xt1 g,
Tt¼1
0
ð12Þ
in our econometric analysis of the second HJD, where
P
ET ½x ¼ ð1=TÞ Tt ¼ 1 xt .12 Therefore, the main objective of
our asymptotic analysis is to characterize as T-1 the
behavior of
þ
½dT 2 ¼ min maxET fðyÞ:
g
ð13Þ
l
The standard approach for asymptotic analysis of
ET fðyÞ is to employ a pointwise quadratic Taylor
expansion of the function fðyÞ with respect to y around
true model parameters y0 , where y0 ¼ ðg0 ,l0 Þ arg ming maxl Efðg,lÞ. That is,
fðyÞ ¼ fðy0 Þ þ
@fðy0 Þ
0
@y
ðyy0 Þ þ 12ðyy0 Þ0 @2 fðy0 Þ
2
0 ðyy0 Þ þ oðJyy0 J Þ,
@y@y
ð14Þ
and then optimize the resulting quadratic representation
with respect to y:
ET fðyÞ ¼ ET fðy0 Þ þ ET
@fðy0 Þ
0
@y
@2 fðy0 Þ
0
@y@y
2
ðyy0 Þ þ op ðJyy0 J Þ:
ðyy0 Þ þ 12ðyy0 Þ0 ET
ð15Þ
However, standard Taylor expansion breaks down in
our case because the function fðyÞ is not pointwise
differentiable. To better illustrate this point, observe that
fðyÞ can be written as
fðyÞ HðgÞ2 gðHðgÞl0 YÞ2l0 X,
2
ð16Þ
þ 2
where gðxÞ ¼ ½maxðx,0Þ ½x . Observe that g(x) is firstorder differentiable everywhere with first-order derivative
(
2x if x Z 0,
ð17Þ
g ð1Þ ðxÞ ¼ 2½x þ ¼
0 if x o 0:
However, g(x) does not have a second-order derivative at
x=0, i.e., g(1) is no longer differentiable everywhere. The
second-order derivative of g(x) equals
8
if x 40,
>
<2
ð2Þ
ð18Þ
g ðxÞ ¼ not exist if x ¼ 0,
>
:
0
x o 0:
þ
0
Therefore, for dT , the function ½Ht ðgÞl Yt þ is not
pointwise differentiable with respect to ðg,lÞ for all Ht ðgÞ
and Yt.13 That is, for a given ðg,lÞ, there are combinations
0
of Ht ðgÞ and Yt such that Ht ðgÞl Yt ¼ 0, which is the kink
0
point of ½Ht ðgÞl Yt þ . As a result, the derivatives of fðg,lÞ
with respect to ðg,lÞ are not always well defined for those
Ht ðgÞ and Yt.
The key to overcome this difficulty is that pointwise
differentiability is not a necessary condition to obtain
Eq. (15). All we need is a good approximation to ET fðyÞ
[but not fðyÞ itself] around the true parameter value y0 .
To this end, the notion of differentiability in quadratic
mean in modern statistics (cf. Le Cam, 1986) plays an
important role.14 In contrast to pointwise differentiability,
which implies a good approximation to fðyÞ for all Ht and
Yt, differentiability in quadratic mean implies that the
error of approximating ET fðyÞ is negligible in quadratic
mean or L2(P) norm. In other words, all we need is an
approximation of fðyÞ that works well in an average
sense. For further discussions of nondifferentiability
issues, see Pollard (1982) and Pakes and Pollard (1989),
among others.
Our approach can be briefly described as follows and is
along the lines of Pollard (1982). First, we decompose
ET fðyÞ into a deterministic component and a (centered)
random component
ET fðyÞ ¼ EfðyÞ þðET EÞfðyÞ,
ð19Þ
where ðET EÞ½ ¼ ET ½E½. To obtain a quadratic representation such as Eq. (15), we consider a second-order
approximation to the deterministic term EfðyÞ and a firstorder approximation to the random component
ðET EÞfðyÞ. We consider a lower-order approximation of
the random component, because it is centered and thus in
general one order smaller than the deterministic component. The above analysis leads to Lemma 1, which justifies
a local asymptotic quadratic (LAQ) representation of
ET fðyÞ.
Lemma 1. Suppose Assumptions A.1 to A.7 in the Appendix
hold. Then the following LAQ representation holds for ET fðyÞ
around y0 :
ET fðyÞ ¼ Efðy0 Þ þ ðET EÞfðy0 Þ þ A0 ðyy0 Þ
þ 12ðyy0 Þ0 Gðyy0 Þ þ oðJyy0 J2 Þ þ op ðJyy0 JT 1=2 Þ,
ð20Þ
0
0
0
11
Replacing ½Hl Y þ by ½Hl Y, the adjustment term H½Hl Y þ
0
0
simplifies to l0 Y. That is, the random variable l0 Y represents the
0
necessary adjustment of H so that it is a member of M. Alternatively, l0 Y
can be used to discount future payoffs state by state to yield current
0
pricing errors of Y : E½ðl0 YÞY ¼ E½ðHmÞY, where m 2 M. Therefore,
0
while d measures average deviations of H from M, l0 Y measures H’s
deviations from M in different states of the economy.
12
For notational convenience, we suppress the dependence of fðyÞ
on t.
13
Let o be a random variable. A function f ðy,oÞ is pointwise
differentiable with respect to y if the function has partial derivatives
with respect to y in the classical sense for all possible values of o.
14
A function f ðy,oÞ is differentiable in quadratic mean with respect
to y at y0 if there exists a DðoÞ in L2 such that E½ðf ðy,oÞ
f ðy0 ,oÞÞ=ðyy0 ÞDðoÞ2 -0 as y-y0 . Similar ideas have been used by
Pakes and Pollard (1989) and others to handle nondifferentiable criteria
functions.
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
where A ðET EÞ@fðy0 Þ=@y and
1
0
@2 fðy0 Þ
@2 fðy0 Þ
E
BE
0 C
@g@g0
@g@l C
@2 fðy0 Þ B
C¼
B
GE
0 ¼B
2
2
C
@y@y
@ E @ fðy0 Þ E @ fðy0 Þ A
0
0
@l@g
@l@l
G11
G21
!
G12
:
G22
Proof. See the Appendix.
We emphasize that the expectation of the second
derivative is always well defined, even though the second
derivative is well defined except on a set of zero
probability. The first-order derivative for the deterministic term does not appear in Eq. (20), i.e., E@fðy0 Þ=@y ¼ 0,
because y0 ¼ ðg0 ,l0 Þ solves the population optimization
problem ming maxl EfðyÞ. Compared with traditional
Taylor expansions, the key point here is to justify that
we can still use the second derivatives of f (which are not
defined everywhere) to obtain an approximation to EfðyÞ.
Based on the LAQ representation of ET fðyÞ in Lemma 1,
we can solve for the minimax problem by first maximizing
the objective function over l for a given value of g, then
minimizing the objective function over g. As a result, we
obtain the following asymptotic representation of the
second HJD.
Lemma 2. Suppose Assumptions A.1 to A.9 in the Appendix
hold. Then we have the following second-order asymptotic
representation of the second HJD at estimated model
parameter y^ :
þ
½d^ T 2 ¼ min maxET fðyÞ ¼ Efðy0 Þ þðET EÞfðy0 Þ
g
l
12A0 G1 Aþ op ðT 1 Þ:
ð21Þ
Moreover, the optimizer y^ ¼ ðg^ ,l^ Þ arg ming maxl ET fðyÞ
equals
y^ ¼ y0 G1 A þop ðT 1=2 Þ,
ð22Þ
where A and G are defined in Lemma 1.
Proof. See the Appendix.
Therefore, the technique of differentiation in quadratic
mean overcomes the nondifferentiability issue of the
second HJD and makes asymptotic analysis of the second
HJD feasible. In particular, our analyses on specification
test and model selection procedures in later sections are
all based on the second-order asymptotic representation
of the second HJD in Lemma 2.
3.2. Specification test based on the second HJD
þ
In this subsection, based on the representation of d^ T in
Lemma 2, we develop the asymptotic distributions of the
second HJD and parameter estimate y^ under the null
hypothesis of a correctly specified model. We first discuss
the intuition behind our asymptotic analysis and then
present the formal results in Theorem 1 and Proposition 1.
We also discuss the relations between our results and
those in the existing literature.
þ
Lemma 2 suggests that the behavior of d^ T is
determined by three terms: Efðy0 Þ, ðET EÞfðy0 Þ, and the
285
quadratic form 12 A0 G1 A. Under the null hypothesis H0 :
d þ ¼ 0, Efðy0 Þ ¼ 0 by definition. Because fðy0 Þ is
non-negative, we must have fðy0 Þ ¼ 0 almost everywhere
to have Efðy0 Þ ¼ 0. As a result, we must have
þ
ðET EÞfðy0 Þ ¼ 0 as well. Therefore, under H0 : d ¼ 0,
þ
the asymptotic behavior of d^ T is mainly determined by
the quadratic form. From Assumption A.9, we have
pffiffiffi
pffiffiffi
@fðy0 Þ
T A ¼ T ðET EÞ
*Nð0,LÞ,
@y
ð23Þ
where * means convergence in distribution and
þ
L ¼ E½@fðy0 Þ=@y @fðy0 Þ=@y0 . Therefore, T½d^ T 2 should
2
follow a weighted w distribution.
Theorem 1 (Specification test). Suppose Assumptions A.1
þ
þ
to A.9 in the Appendix hold. Then under H0 : d ¼ 0, T½d^ T 2
2
has an asymptotic weighted w distribution, and the weights
are the eigenvalues of the matrix
1
1
1
G12 G1
12½G1
22 G22 G21 ½G12 G22 G21 22 Ll ,
ð24Þ
where G12 , G21 , and G22 are defined in Lemma 1 and
Ll ¼ E½@fðy0 Þ=@l @fðy0 Þ=@l0 .
Proof. See the Appendix.
We can use Theorem 1 to conduct specification tests of
SDF models based on the second HJD. Next we develop the
asymptotic distribution of y^ ¼ ðg^ ,l^ Þ, which contains
important diagnostic information on potential sources of
model misspecifications.
Proposition 1 (Parameter estimation). Suppose Assumptions A.1 to A.9 in the Appendix hold. Then the estimator
of model parameters, g^ , has the following asymptotic
distribution:
pffiffiffi
T ðg^ g0 Þ*Nð0,ðJ11 J12 ÞLðJ11 J12 Þ0 Þ;
ð25Þ
and the estimator of Lagrangian multipliers, l^ , has the
asymptotic distribution
pffiffiffi
T ðl^ l0 Þ*Nð0,ðJ21 J22 ÞLðJ21 J22 Þ0 Þ,
ð26Þ
where
J11
J12
J21
J22
!
¼
G11
G21
G12
G22
!1
and Gij ’s (i,j = 1,2) are defined in Lemma 1.
Proof. See the Appendix.15
For a linear SDF model, we can examine the importance of
a specific factor by testing whether the coefficient of the
factor is significantly different from zero using Proposition 1.
The Lagrangian multipliers suggest directions for model
improvements: If the multiplier of one particular asset is
very large, then the SDF model has to be significantly
modified to correctly price this particular asset.
The specification test in Theorem 1 differs from that of
Jagannathan and Wang (1996), which is based on the first
HJD, in several important ways. First, while Jagannathan
þ
15
Proposition 1 can be greatly simplified under H0 : d ¼ 0, even
though it holds whether the model is correctly specified or not. The
limiting normal distribution could be degenerate.
ARTICLE IN PRESS
286
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
and Wang (1996) estimate model parameters by
minimizing the first HJD, we estimate model parameters
by minimizing the second HJD. Except for the rare case in
þ
which d ¼ 0, the two estimated HJDs and their
associated parameters are generally different from each
other. Second, the two methods have different powers in
rejecting misspecified models with small first HJDs but
large second HJDs. A typical example of such models is
linear SDF models that have to take negative values with
significant probabilities to fit the test assets. While such
models could appear acceptable based on the first HJD,
they could be rejected by the second HJD because they
violate the positivity constraint. Third, while Jagannathan
and Wang (1996) consider only linear factor models, our
result is applicable to nonlinear SDF models in which HðgÞ
depends on g nonlinearly. These include asset pricing
models in which investors have complicated utility
functions. Finally, the econometric techniques used in
our analysis are different from that of Jagannathan and
Wang (1996) and can be useful in other finance applications that involve nondifferentiable objective functions.
Our results in this section are closely related to that of
HHL (1995) and Hansen and Jagannathan (1997). Both
papers show that the first and second HJDs follow
asymptotic normal distributions for misspecified
models.16 These distributions become degenerate when
the models are correctly specified (i.e., the first or second
HJD is identically zero).17 Hansen and Jagannathan (1997,
p. 576) argue that the first HJD equals zero only when an
SDF model correctly prices all the test assets and one can
test this condition via GMM using the pricing errors of the
test assets as moment conditions. This argument implies
that, even if we were not aware of the asymptotic
distribution of d for correctly specified models, we can
still test H0 : d ¼ 0 using GMM. It might be tempting to
þ
conclude that a similar result could hold for d as well.
However, the second HJD equals zero only when the SDF
model can correctly price all the test assets and their
þ
related derivatives. As a result, to test H0 : d ¼ 0 using
the GMM approach, we need data on all derivatives that
can be constructed from trading the test assets. Such data
could impossible or at least very difficult to obtain.
Therefore, the asymptotic distribution of the second HJD
þ
in Theorem 1 is essential for testing H0 : d ¼ 0.
misspecified by definition because their SDFs can take
negative values with positive probabilities. Therefore,
another important issue is how to compare the relative
performances of potentially misspecified SDF models
using the second HJD. In this subsection, building on the
same technique in developing the specification test, we
develop a sequence of model selection procedures in the
spirit of Vuong (1989). Similar to that of Vuong (1989),
our procedures apply to situations in which both, only
one, or neither of the competing models could be correctly
specified.
Consider two competing models F and G. We are
interested in the following hypotheses:
H0 : F and G are equally good, i.e., dFþ ¼ dGþ ;
HF : F is better than G, i.e., dFþ o dGþ ; and
HG : F is worse than G, i.e., dFþ 4 dGþ .
In empirical studies, model comparison is conducted
using empirical estimates of the second HJDs. Hence, we
propose to test the above hypotheses using the following
test statistic:
þ
16
þ
We can easily obtain this result from Lemma 2. Under H0 : d a0,
þ
we know that Efðy0 Þa0, ðET EÞfðy0 Þa0, and the behavior of d^ T is
mainly determined by ðET EÞfðy0 Þ instead of the quadratic term. The
second HJD has an asymptotic normal distribution centered at Efðy0 Þ,
because by central limit theorem, ðET EÞfðy0 Þ converges to a normal
distribution as T-1.
17
The approach of HHL (1995) and Hansen and Jagannathan (1997)
is equivalent to a first-order Taylor approximation of the second HJD in
our setting. However, a second-order approximation is needed to
þ
develop the asymptotic distribution of the second HJD under H0 : d ¼ 0.
ð27Þ
þ
where for convenience we denote d^ F ¼
þ
d^ G ¼ dTþ ðy^ G Þ, y^ F ¼ arg mingF maxlF ET fðyF Þ, and y^ G ¼
We
also
define
y0F ¼
arg mingG maxlG ET fðyG Þ.
arg mingF maxlF EfðyF Þ and y0G ¼ arg mingG maxlG EfðyG Þ
dTþ ðy^ F Þ,
as the pseudo-true parameters for models F and G,
respectively. Following the terminology of Vuong (1989),
we say that the two models are observationally equivalent if
fðy0F Þ ¼ fðy0G Þ with probability one. For all practical
purposes, observationally equivalent models are not distinguishable using any test statistics that are functions of f.
The key to our analysis is to obtain the asymptotic
representation of the difference between the second HJDs
of the two models. From Lemma 2, we know that
þ
½d^ F 2 ¼ minmaxET fðyF Þ ¼ Efðy0F Þ
gF
lF
1
Þ
þ ðET EÞfðy0F Þ12A0F G1
F AF þop ðT
ð28Þ
and
þ
½d^ G 2 ¼ minmaxET fðyG Þ
gG
lG
1
Þ,
¼ Efðy0G Þ þ ðET EÞfðy0G Þ12A0G G1
G AG þop ðT
3.3. Model selection procedures based on the second HJD
While the specification test can tell whether a given
asset pricing model is correctly specified, most SDF
models in the existing literature are probably misspecified. In fact, one could argue that linear factor models are
þ
½d^ F 2 ½d^ G 2 ,
ð29Þ
where AF ðET EÞ@fðy0F Þ=@yF , AG ðET EÞ@fðy0G Þ=@yG ,
GF E@2 fðy0F Þ=@yF @y0F , and GG E@2 fðy0G Þ=@yG @y0G . Consequently, we have the following asymptotic representation of the test statistic for model selection:
þ
þ
½d^ F 2 ½d^ G 2 ¼ Efðy0F ÞEfðy0G Þ þðET EÞ½fðy0F Þ
1
1
1 0
fðy0G Þ12 A0F G1
Þ:
F AF þ 2AG GG AG þ op ðT
ð30Þ
One interesting as well as challenging aspect of our
þ
þ
analysis is that ½d^ 2 ½d^ 2 exhibits different asymptotic
F
G
distributions depending on whether the two models
are observationally equivalent. If fðy0F Þ ¼ fðy0G Þ with
þ
þ
probability one, then under H0 : dF ¼ dG , we have
Efðy0F ÞEfðy0G Þ ¼ 0 and ðET EÞ½fðy0F Þfðy0G Þ ¼ 0. As a
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
þ
þ
result, the asymptotic distribution of ½d^ F 2 ½d^ G 2 is
determined by the difference between the two quadratic
forms and follows a weighted w2 distribution. However, if
fðy0F Þafðy0G Þ with positive probability, then under H0 :
dFþ ¼ dGþ , though Efðy0F ÞEfðy0G Þ ¼ 0, ðET EÞ½fðy0F Þ
fðy0G Þ does not vanish. And the asymptotic distribution
þ
þ
of ½d^ 2 ½d^ 2 is determined by ðE EÞ½fðy Þfðy Þ,
F
T
G
0F
0G
which follows a normal distribution.
To address this issue, we develop the asymptotic
þ
þ
distributions of ½d^ F 2 ½d^ G 2 for three different types of
model structures of F and G. Specifically, following
the terminology of Vuong (1989), we consider strictly
non-nested, overlapping, and nested models. By
strictly non-nested models, we mean that F yF \ GyG is an
empty set, where F yF and GyG represent the entire families
of models that we can obtain by considering all possible
values of yF and yG in their parameter spaces, respectively. The definition implies that two non-nested models
can never be observationally equivalent. Two models F yF
and GyG are called overlapping if, and only if, F yF \ GyG is
not empty and F yF JGyG and GyG JF yF . Model F yF is said
to be nested by model GyG if, and only if, F yF DGyG .
Overlapping and nested models can be observationally
equivalent for certain parameter values.
We first consider the cases of strictly non-nested and
nested models, for which we know unambiguously
whether the term ðET EÞ½fðy0F Þfðy0G Þ vanishes under
H0 : dFþ ¼ dGþ . Then we consider the more difficult case of
overlapping models, for which we do not know unambiguously whether the term ðET EÞ½fðy0F Þfðy0G Þ
þ
þ
vanishes under H0 : dF ¼ dG .
Because strictly non-nested models can never be
observationally equivalent, the term ðET EÞ½fðy0F Þ
fðy0G Þ never vanishes and is the dominating term in
þ
þ
½d^ F 2 ½d^ G 2 . As a result, we obtain the following test for
comparing strictly non-nested models based on the
second HJD.
appropriate critical values of the standard normal distribution.
Next we consider nested models. Without loss of
generality we assume that model F is nested by model G.
Because the bigger model is always at least as good as
þ
þ
þ
þ
the smaller model (i.e., dF Z dG ), under H0 : dF ¼ dG ,
we must have fðy0F Þ ¼ fðy0G Þ with probability one. That
is, the two models must be observationally equivalent
þ
þ
under H0 : dF ¼ dG , which in turn implies that
þ
þ
ðE EÞ½fðy Þfðy Þ ¼ 0. As a result, ½d^ 2 ½d^ 2 is
T
pffiffiffi þ 2
þ
T ð½d^ F ½d^ G 2 Þ=s^ T *Nð0,1Þ;
p
ffiffiffi
þ
þ
þ
þ
under HF : dF o dG , T ð½d^ F 2 ½d^ G 2 Þ=s^ T -1; and
p
ffiffiffi
þ
þ
þ
þ
under HG : dF 4 dG , T ð½d^ F 2 ½d^ G 2 Þ=s^ T -þ 1,
0F
0G
F
G
asymptotically determined by the difference between
the two quadratic forms in Eq. (30). From Assumption
A.10, we have
1
0
@fðy0F Þ
!
C
B
pffiffiffi
pffiffiffi AF
B @yF C
T
¼ T ðET EÞB
ð32Þ
C*Nð0,CÞ,
AG
@ @fðy0G Þ A
@yG
where
C¼
LF
LGF
LFG
LG
!
10
10
@fðy0F Þ
@fðy0F Þ
B @yF CB @yF C
CB
C
B
¼ EB
CB
C:
@ @fðy0G Þ A@ @fðy0G Þ A
@yG
@yG
0
þ
þ
Then the asymptotic distribution of ½d^ F 2 ½d^ G 2 should
follow a weighted w2 distribution as given in Theorem 3.
Theorem 3 (Model selection for nested models). Suppose
model F is nested by model G and Assumptions A.1 to A.10 in
the Appendix hold. Then
þ
þ
under H0 : dF ¼ dG , Tð½d^ F 2 ½d^ G 2 Þ has an asymptotic
weighted w2 distribution, and the weights are the
eigenvalues of the following matrix:
0
1
1
1
1 @ GF LF GF LFG A
;
2 G1
G1
G LGF
G LG
þ
þ
and
þ
þ
þ
þ
under HG : dF 4 dG , Tð½d^ F 2 ½d^ G 2 Þ- þ1.
Theorem 2 (Model selection for non-nested models). Suppose models F and G are strictly non-nested and Assumptions A.1 to A.10 in the Appendix hold. Then
þ
287
Proof. See the Appendix.
þ
under H0 : dF ¼ dG ,
where ½s^ T 2 is an estimator of Var½fðy0F Þfðy0G Þ and
equals
½s^ T 2 ¼ ET ½fðy^ F Þfðy^ G Þ2 ðET ½fðy^ F Þfðy^ G ÞÞ2 :
ð31Þ
Proof. See the Appendix.
Theorem 2 allows us to compare two non-nested SDF
models based on their second HJDs. The implementation
of Theorem 2 involves several steps. First, we solve the
optimization problem in Eq. (13) for F and G to obtain
y^ F ¼ ðg^ F ,l^ F Þ and y^ G ¼ ðg^ G ,l^ G Þ. Then we compute s^ T to
pffiffiffi þ
þ
form the statistic T ð½d^ 2 ½d^ 2 Þ=s^ . Finally, we make
F
G
T
inferences about the three hypotheses based on the
The case for overlapping models is much more
complicated, because the term ðET EÞ½fðy0F Þfðy0G Þ
þ
þ
might or might not vanish under H0 : dF ¼ dG . The two
models could be observationally equivalent for certain
þ
þ
parameter values under H0 : dF ¼ dG . In such cases, the
term ðET EÞ½fðy0F Þfðy0G Þ ¼ 0, and the test statistic
þ
þ
½d^ 2 ½d^ 2 is determined by the two quadratic forms
F
G
and follows a weighted w2 distribution. There is also the
possibility that the two models are not observationally
þ
þ
equivalent under H0 : dF ¼ dG . In such cases, the term
and
the
test
statistic
ðET EÞ½fðy0F Þfðy0G Þa0,
þ
þ
2
2
^
^
½d ½d follows a normal distribution. Due to this
F
G
additional complexity, we first need to test the possibility
that the two models are observationally equivalent before
we can decide which asymptotic distribution to use to test
þ
þ
the null hypothesis H0 : dF ¼ dG . Following similar
arguments in Theorem 3, we can show that if,
ARTICLE IN PRESS
288
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
þ
þ
fðy0F Þ ¼ fðy0G Þ with probability one, ½d^ F 2 ½d^ G 2 should
2
follow a weighted w distribution as given below.
Theorem 4 (Model selection for overlapping models). Suppose models F and G are overlapping models and Assumptions A.1 and A.10 in the Appendix hold. Then
under H0 : fðyF 0 Þ ¼ fðyG0 Þ with probability one,
þ
þ
Tð½d^ 2 ½d^ 2 Þ has an asymptotic weighted w2 distribuF
G
tion and the weights are the eigenvalues of the matrix
0
1
1
1
1 @ GF LF GF LFG A
;
2 G1
G1
G LGF
G LG
and
under HA : fðyF 0 ÞafðyG0 Þ with positive probability,
þ
þ
Tð½d^ F 2 ½d^ G 2 Þ-1 ðeither þ 1 or 1Þ.
Proof. See the Appendix.
In summary, the comparison of overlapping models
should be done according to the following procedures.
difference of the HJDs between two models.20 Even
though one can extend their analyses to study the
difference of the HJDs between the two models, such an
extension can cover only the case of strictly non-nested
models. Only based on the second-order Taylor approximation of the HJDs can we develop the model selection
tests for nested and overlapping models.21 Therefore, our
model selection procedures fill an important gap in the
literature by providing a systematic approach for comparing potentially misspecified SDF models based on the
second HJD.
4. Finite sample performances of asset pricing tests
In this section, we provide simulation evidence on the
finite sample performances of both the specification test
and model selection tests. We first discuss the simulation
designs and then report the simulation results. Overall,
we find that our tests have reasonably good finite sample
performances for sample sizes typically considered in the
literature.
4.1. Simulation designs
Test whether the two models are observationally
equivalent using Theorem 4,18
If the test fails to reject H0 , then the two models are
indistinguishable given the test assets.
þ
þ
If the test rejects H0 , then test H0 : dF ¼ dG using the
19
asymptotic distribution in Theorem 2.
Although the asymptotic distributions in Theorems 3
and 4 appear to be similar, the focuses of the two
theorems are totally different. Theorem 3 is a one-sided
test of the hypothesis that two nested SDF models have
the same second HJD. Theorem 4 is a two-sided test of the
hypothesis that two overlapping SDF models are observationally equivalent. We obtain such similar results in
Theorems 3 and 4, because we test both hypotheses using
the same test statistic and two nested models are
observationally equivalent if they have the same second
HJDs.
Given that most models, especially linear factor
models, are likely to be misspecified, the model selection
procedures developed in this section make important
methodological contributions to the asset pricing literature. Although HHL (1995) and Hansen and Jagannathan
(1997) develop the asymptotic distributions of the two
HJDs for misspecified models, their results can test only
the null hypothesis that a given model has a fixed
(nonzero) level of first or second HJD. Their results,
however, cannot be used for formal model comparison
because they do not provide the distribution of the
18
Vuong (1989) shows that under H0 : fðyF 0 Þ ¼ fðyG0 Þ with
probability one, ½s^ T 2 follows a weighted w2 distribution and chooses
to test H0 based on ½s^ T 2 . However, our simulation results (not reported)
show that our test in Theorem 4 has much better finite sample
performances than Vuong’s test.
19
For two overlapping but not observationally equivalent models F
þ
þ
and G, Tð½d^ F 2 ½d^ G 2 Þ has an asymptotic normal distribution as given in
Theorem 2.
We first discuss our simulation designs for the
specification test. Suppose an SDF model has the representation
Ht ¼ b0 Ft ,
ð33Þ
where b is an K 1 vector of market prices of risk and Ft is
an K 1 vector of risk factors. We obtain simulated
random samples of Ht and its associated asset returns, i.e.,
Dt ¼ ðFit ,Yjt Þ0 , for t =1,y,T, i= 1,y,K (the number of factors),
and j =1,y,N (the number of assets), from a (K + N)dimensional multivariate normal distribution
Dt NðmD ,SD Þ,
ð34Þ
where mD is an ðK þ NÞ 1 vector of the mean values of
ðFt ,Yt Þ0 and SD is an ðK þ NÞ ðK þNÞ covariance matrix of
(Ft, Yt).
To make our simulation evidence empirically relevant,
we choose simulation designs to be consistent with
empirical studies in later sections. Specifically, we allow
the market prices of risk b, the mean values of the risk
factors, i.e., the first K elements of mD , and the covariance
matrix SD to be estimated from empirical data. However,
the expected returns of the N assets are determined by the
asset pricing model we choose. That is, if Ht can correctly
price all test assets, i.e., EðHt Yt Þ ¼ EðXt1 Þ, then the
þ
For example, we can test whether the d^ of an SDF model is
significantly different from a pre-specified level, say 0.5, using the
results of HHL (1995) and Hansen and Jagannathan (1997). Suppose we
þ
þ
find that d^ F ¼ 0:5 and d^ G ¼ 1 in our empirical analysis. We cannot test
þ
þ
whether d^ G is significantly different from d^ F using the results of HHL
(1995) and Hansen and Jagannathan (1997), because their tests do not
þ
þ
simultaneously account for the estimation errors in both d^ F and d^ G . In
contrast, our model selection tests explicitly characterize the asymptotic
þ
þ
behavior of ½d^ F 2 ½d^ G 2 .
21
A second-order Taylor approximation of the second HJD is also
needed to develop the model selection tests for overlapping and nested
models.
20
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
expected returns of the N assets can be written as
EðYt Þ ¼
EðXt1 ÞcovðYt ,Ht Þ
:
EðHt Þ
tions. In particular, all deviations from FF3 are based on
the following two redundant factors:
ð35Þ
The above simulation procedure guarantees only that the
expected returns of the test assets are determined by the
pricing kernel we choose. The pricing kernel itself, however,
can take negative values with positive probability.
Based on the above procedure, we generate five
hundred random samples of Dt with different number of
time series observations T. In our main simulation, we
choose N= 26 to mimic the risk-free asset and the Fama
and French 25 size/BM portfolios that have been widely
used in the empirical literature.22 We choose T=200,400,
and 600, where 200 (600) represents the typical number
of quarterly (monthly) observations available in standard
empirical asset pricing studies.
For each simulated random sample, we estimate model
parameters and conduct specification tests based on the
first and second HJDs. Then we report rejection rates
based on the asymptotic critical values at the 10% and 5%
significance levels of the two tests.23 If the tests have good
size performances, then the rejection rates for a correctly
specified model at the above critical values should be
close to 10% and 5%, respectively. If the tests have good
power performances, then the rejection rates for a
misspecified model should be close to one.
We examine the finite sample size performances of the
specification tests using the Fama and French three-factor
model (FF3) with the SDF
HtFF3 ¼ b0 þ b1 rMKT,t þb2 rSMB,t þb3 rHML,t ,
289
ð36Þ
where rSMB,t and rHML,t are the return differences between
small and big firms and high and low BM firms,
respectively. FF3 is a widely used model in the literature.
Although it is a linear factor model, its SDF at empirically
estimated parameter values from historical Fama and
French portfolios take negative values with negligible
probability. Therefore, we treat FF3 as the true model in
our size simulations. To examine the finite sample power
performances of the specification tests, we consider a
simple Capital Asset Pricing Model (CAPM). Because the
data are generated from FF3, the CAPM should not be able
to price the assets and should be rejected by the
specification tests.
Next we discuss our simulation designs for the model
selection tests. For all the tests we consider, we generate
simulated data using the same FF3 model in Eq. (36). We
face many choices in what types of models to use when
testing the finite sample size and power performances of
the model selection tests because of the different model
structures. Given that we simulate data from FF3, we
choose some simple deviations from FF3 in our simula22
We also consider simulation results not reported for the Fama and
French nine portfolios. We find that most tests have better finite sample
performances for a smaller number of assets.
23
We obtain the asymptotic critical values at the 10% and 5%
significance levels based on ten thousand simulated samples from
the weighted w2 distributions in Theorem 1 for the first and second HJDs.
We use similar procedures to obtain the asymptotic critical values for
the model selection tests in Theorems 3 and 4 as well.
r1,t Nðmm ,sm Þ
ð37Þ
and
r2,t Nðmm ,sm Þ,
ð38Þ
where mm ðsm Þ is the mean (volatility) of excess market
returns during our sample period and r1,t and r2,t are
independent of each other. They are also independent of
all the FF3 factors and the asset returns.
To examine the size of the test for non-nested models,
we consider the two models
Htr1 ¼ br1,t
ð39Þ
and
Htr2 ¼ br2,t :
ð40Þ
We estimate the two models using simulated data and
test the null hypothesis that the two models have the
same first or second HJDs. Because the two models are
equally wrong from FF3, the null hypothesis of equal first
or second HJDs should hold. To examine the power of the
test, we test whether Htr1 and HtFF3 have the same first or
second HJDs, a hypothesis that should be rejected by the
simulated data.
To examine the size of the test for nested models, we
consider the model
HtFF3 þ r1 ¼ b0 þ b1 rMKT,t þb2 rSMB,t þ b3 rHML,t þ b4 r1,t :
HtFF3 þ r1
ð41Þ
HFF3
t
We estimate
and
using simulated data and
test the null hypothesis that the two models have the
same first or second HJDs. This hypothesis should hold,
and r1,t is a redundant factor.
because HtFF3 þ r1 nests HFF3
t
To examine the power of the test, we test whether Htr1 and
HtFF3 þ r1 have the same first or second HJDs. This hypothesis should be rejected by the simulated data because
HtFF3 þ r1 has smaller first and second HJDs than Htr1 .
To examine the size of the test for overlapping models,
we compare HtFF3 þ r1 with
HtFF3 þ r2 ¼ b0 þ b1 rMKT,t þb2 rSMB,t þ b3 rHML,t þ b4 r2,t :
HtFF3 þ r1
ð42Þ
HtFF3 þ r2
We estimate
and
using the simulated data
and test the null hypothesis that the two models are
observationally equivalent. This hypothesis should hold
because HtFF3 þ r1 overlaps with HtFF3 þ r2 and r1,t and r2,t are
redundant factors. To examine the power of the test, we
test whether HtFF3 þ r1 is observationally equivalent to
Htr1 þ r2 , where
Htr1 þ r2 ¼ b1 r1,t þ b2 r2,t :
ð43Þ
The two models are not observationally equivalent, and
HtFF3 þ r1 should have smaller first and second HJDs than
Htr1 þ r2 .
4.2. Finite sample size and power performances
Panel A of Table 1 reports the size and power
þ
performances of the d- and d -based specification tests
using simulated data that mimic the risk-free asset and
the Fama and French 25 portfolios. Specifically, it reports
ARTICLE IN PRESS
290
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Table 1
Finite sample performances of specification and model comparison tests.
This table reports the finite sample size and power performances of the specification and model comparison tests based on the first and second HansenJagannathan distances. In all simulations, we generate data according to the Fama and French three-factor model (FF3). That is, the expected returns of
simulated assets are determined by FF3 and the covariance matrix of simulated asset returns mimics that of the Fama and French 25 portfolios between
1952 and 2000. We generate 500 simulated samples with sample sizes of 200, 400, and 600. For each simulated sample, we test the null hypothesis using
either the specification or model comparison tests. We report rejection rates at the 10% and 5% asymptotic critical values over the 500 simulations. More
details on simulation designs can be found in Section 4.1.
Using d +
Using d
Size
Sample size
10%
Power
5%
10%
Size
Power
5%
10%
5%
10%
5%
41%
42%
46%
21%
15%
17%
13%
9%
8%
61%
89%
94%
51%
79%
89%
Panel B. Size and power performances of model comparison tests for strictly non-nested models
200
2%
1%
100%
100%
4%
400
2%
0%
100%
100%
5%
600
0%
0%
100%
100%
2%
1%
2%
0%
100%
100%
100%
100%
100%
100%
Panel C. Size and power performances of model comparison tests for nested models
200
5%
3%
99%
99%
400
4%
1%
100%
100%
600
7%
4%
99%
99%
3%
1%
4%
100%
100%
100%
100%
100%
100%
2%
3%
7%
100%
100%
100%
100%
100%
100%
Panel A. Size and power performances of specification tests
200
19%
13%
51%
400
16%
8%
55%
600
17%
7%
58%
5%
4%
6%
Panel D. Size and power performances of model comparison tests for overlapping models
200
6%
2%
100%
100%
5%
400
7%
3%
100%
100%
7%
600
11%
6%
100%
100%
11%
the rejection rates of FF3 (the null) and CAPM (the
alternative) based on the asymptotic critical values at the
10% and 5% significance levels for the two tests.
Both tests clearly tend to over-reject the null hypothesis for T=200. The rejection rates for both tests are about
20% and 13% at the 10% and 5% asymptotic critical values,
respectively. The performances of both tests become
reasonably good for T= 400 and 600. The rejection rates
for both tests are about 13–15% and 7–8% at the 10% and
5% asymptotic critical values, respectively. Therefore, for
typical sample sizes considered in the current literature,
þ
both d- and d -based specification tests have similar and
reasonably good size performances.
Both tests also have similar power performances in
rejecting the misspecified CAPM. For T = 200, the rejection
rates of both tests are about 80% and 75% at the 10% and
5% asymptotic critical values, respectively. As T increases
to 400 and 600, the rejection rates of both tests are close
to 100%.24
Panel B of Table 1 reports both the size and power
performances of the model selection tests for strictly nonþ
nested models. Both d- and d -based tests have very good
size performances with rejection rates close to corresponding asymptotic critical values for all sample sizes.
The tests also have excellent power in detecting models
with different HJDs: The rejection rates are close to 100%
at all sample sizes.
Panel C of Table 1 reports the performances of the
model selection tests for nested models. The tests for
nested models tend to slightly under reject the null
hypothesis of equal HJDs between the two models. The
þ
rejection rates for both d- and d -based tests are about
5–7% (2–3%) at the 10% (5%) asymptotic critical value for
þ
T= 600. The d -based test for nested models also has 100%
rejection rates for the two models with different HJDs.
Panel D of Table 1 reports the performances of the
model selection tests for overlapping models. The tests for
overlapping models tend to under reject the null hypothesis of two observationally equivalent models. The
þ
rejection rates of both d- and d -based tests are 3–4%
and 1% at the 10% and 5% asymptotic critical values for
þ
T= 600, respectively. The d -based test has excellent
power with 100% rejection rates for the two models that
are not observationally equivalent.25 In contrast, the
power of the d-based test is much worse with rejection
rates in the range of 60–70%. One main reason for the
different powers of the two tests is that the alternative
model Htr1 þ r2 takes negative value 40% of the time in our
þ
simulation. Therefore, the d -based test is more powerful
FF3 þ r1
from Htr1 þ r2 , because the latter is
in differentiating Ht
not arbitrage-free. The above simulation results show that
þ
24
The d based test should be more powerful than the dbased
test in rejecting misspecified models that have small pricing errors but
are not arbitrage-free. This advantage, however, is not reflected here,
because the SDF of CAPM rarely takes negative values in our simulation.
25
The two models used in the power analysis of overlapping models
have different HJDs. In results not reported here, these two models are
also rejected at 100% level based on Theorem 2.
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
the model selection tests also have reasonably good finite
sample performances for typical sample sizes considered
in the current literature.
291
LVX2, have the SDFs
HtLVX1 ¼ b0 þ b1 DIHH,t þ b2 DICorp,t þ b3 DINcorp,t
ð47Þ
and
5. Empirical results
HtLVX2 ¼ b0 þ b1 DIHH,t þ b2 DICorp,t þ b3 DIFCorp,t
þb4 DINcorp,t þ b5 DIFM,t ,
In this section, we provide empirical evidence on the
advantages of the second HJD for empirical asset pricing
studies. In particular, we examine several models that
have been developed in the recent literature to explain
the cross-sectional returns of the Fama-French size/BM
portfolios. These models include that of Lettau and
Ludvigson (2001), Lustig and Nieuwerburgh (2004),
Santos and Veronesi (2006), Li, Vassalou, and Xing
(2006), and Yogo (2006).26 These models also have been
considered by LNS (2010) due to both their importance in
the literature and data availability. LNS (2010) show that
these models pose serious challenges to existing asset
pricing tests that have focused mainly on pricing errors
because the models tend to have small pricing errors for
the Fama and French portfolios by construction. We reach
dramatically different conclusions on model performances using the second HJD than the first HJD.
where DIHH,t ,DICorp,t , DINcorp,t , DIFCorp,t , and DIFM,t represent
log investment growth rates for households, nonfinancial
corporations, the noncorporate sector, financial corporations, and the farm sector, respectively. While LVX2 is the
original model considered in LVX (2006), LVX1 is the
simpler version considered in LNS (2010).
The next model we consider is the durable-consumption CAPM of Yogo (2006), in which the factors are the
growth of durable and nondurable consumption and the
market return. The SDF of the model has the expression
5.1. Asset pricing models
The first model we consider is the conditional
consumption CAPM of Lettau and Ludvigson (2001), in
which the conditioning variable is the aggregate consumption-to-wealth ratio. The SDF of the model has the
expression
HtYOGO ¼ b0 þ b1 DcNdur,t þb2 DcDur,t þ b3 rMKT,t ,
ð48Þ
ð49Þ
where DcNdur,t and DcDur,t represent log consumption
growth for nondurable and durable goods, respectively.
We obtain most of the factors from the corresponding
authors’ websites. Most of the models use consumption or
investment as factors, which are typically available at only
quarterly frequency. As a result, we estimate all the
models using quarterly returns of the risk-free asset and
the Fama and French 25 portfolios from 1952 to 2000. For
comparison, we also consider the Fama and French three27
factor model, HFF3
t .
5.2. Empirical results for the Fama-French portfolios
where so
t1 is the lagged labor income-to-consumption
ratio.
The fourth model we consider is the sector investment
model of Li, Vassalou, and Xing (LVX, 2006). The two
versions of the model we consider, denoted as LVX1 and
Our empirical analysis is conducted in several steps.
We first estimate all the models by minimizing their
corresponding first and second HJDs.28 We then conduct
specification tests of all the models based on the first and
second HJDs. Finally, we compare relative model performances using the model selection tests based on the first
and second HJDs.
Panel A of Table 2 reports the results of specification
tests of all the models. We first report the estimated first
and second HJDs and their differences. We then report the
^ t (estimated using the first HJD) takes
probability that H
^ t o 0Þ.
negative values during the sample period, PrðH
þ
Finally, we report the p-values of the d- and d -based
specification tests for all the models. We reach dramatically
different conclusions on model performances based on the
first and second HJDs. For example, LVX1 and LVX2 have
the smallest first HJDs among all the models. In fact, the
p-values of the d-based specification test for the two
models are 33% and 53%, respectively, while the p-values
for all other models are zero. Therefore, the two models
seem to capture the returns of the Fama and French
portfolios reasonably well based on the first HJD. However,
the probabilities that the estimated SDFs of the two models
take negative values are 14% and 15%, respectively. Because
26
Most of these models incorporate some kind of conditioning
variables to improve the fit of the data. For issues related to conditional
asset pricing models, see Ferson and Harvey (1999) and Farnsworth,
Ferson and Jackson (2002), among others.
27
The returns on all the test assets and the Fama and French factors
were downloaded from Ken French’s website on March 13, 2009.
28
For brevity, we do not report the estimates of model parameters
and Lagrangian multipliers. These results are available upon request.
HtLL ¼ b0 þb1 cayt1 þ b2 Dct þ b3 cayt1 Dct ,
ð44Þ
where cayt 1 is the lagged consumption-to-wealth ratio
and Dct is the log consumption growth rate.
The second model we consider is the conditional
consumption CAPM of Lustig and Nieuwerburgh (2004),
in which the conditioning variable is the housing
collateral ratio. Following LNS (2010), we consider only
their linear model with separate preferences. The SDF of
the model has the expression
HtLV ¼ b0 þb1 myt1 þ b2 Dct þb3 myt1 Dct ,
ð45Þ
where myt 1 is the lagged housing collateral ratio based
on mortgage data.
The third model we consider is the conditional CAPM
of Santos and Veronesi (2006) with the SDF
HtSV ¼ b0 þb1 rMKT,t þ b2 so
t1 rMKT,t ,
ð46Þ
ARTICLE IN PRESS
292
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Table 2
Asset pricing tests based on the first and second Hansen-Jagannathan distances for the risk-free asset and the Fama and French 25 portfolios.
This table provides empirical results on seven asset pricing models based on the two HJDs for the risk-free asset and the Fama and French 25 Size/BM
portfolios from Q2.1952 to Q4.2000. The first data point is Q2.1952 because some of the factors are lagged by one quarter. The returns of all the test assets
and the Fama and French factors are downloaded from Ken French’s website on March 13, 2009. Panel A reports specification test results. The first
(second) row of Panel A contains the estimated first (second) HJD. The third row of Panel A contains the percentage difference between the two HJDs. The
fourth row reports the probabilities that SDF models estimated using the first HJD take negative values during the sample period. The last two rows of
Panel A report the p-values of specification tests based on the first and second HJDs. Panel B contains the results on model comparison for overlapping
models. The reported numbers are the p-values of the hypothesis that the model in the corresponding column is observationally equivalent to the model
in the corresponding row. Panel C contains the results on model comparison for overlapping but not observationally equivalent models. The reported
numbers are the t-statistics for the difference between the HJDs of the model in the corresponding column and the model in the corresponding row. If the
column model is better than the row model at the 5% significance level, then the t-statistic should be smaller than 1.96, and vice versa.
Panel A: Results of specification tests using the risk-free asset and the Fama and French 25 portfolios
LL
LV
SV
LVX1
LVX2
YOGO
FF3
D
dþ
þ
ðd dÞ=d
0.643
0.685
0.643
0.700
0.642
0.667
0.580
0.691
0.546
0.684
0.651
0.673
0.582
0.607
6%
9%
4%
19%
25%
3%
4%
pðH o 0Þ
pðd ¼ 0Þ
2%
0%
0%
10%
0%
0%
1%
0%
0%
14%
33%
0%
15%
53%
0%
0%
0%
0%
2%
0%
0%
pðd
þ
¼ 0Þ
Panel B: Results of model comparison tests for overlapping models
Using d þ
Using d
Model
LL
LV
LV
SV
LVX1
LVX2
YOGO
FF3
93%
90%
7%
5%
45%
23%
85%
10%
6%
50%
36%
SV
5%
3%
26%
0%
LVX1
16%
2%
18%
LVX2
2%
11%
YOGO
LL
LV
SV
LVX1
LVX2
YOGO
0%
42%
87%
71%
80%
67%
7%
30%
72%
63%
73%
1%
51%
63%
26%
0%
34%
96%
1%
84%
2%
0%
Panel C: Results of model comparison tests for overlapping but not observationally equivalent models
Using d þ
Using d
Model
LV
SV
LVX1
LVX2
YOGO
FF3
LL
0.82
LV
SV
LVX1
LVX2
0.79
1.00
YOGO
LL
LV
SV
LVX1
LVX2
YOGO
2.21
2.04
1.94
1.77
2.08
0.67
0.91
1.83
the second HJD explicitly requires a good model to be
arbitrage-free, the two models are overwhelmingly rejected
þ
by the d -based specification test. All the other models are
rejected by the second HJD as well.
In theory, the consumption-based models should have
better performances than FF3 measured by the second
HJD, because the former is designed to price both the
primary and derivatives assets while FF3 focuses mainly
on pricing the primary assets. However, FF3 has smaller
second HJD than all the consumption-based models. One
reason for this result is that the consumption-based
models we consider are linearized versions of the original
models and are not guaranteed to be arbitrage-free.
Another and more fundamental reason is that the
consumption-based models try to explain the crosssectional returns of the Fama and French portfolios
using fundamental economic variables, which is more
2.03
challenging than using factors extracted from stock
returns, such as the Fama and French factors.
Next we consider the relative performances of these
models using the model selection tests based on the first
and second HJDs. Panel B of Table 2 reports the model
comparison results based on the tests for overlapping
models in Theorem 4, because all models we consider
share at least one common constant factor and thus are
overlapping models.29 Each entry in Panel B represents
the p-value of the hypothesis that the model in the
corresponding column is observationally equivalent to the
model in the corresponding row. Among all the pairs of
models we consider, we conclude based on the first HJD
29
Because LVX2 nests LVX1, we compare this pair of models using
the tests in Theorem 3.
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
293
Panel A: Estimated SDF for LVX1
5
4
3
2
1
0
-1
-2
-3
-4
-5
195202
using first HJD
195802
196402
197002
using second HJD
197602
198202
198802
199402
200002
Panel B: Estimated SDF for LVX2
5
4
3
2
1
0
-1
-2
-3
-4
-5
using first HJD
195202
195802
196402
197002
using second HJD
197602
198202
198802
199402
200002
Fig. 2. Time series plots of two stochastic discount models estimated using the risk-free asset and the 25 Fama and French size/book-to-market
portfolios. Panel A (B) contains time series plots of empirically estimated two stochastic discount factor models of Li, Vassalou, and Xing (2006) (denoted
by LVX1 and LVX2) using the first and second HJDs. The details of the model can be found in Section 5.1.
that LVX2 is not observationally equivalent to LL, SV, and
YOGO; LVX1 is not equivalent to SV to YOGO; and FF3 is
not equivalent to SV and YOGO. In contrast, based on the
second HJD, we conclude that FF3 is not observationally
equivalent to almost all other models, which appear to be
not distinguishable from each other. These results are
consistent with that in Panel A of Table 2, where all
models except FF3 have similar second HJDs.
For those models that are not observationally equivalent, in Panel C of Table 2, we test whether they have the
same first or second HJDs based on the tests for nonnested models in Theorem 2.30 Each entry in Panel C
represents the test statistic of the hypothesis that the
model in the corresponding column is better than the
model in the corresponding row measured by the first or
second HJD. We again reach dramatically different
conclusions based on the first and second HJDs. Only
YOGO has significantly bigger first HJD than FF3, while all
other pairs of nonequivalent models do not exhibit
significantly different first HJDs. In contrast, we find most
other models have significantly bigger second HJDs than
FF3, suggesting that none of them can capture the returns
of the Fama and French portfolios as well as FF3.
The two HJDs also lead to very different estimated SDF
^ t (estimated
models. We present time series plots of H
^ þ (estimated using the second
using the first HJD) and H
t
HJD) for LVX1 and LVX2 in Panels A and B of Fig. 2,
^ t takes negative values much
respectively. It is clear that H
^ þ for both models, and H^ þ takes
more frequently than H
t
t
negative values only on rare occasions. Therefore, for a
given sample, though linear factor models estimated
using the first HJD can take negative values with high
probabilities, they can be made closer to be arbitrage free
when estimated using the second HJD.
To summarize, the first and second HJDs could lead to
dramatically different conclusions on model performances.
Even though based on the first HJD certain models appear
to do a good job in explaining the Fama and French
portfolios, all models are overwhelmingly rejected by the
second HJD. Moreover, the second HJD is more powerful
than the first HJD in distinguishing models that have
similar pricing errors of the test assets but are not
arbitrage-free. The analysis in this subsection shows that
the second HJD can make significant differences in
empirical asset pricing studies.
5.3. Further diagnostics
30
The difference between the first or second HJDs of two overlapping but not observationally equivalent models has the same
distribution as that between two strictly non-nested models.
While we evaluate and compare asset pricing models
using the second HJD in the previous subsection, we
provide further diagnostics in this subsection to better
ARTICLE IN PRESS
294
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Table 3
Correlation matrices for the estimated SDFs, the implied true SDFs, and the adjustment portfolios under the first and second HJDs.
This table provides correlation matrices for the estimated SDFs, the implied true SDFs, and the adjustment portfolios under the first and second HJDs
using the risk-free asset and the Fama and French 25 Size/Book-to-Market portfolios from Q2.1952 to Q4.2000. The adjustment portfolios are the
differences between the estimate SDFs and the implied true SDFs.
Using d þ
Using d
Model
LL
LV
SV
LV
SV
LVX1
LVX2
YOGO
SDFs under the first and second HJDs
0.57
0.26
0.09
0.90
0.08
0.05
0.01
0.28
0.15
0.16
0.58
0.26
0.10
0.12
0.07
0.22
0.14
0.06
0.07
0.79
0.64
0.89
0.11
0.19
0.10
0.19
0.62
Panel B: Correlation matrix of the implied true SDFs under the first and second HJDs
LV
0.77
0.94
SV
0.76
0.63
0.89
LVX1
0.40
0.36
0.47
0.87
LVX2
0.37
0.34
0.40
0.92
0.85
YOGO
0.79
0.70
0.95
0.49
0.42
0.91
FF3
0.80
0.67
0.96
0.52
0.45
0.99
0.91
0.93
0.90
0.88
0.95
0.94
0.93
0.92
0.98
0.99
0.99
0.94
0.94
0.93
0.93
1.00
Panel A: Correlation matrix of the estimated
LV
0.56
SV
0.22
0.05
LVX1
0.02
0.05
0.01
LVX2
0.06
0.10
0.01
YOGO
0.26
0.23
0.65
FF3
0.26
0.13
0.60
LVX1
LVX2
YOGO
LL
Panel C: Correlation matrix of the adjustment portfolios (the difference between estimated SDFs and the implied true SDFs) under the first and second
HJDs
LV
0.96
0.97
SV
0.94
0.89
0.95
0.93
LVX1
0.78
0.81
0.77
0.93
0.96
0.91
LVX2
0.74
0.76
0.71
0.94
0.93
0.95
0.90
0.99
YOGO
0.93
0.90
0.98
0.82
0.75
0.95
0.94
0.99
0.92
0.91
FF3
0.87
0.82
0.92
0.75
0.73
0.89
0.87
0.85
0.91
0.84
0.84
0.90
understand the differences and similarities among all the
models.
In Table 3, we report the correlation matrices of the
^ þ ), the implied true SDFs (H
^ t l^ 0 Yt
^ t and H
estimated SDFs (H
t
þ0
þ
^ l^ Yt þ ), and the adjustment portfolios (l^ 0 Yt and
and ½H
t þ
þ0
þ
^
H t ½H^ t l^ Yt þ ) for all the models under the first and
second HJDs. Panel A of Table 3 shows that the correlations of
the estimated SDFs among most models are low under both
the first and second HJDs. The only exception is the two
investment-based models, whose correlation coefficient is
0.90 (0.89) under the first (second) HJD. There is not a
uniform relation between the correlations under the first and
second HJDs: For certain models the correlations are higher
under the first HJD, but for other models, the opposite is true.
These results are not surprising given that most models focus
on different economic factors to explain asset prices. The
implied true SDFs have to correctly price all the test assets by
construction. As a result, their correlations are much higher
than those of the estimated SDFs as shown in Panel B of
Table 3. Moreover, because the true SDFs under the second
HJD have to be arbitrage-free, their correlations are uniformly
larger than that of the true SDFs under the first HJD. Panel C
of Table 3 reports the correlations of the adjustment
portfolios for all the models under the first and second
HJDs. The low correlations of the estimated SDFs and the high
correlations of the true SDFs and the adjustment portfolios of
the consumption-based models suggest that these models
fail to capture some common components that are important
for pricing the Fama and French portfolios.
To have a better understanding of the missing
components in these models, in Table 4 we regress the
adjustment portfolios under the first and second HJDs on
some well-known economic and stock market factors. The
economic factors include gross domestic product growth
(GDP), industrial production (IP) growth, credit spread
(yield difference between BAA and AAA corporate bonds),
term spread (yield difference between ten-year and oneyear government bonds), quarterly risk-free rate, market
risk premium, and market index volatility. The stock
market factors include the six Fama and French size and
BM benchmark portfolios, which help to better
understand which dimension of the size and BM effects
the models fail to capture. We report the ordinary least
squares regression coefficients and the adjusted R2s in
Table 4, where bold entries represent coefficients that are
significant at 5% level.
Panels A and B of Table 4 report the regression results
with the economic and stock market factors, respectively.
Panel A shows that under both the first and second HJDs,
term spread and risk-free rate are significant for all
models, suggesting that all models need to better capture
pricing information contained in the two factors to be
admissible. Panel B shows that under both the first and
second HJDs, the big-low (BL), small-low (SL), and smallhigh (SH) portfolios are significant for all models and that
the big-high (BH) portfolio is significant for all models
except LVX1 and LVX2. The R2’s in Panel B are generally
higher than that in Panel A, although the R2’s are low in
Using d þ
Using d
LVX1
LVX2
YOGO
FF3
LL
LV
SV
LVX1
LVX2
YOGO
FF3
2.94
2.09
13.16
16.37
0.92
16.31
0.29
5.17%
3.29
2.14
1.45
10.09
1.20
12.85
0.26
5.18%
6.10
3.52
25.66
20.04
0.39
17.16
0.15
3.43%
6.59
3.98
23.00
13.46
1.01
10.43
0.21
2.30%
5.68
3.15
10.07
16.83
0.16
16.16
0.13
1.98%
5.26
3.73
21.23
22.20
0.66
20.87
0.33
6.03%
7.06
3.13
24.58
22.11
0.41
18.42
0.16
4.46%
4.55
2.60
24.77
21.31
0.87
19.44
0.60
6.79%
4.68
2.53
25.32
20.63
0.97
18.89
0.52
6.32%
5.75
3.16
35.27
23.51
0.38
19.23
0.09
4.71%
6.32
3.56
32.11
16.58
1.08
12.11
0.15
3.22%
Panel B: Regression results based on Fama-French six benchmark portfolios
BL
3.41
4.12
3.16
4.64
4.20
BM
1.74
1.19
0.63
1.92
3.35
BH
4.96
4.23
5.54
1.23
2.38
SL
5.96
6.24
6.67
4.31
2.94
SM
1.37
1.23
1.91
4.22
5.28
SH
6.82
7.08
7.72
6.30
6.11
2
19.44%
24.47%
22.92%
23.14%
20.68%
Adjusted R
2.82
0.35
3.49
6.72
1.19
7.42
24.46%
5.34
1.24
6.25
4.53
2.02
4.72
11.84%
3.77
1.11
4.40
6.47
1.81
7.15
22.42%
4.44
1.15
3.62
6.56
1.70
6.94
27.17%
3.18
0.49
5.05
6.81
1.91
7.60
21.72%
4.50
1.24
2.72
6.01
0.70
6.32
23.83%
4.66
0.66
2.96
6.15
0.88
6.71
24.31%
2.97
0.04
3.34
6.86
1.67
7.12
22.59%
5.46
1.43
6.24
4.64
2.30
4.59
11.05%
ARTICLE IN PRESS
LL
LV
SV
Panel A: Regression results based on economic factors
GDP
6.19
4.29
7.33
IP
3.80
3.12
3.86
Credit
4.84
0.16
21.71
Term
13.37
13.78
19.85
Vol
0.04
0.58
0.32
Rf
13.82
15.10
17.26
0.05
0.24
Mkt
0.05
Adjusted. R2
1.91%
2.80%
4.01%
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Table 4
Regressions of the adjustment portfolios under the first and second Hansen-Jagannathan distances on economic and stock market factors.
This table provides regression results of the adjustment portfolios under the first and second HJDs on economic and stock market factors using the risk free-asset and the Fama and French 25 Size/Book-toMarket portfolios from Q2.1952 to Q4.2000. Panels A and B contain regression results based on economic and stock market factors, respectively. In Panel A, GDP is the quarterly growth rate of log gross domestic
product growth, IP is the quarterly growth rate of industrial production, Credit is the quarterly yield spread between BAA corporate bond and AAA corporate bond, Term is the quarterly yield spread between
ten-year T-bond and one-year T-bond, Vol is the quarterly volatility of the market return index, Rf is the quarterly risk-free rate, and Mkt is the quarterly excess return of the market index. In Panel B, BL, BM, BH,
SL, SM, and SH represent the big/low (BL), big/median (BM), big/high (BH), small/low (SL), small/median (SM), and small/high (SH) Fama-French benchmark portfolios, respectively. Bold entries represent
ordinary least squares regression coefficients that are significant at the 5% level.
295
ARTICLE IN PRESS
296
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
both panels (the highest adjusted R2 is less than 30%). This
suggests that all the factors we consider still fail to
adequately capture the missing components of the
models, which we leave for future research.
6. Conclusion
In this paper, we develop a systematic approach for
evaluating asset pricing models based on the second HJD,
which unlike the first HJD, explicitly requires a good asset
pricing model to be arbitrage free. We develop both a
specification test and a sequence of model selection
procedures in the spirit of Vuong (1989) for non-nested,
overlapping, and nested models based on the second HJD.
Compared with existing methods, our tests are more
powerful in detecting misspecified models that have small
pricing errors but are not arbitrage-free and in differentiating the relative performances of models that have
similar pricing errors of a given set of test assets.
Simulation studies show that our tests have reasonably
good finite sample performance for typical sample sizes
considered in the literature. Using the Fama and French
size and book-to-market portfolios, we reach dramatically
different conclusions on model performances using our
approach and existing methods.
Appendix A
In this appendix, we provide the assumptions and
detailed mathematical proofs of all the results in the
paper.
Assumption A.1. The population optimization problem
has a unique solution
y0 ¼ ðg0 ,l0 Þ arg min maxEfðg,lÞ,
g
l
which is an interior point of the parameter space of
y ¼ ðg,lÞ.
EJXJ2 o1,
Assumption
A.2. EJYJ2 o1,
E½maxJgg0 J o C HðgÞ2 o 1 for some C 40.
and
Assumption A.3. The SDF model HðgÞ is twice continuously differentiable with respect to g.
0
Assumption A.4. The set fHðgÞl Y ¼ 0g has probability
zero under the true probability measure.
Assumption A.5. The first-order derivatives (which exist
everywhere)
1
0
@f
C
@f B
B @g C
¼B
C
@y @ @f A
@l
form a Donsker class for ðg,lÞ in a neighborhood of ðg0 ,l0 Þ.
Assumption A.6. The time series ðYt ,Xt1 ,Ht ðgÞÞ are
stationary and ergodic.
Assumption A.7. The matrix G defined by
1
0
@2 fðy0 Þ
@2 fðy0 Þ
E
E
C
B
0
@g@g0
@g@l C
@2 fðy0 Þ B
C
B
GE
0 B
2
2
C
@y@y
@ E @ fðy0 Þ E @ fðy0 Þ A
0
@l@g0
@l@l
G11
G21
G12
G22
!
is nonsingular, with a positive definite G22 and a negative
definite ½G11 G12 G1
22 G21 . The second derivatives are well
defined except on a set with zero probability.
The above assumptions are somewhat standard in
asymptotic analysis. Assumption A.1 is needed for
identification purpose. Assumption A.2 requires all
random variables to be square integrable, which is needed
for the existence of the asymptotic covariance matrix of
the second HJD and exchanging differentiation and
expectation operations. Assumption A.3 is a smoothness
assumption needed for quadratic Taylor series expansion.
Assumption A.4 guarantees that the set of non-differentiable points of the second HJD is not too big so that
differentiability in quadratic mean holds. It should hold
for most models in the existing literature. Assumption A.5
ensures that central limit theorem holds for the first
derivatives of f. A set F of functions is called a Donsker
class for P if a functional central limit
theorem holds for
pffiffiffi
the sequence of empirical processes T ðET EÞf for f 2 F
(see Dudley, 1981). A key property of a Donsker class is
that, for every given e 4 0, Z 4 0, there exists a B 40 and
an T0 such that, for all T 4 T0
(
)
pffiffiffi
pffiffiffi
P supj T ðET EÞf1 T ðET EÞf2 j 4 Z o e:
ð50Þ
½ B
½B means that the supremum runs over all pairs of
functions f1 and f2 in F that are less than B apart in L2(P)
norm. This property is needed to justify the first-order
approximation to the random component ðET EÞfðy,lÞ.
Assumption A.6 enables inferences of population distribution using time series counterparts. In certain applications, we might need to transform the original price
or payoff series in order to satisfy Assumption A.6.
For example, although stock prices generally are not
stationary and ergodic, stock returns generally are.
Assumption A.7 ensures that the optimization problem
ming maxl Efðg,lÞ is well defined.
Lemma 1. Suppose Assumptions A.1 to A.7 hold. Then the
following local asymptotic quadratic representation holds for
ET fðyÞ around y0 :
ET fðyÞ ¼ Efðy0 Þ þ ðET EÞfðy0 Þ þ A0 ðyy0 Þ
þ 12ðyy0 Þ0 Gðyy0 Þ þoðJyy0 J2 Þ þ op ðJyy0 JT 1=2 Þ,
0
where A ¼ ðET EÞ@fðy0 Þ=@y and G ¼ E@2 fðy0 Þ=@y@y .
Proof. We decompose ET fðyÞ into a deterministic term,
Efðy0 Þ, and a centered random term, ðET EÞfðyÞ,
ET fðyÞ ¼ EfðyÞ þðET EÞfðyÞ:
ð51Þ
The LAQ representation of ET fðyÞ is based on a secondorder Taylor approximation of ET fðyÞ and a first-order
approximation of ðET EÞfðyÞ. The random component is
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
centered and in general one order smaller than the
deterministic term.
We first consider the second-order approximation of
EfðyÞ. Because HðgÞ is twice differentiable in g, fðyÞ has
the continuous first-order derivatives
1
0
0
1
@fðyÞ
@HðgÞ
0
þ @HðgÞ
B
2Hð
g
Þ
2½Hð
g
Þ
l
Y
@fðyÞ B @g C
C B
@g
@g C
¼B
C¼@
A:
@ @fðyÞ A
@y
0
2½HðgÞl Y þ Y2X
@l
ð52Þ
297
Note that E@fðy0 Þ=@y ¼ 0, because y0 ¼ ðg0 ,l0 Þ solves the
population optimization problem ming maxl EfðyÞ. Therefore, compared with traditional Taylor expansions, the key
point here is to justify that we can still use the secondorder derivatives of f (which are not defined everywhere)
to obtain an approximation to EfðyÞ.
Next we consider a first-order approximation of
ðET EÞfðyÞ. The first-order differentiability of fðyÞ with
respect to y and Assumption A.5 on the first-order
derivatives guarantee the stochastic differentiability of
the empirical process [see Pollard, 1982, p. 921, Eq. (4)]
From Assumption A.2, we have
@
@fðyÞ
:
EfðyÞ ¼ E
@y
@y
ð53Þ
We now demonstrate that although the first-order
derivatives of f in Eq. (52) are not differentiable everywhere, they are in fact differentiable in quadratic mean as
in Pollard (1982, p. 920). A random function f ðaÞ is
differentiable in quadratic mean with respect to a at a0 if
there exists a random vector D such that
f ðaÞ ¼ f ða0 Þ þ D0 ðaa0 Þ þ Jaa0 JR,
ð54Þ
where
EðRÞ2 -0 as a-a0 :
Intuitively, this means that
approximation to f ðaÞ around
To demonstrate that Eq.
quadratic mean, define the
following equation:
f ða0 Þ þ D0 ðaa0 Þ is a good
a0 on average.
(52) is differentiable in
remainder term r by the
@fðyÞ
@fðy0 Þ @2 fðy0 Þ
¼
þ
0 ðyy0 Þ þ Jyy0 Jr:
@y
@y
@y@y
ð55Þ
Although the second derivatives involved might not exist
everywhere, the set of points for which they are not
defined has probability zero due to Assumption A.4.
Hence as a function in the Hilbert space L2(P), the
remainder term r is well defined. The remainder term r
can be shown to be dominated by a function in L2(P) and
r-0 almost surely as y-y0 . The argument for this
assertion is similar to that of Lemma A in Pollard (1982).
By the dominated convergence theorem, this implies
differentiability in quadratic mean of the above firstorder derivatives of f. Because L2(P) convergence implies
L1(P) convergence, the quadratic mean differentiability for
Eq. (52) implies that EfðyÞ has traditional second
0
derivatives given by E@2 fðyÞ=@y@y . We emphasize again
that the second-order derivatives inside the expectation
operator is well defined except on a set of zero
probability.
It follows that the deterministic term EfðyÞ has the
quadratic approximation
@2 fðy0 Þ
0
@y@y
2
ðyy0 Þ þoðJðy,lÞðy0 ,l0 ÞJ Þ:
ðET EÞfðyÞ ¼ ðET EÞfðy0 Þ þ ðET EÞ
@fðy0 Þ
0
@y
ðyy0 Þ þ op ðJyy0 JT 1=2 Þ:
ð57Þ
Combining Eq. (A.3) and (57), we obtain the LAQ
representation for ET fðyÞ. &
Based on the LAQ representation of ET fðyÞ in Lemma 1,
we solve the minimax problem ming maxl ET fðyÞ to
obtain the asymptotic representation of the second HJD
at y^ ¼ ðg^ ,l^ Þ arg ming maxl ET fðyÞ. We first make the
following additional assumptions.
Assumption A.8. The estimator
maxl ET fðyÞ for y0 is consistent.
y^ ¼ ðg^ ,l^ Þ arg ming
Assumption A.9. A central limit theorem holds for the
empirical process
1
0
@fðy0 Þ
C
B
pffiffiffi
B @g C
T ðET EÞB
C*Nð0,LÞ,
@ @fðy0 Þ A
@l
where
20
10
10 3
@fðy0 Þ
@fðy0 Þ
6B @g CB @g C 7
CB
C7
B
L ¼ E6
CB
C 7:
6B
4@ @fðy0 Þ A@ @fðy0 Þ A 5
@l
@l
As suggested by HHL (1995), the consistency condition
in Assumption A.8 can be replaced by more primitive
assumptions.
Lemma 2. Suppose Assumptions A.1 to A.9 hold. Then we
have the following asymptotic representation of the second
HJD at estimated model parameter y^ :
þ
½d^ T 2 ¼ min maxET fðyÞ ¼ Efðy0 Þ þðET EÞfðy0 Þ
g
l
12A0 G1 A þop ðT 1 Þ:
Moreover, the optimizer y^ ¼ ðg^ ,l^ Þ arg ming maxl ET fðyÞ
equals
y^ ¼ y0 G1 Aþ op ðT 1=2 Þ,
where A and G are defined in Lemma 1.
1
2
EfðyÞ ¼ Efðy0 Þ þ ðyy0 Þ0 E
ð56Þ
Proof. Denote Ag ¼ ðET EÞ@fðy0 Þ=@g, Al ¼ ðET EÞ@fðy0 Þ=
@l, U ¼ ðgg0 Þ, and V ¼ ðll0 Þ. Based on the LAQ
ARTICLE IN PRESS
298
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
where G12 , G21 , and G22 are defined in Assumption A.7 and
Ll ¼ E½@fðy0 Þ=@l@fðy0 Þ=@l0 .
representation in Lemma 1, we have
ET fðyÞ ¼ Efðy0 Þ þ ðET EÞfðy0 Þ
Ag
þ
Al
!0 U
V
þ
1
2
U
0
V
G11
G21
!
G12 U G22
V
Proof. From Lemma 2, we know that under H0 : d
þoðJðg,lÞðg0 ,l0 ÞJ2 Þ þ op ðJðg,lÞðg0 ,l0 ÞJT 1=2 Þ
¼ Efðy0 Þ þ ðET EÞfðy0 Þ
1
1
þA0g U þ U 0 G11 U þ½Al þ G21 U0 V þ V 0 G22 V
2
2
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl
ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl
ffl}
þ
½d^ T 2 ¼ ðET EÞfðg0 ,l0 Þ
þoðJðg,lÞðg0 ,l0 ÞJ2 Þ þ op ðJðg,lÞðg0 ,l0 ÞJT 1=2 Þ:
For a fixed g, the quadratic in V in the above equation is
maximized at
1=2
Þ,
V^ ¼ ðl^ l0 Þ ¼ G1
22 ½Al þ G21 U þ op ðT
ð58Þ
J12
J21
J22
!
Ag
Al
!
þ op ðT 1 Þ:
Next we consider the second term in Eq. (63). Under
(A.2) and some simple calculations, we have
1
0 1
1
1 0
Þ
¼ A0g U þ 12 U 0 G11 U12 A0l G1
22 Al Al G22 G21 U2U ½G12 G22 G21 U þ op ðT
1
0
0 1
1
1 0
12 A0l G1
Þ:
22 Al þ ½Ag Al G22 G21 U þ 2U ½G11 G12 G22 G21 U þ op ðT
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
Ag ¼
@fðy0 Þ
¼0
@g
ð59Þ
The above quadratic in U is minimized at
1
1=2
U^ ¼ ðg^ g0 Þ ¼ ½G11 G12 G1
½Ag G12 G1
Þ,
22 G21 22 Al þ op ðT
ð60Þ
and its minimum at U ¼ U^ equals
1
0 1
1
1 0
12 A0l G1
22 Al 2½Ag Al G22 G21 ½G11 G12 G22 G21 1
½Ag G12 G1
Þ:
22 Al þ op ðT
ð61Þ
Therefore, the two-step optimization leads to the
following asymptotic representation of the objective
function:
min maxET fðyÞEfðy0 Þ ¼ ðET EÞfðy0 Þ
l
1
1
1 0
12 A0l G1
Ag
22 Al 2Ag ½G11 G12 G22 G21 1
þA0g ½G11 G12 G1
G12 G1
22 G21 22 Al
1
1
1
12A0l G1
G12 G1
Þ
22 G21 ½G11 G12 G22 G21 22 Al þop ðT
!0
!
!
Ag
Ag
J11 J12
¼ ðET EÞfðy0 ,l0 Þ12
þ op ðT 1 Þ,
Al
J21 J22
Al
ð62Þ
where
ð64Þ
and
quadratic in U
G11 ¼ E
@2 fðy0 Þ
¼ 0:
@g@g0
ð65Þ
Substituting these representations into Eq. (63), we
þ
obtain that under H0 : d ¼ 0
pffiffiffi
pffiffiffi
þ
T½d^ T 2 ¼ 12½ T ðET EÞAl 0 J22 ½ T ðET EÞAl þ op ð1Þ
pffiffiffi
1
1
1
¼ 12½ T ðET EÞAl 0 ½G1
22 G22 G21 ½G12 G22 G21 pffiffiffi
1
G12 G22 ½ T ðET EÞAl þop ð1Þ:
ð66Þ
pffiffiffi
Assumption A.9 implies that T ðET EÞAl -W, where W
is a multivariate normal random vector with mean zero
and covariance matrix Ll . It follows immediately that
þ
þ
under H0 : d ¼ 0, the asymptotic distribution of T½d^ T 2
1=2
1=2
can be represented as W 0 XW ¼ Z 0 Ll XLl Z, where
1
1
1
1
1 1
X ¼ 2½G22 G22 G21 ½G12 G22 G21 G12 G22 , Z is a vector
of standard multivariate normal random vector, and
1=2
1=2
1=2
W ¼ Ll Z. It can be easily shown that Ll XLl has
þ
2
the same eigenvalues as XLl . Therefore, T½d^ T has an
asymptotic distribution of weighted w2 , and the weights
are the eigenvalues of the matrix XLl ¼ 12½G1
22 1
1
1
G1
22 G21 ½G12 G22 G21 G12 G22 Ll . &
Proposition 1 (Parameter estimation). Suppose Assumptions A.1 to A.9 hold. Then
1
J11 ¼ ½G11 G12 G1
,
22 G21 0
1
¼ ½G11 G12 G1
G12 G1
J12 ¼ J21
22 G21 22 ,
and
&
Theorem 1 (Specification test). Suppose Assumptions A.1 to
þ
þ
A.9 hold. Then under H0 : d ¼ 0, T½d^ T 2 has an asymptotic
weighted w2 distribution and the weights are the eigenvalues
of the matrix
1
1
1
12½G1
G12 G1
22 G22 G21 ½G12 G22 G21 22 Ll ,
J11
First, we argue that ðET EÞfðg0 ,l0 Þ ¼ 0 under H0 :
d þ ¼ 0. The fact that d þ ¼ 0 means that Efðg0 ,l0 Þ ¼ 0.
Because fðg0 ,l0 Þ is nonnegative, we must have
fðg0 ,l0 Þ ¼ 0 almost everywhere for Efðg0 ,l0 Þ to be zero.
Consequently, ðET EÞfðg0 ,l0 Þ ¼ 0.
A0g U þ 12 U 0 G11 U12½Al þ G21 U0 G1
22 ½Al þ G21 U
1
1
1
J22 ¼ G1
G12 G1
22 þ G22 G21 ½G11 G12 G22 G21 22 :
!0
H0 : d þ ¼ 0, we must have l0 ¼ 0. Consequently, based on
and its maximum at V ¼ V^ equals
g
Ag
Al
¼0
ð63Þ
quadratic in V
¼
1
2
þ
the estimator of model parameters, g^ , has the following
asymptotic distribution:
pffiffiffi
T ðg^ g0 Þ*Nð0,ðJ11 J12 ÞLðJ11 J12 Þ0 Þ
the estimator of Lagrangian multipliers, l^ , has the
following asymptotic distribution:
pffiffiffi
T ðl^ l0 Þ*Nð0,ðJ21 J22 ÞLðJ21 J22 Þ0 Þ,
J12
11 G12 1 and G ’ s (i,j = 1,2) are defined in
where ðJJ11
Þ ¼ ðG
ij
G21 G22 Þ
21 J22
Assumption A.7.
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
þ
Proof. From the proof of Lemma 2, we have
½d^ G 2 ¼ min maxET fðyG Þ ¼ EfðyG0 Þ þ ðET EÞfðyG0 Þ
gG
1
1=2
g^ g0 ¼ ½G11 G12 G1
½Ag G12 G1
Þ
22 G21 22 Al þop ðT
1
¼ ½G11 G12 G1
ðI
22 G21 þ op ðT
1=2
J12 Þ
Þ ¼ ðJ11
Ag
G12 G1
22 Þ
!
Al
þop ðT
1=2
Ag
Al
299
!
ð69Þ
and
þ
þ
½d^ F 2 ½d^ G 2 ¼ EfðyF 0 ÞEfðyG0 Þ þ ðET EÞ½fðyF 0 Þ
Þ:
fðyG0 Þ12 A0F JF AF þ 12A0G JG AG þ op ðT 1 Þ:
From
Assumption A.9, we know immediately that
pffiffiffi
T ðg^ g0 Þ is asymptotically normally distributed with
mean zero and covariance matrix ðJ11 J12 ÞLðJ11 J12 Þ0 .
Similarly, we have the following asymptotic representation for l^ l0 ,
1=2
^
l^ l0 ¼ G1
Þ
22 ½Al þ G21 ðg g0 Þ þ op ðT
1
1
1
¼ G22 ðAl G21 ½G11 G12 G22 G21 ½Ag G12 G1
22 Al Þ
1
1
þ op ðT 1=2 Þ ¼ G1
Ag
22 ðAl G21 ½G11 G12 G22 G21 1
1=2
þ G21 ½G11 G12 G1
G12 G1
Þ
22 G21 22 Al Þþ op ðT
1
1
¼ G1
Ag ðI þ G21
22 ½G21 ½G11 G12 G22 G21 1
1
1=2
½G11 G12 G1
Þ
22 G21 G12 G22 ÞAl þop ðT
!
Ag
¼ ðJ21 J22 Þ
þop ðT 1=2 Þ
Al
lG
12A0G JG AG þ op ðT 1 Þ,
ð67Þ
From Assumption A.9, we know immediately that
pffiffiffi
T ðl^ l0 Þ is asymptotically normally distributed with
mean zero and covariance matrix ðJ21 J22 ÞLðJ21 J22 Þ0 . &
Assumption A.10. A central limit theorem holds for the
empirical process
1
0
@fðy0F Þ
C
B
pffiffiffi
B @yF C
T ðET EÞB
C*Nð0,CÞ,
@ @fðy0G Þ A
@yG
ð70Þ
Because two non-nested models can never be observationally equivalent, the empirical process ðET EÞ
½fðyF 0 ÞfðyG0 Þ does not degenerate. Under H0 :
dFþ ¼ dGþ , EfðyF 0 ÞEfðyG0 Þ ¼ 0 and the second-order
terms in Eq. (A.11) are Op ðT 1 Þ and negligible. As a
pffiffiffi þ
þ
result, T ð½d^ F 2 ½d^ G 2 Þ is asymptotically equivalent to
pffiffiffi
T ðET EÞ½fðyF 0 ÞfðyG0 Þ, which has an asymptotic normal distribution with mean zero and variance
s2 ¼ Var½fðyF 0 ÞfðyG0 Þ. This proves the first assertion.
The assertion for HF and HG follows immediately by
observing that EfðyF 0 ÞEfðyG0 Þ is then the dominating
term in Eq. (A.11). &
Theorem 3 (Model selection for nested models). Suppose
that model F is nested by model G and that Assumptions A.1
to A.10 hold. Then
þ
þ
under H0 : dF ¼ dG , Tð½d^ F 2 ½d^ G 2 Þ has an asymptotic
weighted w2 distribution and the weights are the
eigenvalues of the matrix
0
1
1
1
1 @ GF LF GF LFG A
;
2 G1
G1
G LGF
G LG
þ
þ
and
þ
þ
þ
þ
under HG : dF 4 dG , Tð½d^ F 2 ½d^ G 2 Þ- þ1.
where
C¼
LF
LGF
LFG
LG
!
10
10
@fðy0F Þ
@fðy0F Þ
B @yF CB @yF C
CB
C
B
¼ EB
CB
C:
@ @fðy0G Þ A@ @fðy0G Þ A
@yG
@yG
0
Proof. From the proof of Theorem 2, we have
þ
þ
½d^ F 2 ½d^ G 2 ¼ EfðyF 0 ÞEfðyG0 Þ þ ðET EÞ½fðyF 0 ÞfðyG0 Þ
12 A0F JF AF þ 12A0G JG AG þ op ðT 1 Þ:
þ
Theorem 2 (Model selection for non-nested models). Suppose models F and G are strictly non-nested and Assumptions A.1 to A.10 hold. Then
pffiffiffi þ 2
þ
T ð½d^ F ½d^ G 2 Þ=s^ T *Nð0,1Þ;
pffiffiffi þ
þ
þ
þ
under HF : dF o dG , T ð½d^ F 2 ½d^ G 2 Þ=s^ T -1; and
p
ffiffiffi
þ
þ
þ
þ
under HG : dF 4 dG , T ð½d^ F 2 ½d^ G 2 Þ=s^ T -þ 1,
þ
þ
under H0 : dF ¼ dG ,
where ½s^ T equals
2
is an estimator of Var½fðy0F Þfðy0G Þ and
½s^ T 2 ¼ ET ½fðy^ F Þfðy^ G Þ2 ðET ½fðy^ F Þfðy^ G ÞÞ2 :
Proof. From Lemma 2, we have
þ
½d^ F 2 ¼ min maxET fðyF Þ ¼ EfðyF 0 Þ þðET EÞfðyF 0 Þ
gF
lF
12A0F JF AF þop ðT 1 Þ,
ð68Þ
þ
Under H0 : dF ¼ dG , because F is nested by G, we must
have fðyF 0 Þ ¼ fðyG0 Þ with probability one. Consequently,
EfðyF 0 ÞEfðyG0 Þ ¼ 0 and ðET EÞ½fðyF 0 ÞfðyG0 Þ ¼ 0.
Therefore, we obtain
þ
þ
1
1
½d^ F 2 ½d^ G 2 ¼ A0F JF AF þ A0G JG AG þ op ðT 1 Þ
2
2
0
1
!0
!
1
G
AF
A
F
1
F
@
A
¼
þ op ðT 1 Þ:
AG
2 AG
G1
G
pffiffiffi
From Assumption A.10, we have T ðAAFG Þ-W, where W is a
multivariate normal random vector with mean zero and
covariance matrix C.
Rewrite W as W ¼ C1=2 Z, where Z is a vector of standard
multivariate normal random vector. Then the limiting
þ
þ
distribution of Tð½d^ F 2 ½d^ G 2 Þ can be represented as
2
0
1
3
1
1 0 4 1=2 @ GF
1=2
AC 5Z:
Z C
2
G1
G
ARTICLE IN PRESS
300
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Because an orthogonal transformation of a standard
normal random vector is still a standard normal random
vector, the theorem is proved if we can show that the
eigenvalues of
2
0
1
3
G1
F
4C1=2 @
AC1=2 5
G1
G
are the same as that of
0
1
G1
G1
F LF
F LFG
@
A:
G1
G LGF
G1
G LG
It is easy to see that matrix
0
1
G1
F
1=2 @
AC1=2
C
1
GG
has the same eigenvalues as matrix
20
1
3
0
G1
G1
F
F
1=2
1=2
4@
AC 5C ¼ @
0
¼@
G1
G
G1
F LF
G1
F LFG
G1
G LGF
G1
G LG
1
1
G1
G
AC
A:
þ
þ
Therefore, it follows that, under H0 , Tð½d^ F 2 ½d^ G 2 Þ is
2
asymptotically distributed as a weighted w distribution.
Under HG , the first two terms in the representation of
þ
þ
½d^ F 2 ½d^ G 2 are the leading terms, and hence the
conclusion of the theorem under HG follows. This
completes the proof of Theorem 3. &
Theorem 4 (Model selection for overlapping models). Suppose models F and G are overlapping models and Assumptions A.1 and A.10 hold. Then
under H0 : fðyF 0 Þ ¼ fðyG0 Þ with probability one,
þ
þ
Tð½d^ F 2 ½d^ G 2 Þ has an asymptotic weighted w2 distribution, and the weights are the eigenvalues of the matrix
0
1
1
1
1 @ GF LF GF LFG A
;
2 G1
G1
G LFG
G LG
and
under HA : fðyF 0 ÞafðyG0 Þ with positive probability,
þ
þ
Tð½d^ 2 ½d^ 2 Þ-1 ðeither þ 1 or1Þ.
F
G
Proof. From the proof of Theorem 3, we have
þ
þ
½d^ F 2 ½d^ G 2 ¼ EfðyF 0 ÞEfðyG0 Þ þðET EÞ½fðyF 0 Þ
fðyG0 Þ12 A0F JF AF þ 12A0G JG AG þ op ðT 1 Þ:
Under H0 : fðyF 0 ÞafðyG0 Þ with positive probability,
EfðyF 0 ÞEfðyG0 Þ ¼ 0 and ðET EÞ½fðyF 0 ÞfðyG0 Þ ¼ 0.
Therefore, we obtain
þ
þ
1
1
½d^ F 2 ½d^ G 2 ¼ A0F JF AF þ A0G JG AG þop ðT 1 Þ
2
2
0
1
!0
!
1
AF
1 AF @ GF
A
þop ðT 1 Þ:
¼
AG
2 AG
G1
G
The rest of the proof is similar to that of Theorem 3 and
thus is omitted. &
References
Agarwal, V., Naik, N., 2004. Risks and portfolio decisions involving hedge
funds. Review of Financial Studies 17, 63–98.
Ben Dor, A., Jagannathan, R., 2002. Understanding mutual fund and
hedge fund styles using return based style analysis. NBER Working
Paper No. 9111, National Bureau of Economic Research, Cambridge,
MA.
Campbell, J., Cochrane, J., 2000. Explaining the poor performance of
consumption-based asset pricing models. Journal of Finance 55,
2863–2878.
Cochrane, J., 2005. Asset Pricing. Princeton University Press, Princeton,
NJ.
Dittmar, R., 2002. Nonlinear pricing kernels, kurtosis preference, and
evidence from the cross section of equity returns. Journal of Finance
57, 369–403.
Dudley, R., 1981. Donsker classes of functions. In: Statistics and Related
Topics. North-Holland, Amsterdam, pp. 341–352.
Dybvig, P., Ingersoll, J., 1982. Mean-variance theory in complete markets.
Journal of Business 55, 233–251.
Dybvig, P., Ross, S., 1985. Differential information and performance
measurement using a security market line. Journal of Finance 40,
383–399.
Fama, E., French, K., 1993. Common risk factors in the returns on stocks
and bonds. Journal of Financial Economics 33, 3–56.
Farnsworth, H., Ferson, W., Jackson, D., Todd, S., 2002. Performance
evaluation with stochastic discount factors. Journal of Business 75,
473–504.
Ferson, W., Harvey, C.R., 1999. Conditioning variables and cross-section
of stock returns. Journal of Finance 54, 1325–1360.
Ferson, W., Khang, K., 2002. Conditional performance measurement
using portfolio weights: evidence for pension funds. Journal of
Financial Economics 65, 249–282.
Ferson, W., Henry, T., Kisgen, D., 2006. Evaluating government bond fund
performance with stochastic discount factors. Review of Financial
Studies 19, 423–456.
Fung, W., Hsieh, D., 2001. The risk in hedge fund strategies: theory and
evidence from trend followers. Review of Financial Studies 14,
313–341.
Glosten, L., Jagannathan, R., 1994. A contingent claim approach
to performance evaluation. Journal of Empirical Finance 1,
133–160.
Grinblatt, M., Titman, S., 1989. Portfolio performance evaluation:
old issues and new insights. Review of Financial Studies 2,
393–421.
Hansen, L.P., 1982. Large sample properties of generalized method of
moments estimators. Econometrica 50, 1029–1054.
Hansen, L.P., Heaton, J., Luttmer, E., 1995. Econometric evaluation of
asset pricing models. Review of Financial Studies 8, 237–274.
Hansen, L.P., Jagannathan, R., 1991. Implications of security market data
for models of dynamic economies. Journal of Political Economy 99,
225–262.
Hansen, L.P., Jagannathan, R., 1997. Assessing specification errors
in stochastic discount factor models. Journal of Finance 52,
557–590.
Hodrick, R., Zhang, X., 2001. Evaluating the specification errors of asset
pricing models. Journal of Financial Economics 62, 327–376.
Jagannathan, R., Wang, Z., 1996. The conditional CAPM and the crosssection of expected returns. Journal of Finance 51, 3–53.
Jagannathan, R., Kubota, K., Takehara, H., 1998. Relationship between
labor income risk and average return: empirical evidence from the
Japanese stock market. Journal of Business 71, 319–348.
Kan, R., Zhou, G., 2002. Hansen-Jagannathan Distance: Geometry and
Exact Distribution. Unpublished working paper. Washington University in St. Louis., St. Louis, MO.
Le Cam, L., 1986. Asymptotic Methods in Statistical Decision Theory.
Springer-Verlag, New York.
Lettau, M., Ludvigson, S., 2001. Resurrecting the (C)CAPM: A crosssectional test when risk premia are time-varying. Journal of Political
Economy 109, 1238–1287.
Li, Q., Vassalou, M., Xing, Y., 2006. Sector investment growth rates
and the cross-section of equity returns. Journal of Business 79,
1637–1665.
Lustig, H., Nieuwerburgh, S., 2004. Housing collateral, consumption
insurance, and risk premia. Journal of Finance 60, 1167–1221.
ARTICLE IN PRESS
H. Li et al. / Journal of Financial Economics 97 (2010) 279–301
Lewellen, J., Nagel, S., Shanken, J., 2010. A skeptical appraisal of assetpricing tests. Journal of Financial Economics 96, 175–194.
Merton, R., 1981. On market timing and investment performance I: an
equilibrium theory of values for markets forecasts. Journal of
Business 54, 363–406.
Mitchell, M., Pulvino, T., 2001. Characteristics of risk and return in risk
arbitrage. Journal of Finance 56, 2135–2175.
Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of
optimization estimators. Econometrica 57, 1027–1057.
Pollard, D., 1982. A central limit theorem for k-means clustering. Annals
of Probability 10, 919–926.
301
Ross, S., 2005. Neoclassical Finance. Princeton University Press, New
Jersey.
Santos, T., Veronesi, P., 2006. Labor income and predictable stock
returns. Review of Financial Studies 19, 1–44.
Vuong, Q., 1989. Likelihood ratio tests for model selection and non
nested hypotheses. Econometrica 57, 307–333.
Wang, Z., Zhang, X., 2005. The role of arbitrage in the empirical
evaluation of asset pricing models. Unpublished working paper,
Cornell University, Ithaca, NY.
Yogo, M., 2006. A consumption-based explanation of expected stock
returns. Journal of Finance 61, 539–580.