Efficient Estimation Using the Characteristic Function - Hal-SHS

Efficient Estimation Using the Characteristic Function
Marine Carrasco, Rachidi Kotchoni
To cite this version:
Marine Carrasco, Rachidi Kotchoni. Efficient Estimation Using the Characteristic Function.
2013. <hal-00867850>
HAL Id: hal-00867850
https://hal.archives-ouvertes.fr/hal-00867850
Submitted on 30 Sep 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
E¢cient Estimation using
the Characteristic Function3
Rachidi Kotchoniz
University of Montreal
Marine Carrascoy
University of Montreal
First draft: January 2010
This version: July 2013
Abstract
The method of moments proposed by Carrasco and Florens (2000) permits to fully exploit the
information contained in the characteristic function and yields an estimator which is asymptotically
as e¢cient as the maximum likelihood estimator. However, this estimation procedure depends on
a regularization or tuning parameter B that needs to be selected. The aim of the present paper is
to provide a way to optimally choose B by minimizing the approximate mean square error (AMSE)
of the estimator. Following an approach similar to that of Newey and Smith (2004), we derive a
higher-order expansion of the estimator from which we characterize the …nite sample dependence of
the AMSE on B. We provide a data-driven procedure for selecting the regularization parameter that
relies on parametric bootstrap. We show that this procedure delivers a root T consistent estimator
of B. Moreover, the data-driven selection of the regularization parameter preserves the consistency,
asymptotic normality and e¢ciency of the CGMM estimator. Simulation experiments based on a
CIR model show the relevance of the proposed approach.
Keywords: Conditional moment restriction, Continuum of moment conditions, Generalized method
of moments, Mean square error, Stochastic expansion, Tikhonov regularization. JEL Classi…cation:
C00, C13, C15
3
An earlier version of this work was joint with Jean-Pierre Florens. We are grateful for his support. This paper has been
presented at various conferences and seminars and we thank the participants for their comments, especially discussant
Atsushi Inoue. Partial …nancial support from SSHRC is gratefully acknowledged.
y
Université de Montreal; CIRANO; CIREQ. Fax (+1) 514 343 7221 E-mail: [email protected]
z
Corresponding author:
Université de Montreal, Département de Sciences Economiques.
E-mail:
[email protected]
1
1
Introduction
There is a one-to-one relationship between the characteristic function (henceforth, CF) and the probability distribution function of a random variable, the former being the Fourier transform of the latter.
This implies that an inference procedure based on the empirical CF has the potential to be as e¢cient
as another one that exploits the likelihood function. Paulson et al. (1975) used a weighted modulus
of the di¤erence between the theoretical CF and its empirical counterpart to estimate the parameters
of the stable law. Feuerverger and Mureika (1977) studied the convergence properties of the empirical CF and suggested that "it may be a useful tool in numerous statistical problems". Since then,
many interesting applications have been proposed, including Feuerverger and McDunnough (1981b,c),
Koutrouvelis (1980), Carrasco and Florens (2000), Chacko and Viceira (2003) and Carrasco, Chernov,
Florens, and Ghysels (2007) (henceforth, CCFG (2007)). For a quite comprehensive review of empirical
CF-based estimation methods, see Yu (2004).
The CF provides a good alternative to econometricians when the likelihood function is not available
in closed form. For example, some distributions in the B-stable family are naturally speci…ed via their
CFs while their densities are known in closed form only at isolated points of the parameter space (see e.g.
Nolan, 2009). The density of the Variance-Gamma model of Madan and Seneta (1990) has an integral
representation whereas its CF has a simple closed form expression. The transition density of a discretely
sampled continuous time process is not available in closed form, except when its parameterization
coincides with that of a square-root di¤usion (Singleton, 2001). Even in this special case, the transition
density takes the form of an in…nite mixture of Gamma densities with Poisson weights. The same
type of density arises in the autoregressive Gamma model studied in Gouriéroux and Jasiak (2005).
Ait-Sahalia and Kimmel (2006) propose closed form approximations for the log-likelihood function of
various continuous-time stochastic volatility models. But their method cannot be applied to other
situations without solving a complicated Kolmogorov forward and backward equation. Interestingly,
the conditional CF can be derived in closed form for all continuous-time stochastic volatility models.
The CF, ' ( ; ) ; of a random vector xt 2 Rp (t = 1; :::; T ) is nothing but the expectation of
i 0 xt
e
with respect the distribution of xt , where is the parameter that characterizes the distribution
of xt , 2 Rp is the Fourier index and i is the imaginary number such that i2 = 01. Hence, a
candidate moment condition for the estimation of 0 (i.e., the true value of ) is given by ht ( ; ) =
ei
0x
t
0 E(ei
0x
t
). This moment condition is valid for all 2 Rp and hence, ht ( ; ) is a moment
function or a continuum of moment conditions. Feuerverger and McDunnough (1981b) propose an
estimation procedure that consists of minimizing a norm of the sample average of the moment function.
Their objective function involves an optimal weighting function that depends on the true unknown
likelihood function. Feuerverger and McDunnough (1981c) apply the Generalized Method of Moments
(GMM) to a discrete set of moment conditions obtained by restricting the continuous index 2 Rp to
a discrete grid 2 ( 1 ; 2 ; ::: N ). They show that the asymptotic variance of the resulting estimator
can be made arbitrarily close to the Cramer-Rao bound by selecting the grid for su¢ciently …ne
and extended. Similar discretization approaches are used in Singleton (2001) and Chacko and Viceira
2
(2003). However, the number of points in the grid for must not be larger than the sample size for
the covariance matrix of the discrete set of moment conditions to be invertible. In particular, the …rst
order optimality conditions associated with the discrete GMM procedure becomes ill-posed as soon as
the grid ( 1 ; 2 ; ::: N ) is too re…ned or too extended. Intuitively, the discrete set of moment conditions
fht ( i ; )gN
i=1 converges to the moment function 7! ht ( ; ); 2 R as this grid is re…ned and extended.
As a result, it is necessary to apply operator methods in a suitable Hilbert space to be able to handle
the estimation procedure at the limit.
Carrasco and Florens (2000) proposed a Continuum GMM (henceforth, CGMM) that permits to
e¢ciently use the whole continuum of moment conditions. Similarly to the classical GMM, the CGMM
is a two-step procedure that delivers a consistent estimator at the …rst step and an e¢cient estimator
at the second step. The ideal (unfeasible) objective function of the second step CGMM is a quadratic
form in an suitably de…ned Hilbert space with metrics K 01 , where K is the asymptotic covariance
operator associated with the moment function ht ( ; ). To obtain a feasible e¢cient CGMM estimator,
one replaces the operator K by an estimator KT obtained from a …nite sample. However, the latter
empirical operator is degenerate and not invertible while its theoretical counterpart is invertible only
on a dense subset of the reference space. To circumvent these di¢culties, Carrasco and Florens (2000)
0
101
resorted to a Tikhonov-type regularized inverse of KT , e.g. KBT = KT2 + BI
KT , where I is the
identity operator and B is a regularization parameter. The CGMM estimator is root-T consistent and
asymptotically normal for any …xed and reasonably small value of B. However, asymptotic e¢ciency is
obtained only by letting BT 1=2 go to in…nity and B go to zero as T goes to in…nity.
The main objective of this paper is to characterize the optimal rate of convergence for B as T goes
to in…nity. To this end, we derive a Nagar (1959) type stochastic expansion of the CGMM estimator.
This type of expansion has been used in Newey and Smith (2004) to study the higher order properties
of empirical likelihood estimators. We use our expansion to …nd the convergence rates of the higher
order terms of the MSE of the CGMM estimator. These rates depend on both B and T . We …nd that
the higher order bias of the CGMM estimator is dominated by two higher order variance terms. By
equating the rates of these dominant term, we …nd an expression of the form BT = c (0 ) T 0g(C) , where
c (0 ) does not depend on T and g (C) inherits some properties from the covariance operator K. To
implement the optimal selection of B empirically, we advocate a naive estimator of BT obtained by
minimizing an approximate MSE of the CGMM estimator obtained by parametric bootstrap. Even
though the CGMM estimator is consistent, there is a concern that its variance be in…nite in …nite
sample for certain data generating processes. This concern seems unfounded for the CIR model on
which our Monte Carlo simulations are based. If applicable, this di¢culty is avoided by truncating the
AMSE similarly as in Andrews (1991).
The remainder of the paper is organized as follows. In Section 2, we review the properties of the
CGMM estimator in IID and Markov cases. In Section 3, we derive a higher-order expansion for the
MSE of the CGMM estimator and use this expansion to obtain the optimal rate of convergence for the
regularization parameter BT . In Section 4, we describe a simulation-based method to estimate BT and
show the consistency of the resulting estimator. Section 5 presents a simulation study based on the
3
CIR term structure model and Section 6 concludes. The proofs are collected in appendix.
2
Overview of the CGMM
This section is essentially a summary of known results about the CGMM estimator. The …rst subsection present a general framework for implementing the CF-based CGMM procedure whilst the second
subsection presents the basic properties of the resulting estimator.
2.1
The CGMM Based on Characteristic function
Let xt 2 Rp be a random vector process whose distribution is indexed by a …nite dimensional parameter
with true value 0 . When the process xt is IID, Carrasco and Florens (2000) propose to estimate 0
based on the moment function given by:
ht ( ; ) = ei
0x
t+1
0 '( ; );
(1)
0
where '( ; ) = E ei xt+1 is the CF of xt and E is the expectation operator with respect to the
data generating process indexed by .
CCFG (2007) extend the scope of the CGMM procedure to Markov and weakly dependent models.
In this paper, we restrict our attention to IID and Markov cases. The moment function used in CCFG
(2007) for the Markov case is:
0
0
ht ( ; ) = eis xt+1 0 't (s; ) eir xt :
(2)
0
where 't (s; ) = E (eis xt+1 jxt ) is the conditional CF of xt and = (s; r) 2 R2p . In equation (2), the
0
set of basis functions feir xt g is being used as instruments. CCFG (2007) show that these instruments
are optimal given the Markovian structure of the model. Moment conditions de…ned by (1) are IID
whereas equation (2) describes a martingale di¤erence sequence.
Note that a standard conditional moment restriction (i.e., non CF-based) can be converted into
a continuum of moment unconditional moment restrictions featuring (2). In this case, the CGMM
estimator may be viewed as an alternative to the estimator proposed by Dominguez and Lobato (2004)
and to the smooth minimum distance estimator of Lavergne and Patilea (2008). Subsequently, we use
the generic notation ht ( ; ); 2 Rd to denote a moment function de…ned by either (1) or (2), where
d = p for (1) and d = 2p for (2).
Let be a probability distribution function on Rd and L2 () be the Hilbert space of complex valued
functions that are square integrable with respect to , i.e.:
2
d
L () = ff : R ! Cj
Z
f ( )f ( )( )d < 1g;
(3)
where f ( ) denotes the complex conjugate of f ( ). As jht (:; )j2 2 for all 2 2, the function ht (:; )
4
belongs to L2 () for all 2 2 and for any …nite measure . Hence, we consider the following scalar
product on L2 () 2 L2 ():
hf; gi =
Z
f ( )g( )( )d :
(4)
Based on this notation, the e¢cient CGMM estimator is given by
D
E
b
= arg min K 01b
hT (:; ); b
hT (:; ) :
where K is the asymptotic covariance operator associated with the moment conditions. K is an integral
operator and satis…es:
Kf ( 1 ) =
Z
1
01
k( 1 ; )f ( ) ( ) d ; for any f 2 L2 () ;
(5)
where k( 1 ; 2 ) is the kernel given by:
k( 1 ; 2 ) = E ht ( 1 ; )ht ( 2 ; ) :
(6)
Some basic properties of the operator K are discussed in Appendix A.
1
With a sample of size T and a consistent …rst step estimator b
in hand, one estimates k( 1 ; 2 ) by:
T
1
1
1
1X
):
kT ( 1 ; 2 ; b
)=
ht ( 1 ; b
)ht ( 2 ; b
T
(7)
t=1
In the speci…c case of IID data, an estimator of the kernel that does not use a …rst step estimator is
given by:
kT ( 1 ; 2 ) =
T
0
1
1 X i 01 xt
0
0'
b T ( 1 ) ei 2 xt 0 '
e
b T ( 2 ) :
T
(8)
t=1
where '
b T ( 1 ) =
1
T
PT
i 01 xt .
t=1 e
Unfortunately, an empirical covariance operator KT with kernel function
given by either (7) or (8) is degenerate and not invertible. Indeed, the inversion of KT raises a problem
similar to one of the Fourier inversion of an empirical characteristic function. This problem is worsened
by the fact that the inverse of K which KT is aimed at estimating exists only on a dense subset of
L2 (). Moreover, when K 01 f = g exists for a given function f , a small perturbation in f may give
rise to a large variation in g.
To circumvent these di¢culties, we consider estimating K 01 by:
101
0
01
KT ;
KBT
= KT2 + BI
01
where the hyperparameter B plays two roles. First, it is a smoothing parameter as it allows KBT
f to
01
exist for all f in L2 (). Second, it is a regularization parameter as it dampens the sensitivity of KBT
f
to perturbations in the input f . For any function f in the range of K and any consistent estimator
5
01 b
fbT of f , KBT
fT converges to K 01 f as T goes to in…nity and B goes to zero at appropriate rate. The
01
expression for KBT
uses a Tikhonov regularization, also called ridge regularization. Other forms of
regularization could have been used, see e.g. Carrasco, Florens and Renault (2007).
The feasible CGMM estimator is given by:
b
b T (B; ) ;
T (B) = arg minQ
(9)
D
E
b
b T (B; ) = K 01b
b T (B; ) in matrix
where Q
h
(:;
);
h
(:;
)
. An expression of the objective function Q
T
T
BT
form is given in CCFG (2007a, Section 3.3). An alternative expression and a numerical algorithm for
the numerical evaluation of this objective function based on Gauss-Hermite quadratures is described in
Appendix D.
2.2
Consistency and Asymptotic Normality
In order to study the properties of a CGMM estimator obtained within the framework described previously, the following assumptions are posited:
Assumption 1: The probability density function is strictly positive on Rd and admits all its
moments.
Assumption 2: The equation
E 0 (ht ( ; )) = 0 for all 2 Rd ; 0 almost everywhere,
has a unique solution 0 which is an interior point of a compact set 2.
Assumption 3: ht ( ; ) is three times continuously di¤erentiable with respect to . Furthermore,
the …rst two derivatives satisfy:
V ar
T
1 X @ht ( ; )
p
T t=1 @j
!
< 1 and V ar
T
1 X @ 2 ht ( ; )
p
T t=1 @j @k
!
< 1;
for all j, k and T .
Assumption 4: E 0 (hT (:; )) 2 8C for all 2 2 and for some C 1, and the …rst two derivatives
of E 0 (hT (:; )) w.r.t. belong to 8C for all in a neighborhood of 0 and for the same C as previously,
where:
D
D
o
n
D
D
8C = f 2 L2 () such that DK 0C f D < 1
(10)
Assumption 5: The random variable xt is stationary Markov and satis…es xt = r (xt01 ; 0 ; "t )
where r (xt01 ; 0 ; "t ) is three times continuously di¤erentiable with respect to 0 and "t is a IID white
noise whose distribution is known and does not depend on 0 .
Assumption 1 and 2 are quite standard and they have been used in Carrasco and Florens (2000).
The …rst part of Assumption 3 ensures some smoothness properties for b
T (B) while the second part
is always satis…ed for IID models. The largest real C such that f 2 8C in Assumption 4 may be
6
called the level of regularity of f with respect to K: the larger C is, the better f is approximated by a
linear combination of the eigenfunctions of K associated with the highest eigenvalues. Because Kf (:)
involve a d-dimensional integration, C may be a¤ected by both the dimensionality of the index and the
smoothness of f . CCFG (2007) have shown that we always have C 1 if f = E 0 (ht ( ; )). Assumption
5 implies that the data can be simulated upon knowing how to draw from the distribution of "t . It is
satis…ed for all random variables that can be written as a location parameter plus a scale parameter
time a standardized representative of the family of distribution. Examples include the exponential
family and the stable distribution. The IID case is a special case of Assumption 5 where r (xt01 ; 0 ; "t )
takes the simpler form r (0 ; "t ). Further discussions on this type of model can be found in Gourieroux,
Monfort, and Renault (1993) in the indirect inference context. Note that the function r (xt01 ; 0 ; "t )
may not be available in analytical form. In particular, the relation xt = r (xt01 ; 0 ; "t ) can be the
numerical solution of a general equilibrium asset pricing model (e.g., as in Du¢e and Singleton, 1993).
We have the following results:
Theorem 1 Under Assumptions 1 to 5, the CGMM estimator is consistent and satis…es:
L
T (B) 0 0 ! N (0; I01
):
T 1=2 b
0
denotes the inverse of the Fisher Information
as T and BT 1=2 go to in…nity and B goes to zero, where I01
0
Matrix.
See Proposition 3.2 of CCFG (2007) for a more general statement of the consistency and asymptotic
normality result. A nice feature about the CGMM estimator is that its asymptotic distribution does
not depend on the probability density function .
3
Stochastic expansion of the CGMM estimator
The conditions required for the asymptotic e¢ciency result stated by Theorem 1 allow for a wide range
of convergence rates for B. Indeed, any sequence of type BT = cT 0a (with c > 0) satis…es these
conditions as soon as 0 < a < 1=2. Among the admissible convergence rates, we would like to …nd
the one that minimizes the mean square error of the CGMM estimator for a given sample size T .
To achieve this, we consider deriving the stochastic expansion of the CGMM estimator. The higher
order properties of GMM-type estimators have been studied by Rothenberg (1983, 1984), Koenker et
al. (1994), Rilstone et al. (1996) and Newey and Smith (2004). For estimators derived in the linear
simultaneous equation framework, examples include Nagar (1959), Buse (1992) and Donald and Newey
(2001). The approach followed here is similar to Nagar (1959) and Newey and Smith (2004), which
tries to approximate the MSE of an estimator analytically based on the leading terms of its stochastic
expansion.
Two di¢culties arise when analyzing the terms of the expansion of the CGMM estimator. First,
when the rate of B as a function of T is unknown, it is not always possible to write the terms of the
7
expansion in decreasing order. The second di¢culty stems from a result that dramatically di¤ers from
the case with a …nite number of moment conditions. Indeed, when the number of moment conditions
is …nite, the quadratic form T b
hT (0 )0 K 01b
hT (0 ) is Op (1) and follows asymptotically a chi-square
distribution with degrees of freedom
by the number
of moment conditions. However, the analogue
D given
D2
D 01=2 p b
D
of the previous quadratic form, DK
T hT (0 )D , is not well de…ned in the presence of a continuum
D2
D
D
D 01=2 p b
T hT (0 )D , exists but diverges as T goes to
of moment conditions. Its regularized version, DKB
in…nity and B goes to zero. Indeed, we have
D
D
D
D0
D
D
101=4 D
101=4 1=2 D
D
D
D
DD0 2
DDp
D 01=2 p b
T hT (0 )D D K 2 + BI
hT (0 )D
K DD T b
DD K + BI
DKB
|
{z
}|
{z
}|
{z
}
1
B01=4
(11)
=Op (1)
= Op B01=4 .
The expansion that we derive for b
T (B) 0 0 is of the same form for both the IID and Markov cases.
Namely:
1
0
2C01
b
T (B) 0 0 = 11 + 12 + 13 + op B01 T 01 + op Bmin(1; 2 ) T 01=2
(12)
0
1
0
1
2C01
where 11 = Op T 01=2 ; 12 = Op Bmin(1; 2 ) T 01=2 and 13 = Op B01 T 01 . Appendix B provides
details about the above expansion whose validity is ensured by the consistency result of Theorem 1.
In deriving the expansion above, we wish to …nd the rate of convergence of the B which minimizes the
leading terms of the MSE:
0 b
b
M SE (B; 0 ) = T E T T (B) 0 0 T (B) 0 0
(13)
We have the following results on the higher order MSE matrix and on the optimal convergence rate for
the regularization parameter.
Theorem 2 Assume that Assumptions 1 to 5 hold. Then we have:
0
1
(i) The approximate MSE matrix of b
T (B) up to order O B01 T 01=2 (henceforth, AMSE) is decomposed as the sum of the squared bias and variance:
AM SE (B; 0 ) = T Bias 3 Bias0 + T V ar
where
0
1
T Bias 3 Bias0 = O B02 T 01 ;
min(2; 2C01
01 01=2
2 )
+
O
B
T
:
T V ar = I01
+
O
B
0
as T ! 1, B2 T ! 1 and B ! 0.
8
(ii) The B that minimizes the trace of AM SE (B; 0 ), denoted BT BT (0 ), satis…es:
1
0 max( 61 ; 2C+1
)
BT = O T
:
Remarks.
1. We have the usual trade-o¤ between a term that is decreasing in B and another that is increasing
in B. Interestingly, the squared bias term is dominated by two higher order variance terms whose rates
are equated to obtain the optimal rate for the regularization parameter. The same situation happens
for the Limited Information Maximum Likelihood estimator for which the bias is also dominated by
variance terms (see Donald
2001).
and Newey,
min(2; 2C01
)
2
variance term does not improve for C > 2:5. This is due to a
2. The rate for the O B
property of Tikhonov regularization that is well documented in the literature on inverse problems, see
e.g. Carrasco, Florens and Renault (2007). The use of another regularization such as spectral cut-o¤
or Landweber-Fridman would permit to improve the rate of convergence for large values of C. However,
this improvement comes at the cost of a greater complexity in the proofs (e.g. in the spectral cut-o¤,
we lose the di¤erentiability of the estimator with respect to B).
3. Our expansion is consistent with the condition of Theorem 1, since the optimal regularization
parameter BT satis…es B2T T ! 1.
4. It follows from Theorem 2 that the optimal regularization parameter BT is necessarily of the
form:
BT = c (0 ) T 0g(C) ;
(14)
for some
function c (0 ) that does not depend on T and a positive function g (C) that satis…es
positive
1
1
max 6 ; 2C+1 g (C) < 1=2. An expression of the form (14) is often used as starting point for optimal
bandwidth selection in nonparametric density estimation. Examples in the semiparametric context
include Linton (2002) and Jacho-Chavez (2010).
4
Estimation of the Optimal Regularization parameter
Our purpose is to select the regularization parameter B so as to minimize the trace of the MSE matrix
of b
T (B) for a given sample of size T , i.e.:
BT (0 ) = arg min 6T (B; 0 ) ;
B2[0;1]
D
D2 Db
D
where 6T (B; 0 ) = T E DT (B) 0 0 D .
This raises at least three problems.
First, the MSE
6T (B; 0 ) might be in…nite in …nite samples even though b
T (B) is consistent.1 Second, the true
parameter value 0 is unknown. Third, the …nite sample distribution of b
T (B) 0 0 is not known even
1
This is due to the fact that b
T (B) is a GMM-type estimator. The large sample properties of such estimators are
well-known whilst their …nite sample properties can be established only a special cases.
9
when 0 is known. Each of these problems is examined below.
The variance of b
T (B) may be in…nite for some data generating processes. To hedge against such
situations, one may consider a truncated MSE of the form:
6T (B; 0 ; ) = T E [ T (B; 0 ) j T (B; 0 ) < n ] ;
(15)
D
D2
D
D
where T (B; ) Db
T (B) 0 D and n satis…es = Pr ( T (B; 0 ) > n ). A similar approach has been
used in Andrews (1991, p. 826). Given that the …nite sample distribution of b
T (B) is unknown in
practice, it is convenient to …rst select the probability of truncation (e.g., = 1%) and then deduce
the corresponding quantile n by simulation. To account for the possibility of the pair (; n ) depending
on B, one may consider instead:
6T (B; 0 ; ) = (1 0 ) T E [ T (B; 0 ) j T (B; 0 ) < n ] + n T;
(16)
which accounts for the probability mass at the truncation boundary. Note that the truncation will play
no role if the MSE of b
T (B) is …nite. In this case, we simply let:
6T (B; 0 ; 0) 6T (B; 0 ) = T E [ T (B; 0 )] :
(17)
As b
T (B) is asymptotically normal, its second moment exists for large enough T . Hence, the truncation
disappears (i.e., each of the expressions (15) and (16) converges to (17)) if one let go to zero as T
goes to in…nity.2
We de…ne the optimal regularization parameter as:
BT (0 ) = arg min 6T (B; 0 ; ) ;
(18)
B2[0;1]
where 6T (B; 0 ; ) is given by either (15), (16) or (17).
Our strategy for estimating BT (0 ) relies on approximating the unknown MSE by parametric boot1
strap. Let b
be the CGMM estimator of 0 obtained by replacing the covariance operator with the
T
1
identity operator. This estimator is consistent and asymptotically normal albeit ine¢cient. We use b
T
(j) 1
to simulate M independent samples of size T , denoted X (b
) for j = 1; 2; :::; M . It should be emphaT
sized that we have adopted a fully parametric approach from the beginning by assuming that the model
of interest is fully speci…ed. Indeed, it would not be possible to obtain MLE e¢ciency otherwise. The
model can be simulated by exploiting Assumption 5, which stipulates that the data generating process
(j)
satis…es xt = r (xt01 ; ; "t ). To start with, one …rst generates M T IID draws "t
(for j = 1; :::; M and
t = 1; :::; T ) from the known distribution
of the
errors. Next, M time-series of size T are obtained by
(j)
(j) b1
(j)
(j)
applying the recursion xt = r xt01 ; T ; "t , t = 1; :::; T , from M arbitrary starting values x0 .
Using the simulated samples, one computes M IID copies of the CGMM estimator for any given
2
As n ! 1 as ! 0, one might be concerned by the limiting behavior of n as ! 0 when 6T (B; 0 ) is in…nite.
However, this is not an issue as long as n is …nite for all …nite T .
10
j
1
B. We let b
T (B; b
T ) denote the CGMM estimator computed from the j th sample. The truncated MSE
given by (15) is estimated by:
1
b T M B; b
6
T ; =
M
X
1
1
T
j;T (B; b
T )1 j;T (B; b
T ) n
b ;
(1 0 ) M
(19)
j=1
D
D j
1
1
1 D2
D
b satis…es:
where j;T (B; b
T ) = Db
T (B; b
T ) 0 b
T D , is a probability selected by the econometrician and n
M
1
1 X 1 j;T (B; b
T ) n
b = 1 0 ;
M
j=1
The truncated MSE based on the alternative Formula (16) is estimated by:
M
1
1
1
T X
b
b
6T M B; T ; =
j;T (B; b
T )1 j;T (B; b
T ) n
b + b
n T:
M
(20)
j=1
With no truncation, (21) and (19) are identical to the naive MSE estimator given by:
M
1
1
T X
b
b
j;T (B; b
T );
6T M B; T =
M
(21)
j=1
which is aimed at estimating (17). Finally, we select the optimal regularization parameter according
to:
1
1
b T M B; b
B
bT M b
= arg min6
; ;
(22)
B2[0;1]
1
b T M B; b
; is either (19), (20) or (21).
where 6
1
1
b T M B; b
Let 6T B; b
; be the limit of 6
; as the number of replications M goes to in…nity
and de…ne:
1
1
BT b
= arg min6T B; b
; :
B2[0;1]
1
1
Note that BT b
is a deterministic function of a stochastic argument while B
bT M b
is doubly
1
random, being a stochastic function of a stochastic argument. The estimator BT b
is not feasible.
However,
1 its properties are the key ingredients for establishing the consistency of its feasible counterpart
B
bT M b
. To pursue, we need the following assumption:
Assumption 6: The regularization parameter B that minimizes (the possibly truncated criterion)
6T (B; 0 ; ) is of the form BT (0 ) = c (0 ) T 0g(C) , for some continuous
positive
function c (0 ) that
1
1
does not depend on T and a positive function g (C) that satis…es max 6 ; 2C+1 g (C) < 1=2.
Basically, Assumption 6 requires that the optimal rate found for the regularization parameter at (14)
11
1
1
be insensitive to the MSE truncation scheme. This assumption ensures that BT b
=c b
T 0g(C)
and is necessarily satis…ed as T goes to in…nity and goes to zero. The following result can further be
proved.
1
Theorem 3 Let b
be a
p
1
T 0consistent estimator of 0 . Then under Assumptions 1 to 5,
converges in probability to zero as T goes to in…nity.
BT (b
)
BT (0 )
01
1
In Theorem 3, the function BT (:) is deterministic and continuous but the argument b
is stochastic.
1
As T goes to in…nity, b
gets closer and closer to 0 ; but at the same time BT (0 ) converges to zero at
some rate that depends on T . This prevents us from claiming without caution that
1
BT (b
)
BT (0 )
0 1 = op (1)
since the denominator is not bounded away from zero. The next theorem characterizes the rate of
convergence of
B
b T M (0 )
BT (0 ) .
Theorem 4 Under assumptions 1 to 5,
M goes to in…nity and T is …xed.
B
b T M (0 )
BT (0 )
0 1 converges in probability to zero at rate M 01=2 as
In Theorem 4, B
b T M (0 ) is the minimum of the empirical MSE simulated with the true 0 . In the
proof, one …rst shows that the conditions of the uniform convergence in probability of the empirical
MSE are satis…ed. Next, one uses Theorem 2.1 of Newey and McFadden (1994) and the fact that
BT (0 ) is bounded away from zero for any …nite T to establish the consistency of
B
b T M (0 )
BT (0 ) .
1
theorem, we revisit the previous results when 0 is replaced by a consistent estimator b
.
1
Theorem 5 Let b
be a
Op
(T 01=2 )
+ Op
(M 01=2 )
p
T 0consistent estimator of 0 . Then under assumptions 1 to 5,
as M goes to in…nity …rst and T goes to in…nity second.
In the next
1
B
b T M (b
)
BT (0 ) 01
=
The result of Theorem 5 is obtained by using a sequential limit in M and T , which is needed here
because Theorem 4 has been derived for …xed T . Such sequential approach is often used in panel data
econometrics, see for instance Phillips and Moon (1999). It is also used implicitly in the theoretical
1
analysis of bootstrap.3 Theorem 5 implies that B
b T M (b
) bene…ts from an increase in both M and T .
The last theorem compares the feasible CGMM estimator based on B
b T M to the unfeasible estimator
b
(BT ), where BT is de…ned in (14).
1
Theorem 6 Let B
bT M = B
b T M (b
) de…ned in (22). Then:
provided that M T .
p (b
BT M ) 0 b
(BT ) = Op (T 0g(C) );
T b
3
The properties of a bootstrap estimator are usually derived using its bootstrap distribution, hence letting M go to
in…nity before T .
12
p (b
BT M ) 0 0 is the same as the distribution
Hence, theorem 6 implies that the distribution of T b
p (BT ) 0 0 to the order T 0g(C) . This ensures that replacing BT by a consistent estimator B
bT M
of T b
such that
B
bT M
BT
0 1 = op (1) does not a¤ect the consistency, asymptotic normality and e¢ciency of the
…nal CGMM estimator b
(b
BT M ). The proof of this theorem relies mainly on the fact that b
(B) is
continuously di¤erentiable with respect to B whilst the optimal BT is bounded away from zero for any
…nite T . Overall, our selection procedure for the regularization parameter is optimal and adaptive as
it does not require the a priori knowledge of the regularity parameter C.
5
Monte Carlo Simulations
1
b T M B; b
The aim of this simulation study is to investigate the properties of the MSE function 6
T ; as the regularization parameter (B), the sample size (T ) and the number of replications (M ) vary.
For this purpose, we consider estimating the parameters of a square-root di¤usion (also known as the
CIR di¤usion) by CGMM. Below, the …rst subsection describes the simulation design whilst the second
subsection presents the simulation results.
5.1
Simulation Design
A continuous time process rt is said to follows a CIR di¤usion if it obeys the following stochastic
di¤erential equation:
p
drt = (C 0 rt ) dt + rt dWt
(23)
where the parameter > 0 is the strength of the mean reversion in the process, C > 0 is the long run
mean and > 0 controls the volatility of rt . This model has been widely used in the asset pricing
literature, see e.g Heston (1993) or Singleton (2001). It is shown in Feller (1951) that Equation (23)
admits a unique and positive fundamental solution if 2 2C.
We assume that rt is observed at regularly spaced discrete times t1 ; t2 ; ..., tT such that ti 0ti01 = 1.
The conditional distribution of rt given rt01 is a noncentered chi-square with possibly fractional order.
Its transition density is a Bessel function of type I, which can be represented as an in…nite mixture of
Gamma densities with Poisson weights:
f (rt jrt01 ) =
where c =
2
2 (10e01 )
,q=
2C
2
1
X
pj
j=0
rtj+qk 01 cj+q
exp (0crt )
0 (j + q)
j
and pj =
(ce01 rt01 )
exp(0ce01 rt01 )
:
j!
To implement a likelihood based
inference for this model, one has to truncate the expression of f (rt jrt01 ). However, the conditional
CF of rt has a simple closed form expression given by:
0
1
't (s; ) E eisrt jrt01 =
is
10
c
13
0q
exp
ise01 Vt01
1 0 isc
!
(24)
with = (; C; )0 .
To start, we simulate one sample of size T from the CIR process assuming 1 = 1 and the true value
of is:
0 = (0 ; C 0 ; 0 ) = (0:4; 6:0; 0:3) :
These parameter values are taken from Singleton (2001). We refer the reader to DeVroye (1986) and
Zhou (2001) for details on how to simulate a CIR process. We treat this simulated sample as the actual
1
data available to the econometrician and use it to estimate the …rst step CGMM estimator b
as:
T
1
b
T = arg min
Z
R2
0
b
hT ( ; )e0 d ;
hT ( ; )b
1
ei 1 rti 0 't (s; ) ei 2 rti01 , = ( 1 ; 1 ) 2 R2 and 't (s; ) is given by (24).
1
Next, we simulate M samples of size T using b
T as pseudo-true parameter value. Each simulated
samples is used to compute the second step CGMM estimator b
T;j (B) as:
where b
hT ( ; ) =
1
T 01
PT
i=2
0
b
T;j (B) = arg min
Z
R2
0
01 b
hT ( ; )e0 d
KBT
hT ( ; ) b
(25)
The objective function (25) is evaluated using a Gauss-Hermite quadrature with ten points. The
regularization parameter B is selected on a thirty points grid that lies between 10010 and 1002 , that is:
B 2 [10010 ; 2:5 2 10010 ; 5 2 10010 ; 7:5 2 10010 ; 1 2 1009 ; :::; 1 2 1002 ]
For each B in this grid, we compute the MSE using Equation (21) (i.e., no truncation of the distribution
D
D
1 D2
D
T;j (B) 0 b
D ).
of Db
T
5.2
Simulations results
Table 1 shows the simulations T = 251, 501, 751 and 1001 for two di¤erent values of M . For a
given sample size T , the scenarios with M = 500 and M = 1000 use common random numbers (i.e.,
the results for M = 500 are based on the …rst 500 replications of the scenarios with M = 1000).
1
Curiously enough, the estimate of B
b T M (b
) is consistently equal to 2:5 2 1006 across all scenarios
except (T = 251; M = 1000) and (T = 1001; M = 500). This result might be suggesting that the
grid on which B is selected is not re…ned enough. Indeed, the value that is immediately smaller than
2:5 2 1006 on that grid is 1:0 2 1006 , which is selected for the scenario (T = 1001; M = 500). Arguably,
the results suggest that the rate of convergence of BT (0 ) to zero is quite slow for this particular data
generating process. Overall, 2:5 2 1006 seems a reasonable choice for the regularization parameter for
all sample sizes for this data generating process. Note that our simulations results do not allow us to
infer the behavior of BT (0 ) as 0 vary in the parameter space.
Figure 1 presents the simulated MSE curves. For all eight scenarios, these curves are convex and
14
have one minimum. The hump-shaped left tail of the MSE curves for T = 251 stems to the fact that
the approximating matrix of the covariance operator (see Appendix D) is severely ill-posed. Hence, the
shape of the MSE curve re‡ects the distortions in‡icted to the eigenvalues of the regularized inverse
of this approximating matrix as B varies. A smaller number of quadrature points should be used for
smaller sample sizes in order to mitigate this ill-posedness and obtain perfectly convex MSE curves.
Table 1: Estimation of BT for di¤erent sample size.
M = 500
M = 1000
1
1
bTM
bTM
B
b T M (b
) 16
B
b T M (b
) 16
T
T = 251
T = 501
T = 751
T = 1001
2:5 2 1006
T
0:0270
2:5 2 1006
0:0114
1006
0:0066
1:0 2 1006
0:0057
2:5 2
7:5 2 1007
0:0289
2:5 2 1006
0:0119
1006
0:0065
2:5 2 1006
0:0053
2:5 2
Figure 1: MSE curves of the CGMM estimator for di¤erent M and T .
1
b T M = 1 PM j;T (B; b
T ) and the horizontal axis is scaled as
The vertical axis shows T1 6
j=1
M
Sample size: T = 251
log B
10 :
Sample size: T = 501
0.034
0.014
M=500 reps
M=1000 reps
0.033
M=500 reps
M=1000 reps
0.0135
0.032
Mean square error
Mean square error
0.013
0.031
0.03
0.029
0.0125
0.012
0.028
0.0115
0.027
0.026
- 2.5
-2
- 1.5
-1
- 0.5
0.011
- 2.5
0
-2
- 1.5
log(α )/10
-3
7.8
Sample size: T = 751
x 10
-3
6.8
M=500 reps
M=1000 reps
7.6
- 0.5
0
Sample size: T = 1001
x 10
M=500 reps
M=1000 reps
6.6
6.4
Mean square error
7.4
Mean square error
-1
log(α )/10
7.2
7
6.2
6
5.8
6.8
5.6
6.6
6.4
- 2.5
5.4
-2
- 1.5
-1
- 0.5
0
- 2.5
log(α )/10
6
-2
- 1.5
-1
- 0.5
0
log(α )/10
Conclusion
The objective of this paper is to provide a method to optimally select the regularization parameter
denoted B in the CGMM estimation. First, we derive a higher order expansion of the CGMM estimator
that sheds light on how the …nite sample MSE depends on the regularization parameter. We obtain the
15
convergence rate for the optimal regularization parameter BT by equating the rates of two higher order
variance terms. We …nd an expression of the form BT = c (0 ) T 0g(C) , where c (0 ) does not depend of
the sample size T and 0 g (C) 1=2, where C is the regularity of the moment function with respect
to the covariance operator (see Assumption 4).
Next, we propose an estimation procedure for BT that relies on the minimization of an approximate
MSE criterion obtained by Monte Carlo simulations. The proposed estimator, B
b T M , is indexed by the
sample size T and the number of Monte Carlo replications M . To hedge against situations where the
MSE is not …nite, we propose to base the selection of BT on a truncated MSE that is always …nite.
Under the assumption that the truncation scheme does not alter the rate of BT , B
b T M is consistent for
BT as T and M increase to in…nity. Our simulation-based selection procedure has the advantage to be
easily applicable to other estimators, for instance it could be used to select the number of polynomial
terms in the e¢cient method of moments procedure of Gallant and Tauchen (1996). The optimal
selection of the regularization parameter permits to devise a fully feasible CGMM estimator that is a
real alternative to the maximum likelihood estimator.
16
References
[1] Ait-Sahalia, Y. and R. Kimmel (2007), “Maximum likelihood estimation of stochastic volatility
models,” Journal of Financial Economics 83, 413–452.
[2] Andews, D. W. K. (1991), "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Estimation," Econometrica, 59(3): 817-858
[3] Buse, A. (1992), “The Bias of Instrumental Variable Estimators”, Econometrica 60:1, 173–180.
[4] Carrasco, M. (2012), “A regularization approach to the many instruments problem,” Journal of
Econometrics, 170:2, 383–398.
[5] Carrasco, M., Chernov, M., Florens J. P. and E. Ghysels (2007), “E¢cient estimation of general
dynamic models with a continuum of moment conditions,” Journal of Econometrics, 140, 529–573.
[6] Carrasco, M. and J. P. Florens (2000), “Generalization of GMM to a continuum of moment conditions,” Econometric Theory, 16, 797-834.
[7] Carrasco, M., J. P. Florens, and E. Renault (2007), “Linear Inverse Problems in Structural Econometrics: Estimation based on spectral decomposition and regularization,” in the Handbook of
Econometrics, Vol. 6, edited by J.J. Heckman and E.E. Leamer.
[8] Chacko, G. and L. Viceira (2003), “Spectral GMM estimation of continuous-time processes,” Journal of Econometrics, 116, 259-292.
[9] Devroye, L. (1986), “Non-Uniform Random Variate Generation,” Spinger-Verlag.
[10] Dominguez, M. A. and I. N. Lobato (2004), “Consistent Estimation of Models De…ned by Conditional Moment Restrictions,” Econometrica 72:5, 1601-1615.
[11] Donald, S. and W. Newey (2001), “Choosing the number of instruments,” Econometrica, 69, 11611191.
[12] Du¢e, D. and K. Singleton (1993), “Simulated Moments Estimation of Markov Models of Asset
Prices,” Econometrica, 61, 929-952.
[13] Feller, W. (1951), “Two Singular Di¤usion Problems,” Annals of Mathematics, 54, 173-182.
[14] Feuerverger, A. (1990), “An e¢ciency result for the empirical characteristic function in stationary
time-series models,” Canadian Journal of Statistics, 18, 155-161.
[15] Feuerverger, A. and P. McDunnough (1981a), “On E¢cient Inference in Symmetry Stable Laws
and Processes,” Csorgo, M., ed. Statistics and Related Topics. New York: North Holland, 109-122.
[16] Feuerverger, A. and P. McDunnough (1981b), “On some Fourier Methods for Inference,” J. R.
Statist. Assoc. 76:379-387.
17
[17] Feuerverger, A. and P. McDunnough (1981c), “On the E¢ciency of Empirical Characteristic Function Procedures,” J. R. Statist. Soc. B, 43, 20-27.
[18] Feuerverger, A. and R. Mureika (1977), “The empirical characteristic function and its applications,” The annals of statistics, 5, No 1, 88-97.
[19] Gallant, A.R. and G. Tauchen (1996), “Which Moments to Match?” Econometric Theory, 12,
657-681.
[20] Gouriéroux, C. and A. Monfort (1996), “Simulation Based Econometric Methods,” CORE Lectures, Oxford University Press, New York.
[21] Gouriéroux, C., A. Monfort and E. Renault (1993), “Indirect Inference,” Journal of Applied Econometrics 8, S85-S118 .
[22] Hansen, L. (1982), “Large Sample Properties of Generalized Method of Moments Estimators,”
Econometrica, 50, 1029-1054.
[23] Heston, S. (1993), “A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options,” The Review of Financial Studies, 6:2, 327-343.
[24] Jacho-Chavez, D. T. (2010), “Optimal Bandwidth Choice for Estimation of Inverse Conditional–
Density–Weighted Expectations,” Econometric Theory, 26, 94-118.
[25] Jiang, G. and J. Knight (2002), “Estimation of Continuous Time Processes via the Empirical
Characteristic Function,” Journal of Business and Economic Statistics, 20, 198-212.
[26] Koenker, R., Machado, J. A. F., Skeels, C. L. and A. H. I. Welsh (1994), “Momentary Lapses:
Moment Expansions and the Robustness of Minimum Distance Estimators,” Econometric Theory,
10, 172-197.
[27] Koutrouvelis, I. A. (1980), “Regression-type estimation of the parameters of stable laws,” J. Am.
Statist. Assoc. 75:372, 918–928.
[28] Lavergne, P., and V. Patilea (2008), “Smooth minimum distance estimation and testing with
conditional estimating equations: uniform in bandwidth theory,” Working paper available at
ideas.repec.org.
[29] Linton, O. (2002) “Edgeworth Approximation for Semiparametric Instrumental Variable and Test
Statistics,” Journal of Econometrics, 106, 325-368.
[30] Liu, Q. and D. A. Pierce (1994), “A Note on Gauss-Hermite Quadrature,” Biometrika, 81:3, 624629.
[31] Madan, D. B. and E. Senata (1990), “The Variance Gamma Model for Share Market Returns,”
Journal of Business, 63:4, 511-524.
18
[32] Nagar, A. L. (1959): “The Bias and Moment Matrix of the General k-Class Estimators of the
Parameters in Simultaneous Equations,” Econometrica, 27, 573-595.
[33] Newey, W. K. and D. McFadden (1994), “Large Sample Estimation and Hypotheses Testing,”
Handbook of Econometrics, Vol IV, ed. by R.F. Engle and D.L. McFadden.
[34] Newey, W K. and R. J. Smith (2004), “Higher Order Properties of GMM and Generalized Empirical
Likelihood Estimators,” Econometrica, 72:1, 219–255.
[35] Nolan, J. P. (2009), “Stable Distributions: Models for Heavy Tailed Data,” Birkhauser, Boston,.
In preparation.
[36] Paulson, A. S., Holcomb W. E. and R. A. Leitch (1975), “The estimation of the parameters of the
stable laws,” Biometrika, 62:1, 163-170.
[37] Phillips, P. C. B. and H. R. Moon (1999), “Linear Regression Limit Theory for Nonstationary
Panel Data,” Econometrica, 67:5, 1057–1112.
[38] Rilstone, P., Srivastava, V. K., and A. Ullah (1996), “The Second-Order Bias and Mean-Squared
Error of Nonlinear Estimators,” Journal of Econometrics, 75, 369-395.
[39] Rothenberg, T. J. (1983), “Asymptotic properties of Some Estimators in Structural Models,”
Studies in Econometrics, Time Series and Multivariate Statistics, edited by S. Karlin, T. Amemiya
and L. A. Goodman, New York: Academic Press.
[40] Rothenberg, T. J. (1984), “Approximating the Distributions of Econometric Estimators and Test
Statistics,” Handbook of Econometrics, Vol. 2, ed. by Z. Griliches and M. D. Intriligator. New
York: North-Holland.
[41] Singleton, K. (2001), “Estimation of A¢ne Pricing Models Using the Empirical Characteristic
Function,” Journal of Econometrics, 102, 111-141.
[42] Yu, J. (2004), “Empirical Characteristic Function Estimation and Its Applications,” Econometric
Reviews, 23:2, 93-123.
[43] Zhou,
MLE
H.
for
(2001),
a
“Finite
Square-Root
Sample
Interest
Properties
Rate
of
Di¤usion
EMM,
GMM,
Model,”
Mimeo,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.8099.
19
QMLE,
available
and
at
Appendix
A
Some basic properties of the covariance operator
For more formal proofs of the results mentioned in this appendix, see Carrasco, Florens and Renault
(2007). Let K be the covariance operator de…ned in (5) and (6), and b
ht ( ; ) the moment function
de…ned in (1) and (2). Finally, let 8C be the subset of L2 () de…ned in Assumption 4.
De…nition 7 The range of K denoted R(K) is the set of functions g such that Kf = g for some f in
L2 ().
Proposition 8 R(K) is a subspace of L2 ().
Note that the kernel functions k(s; :) and k(:; r) are elements of L2 () because
C h
iC2
C
C
jk(s; r)j2 = CE ht (; s)ht (; r) C 4; 8 (s; r) 2 R2p
(1)
Thus for any f 2 L2 (), we have
2
jKf (s)j
C2 Z
CZ
C
C
C
= C k(s; r)f (r) (r) drCC jk(s; r)f (r)j2 (r) dr
Z
4 jf (r)j2 (r) dr < 1;
implying
2
kKf k =
Z
jKf (s)j2 (s) ds < 1 ) Kf 2 L2 () :
De…nition 9 The null space of K denoted N (K) is the set of functions f in L2 () such that Kf = 0.
The covariance operator K associated with a moment function based on the CF is such that N (K) =
f0g. See CCFG (2007).
De…nition 10 is an eigenfunction of K associated with eigenvalue if and only if K = .
8 9
Proposition 11 Suppose 1 2 :::: j ::: are the eigenvalues of K. Then the sequence j
satis…es: (i) j > 0 for all j, (ii) 1 < 1 and lim j = 0.
j!1
Remark. The covariance operator associated with the CF-based moment function is necessarily
compact.
Proposition 12 Every f 2 L2 () can be decomposed as: f =
As a consequence, Kf =
P1 A
j=1
P1 A
j=1
B
B
A
P
f; j Kj = 1
j=1 f; j j j .
20
B
f; j j .
Proposition 13 If 0 < C 1 C 2 ; then 8C 2 8C 1 .
D
D
We recall that 8C is the set of functions such that DK 0C f D < 1. In fact, f 2 R(K C 2 ) ) K 0C 2 f
D
D2 P
BC2
02C 2 CCA
exist and DK 0C 2 f D = 1
f; j C < 1: Thus if f 2 R(K C 2 ), we have:
j=1 j
1
1
D
D
X
BC2
BC2
D 0C 1 D2 X 2(C 2 0C 1 ) 02C 2 CCA
02C CA
2(C 2 0C 1 )
C
j 2 C f; j C < 1
fD =
1
f; j
j
j
DK
j=1
j=1
) K 0C 1 f exist ) f 2 R(K C 1 ). This means R(K) R(K 1=2 ) so that the function K 01=2 f is de…ned
A
B A
B
on a wider subset of L2 () compared to K 01 f . When f 2 81 , K 01=2 f; K 01=2 f = K 01 f; f . But
A
B
A
B
when f 2 8C for 1=2 C < 1, the quadratic form K 01=2 f; K 01=2 f is well de…ned while K 01 f; f is
not.
B
Expansion of the MSE and proofs of Theorems 1 and 2
B.1
Preliminary results and proof of Theorem 1
Lemma 14 Let KB01 = (K 2 + BI)01 K and assume that f 2 8C for some C > 1. Then as B goes to
zero and n goes to in…nity, we have:
D 01
D
DK 0 KB01 D
BT
D0 01
1 D
D K 0 KB01 f D
BT
D0 01
1 D
D KB 0 K 01 f D
1
B
A0 01
K 0 KB01 f; f
= Op B03=2 T 01=2 ;
= Op B01 T 01=2 ;
C01
= O Bmin(1; 2 ) ;
min(1; 2C01
)
2
:
= O B
(2)
(3)
(4)
(5)
Proof of Lemma 14. Subsequently, j ; j = 1; 2:::; 1 denote the eigenfunctions of the covariance
operator K associated respectively with the eigenvalues j ; j = 1; 2:::; 1. We …rst consider (2). By the
triangular inequality:
D 2
D
D(K + BI)01 KT 0 (K 2 + BI)01 K D
T
D
D D
D
D(KT2 + BI)01 (KT 0 K)D + D(KT2 + BI)01 K 0 (K 2 + BI)01 K D
D2
D
D
3 D
D(KT2 + BI)01 DkKT 0 Kk + D (KT2 + BI)01 0 (K 2 + BI)01 K D ;
{z
}| {z }
|
B01
=Op (T 01=2 )
0
1
where kKT 0 Kk = Op T 01=2 follows from Proposition 3.3 (i) of CCFG (2007). We have:
D2 2
3 D
D (KT + BI)01 0 (K 2 + BI)01 K D
D
D
0
1
= D(KT2 + BI)01 K 2 0 KT2 (K 2 + BI)01 K D
DD
D
DD0
D
1DD
DD
D
D
D(KT2 + BI)01 DD K 2 0 KT2 DD(K 2 + BI)01=2 DD(K 2 + BI)01=2 K D
{z
}|
{z
}|
|
{z
}|
{z
}
B01
=Op (T 01=2 )
1
B01=2
21
This proves (2).
The di¤erence between (2) and (3) is that in (3) we exploit the fact that f 2 8C with C > 1, hence
D 01 D
DK f D < 1. We can rewrite (3) as
D D0
D0 01
D
1
1 D D0
1 DD
D K 0 KB01 f D = D K 01 0 KB01 KK 01 f D D K 01 0 KB01 K D DK 01 f D :
BT
BT
We have
0
BT
1
01
0 KB01 K = (KT2 + BI)01 KT K 0 (K 2 + BI)01 K 2
KBT
= (KT2 + BI)01 (KT 0 K) K
2
3
+ (KT2 + BI)01 0 (K 2 + BI)01 K 2 :
(6)
(7)
The term (6) can be bounded in the following manner
D
D
D
D 2
D(KT + BI)01 (KT 0 K) K D D(KT2 + BI)01 DkKT 0 Kk kKk
|
{z
}| {z }
B01
=Op (T 01=2 )
= Op B01 T 01=2 :
0
1
For the term (7), we use the fact that A01=2 0 B 01=2 = A01=2 B 1=2 0 A1=2 B 01=2 : It follows that
D
D2 2
3
D (KT + BI)01 0 (K 2 + BI)01 K 2 D
D
D
0
1
= D(KT2 + BI)01 K 2 0 KT2 (K 2 + BI)01 K 2 D
DD
D 2
D
DD 2
01 DD 2
01 01=2
2 DD
01 2 D
D
(KT + BI)
:
K 0 KT (K + BI) K = Op B T
|
{z
}|
{z
}|
{z
}
1
B01
=Op (T 01=2 )
This proves (3).
Now we turn our attention toward equation (4). We can write
2
01
(K + BI)
Kf 0 K
01
f=
1
X
j=1
"
#
1
X
B
j
1 A
0
f; j j =
j
B + 2j
j=1
2j
B + 2j
01
!A
B
f; j
j :
j
We now take the norm:
0
1
X
D
D 2
01
01 D
D
@
(4) = (K + BI) Kf 0 K f =
j=1
0
= @
1
X
j=1
2C02
j
!2 CA
BC 11=2
C f; j C2
A
01
B + 2j
2j
2j
!2 CA
BC 11=2 0 1 CA
BC 11=2
C f; j C2
X C f; j C2
B
A @
A
01
sup jC01
2
2:
2C
2C
B + j
B
+
1j1
j
j
j
j=1
2j
22
Recall that as K is a compact operator, its largest eigenvalue 1 is bounded. We need to …nd an
equivalent to
C01
sup 01
C01
B
= sup 2
2
B + j
021
1
10
B= + 1
(8)
C=201=2
x
Case where 1 C 3: We apply another change of variables x = B=: sup BxC=201=2 1+x
. An
x0
1
x(30C)=2
x
+
equivalent to (8) is BC=201=2 provided that xC=201=2
1+x is bounded on R . Note that g (x) 1+x
is continuous and therefore bounded on any interval of (0; +1). It goes to 0 at +1 and its limit at 0
also equals 0 for 1 C < 3. For C = 3, we have: g (x) 1
1+x .
Then g (x) goes to 1 at x = 0 and to 0
at +1.
Case where C > 3: We rewrite the left hand side of (8) as
C01
j
2
B
C03 j
BC03
= O (B) :
= Bj
1
2
2
B + j
B + j
| {z }
2(0;1)
C01
To summarize, we have for f 2 8C : (4)= O Bmin(1; 2 ) :
Finally, we consider (5). We have:
(5) =
X
j
1
0 2
j
j + B
j
=
X
2C01
j
j
10
!
2j
2j
A
f; j
+B
B2
!A
=
f; j
2C
j
X
j
B2
10
2j
2j + B
B
A
X f; j 2
2C
j
j
!A
f; j
j
B2
sup 2C01
1
2
B
:
+B
For C 3=2, we have: sup1 2C01 2B+B B2C03
= O (B). For C < 3=2, we apply the change of
1
2C01 2C01
0
1
2C01
x
x
B
2
= O B 2 , as f (x) = 1+x
variables x = B=2 and obtain supx0 1+x
x0 2 is bounded on
x
2C01
R+ . Finally: (5)= O Bmin(1; 2 ) :4
Lemma 15
Suppose we have a particular function f () 2 8C for some C > 1, and a sequence of
functions fT () 2 8C such that sup kfT () 0 f ()k = Op (T 01=2 ): Then as B goes to zero, we have
22
D
D
C01
D 01=2
D
01=2
f ()D = Op (B01 T 01=2 ) + O Bmin(1; 2 ) .
sup DKBT fT () 0 K
22
Proof of Lemma 15.
D
D 01
fT () 0 K 01 f ()D B1 + B2 ;
sup DKBT
22
with
D
D0 01
D
D 01
1
01
0 K 01 f ()D :
fT () 0 KBT
f ()D and B2 = sup D KBT
B1 = sup DKBT
22
22
23
We have
D 01 D
D sup kfT () 0 f ()k
B1 DKBT
22
D
D0
D
101=2 D
101=2
DD0
D
D
D BT + KT2
KT Dsup kfT () 0 f ()k
DD BT + KT2
|
{z
}|
{z
}|22
{z
}
1
01=2
BT
01=2
= Op (BT
=Op (T 01=2 )
T 01=2 ):
On the other hand, Lemma 14 implies that:
D
D0 01
1
B2 = D KBT
0 K 01 f ()D
D
D0 01
D D0
1
1
D KBT
0 KB01 f ()D + D KB01 0 K 01 f ()D
C01
= Op B01 T 01=2 + O Bmin(1; 2 ) :
Hence, B1 is negligible with respect to B2 and the result follows.4
Lemma 16 For all nonrandom functions (u; v) ; we have:
E
D
E
ED
1
b
b
u; hT (:; ) v; hT (:; ) = hu; Kvi
T
Proof of Lemma 16. We have:
Z
Z
D
E
ED
b
b
b
b
u ( ) hT ( ; ) ( ) d
= E
v ( )hT ( ; ) ( ) d
E u; hT (:; ) v; hT (:; )
Z Z
b
b
hT ( 1 ; )hT ( 2 ; )u ( 1 ) v ( 2 ) ( 1 ) ( 2 ) d 1 d 2
= E
Z Z
i
h
hT ( 1 ; )b
hT ( 2 ; ) u ( 1 ) v ( 2 ) ( 1 ) ( 2 ) d 1 d 2 :
=
E b
Because the ht s are uncorrelated, we have:
we have
i
h
i
1 h
1
hT ( 2 ; ) = E ht ( 1 ; )ht ( 2 ; ) = k ( 1 ; 2 ) ;
hT ( 1 ; )b
E b
T
T
E
ED
hT (:; )
E u; b
hT (:; ) v; b
Z Z
1
1
k ( 1 ; 2 )v ( 2 ) ( 2 ) d 2 u ( 1 ) ( 1 ) d 1 = hu; Kvi :
=
T
T
{z
}
|
D
Kv( 1 )
4
24
Lemma 17 Let S be a neighborhood of b
, such that e
0b
= Op (T 01=2 ) for all e
2 S, where b
solves:
D
b T (:; ) =
and G
@b
hT (:;)
.
@
We have:
E
01 b
KBT
GT (:; b
); b
hT (:; b
) = 0
E
D
0
1
01 b
2 S:
GT (:; e
); b
hT (:; e
) = Op T 01 for all e
Im KBT
Proof of Lemma 17. Note that S contains 0 and:
e
0 0 =
e
0 0 = Op T 01=2 :
| {z
0b
} + b
| {z }
Op (T 01=2 )
Op (T 01=2 )
Hence, a …rst order Taylor expansion of b
hT (:; e
) around 0 yields:
0
1
b
b T (:; 0 ) e
0 0 + Op T 01 :
hT (:; e
) = b
hT (:; 0 ) + G
b T (:; e
Likewise, a …rst order Taylor expansion of G
) around 0 yields:
Hence, we have:
D
b T (:; e
b T (:; 0 ) +
G
) = G
q
X
j=1
0
1
b j;T (:; 0 ) e
j 0 j;0 + Op T 01 :
H
E
E
D
01 b
01 b
GT (:; 0 ); b
hT (:; 0 )
GT (:; e
); b
hT (:; e
)
=
KBT
KBT
D
E
0
1
01 b
b T (:; 0 ) e
0 0 + Op T 01 :
+ KBT
GT (:; 0 ); G
E
D
01 b
b T (:; 0 ) e
0 0 is real. At the particular point e
GT (:; 0 ); G
=b
(and for
Note that the term KBT
…xed B):
E
E D
D
0
1
01 b
01 b
b T (:; 0 ) b
0 0 + Op T 01 :
GT (:; 0 ); G
GT (:; 0 ); b
hT (:; 0 ) + KBT
0 = KBT
D
E
0
1
01 b
Hence, the imaginary part of KBT
GT (:; 0 ); b
hT (:; 0 ) is Op T 01 , and so is the imaginary part of
E
D
01 b
e
b
e
2 S:4
KBT GT (:; ); hT (:; ) for all e
Proof of Theorem 1. The proof follows the same steps as that of Proposition 3.2 in CCFG
(2007). However, we now exploit the fact Er ht () 2 8C with C 1. The consistency follows from
Lemma 15 provided BT 1=2 ! 1 and B ! 0. For the asymptotic normality to hold, we need to …nd a
25
bound for the term B.10 of CCFG (2007). We have:
CD
EC
p
C
01
^ T ^T 0 K 01 E r h
^ T (0 ) ; T h
^ T (0 ) CC
jB:10j = C KBT
r h
D
D
D Dp
D 01=2 ^ ^
D
01=2
^ T (0 )D
^ T (0 ) D
DKBT r h
E r h
D
DD Th
T T 0 K
|
{z
}
= Op B01=2 T 01=2 + O B
)
min(1; C01
2
=Op (1)
:
Hence the asymptotic normality requires the same conditions as the consistency, that is, BT 1=2 ! 1 and
01
B ! 0. The asymptotic e¢ciency follows from the fact that KBT
! K 01 under the same conditions.4
B.2
Stochastic expansion of the CGMM estimator: IID case
The objective function is
n
D
Eo
01 b
b
hT (:; ); b
hT (:; ) :
= arg min QBT () = KBT
where b
hT ( ; ) =
1
T
PT
t=1
ei
0x
t
0 '( ; ) . The optimal b
solves:
@QBT b
@
;)
where G(:; ) = 0 @'(
@ .
D
E
01
= 2 Re KBT
G(:; b
); b
hT (:; b
) = 0
(9)
A third order expansion gives
q X
@ 3 Q 01 @QBT (0 ) @ 2 QBT (0 ) b
BT
b
b
0
0
+
+
0
;
0=
0
0
j
j;0
@
@@0
@j @@0
j=1
where lies between b
and 0 . The dependence of b
on BT is hidden for convenience. Let us de…ne
Gj (:; ) = 0
@'( ; )
@ 2 '( ; )
@ 3 '( ; )
@ 2 '( ; )
; H(:; ) = 0
;
L
=
0
;
H
(:;
)
=
0
:
j
j
@j
@@j
@@0
@j @@0
and
D
E
01
9T (0 ) = Re KBT
G(:; 0 ); b
hT (:; 0 ) ;
D
E
B
A 01
01
WT (0 ) = KBT
G(:; 0 ); G(:; 0 ) + Re KBT
H(:; 0 ); b
hT (:; 0 ) ;
D
E
A 01
B
01
Bj;T () = 2 Re KBT
G(:; ); Hj (:; ) + Re KBT
Lj (:; ); b
hT (:; )
A 01
B
+ Re KBT
H(:; ); Gj (:; ) :
26
Then we can write:
q X
b
b
j 0 j;0 Bj;T () b
0 = 9T (0 ) + WT (0 ) 0 0 +
0 0 :
j=1
Note that the derivatives of the moment functions are deterministic in the IID case. We decompose
9T (0 ), WT (0 ) and Bj;T () as follows:
e T;B (0 );
9T (0 ) = 9T;0 (0 ) + 9T;B (0 ) + 9
where
E
D
9T;0 (0 ) = Re K 01 G; b
hT = Op T 01=2
D0
E
1
C01
hT = Op Bmin(1; 2 ) T 01=2
9T;B (0 ) = Re KB01 0 K 01 G; b
D0
E
1
0 01 01 1
e T;B (0 ) = Re K 01 0 K 01 G; b
h
9
T = Op B T
B
BT
where the rates of convergence are obtained using the Cauchy-Schwarz inequality and the results of
Lemma 14. Similarly, we decompose WT (0 ) into various terms with distinct rates of convergence:
where
fB (0 ) + WT;0 (0 ) + W
fT;B (0 );
WT (0 ) = W0 (0 ) + WB (0 ) + W
W0 (0 ) =
WB (0 ) =
fB (0 ) =
W
WT;0 (0 ) =
fT;B (0 ) =
W
A
B
K 01 G; G = O(1);
1
B
A0 01
2C01
KB 0 K 01 G; G = O Bmin(1; 2 ) ;
A0 01
1
B
KBT 0 KB01 G; G = Op B01 T 01=2 ;
D
E
Re K 01 H(:; 0 ); b
hT (:; 0 ) = Op T 01=2 ;
D0
E
1
0
1
01
Re KBT
0 K 01 H(:; 0 ); b
hT (:; 0 ) = Op B01 T 01 :
We consider a simpler decomposition for Bj;T ():
where
0
1
Bj;T () = Bj () + Bj;T () 0 Bj ()
A
B
A
B
Bj () = 2 Re K 01 G(:; ); Hj (:; ) + Re K 01 H(:; ); Gj (:; ) = O(1);
C01
Bj;T () = Bj () + O Bmin(1; 2 ) + Op B01 T 01=2 :
27
By replacing these decompositions into the expansion of the FOC, we can solve for b
0 0 to obtain:
b
0 0 = 0W001 (0 )9T;0 (0 )
h
i
0 0
0W001 (0 ) 9T;B (0 ) + WB (0 ) b
h
i
e T;B (0 ) + W
fB (0 ) b
0 0
0W001 (0 ) 9
0 0
0W001 (0 )WT;0 (0 ) b
q X
b
0
j 0 j;0 W001 (0 )Bj () b
0 0
0
j=1
q X
j=1
b
j 0 j;0 W001 (0 )(Bj;T () 0 Bj ()) b
0 0 :
To complete the expansion, we replace b
0 0 by 0W001 (0 )9T;0 (0 ) in the higher order terms:
b
b
0 0 = 11 + 12 + 13 + 14 + 15 + R;
b is a remainder that goes to zero faster than the following terms:
where R
11 = 0W001 (0 )9T;0 (0 );
2
3
12 = 0W001 (0 ) 9T;B (0 ) 0 WB (0 )W001 (0 )9T;0 (0 ) ;
i
h
e T;B (0 ) 0 W
fB (0 )W 01 (0 )9T;0 (0 ) ;
13 = 0W001 (0 ) 9
0
14 = W001 (0 )WT;0 (0 )W001 (0 )9T;0 (0 )
q
X
0 01
1
0
W0 (0 )9T;0 (0 ) j W001 (0 )Bj ()W001 (0 )9T;0 (0 );
j=1
15 = 0
q
X
0
j=1
1
W001 (0 )9T;0 (0 ) j W001 (0 )(Bj;T () 0 Bj ())W001 (0 )9T;0 (0 ):
To obtain the rates of these terms, we use the fact that jAf j kAk jf j. This yields immediately:
0
1
0
1
2C01
11 = Op T 01=2 ; 12 = Op Bmin(1; 2 ) T 01=2 , 13 = Op B01 T 01 ; 14 = Op T 01 ,
C01
15 = O Bmin(1; 2 ) T 01 + Op B01 T 03=2 :
To summarize, we have:
0
1
C01
b
0 0 = 11 + 12 + 13 + op B01 T 01 + op Bmin(1; 2 ) T 01=2 :
28
(10)
B.3
Stochastic expansion of the CGMM estimator: Markov case
The objective function here is given by:
n
D
Eo
01 b
b
= arg min QBT () = KBT
hT (:; ); b
hT (:; ) :
where b
hT ( ; ) =
1
T
PT
t=1
0
0
eis xt+1 0 '(s; ; xt ) eir xt and = (s; r) 2 R2p . The optimal b
solves
@QBT b
@
b T ( ; ) = 0 1 PT
where G
t=1
T
D
E
01 b
GT (:; b
); b
hT (:; b
) = 0
= 2 Re KBT
(11)
@'(s;;xt ) ir0 xt
e
.
@
The third order Taylor expansion of (11) around 0 yields:
q @ 3 Q 01
X
@QBT (0 ) @ 2 QBT (0 ) b
BT
b
b
( 0 0 ) +
+
0=
j 0 j;0
0 ( 0 0 );
@
@@0
@
@@
j
j=1
where lies between b
and 0 .
Let us de…ne:
b T ( ; ) = 0 1
H
T
and
b j;T ( ; ) = 0 1
H
T
T
X
@ 2 '(s; ; xt )
t=1
T
X
t=1
@@
0
0
b j;T ( ; ) = 0 1
eir xt ; G
T
@ 2 '(s; ; xt ) ir0 xt b
1
; Lj;T ( ; ) = 0
e
@j @
T
T
X
@'(s; ; xt )
t=1
T
X
t=1
@j
@ 3 '(s; ; xt ) ir0 xt
;
e
@j @@0
D
E
b T (0 ) = Re K 01 G
b T (:; 0 ); b
9
h
(:;
)
;
0
T
BT
E
E
D
D
01 b
b T (:; 0 ); b
b T (:;0 ) + Re K 01 H
cT (0 ) =
h
(:;
)
;
GT (:; 0 ); G
W
KBT
0
T
BT
E
E
D
D
b T (:; ); G
b j;T (:; )
b T (:; ); H
b j;T (:; ) + Re K 01 H
bj;T () = 2 Re K 01 G
B
BT
BT
E
D
01 b
Lj;T (:; ); b
hT (:; ) :
+ Re KBT
Then the expansion of the FOC becomes:
q X
b
b
b
c
bj;T () b
0 = 9T (0 ) + WT (0 ) 0 0 +
j 0 j;0 B
0 0 ;
j=1
29
0
eir xt ;
Unlike in the IID case, the derivatives of the moment function are not deterministic. We thus de…ne:
b T ( ; );
b T ( ; ); H( ; ) = p limH
G( ; ) = p limG
T !1
T !1
b j;T ( ; ):
b j;T ( ; ); Hj ( ; ) = p limH
Gj ( ; ) = p limG
T !1
T !1
It follows from Assumption 3 and Markov’s inequality that:
b T ( ; ) = Op T 01=2 ; H( ; ) 0 H
b T ( ; ) = Op T 01=2 ;
G( ; ) 0 G
01=2
01=2
b
b
Gj ( ; ) 0 Gj;T ( ; ) = Op T
; Hj ( ; ) 0 Hj;T ( ; ) = Op T
:
b T (0 ):
We have the following decomposition for 9
b
b T (0 ) = 9T;0 (0 ) + 9T;B (0 ) + 9
e T;B (0 ) + 9
b T;B (0 ) + 9
e T;B (0 ):
9
By using the fact that kAf k kAk kf k, we obtain the following the rates:
D
E
9T;0 (0 ) = Re K 01 G; b
hT (:; 0 ) = Op T 01=2 ;
E
D0
1
C01
hT (:; 0 ) = Op Bmin(1; 2 ) T 01=2 ;
9T;B (0 ) = Re KB01 0 K 01 G; b
E
D0
1
0
1
e T;B (0 ) = Re K 01 0 KB01 G; b
h
(:;
)
= Op B01 T 01 ;
9
0
T
BT
E
D
bT 0 G ; b
b T;B (0 ) = Re KB01 G
hT (:; 0 ) = Op B01=2 T 01 ;
9
E
D0
1
b
01
bT 0 G ; b
e
KBT
0 KB01 G
hT (:; 0 ) = Op B03=2 T 03=2 :
9
T;B ( 0 ) = Re
b T (0 ) and the one in the IID case only comes from
The di¤erence between the above decomposition of 9
b
b T;B (0 ) and 9
e T;B (0 ). Hence we can write 9
b T (0 ) as:
the additional higher order terms 9
b T (0 ) = 9T;0 (0 ) + 9T;B (0 ) + 9
e T;B (0 ) + R9 ;
9
0
1
C01
where R9 = op B01 T 01 + op Bmin(1; 2 ) T 01=2 .
cT (0 ):
We have a similar decomposition for W
where
c
f B (0 )
cT (0 ) = W0 (0 ) + WB (0 ) + W
fB (0 ) + W
cB (0 ) + W
W
c
f1;B (0 ) + W
c1;B (0 ) + W
f
+W1 (0 ) + W1;B (0 ) + W
1;B ( 0 );
A
B
W0 (0 ) = K 01 G; G = O(1);
A0
1
B
2C01
WB (0 ) = KB01 0 K 01 G; G = O Bmin(1; 2 ) ;
30
1
B
0
1
A0
fB (0 ) = K 01 0 K 01 G; G = Op B01 T 01=2 ;
W
B
BT
E
D
0
1
b T 0 G ; G = Op B01=2 T 01=2 ;
cB (0 ) = K 01 G
W
B
D
E D
E
0
1
b T 0 G = Op T 01=2 ;
W1 (0 ) = Re K 01 H; b
hT + K 01 G; G
E
D0
1
0
1
01
b T 0 G ; G = Op B03=2 T 01 :
0 KB01 G
W1;B (0 ) = KBT
D0
E D0
E
1
1
c
) 01=2
f 1;B (0 ) = Re K 01 0 K 01 H; b
b T 0 G = O Bmin(1; C01
2
T
;
W
hT + KB01 0 K 01 G; G
B
D0
E D0
E
1
0 01 01 1
1
01
01
01
01
b T 0 G = Op B T
f1;B (0 ) = Re K 0 KB H; b
;
hT + KBT 0 KB G; G
W
D BT
E
E D
0
1
bT 0 H ; b
bT 0 G ; G
b T 0 G = Op B01=2 T 01 and
c1;B (0 ) = Re K 01 H
W
hT + KB01 G
B
E
E D0
D0
1
1
0
1
01
01
bT 0 G ; G
b T 0 G = Op B03=2 T 03=2 :
bT 0 H ; b
0 KB01 G
hT + KBT
0 KB01 H
RW;1 = Re KBT
For the purpose of …nding the optimal B, it is enough to consider the shorter decomposition:
with
cT (0 ) = W0 (0 ) + WB (0 ) + W
fB (0 ) + W
cB (0 ) + W1 (0 ) + W1;B (0 ) + RW;
W
0
1
c
) 01=2
f 1;B (0 ) + W
f1;B (0 ) + W
c1;B (0 ) + RW;1 = Op B01 T 01 + O Bmin(1; C01
2
RW W
T
:
Finally, we consider again a simpler decomposition for Bj;T ():
0
1
Bj;T () = Bj () + Bj;T () 0 Bj ()
where
A
B
A
B
Bj () = 2 Re K 01 G(:; ); Hj (:; ) + Re K 01 H(:; ); Gj (:; ) = O(1) and
C01
Bj;T () = Bj () + O Bmin(1; 2 ) + Op B01 T 01=2 :
We replace these decompositions into the expansion of the FOC and solve for b
0 0 to obtain:
b
0 0 = 0W001 (0 )9T;0 (0 )
h
i
0 0
0W001 (0 ) 9T;B (0 ) + WB (0 ) b
i
h
e T;B (0 ) + W
fB (0 ) b
cB (0 ) b
0 0 0 W001 (0 )W
0 0
0W001 (0 ) 9
q X
b
0 0 0
0W001 (0 )W1 (0 ) b
j 0 j;0 W001 (0 )Bj () b
0 0
0 0
0W001 (0 )W1;B (0 ) b
j=1
q X
b
0
j 0 j;0 W001 (0 )(Bj;T () 0 Bj ()) b
0 0
j=1
0 0 0 W001 (0 )R9 :
0W001 (0 )RW b
31
0
1
Next, we replace b
0 0 by 0W001 (0 )9T;0 (0 ) = Op T 01=2 in the higher order terms. This yields:
b
b1 + R
b2 + R
b3 + R
b4 ;
0 0 = 11 + 12 + 13 + R
where
11 = 0W001 (0 )9T;0 (0 ) = Op T 01=2 ;
2
3
2C01
12 = 0W001 (0 ) 9T;B (0 ) 0 WB (0 )W001 (0 )9T;0 (0 ) = Op Bmin(1; 2 ) T 01=2 ;
i
h
0
1
e T;B (0 ) 0 W
fB (0 )W 01 (0 )9T;0 (0 ) = Op B01 T 01 ;
13 = 0W001 (0 ) 9
0
b1 = W 01 (0 )W
cB (0 )W 01 (0 )9T;0 (0 ) = Op B01=2 T 01 ;
R
0
0
b2 = W 01 (0 )W1 (0 )W 01 (0 )9T;0 (0 )
R
0
0
q
X
0 01
1
1
0
0
W0 (0 )9T;0 (0 ) j W001 (0 )Bj ()W001 (0 )9T;0 (0 ) = Op T 01 ;
j=1
and
b3 = W 01 (0 )W1;B (0 )W 01 (0 )9T;0 (0 ) = Op B03=2 T 03=2 ;
R
0
0
b4 = 0W 01 (0 )R9 + W 01 (0 )RW W 01 (0 )9T;0 (0 )
R
0
0
0
q
X
0 01
1
0
W0 (0 )9T;0 (0 ) j W001 (0 )(Bj;T () 0 Bj ())W001 (0 )9T;0 (0 );
j=1
0
1
C01
= op B01 T 01 + op Bmin(1; 2 ) T 01=2 :
In summary, we have:
0
1
C01
b
0 0 = 11 + 12 + 13 + op B01 T 01 + op Bmin(1; 2 ) T 01=2 ;
(12)
which is of the same form as in the IID case.
B.4
Proof of Theorem 2.
Using the expansions given in (10) and (12), we obtain:
0
1
b
0 0 = 11 + 12 + 13 + Op T 01 :
0
1
Lemma 17 ensures that all terms that are slower than Op T 01 in the expansion above are real. Hence
the Re symbol may be removed from the expression of 11 , 12 and 13 .
32
Asymptotic Variance
The asymptotic variance of b
is given by
2
3
T V ar (11 ) = T W001 E 9T;0 (0 )9T;0 (0 )0 W001
D
ED
E0 A
B
01
01
01
b
b
= T W0 E K G; hT
K G; hT
W001 = W001 K 01 G; G W001 ;
where the last equality follows from Lemma 16. Hence,
Higher Order Bias
A
B
T V ar (11 ) = W001 K 01 G; G W001 = W001 :
The terms 11 and 12 have zero expectations. Hence, the bias comes from 13 :
h
i
0 0 = E [13 ]
Bias E b
e T;B + W 01 W
fB W 01 9T;0 . As W 01 is a constant matrix, we focus on 9
e T;B +
where 13 = 0W001 9
0
0
0
01
fB W 9T;0 .
W
0
e T;B . By applying Cauchy-Schwarz twice, we obtain:
We …rst consider the term 9
D D D0
D
ED
1
D
D
D
01
e T;B D
0 KB01 G; b
hT D
DE 9
D = DE KBT
D
D0
1 DD
D D
01
hT D 0 KB01 GD Db
E D KBT
s
D D D0
1 D2 D D2
01
01
D
D
hT D :
K 0 KB G
E
E Db
BT
Using the fact that ht ( ; ) is a martingale di¤erence sequence and is bounded, we obtain:
Z
D D Db D2
b
b
= E
hT ( ; ) hT ( ; ) ( ) d
E DhT D
Z
1
=
E
ht ( ; ) ht ( ; ) ( ) d = O(T 01 ):
T
(13)
Next, using (6) and (7), we obtain:
Hence:
D0
D0
D2
1 D2 1 D2 D
01
01
0 KB01 K D DK 01 GD
E D KBT
0 KB01 GD
E D KBT
D
D2
D2 D
E D(KT2 + BI)01 (KT 0 K) K D DK 01 GD
D
D2
D2 D
+E D(KT2 + BI)01 0 (K 2 + BI)01 D kKk4 DK 01 GD :
D
D2
D2 D
(14) = E D(KT2 + BI)01 (KT 0 K) K D DK 01 GD
D2
D
B02 E kKT 0 Kk2 kKk2 DK 01 GD = O(B02 T 01 );
33
(14)
(15)
where E kKT 0 Kk2 = O(T 01 ) follows from Carrasco and Florens (2000, Theorem 4, p. 825). For
(15), we use the fact that A01 0 B 01 = A01 (B 0 A) B 01 to obtain:
D
D2 D2
D
(15) = E D(KT2 + BI)01 0 (K 2 + BI)01 D kKk4 DK 01 GD
D
D
D2 D
D2
DD
D
E D(KT2 + BI)01 D DKT2 0 K 2 D D(K 2 + BI)01 D kKk4 DK 01 GD
D
D2 D2
D
B02 E DKT2 0 K 2 D kKk4 DK 01 GD
D2
D
B02 E kKT 0 Kk2 kKT + Kk2 kKk4 DK 01 GD
By the triangular inequality, kKT + Kk kKT k + kKk. Hence:
h
i
D2
D
(15) B02 E kKT 0 Kk2 (kKT k + kKk)2 kKk4 DK 01 GD :
1
From (1) in Appendix A, we know that jk( 1 ; 2 )j2 4. Similarly, b
kT ( 1 ; 2 ; b
) is bounded such that
C
C
1 C2
Cb
)C 4. Hence:
CkT ( 1 ; 2 ; b
kKk Consequently:
Finally,
sZ Z
jk( 1 ; 2 )j2 ( 1 ) ( 2 ) d 1 d 2 2
sZ Z
C
C
1 C2
Cb
kKT k )C ( 1 ) ( 2 ) d 1 d 2 2
CkT ( 1 ; 2 ; b
D
D2
(15) 16B02 E kKT 0 Kk2 kKk4 DK 01 GD = O(B02 T 01 ):
D D p
D
e T;B D
D = O(B02 T 01 ) 2 O(T 01 ) = O(B01 T 01 ):
DE 9
(16)
fB W 01 9T;0 . Again, using Cauchy-Schwarz twice leads to:
We now consider the term W
0
We have:
D D D D
D
D
D f D D 01
D
fB W 01 9T;0 D
D W
W
9
E
D
D
D
DE W
B
T;0
0
0
s D D2 D
D2 Df D
E DW
E DW001 9T;0 D :
BD
D
D2 D
D
ED2 D2 D 01 D2 D0 01 1D2 D
D
D 01
Db
D
01
01
b
D
D
D
D
D
D
E W0 9T;0
= E DW0
W0
K G; hT (:; 0 ) D E
K G
DhT (:; 0 )D
D
D2 D 01 D2 D0 01 1D2
Db
D
D
D
D
D
K G
E DhT (:; 0 )D = O(T 01 );
= W0
D
D2 D
Db
where E DhT (:; 0 )D = O(T 01 ) follows from (13). Next:
34
D D DA0
D0
1 D2
1
BD2 D f D2
01
01
0 KB01 GD kGk2
0 KB01 G; G D E D KBT
E DW
= E D KBT
BD
D0
1 D2 01
0 KB01 GD = O(B02 T 01 );
= kGk2 E D KBT
where the rate follows from (14) and (15). Hence, by the Cauchy-Schwarz inequality,
D D
D
fB W 01 9T;0 D
DE W
D 0
=
s D D2 D
D2 Df D
01
D
D
9
W
E DW
E
D
B
T;0
0
p
O(B02 T 01 )O(T 01 ) = O(B01 T 01 ):
(17)
0
1
By putting (16) and (17) together, we …nd E [13 ] = Op B01 T 01 so that the squared bias satis…es:
0
1
T Bias:Bias0 = O B02 T 01 :
Higher Order Variance
The dominant terms in the higher order variance are
Cov (11 ; 12 ) + V ar (12 ) + Cov (11 ; 13 ) :
We …rst consider Cov (11 ; 12 ):
2
3
2
3
Cov (11 ; 12 ) = W001 E 9T;0 9T;B (0 )0 W001 0 W001 E 9T;0 90T;0 W001 WB W001 :
From Lemma 16, we have:
3
2
1
B
1 A0 01
KB 0 K 01 G; G = WB :
E 9T;0 90T;B =
T
h
i
and E 9T;0 90T;0 = W0 . Hence,
Cov (11 ; 12 ) =
1
1 01
W WB W001 0 W001 W0 W001 WB W001 = 0:
T 0
T
Now we consider the term Cov (11 ; 13 ):
fB W 01 :
e 0T;B W 01 0 W 01 E 9T;0 90T;0 W 01 W
Cov (11 ; 13 ) = W001 E 9T;0 9
0
0
0
0
35
h
i
e0
We …rst consider E 9T;0 9
T;B . By the Cauchy-Schwarz inequality:
D D
D
e 0T;B D
D DE 9T;0 9
Hence we have:
Also,
s
E k9T;0 k
2
D2 D
T;B D
D
De0
E D9
s DD
ED2 ED2 DD0
1
D
D
D
D
01
01
01
b
b
=
E D K G; hT D E D KBT 0 KB G; hT D
DD
D D ED2 D
D2
D
D D2
D
01
01
hT D DK GD E Db
E D K G; b
hT D = O(T 01 )
DD
D ED2 D0 01
1
1 D2 D
D 0 01
Db D2
D
01
01
b
D
D
E D KBT 0 KB G; hT D
= E
KBT 0 KB G DhT D
s
D D D0
1 D4 D D4
01
01
D
D
E Db
KBT 0 KB G
hT D
E
D D4
D D
We …rst consider Db
hT D :
0
12
2
Z
Z X
Z X
T
T
D D4
1
Db D
b
ht ht ( ) d +
hT ( ) d
= 4@
hT b
ht hs ( ) d A
DhT D =
T
t=1
t6=s
12
0
!
2
T Z
T Z
X
X
1
1
=
+ 4@
ht hs ( ) d A
ht ht ( ) d
4
T
T
t=1
t6=s
0
1
!
T Z
T Z
X
X
1
1
ht hs ( ) d A
ht ht ( ) d @ 2
+2
2
T
T
t=1
t6=s
D D4
D D
Consider the …rst squared term of Db
hT D :
2
1
E4 4
T
T Z
X
t=1
ht ht ( ) d
!2 3
5 =
T
E
T4
" Z
ht ht ( ) d
= O(T 02 )
36
2 #
2
Z
T (T 0 1)
+
ht ht ( ) d
E
T4
The second squared term leads to:
2
E4
2
0
1 @
T4
T Z
X
t6=s
12 3
ht hs ( ) d A 5
3
3
2
2
Z
Z
T Z
T
X
X
1
1
hl hj ( ) d 5
ht hs ( ) d 5 + E 4 4
= E4 4
ht hs ( ) d
T
T
t6=s
t6=s;l6=j;(t;s)6=(l;j)
" Z
2 #
T (T 0 1)
ht hs ( ) d
E
= O(T 02 ); for t 6= s:
=
T4
As the ht s are uncorrelated, the cross-term is equal to zero:
2
T Z
X
4 1
ht ht ( ) d
T2
t=1
!0
13
T Z
X
@ 1
ht hs ( ) d A5 = 0
T2
t6=s
D D D D4
In total, we obtain: E Db
hT D = O(T 02 ).
D0
1 D4 01
0 KB01 GD . Using the same decomposition as in (14) and (15) leads
We now consider E D KBT
to:
Hence:
D0
D0
D4
1 D4 1 D4 D
01
01
0 KB01 K D DK 01 GD
0 KB01 GD
E D KBT
E D KBT
D
D4
D4 D
E D(KT2 + BI)01 (KT 0 K) K D DK 01 GD
D
D4 D4
D
+E D(KT2 + BI)01 0 (K 2 + BI)01 D kKk8 DK 01 GD ;
D
D4
D
D4
D4 D
(18) = E D(KT2 + BI)01 (KT 0 K) K D DK 01 GD B04 E kKT 0 Kk4 kKk4 DK 01 GD
For (19), we use A01 0 B 01 = A01 (B 0 A) B 01 to obtain:
D
D4 D
D4
(19) = E D(KT2 + BI)01 0 (K 2 + BI)01 D kKk8 DK 01 GD
D
D2 D4 D
D2 D
D4
D
E D(KT2 + BI)01 D DKT2 0 K 2 D D(K 2 + BI)01 D kKk8 DK 01 GD
D
D4 D4
D
B04 E DKT2 0 K 2 D kKk8 DK 01 GD
D4
D
B04 E kKT 0 Kk4 kKT + Kk4 kKk8 DK 01 GD
37
(18)
(19)
By the triangular inequality:
D4
D
(19) B04 E kKT 0 Kk4 (kKT k + kKk)4 kKk8 DK 01 GD
D4
D
256B04 E kKT 0 Kk4 kKk8 DK 01 GD ;
due to kKT k 2 and kKk 2.
The rates of (18) and (19) depend on the rate of E kKT 0 Kk4 .
kKT 0 Kk2
C2
Z Z CC X
T
C
1
C
C
t ( 1 ; 2 )C ( 1 ) ( 2 ) d 1 d 2
C
CT
C
t=1
T Z Z
1 X
jt ( 1 ; 2 )j2 ( 1 ) ( 2 ) d 1 d 2
=
2
T
t=1
T Z Z
1 X
+ 2
t ( 1 ; 2 )l ( 1 ; 2 ) ( 1 ) ( 2 ) d 1 d 2
T
(20)
(21)
t6=l
1
where t ( 1 ; 2 ) = kt ( 1 ; 2 ; b
) 0 k( 1 ; 2 ). Hence
E kKT 0 Kk4 E [(20)]2 + 2E ([(20)] [(21)]) + E [(21)]2
r Because E ([(20)] [(21)]) E [(20)]2 E [(21)]2 , we only need to check the rates of the squared
terms. We have:
2
E [(20)]
E
+ T (TT 01)
4
T
E
T4
=
hR R
RR
2 2
jt ( 1 ; 2 )j ( 1 ) ( 2 ) d 1 d 2
i
R R
jl ( 1 ; 2 )j2 ( 1 ) ( 2 ) d 1 d 2 ; for l 6= t:
jt ( 1 ; 2 )j2 ( 1 ) ( 2 ) d 1 d 2
Hence E [(20)]2 = O(T 02 ). Next:
E [(21)]2
" Z Z
2 #
T
1 X
t ( 1 ; 2 )l ( 1 ; 2 ) ( 1 ) ( 2 ) d 1 d 2
E
=
T4
t6=l
1
+ 4
T
T
X
t6=l;n6=j;(t;l)6=(n;j)
E
Z Z
t l ( 1 ) ( 2 ) d 1 d 2
Z Z
n j ( 1 ) ( 2 ) d 1 d 2
with t t ( 1 ; 2 ). Due to the m.d.s property, the last term has expectation zero. Hence:
E [(21)]2
T (T 0 1)
E
=
T4
" Z Z
t ( 1 ; 2 )l ( 1 ; 2 ) ( 1 ) ( 2 ) d 1 d 2
38
2 #
= O(T 02 ):
By putting these together, we obtain E kKT 0 Kk4 = O(T 02 ) so that:
D0
1 D4 0
1
01
(18)+(19) = O B04 T 02 and
0 KB01 GD
E D KBT
s
D D DD
D0
ED2 1 D4 0
1
D D4
D
D
01
01
01
01
b
D
D
E Db
hT D
KBT 0 KB G
E
E D KBT 0 KB G; hT D
=
In total:
D D
D
D
e0
DE 9T;0 9
T;B D =
p
0
1
O (B04 T 02 ) 2 O (T 02 ) = O B02 T 02
s DD
ED2 DD0
ED2 1
D
D
D
D
01
01
01
b
b
E D K G; hT D E D KBT 0 KB G; hT D
p
O(T 01 ) 2 O (B02 T 02 ) = O B01 T 03=2
We now check the rate of the second term of Cov (11 ; 13 ):
D D
D
fB D
D
DE 9T;0 90T;0 W001 W
s D2 D
D D
D 01 f D2
D
D
0
E D9T;0 9T;0 D E DW0 WB D
D
D2 D
D
0
We …rst consider E D9T;0 9T;0 D . By the Cauchy-Schwarz inequality:
D
D2 E D9T;0 90T;0 D = E
!
DD
D D 2
ED
E0 D
D
D
D 01 D4
D D4
01
b
D K 01 G; b
D
D
D
hT
K G; hT D
K G E Db
hT D = O(T 02 )
D
For the second term, we have:
D
D DA0
1
BD2 D 01 f D2
E DW0 WB D
= kW0 k02 E D KB01 0 K 01 G; G D
D0
1 D2 kW0 k02 kGk2 E D KB01 0 K 01 GD = O(B02 T 01 );
according to (14)-(15). Hence,
D D p
D
01 f D
0
E
9
9
W
W
O(T 02 ) 2 O(B02 T 01 ) = O(B01 T 03=2 )
D
B D
T;0 T;0 0
It now remains to …nd the rate of V ar (12 ). We recall that 12 = 0W001 9T;B + W001 WB W001 9T;0 .
We have
2
3
2
3
V ar (12 ) = W001 E 9T;B 90T;B W001 0 W001 E 9T;B 90T;0 W001 WB W001
3
3
2
2
0W001 WB W001 E 9T;0 90T;B W001 + W001 WB W001 E 9T;0 90T;0 W001 WB W001 :
h
i
Replacing E 9T;0 90T;B =
1
T WB
h
i
and E 9T;0 90T;0 =
39
1
T W0 ,
we see immediately that the last two terms
cancel out so that
2
3
V ar (12 ) = W001 E 9T;B 90T;B W001 0 W001 WB W001 WB W001 :
For the …rst term of V ar (12 ), we use Lemma 16 to obtain:
hD0
E D0
Ei
1
2
3
1
hT
KB01 0 K 01 G; b
hT
KB01 0 K 01 G; b
E 9T;B 90T;B = E
!2
X
B2
A
1
0
1
B
1 A0 01
1
j
=
0
KB 0 K 01 G; KB01 0 K 01 KG =
G;
j
j
T
2j + B j
j
!2
B2
B
A
A
X G; j 2
X
j
1 2 2C+1
1
2C+1 G; j
j
0
sup
:
=
0
2
2j + B j
1 + B
2C
2C
j
j
j
j
We focus on the square-root of
sup
1
2 +B
0
1
0
2 + B
1
2
2C+1 , namely:
(2C+1)=2
= sup
1
2
10 2
+B
C01=2 :
Case where C 5=2
sup
1
2
10 2
+B
C01=2 = B sup
1
C01=2
C05=2
B sup C05=2 B1
:
2
+B
1
Case where C < 5=2
We apply the change of variable x = B=2 and obtain
sup
1
10
2
2
+B
The function f (x) =
C01=2 = sup 1 0
x0
0 2C01
x
4
1+x x
1
1+x
C01=2
2C01
2C01
B
x
2
x0 4 :
= B 4 sup
x
x0 1 + x
is continuous and hence bounded for x away from 0 and in…nity. When
x goes to in…nity, f (x) goes to zero because 2C 0 1 > 0: When x goes to zero, f (x) =
x
502C
4
1+x
goes to
R+ .
zero
f (x) is bounded on
In conclusion, the rate of convergence of
because 5 0 2C > 0. Hence, 2C01
min(2; 2 ) 01
0
T : Note that this rate is an equivalent, not a big O.
E 9T;B 9T;B is given by: B
2C01
For the second term of V ar (12 ), we use the fact that WB = O Bmin(1; 2 ) according to Equation
(5) in Lemma 14:
1 01
W WB W001 WB W001 =
T 0
2C01
2C01
1
2 O (1) 2 O Bmin(1; 2 ) 2 O (1) 2 O Bmin(1; 2 ) 2 O (1)
T
= O Bmin(2;2C01) T 01 :
40
Optimal Rate for B
0
1
Note that the bias term T Bias 3 Bias0 = O B02 T 01 goes to zero faster than the covariance
0
1
term T Cov (11 ; 13 ) = O B01 T 01=2 . Hence the optimal B is the one that achieves the best trade-o¤
1
between T V ar (12 ) Bmin(2;C0 2 ) which is increasing in B and T Cov (11 ; 13 ) which is decreasing in
B. We have
1
Bmin(2;C0 2 ) = B01 T 01=2 ) B3 = T
1
)
0 max( 61 ; 2C+1
:
Note that this rate satis…es B01 T 01=2 = o(1):
C
1
Consistency of B
bT M b
We …rst prove the following lemma.
Lemma 18 :Under Assumptions 1 to 5, b
T (B; 0 ) is once continuously di¤erentiable with respect to B
and twice continuously di¤erentiable with respect to 0 and BT (0 ) is a continuous in 0 .
b T (B; ) involves the following operator:
Proof of Lemma 18: The objective function Q
01 b
KBT
hT (:; ) =
T
X
j=1
bj
B+
b2j
D
E
b
b b
hT (:; ); j
j
b is the eigenfunction of KT associated with the eigenvalue where bj . By assumption 3, the moment
j
b
function hT (:; ) is three times continuously di¤erentiable with respect to ; the argument with respect
to which we minimize the objective function of the CGMM. By assumption 5, xt = x (xt01 ; 0 ; "t ) where
r is three times continuously di¤erentiable with respect to 0 (the true unknown parameter) and "t is
an IID white noise whose distribution does not depend on 0 . Thus as an exponential function of xt ; the
moment function is also three times continuously di¤erentiable with respect to 0 : Thus Assumptions
3 and 5 imply that the objective function of the CGMM is three times continuously di¤erentiable with
respect to and 0 . Now we turn our attention toward the di¤erentiability with respect to B. It is easy
to check that
01 b
@ 3 KBT
hT (:; )
e BT b
=K
hT (:; )
@B3
0
1
e BT 0 K 2 + BT I 02 KT which is well de…ned on L2 () for BT …xed. When BT goes to zero,
where K
T
CD
EC
C
C e b
b
h
(:;
);
h
(:;
)
we have to be more careful. We check that C K
C is bounded. We have
T
BT T
CD
DD
D
DD
D D0
D2
EC
102
D Db
De b
D Db
C
D D 2
C e b
D
h
(:;
)
+
B
I
K
K
h
(:;
)
hT (:; ) C DK
h
(:;
)
D
D
D
D
C KBT hT (:; ); b
D
D
D
BT T
T
T
T
T
T
D
D
D
D
D
D0
101=2
103=2 DD0 2
DD
D2
D
hT (:; )D
KT DDb
= D KT2 + BT I
DD KT + BT I
{z
}|
{z
}| {z }
|
1
03=2
BT
03=2
= Op BT
T 01 = op (1) ;
41
=Op (T 01 )
b T (B; ) is once continuously
where the last equality follows from Theorem 2(ii). This shows that Q
di¤erentiable with respect to B and three times continuously di¤erentiable with respect to . By
b T (B; ) is once continuously di¤erentiable with
the implicit function theorem, b
T (B; 0 ) = arg minQ
respect to B and twice continuously di¤erentiable w.r.t. 0 . The MSE b
T (B; 0 ) is an expectation of
a quadratic function in b
T (B; 0 ). Hence 6T (B; 0 ) is also once continuously di¤erentiable w.r.t. B
and twice continuously di¤erentiable w.r.t. 0 : Finally, the Maximum theorem implies that BT (0 ) =
arg min6T (B; 0 ; ) is continuous w.r.t. 0 :4
B2[0;1]
Proof of Theorem 3: Using Assumption 6, we see that
1
BT (b
)
BT (0 )
1
=
1
c(b
)
c(0 ) .
Moreover by Lemma
18, BT () and hence c () are continuous functions of . Since b
is a consistent estimator of 0 ; the
continuous mapping theorem implies that
1
c(b
) P
c(0 ) !
1 as T ! 1.4
Proof of Theorem 4: Here, we consider the expression of the MSE given by (15) but the same
b T M (B; 0 ; ) =
proof can be easily adapted to the expressions given by (16) and (17). Consider 6
0
1
PM
T
b;T M where j;T (B) j;T (B; 0 ) is IID (across j) and continuous
j=1 j;T (B) 1 jT (B) < n
(10)M
in B. We have:
=
b T M (B; 0 ; ) 0 6T (B; 0 ; )
6
M
X
2
0
1
0
0
113
T
j;T (B) 1 jT (B) < n
b;T M 0 E j;T (B) 1 jT (B) < n;T
;
(1 0 )M
j=1
where n;T = lim n;T M .
M !1
If we can show that there exists a function bT > 0 independent of B such that
D
D
D @ j;T (B) D
D
D
D @B D < bT ;
(22)
and E (bT ) < 1; then, by Lemma 2.4 of Newey and McFadden (1994), we would have
C
C
Cb
C
01=2
sup C6
(B;
;
)
0
6
(B;
;
)
=
O
M
:
C
0
0
p
TM
T
B2[0;1]
and it would follow from Theorem 2.1 of Newey and McFadden (1994) that B
b T M (0 ) 0 BT (0 ) =
0 01=2 1
0 01=2 1
B
b T M (0 )
. This would imply that BT (0 ) 0 1 = Op M
Op M
, given that BT (0 ) is bounded away
from zero when T is …xed.
Let T be any of the j;T ; j = 1; :::; M . To prove inequality (22), we …rst compute:
0
@b
T (B; 0 ) b
@ T (B)
=2
T (B; 0 ) 0 0
@B
@B
42
where by the implicit function theorem:
@b
T (B; 0 )
=0
@B
b T (B; )
@2Q
@@0
!01
b T (B; )
@2Q
:
@@B
The expressions involved are:
b T (B; )
@2Q
@@0
b T (B; )
@2Q
@@B
=
=
D
D
E
E D
01 b
b T (:; 0 ); b
b T (:; 0 ) + K 01 H
h
(:;
)
;
KBT
GT (:; 0 ); G
0
T
BT
E D
E
3 b
3 b
b T (:; 0 )
GT (:; 0 ); b
hT (:; 0 ) + KBT
hT (:; 0 ); G
KBT
0
1
3 0 K 2 + BI 02 K . Next recall that for …xed T , B ( ) is bounded away from zero so
and KBT
0
T
T
T
that there exists a sequence BT such that BT (0 ) BT for all T .
Hence, the minimization problem for the selection of B may be re-written as B (0 ) = arg min6T (B; 0 ; );
so that 6T (B; 0 ; ) is a bounded function of B on the choice set. Because
b T (B;;0 )
@2Q
@@0
B2[BT ;1]
2 b (B;; )
0
T
and @ Q@@B
T (B)
are continuous
to B, @@B
is also continuous
D
D with respect to B. Hence, we indeed have
D with respect
D
D @T (BT ) D
D @T (B) D
kbT (BT )k = D @B D < 1 where BT = arg sup D @B D :4
B2[BT ;1]
Proof of Theorem 5: We …rst make the following decomposition
1
B
b T M (b
)
01=
BT (0 )
!
1
BT (b
)
01 +
BT (0 )
1
BT (b
)
01
BT (0 )
!
!
1
B
b T M (b
)
01 +
1
BT (b
)
!
1
B
b T M (b
)
01 :
1
BT (b
)
By …rst letting M go to in…nity, we obtain the result of Theorem 4 which has been proved for …x T :
B
b T M (0 )
01=2 ). Next, we
BT (0 ) 01 = Op (M
1
BT (b
)
01=2 ). The product
BT (0 ) 0 1 = Op (T
let T go to in…nity in order to obtain the result of Theorem 3:
1
1
BT (b
)
B
b T M (b
)
of BT (0 ) 0 1 and
0 1 is negligible with respect to
b1
either of the other terms. Thus, it follows that
B T ( )
1
B
b T M (b
)
BT (0 )
0 1 = Op (T 01=2 ) + Op (M 01=2 ).4
Proof of Theorem 6: The mean value Theorem yields:
@b
(B)
b
(b
BT M 0 BT ) ;
(b
BT M ) 0 b
(BT ) =
@B
where B lies between B
b T M and BT and BT is bounded away from zero, i.e., 9 BT > 0 : BT BT 1; 8
T . From the proof of Theorem 4, we know that b
(B) is continuously di¤erentiable with respect to B.
This implies that:
D
D
D
D
D @b
D
D
D @b
(B)
(B)
D
D
D
D
D < sup D
D = Op (1):
D
D @B D B2[BT ;1] D @B D
43
Consequently, the rate of b
(b
BT M ) 0 b
(BT ) is determined by the rate at which B
b T M 0 BT converges to
zero. We have:
B
b T M 0 BT = BT
provided that M T: Hence:
B
bT M
01
BT
= c (0 ) T
0g(C)
B
bT M
01
BT
= Op (T 0g(C)01=2 );
@b
p (B) p
(b
BT M ) 0 b
(BT ) =
T b
T (b
BT M 0 BT ) = Op (T 0g(C) ) = op (1);
@B
p p (b
BT M ) 0 0 and T b
(BT ) 0 0 have the same asymptotic distribution.
which shows that T b
D
Numerical algorithms: Computing the objective function of the
CGMM
The moment function ht (; ) 2 L2 () for any …nite measure . Hence, we can take ( ) to be the
standard normal density up to a multiplicative constant: ( ) = exp f0 0 g : We have:
^ T (; ) =
KT h
Z
Rd
8
9
b
kT (s; ) b
hT (; s) exp 0s0 s ds:
This integral can be well approximated numerically by using the Gauss-Hermite quadrature. This
amounts to …nd m points s1 ; s2 ; :::sm and weighs ! 1 ; ! 2 ; :::! m such that:
Z
0
P (s) expf0s sgdx =
Rd
m
X
! k P (sk )
k=1
for any polynomial function P (:) of order smaller than or equal to 2m 0 1. See for example Liu and
Pierce (1994).
If f is di¤erentiable at any order (for example an analytic function), it can be shown that for any
positive " arbitrarily small, there exist m such that:
CZ
C
m
C
C
X
C
C
f (s) expf0s0 sgdx 0
! k f (sk )C < ":
C
C Rd
C
k=1
The choice of the quadrature point does not depend on the function f . The quadrature points and
weights are determined by solving:
Z
sl expf0s2 gds =
n
X
k=1
! k slk for all l = 1; :::; 2n 0 1
44
Applying that method to evaluate the above integral, we get
KT b
hT (; ) m
X
k=1
!k b
kT (sk ; ) b
hT (; sk ) :
0
cT denote the matrix with elehT (; sk ) ; b
hT (; sk ) ; :::; b
hT (; sk ) and W
Let b
hT () denote the vector b
ments: Wjk = ! k b
kT (sk ; sj ). Thus we can simply write:
cT b
KT b
hT () W
hT () :
cT can be looked at as the best …nite dimensional reduction
For any given level of precision, the matrix W
01
of the operator KT . From the spectral decomposition of KBT
; it is easy to deduce the approximation:
01
01 b
cT b
c 2 + BI
W
hT () e
hT () :
KBT
hT () W
T
Finally, the objective function of the CGMM is computed as:
D
E Z
01 b
KBT
hT (; :); b
hT (; :) =
m
C
C2
C
C2
X
8 0 9
C 01=2b
C
Ce
C
K
h
(;
)
exp
0
d
h
(;
s
)
!
C BT
C
T
k C
kC T
k=1
where e
hT (; sk ) is the k th component of e
hT ().
45