A Reduced Form Representation for State Space Models

A Reduced Form Representation for State Space Models
Nelson Lind∗
May 29, 2014
Abstract
Estimating structural state space models with maximum likelihood is often infeasible. If the model can be expressed as a reduced form vector-autoregression (VAR) in
the observable data, then two step techniques such as minimum chi-square estimation
can reliably recover structural parameter estimates. However, macroeconomists cannot always rely on the existence of a VAR reduced form – as is often the case when
estimating dynamic stochastic general equilibrium models.
This paper introduces a reduced form representation for general state space models. I use a prediction based criterion to normalize the state vector and generate the
reduced form. In particular, I force the state vector to be the best linear recursive
predictor of the data. As a byproduct, this representation is intrinsically related to
orthonormalized partial least squares regression. Also, a pair of quadratic parameter restrictions characterizes the representation, which enables constrained maximum
likelihood estimation.
This reduced form disentangles statistical inference from structural identification,
greatly clarifying and simplifying the analysis of a broad class of dynamic structural
econometric models. Using these results, economists can use a reduced form approach
to structural estimation without needing a VAR representation.
∗ [email protected]
1
1
Introduction
Time series analysis often relies on linear state space models. Estimation proceeds along
two broad themes. If the model can be re-cast as a vector-autoregression (VAR), then linear
regression provides a simple and reliable estimation method. Otherwise, the Kalman filter
can be used to evaluate the model’s likelihood, facilitating maximum likelihood estimation. The later approach can be numerically problematic. The VAR reduced form greatly
improves the reliability of estimation, and macroeconomists often work exclusively with
VAR representations of observable data1 . But this focus ignores many types of time series
which are predicted by economic theory. Indeed, the vast majority of dynamic structural
econometric models that arise from economic theory do not have VAR reduced forms.
This paper enables the reliable estimation of general state space models by introducing
reduced form representations for any covariance stationary Gaussian time series with a finite
state space. The reduced form is constructed by requiring the state vector in a state space
representation of observable data to be the best linear recursive predictor (BLRP) of the
data2 . This recursive restriction defines a sequence of nested state space models of lower
order which are pinned down by the property that they generate optimal forecasts of the
observable data. Imposing this property on the state vector leads to a well defined reduced
form state space representation.
This normalization also is directly linked to orthonormalized partial least squares regression3 . As a result, this BLRP reduced form representation can be directly estimated from
observable data. This property makes it particularly convenient because regression can be
used as the first step in minimum chi-square estimation of structural state space models (as
in Hamilton and Wu (2012)).
This approach disentangles statistical inference from structural identification, greatly
clarifying and simplifying the analysis of a broad class of dynamic structural econometric
models. Reduced forms are a standard tool in econometric analysis and greatly facilitate
clean identification analysis. As discussed in Christ (1994), the existence and use of a reduced
form is a fundamental tenant of the Cowles Commission approach to econometrics. By
estimating models via their reduced form, economists can clearly differentiate between issues
of statistical inference and issues of structural identification. Given the ubiquitousness of
identification issues in structural macroeconometrics, highlighted by Canova and Sala (2009)
and studied closely by Morris (2013), this reduced form representation is an important tool
for assessing identification in dynamic models.
1 For instance, Giacomini (2013) discusses the important connection between DSGE analysis and VAR
representations.
2 The seminal use of this type of predictive efficiency criterion is Rao (1964)
3 See Sun et al. (2009) for references for OPLS. Also note that their results imply an alternative reduced
form. In particular, a parallel reduced from based on canonical correlations analysis arises from a simple
unitary transformation of the BLRP reduced form.
2
The paper is organized as follows. In section 2, I examine state space representations
of finite Gaussian covariance stationary time series. Section 3 contains the main result of
the paper. It introduces a prediction based criterion for differentiating between possible
state variables and shows how this criterion leads to a reduced form representation of the
observable data. Section 4 discusses estimation of the reduced form using orthonormalized
partial least squares and maximum likelihood. I present a result that characterizes BLRP
representations using a system of quadratic matrix equations, which enables constrained
likelihood estimation. Section 5 examines how to map a structural model into the BLRP
representation so that the question of identification reduces to inverting a non-linear system
of equations.
2
State-Space Representations
Suppose that {Yt } is an RM -valued Gaussian covariance stationary time series. Without
loss of generality, assume it is mean zero. The goal of this paper is to find a reduced-form
state space representation of this time series which is useful for estimation.
A reduced form for the process is a parametrization which precisely characterizes its
probability law:
Definition 1 A reduced form of {Yt } is a probability law P ({y}; θ) and parameter space
Θ ⊂ RK such that for every trajectory {y}, P ({y}; θ) = P ({y}; θ0 ) if and only if θ = θ0 .
The reduced form summarizes the stochastic properties of the time series as points in the
parameter space. The reduced form is useful because it allows an econometrician to use
a single parameter estimate (and the standard errors of that estimate) to represent all of
the information contained in a dataset. From there, this summary statistic can be used
to answer economic questions through structural assumptions. This allows the problem of
statistical inference to be separated from the problem of structural identification.
Gaussian time series are conveniently represented using linear state space systems. Moreover, dynamic economic theory often leads to state space models. With this motivation, I
will look for a reduced form whose parameters come from a state space representation of
the time series.
To carefully define a state space representation, I follow Faurre (1976) closely in defining
the state space of the time series.
Definition 2 The state space of {Yt } is defined as the collection of Gaussian time series
spanned by the conditional expectation of Yt :
(
X≡
)
∞
X
∞
M Z+
{Zt } ∃{α` }`=0 ∈ (R ) s.t. Zt =
α` Et Yt+`
`=1
3
where Et [·] ≡ E[· | σ(Yt , Yt−1 , . . . )].
Note that due to the Gaussian assumption, the conditional expectation operator is equivalent
to the linear projection onto the history of the time series. Identifying two Gaussian time
series as equivalent if they have identical moments, the dimension of this set is the state space
dimension which is also sometimes referred to as the McMillan degree (see Pugh (1976)).
This dimension is the number of linearly independent time series necessary to describe (via
linear combination) an element of the state space.
I assume that the state space of the time series is finite:
Assumption 1 The state space dimension of {Yt } is finite: dim(X) ≡ N < ∞
This assumption means that any time series in the state space can be expressed as the
linear combination of exactly N time series. Just as a set of vectors can represent a given
sub-space of Euclidean space, N distinct Gaussian time series can be used to represent each
time series in the state space.
A set of N variables which can combine to represent any given time series of the state
space space provide a basis, represented by an N dimensional time series. A state time
series underlying {Yt } is any RN -valued time series {ξt } whose component time series span
the state space.
The state space is pinned down by {Yt }. However, the basis used to represent the state
space is not unique. Any invertible transformation T defines a new equally valid state time
series: {ξ˜t } = {T ξt }. Just as a linear sub-space of Euclidean space can be represented
by many different basis vectors, many different state time series can be chosen which all
represent X. Resolving this indeterminacy is a central issue underlying finding a reduced
form state space representation for {Yt }.
Once a state time series is selected which spans the state space, we can decompose the
time series by projecting ξt onto ξt−1 and Yt onto ξt−1 . This decomposition delivers a state
space representation of the time series.
Definition 3 A state space representation of {Yt } is a collection of matrices G = (A, C, Q, S, R)
such that there exists a Gaussian time series {ξt } in RN , Gaussian white noise {εξt } in RN ,
and Gaussian white noise {εYt } in RM satisfying:
"
ξt
=
Aξt−1 + εξt
Yt
#
=
Cξt−1 + εYt
" # "
0
Q
N
, 0
0
S
εξt
εYt
iid
∼
4
S
R
#!
This representation always exists under assumption 1 and its form is uniquely determined
by the choice of {ξt }. Note that this definition only requires that the components of {ξt }
span the state space. There is no requirement that the history of {Yt } span {ξt }. That is,
the components of {ξt } are not necessarily in X.
This form for a state space representation is as general as possible, and it is important
to start from here in order to correctly reduce structural models to their reduced form
representation. The generality of the present representation has a simple interpretation
based on population regressions, which shows that this representation is solely defined by
the choice of the state time series {ξt }.
In particular let ξ1,t , . . . , ξN,t be any Gaussian time series which spans the state space
X. Specifically, for any {Zt } ∈ X there exist some weights ω1 , . . . , ωN satisfying Zt =
ω1 X1,t + · · · + ωN XN,t . Define a state time series {ξt } by setting ξt = (ξ1,t , . . . , ξN,t )0 . The
associated state space representation of {Yt } has coefficient matrices and white noise shocks
determined by:
"
A
0
0
≡ E[ξt ξt−1
]E[ξt−1 ξt−1
]−1
εξt
≡ ξt − Aξt−1
C
0
0
≡ E[Yt ξt−1
]E[ξt−1 ξt−1
]−1
Q
εYt
#
S
S0
R
≡ Yt − Cξt−1
" # " #0
εξ
εξt
≡ E Yt
εt
εYt
The dynamics of the state time series are fully described by the coefficients from a population
regression of ξt on ξt−1 . The innovations to the state time series are the residuals from this
population regression. Similarly, the relationship between the observable time series and
the state time series is fully described by the coefficients from a population regression of Yt
on ξt−1 and the white noise component is the regression residual.
This construction delivers a unique state space representation associated with each possible choice of state time series. This generality allows for arbitrary structures where latent
shocks may not be spanned by observable data, which happens often when estimating structural macroeconometric models.
Since the goal is to generate a reduced form for the observable data, any extraneous
information contained in the state vector that is not contained in the observed data allows
for indeterminacy in the timing of information. To avoid this problem, the first step I
consider is to standardize the timing convention. To do so, I perform a change of variables
which puts the state space into its innovations representation (see Anderson and Moore
(2012) and Hansen and Sargent (2013)).
This form replaces the state vector with the best linear forecast of the state vector using
5
observed data up through the current period. I define this new state variable as Xt ≡ Et ξt .
Then from the properties of conditional expectation:
Xt
=
AEt ξt−1 + Et εξt = AXt−1 + A(Et ξt−1 − Xt−1 ) + Et εξt
|
{z
}
Yt
=
CEt ξt−1 + Et εYt = CXt−1 + C(Et ξt−1 − Xt−1 ) + Et εYt
|
{z
}
=BVt
=Vt
where Vt is some white noise random variable spanned by the current data and orthogonal to
all past data. In particular, it is the forecast innovation Vt ≡ Yt −Et−1 Yt . The coefficient B is
the population regression coefficient from a regression of the term A(Et ξt−1 −Xt−1 )+Et εξt on
Vt . This regression has no residual because Vt is a sufficient statistic for all new information
that came available in the current period, and the term being forecasted is spanned by
information available in observable data at time t.
Indeed, the coefficient B is nothing other than the steady state Kalman gain. It is determined from the previous parameterization of the model by solving a system of Lyupanov
and Ricatti equations:
B
=
(E[ξt Yt0 ] − AP C 0 )(E[Yt Yt0 ] − CP C 0 )−1
P
=
AP A0 + (E[ξt Yt0 ] − AP C 0 )(E[Yt Yt0 ] − CP C 0 )−1 (E[ξt Yt0 ] − AP C 0 )0
E[ξt Yt0 ]
= AE[ξt ξt0 ]C 0 + S
E[Yt Yt0 ]
= CE[ξt ξt0 ]C 0 + R
E[ξt ξt0 ]
= AE[ξt ξt0 ]A0 + Q
This representation always exists and is unique given that {Yt } is Covariance stationary
(non-deterministic). This mapping from the arbitrary state space representation to the
innovations representation will be central when mapping structural models to reduced form
state space estimates.
For the purpose of estimation, I take the innovations representation as the point of
departure. Indeed, by imposing that each component of the state time series is an element
of the state space, the resulting state space representation is necessarily an innovations
representation. As shown in Faurre (1976), this representation of the data corresponds to
minimizing the variance of the state variable while holding fixed the matrices A and C. I
summarize this result as a lemma:
Lemma 1 (Faurre (1976)) For any Gaussian covariance stationary time series {Yt } which
has a state space representation of G = (A, C, Q, S, R), there exists a unique innovations
6
representation with parameter matrices H = (A, B, C, Ω) of the form:
Xt
Yt
=
AXt−1 + BVt
=
CXt−1 + Vt
iid
N (0, Ω)
∼
Vt
This representation arises from holding constant the matrices A and C and choosing the
state time series with minimal possible variance.
This lemma directly reduces the parameterization of the model without restricting the properties of the original time series {Yt }. Now, the parameterization of H = (A, B, C, Ω) is
sufficient to fully describe the observed time series.
In summary, for any given Gaussian covariance stationary time series with finite state
space, a given choice of state time series which spans the state space defines a unique state
space representation. Then, by removing any parts of the state time series which cannot be
predicted by the observable data the representation is uniquely reduced to this innovations
representation.
However, since the state space basis is inherently indeterminate, this representation
is also indeterminate – any invertible transformation defines a change of basis and another
equally valid state time series: X̃t = T Xt . In fact, any two innovations representations built
on N basis elements of X (minimal representations) are always related by some invertible
matrix T .
Remark 1 Given any two (minimal) state space representations G = (A, C, Q, S, R) and
G̃ = (Ã, C̃, Q̃, S̃, R̃) or given their associated innovations representations H = (A, C, C, Ω)
and H̃ = (Ã, B̃, C̃, Ω̃), there exists an invertible transformation matrix T such that
(Ã, C̃, Q̃, S̃, R̃)
=
(T AT −1 , CT −1 , T QT 0 , T S, R)
(Ã, B̃, C̃, Ω̃)
=
(T AT −1 , T B, CT −1 , Ω)
These transformation matrices relate all possible choices of state time series . Any singular
transformation would drop a necessary dimension of the state space, and any extra sources
of variation, whether from extraneous white noise or unnecessary state variables, will lead
to a non-minimal representation.
To develop a reduced form state space representation, we need to standardize the choice
of basis and thereby standardize the state space representation. This remark clarifies that
an appropriately defined transformation matrix can be used to standardize the state space
basis and define a reduced form state space representation.
7
3
A Prediction Based Reduced Form Representation
I suggest choosing the standard basis so that the components of the state are the best
linear recursive predictors of the future path of the data. Specifically, the p-th component
of Xt is the best linear predictor of {Yt+` }∞
`=1 given the previous p − 1 components of Xt .
I show that this choice leads to a reduced form state-space representation of the observable
data.
In control theory, this choice of basis leads to internally balanced model reduction (Arun
and Kung (1990)). This approach to state space modeling was first introduced by Fujishige
et al. (1975) who demonstrated that the basis maximizes predictive efficiency at each possible
truncation of the state vector. This optimal prediction property is actually sufficient to
define the basis because it provides a variable selection criterion to recursively construct a
standardized state space representation.
The formal definition of a best linear recursive predictor is as follows:
Definition 4 A state time series, {Xt∗ }, is a best linear recursive predictor (BLRP) for {Yt }
if for each p = 1, . . . , N and any alternative p-th component {X̃p,t } ∈ X
E
∞
X
∗
∗
kEt Yt+` −Π(Yt+` | X1,y
, . . . , Xp−1,t
, X̃p,t )k22 ≥ E
`=1
∞
X
∗
∗
kEt Yt+` −Π(Yt+` | X1,t
, . . . , Xp,t
)k22
`=1
(1)
where Π is the linear projection operator.
A state time series is a BLRP if any modification of the p-th component of the time series,
while holding fixed the previous p − 1 components, leads to inferior forecasts of the future
path of observable data. This is a recursive condition which defines a series of nested
forecasting models based on successive model truncation. Note that without the recursive
choice of the state vector, it would always be possible to adjust two state components
simultaneously and keep the forecast error variance constant. This indeterminacy is avoided
by choosing the components sequentially.
This definition leads directly to a procedure for constructing a BLRP state time series.
Since the candidate predictors must come from the state space of the model they can be
expressed as linear combinations of the components of a state time series associated with
an innovations representation of the data.
Fix some state time series {Xt } with associated innovations representation H = (A, B, C, Ω).
If {Xt∗ } is a BLRP state time series there must be a transformation matrix T ∗ such that
Xt∗ = T ∗ Xt . Since the components of Xt∗ must satisfy the recursive best predictor condition, this matrix can be constructed by sequentially choosing the rows of T ∗ to ensure that
condition (1) is satisfied.
8
To make condition (1) operational, I calculate closed form expressions for the condi∗
∗
tional expectation, Et Yt+` , and the linear forecast, Π(Yt+` | X1,t
, . . . , Xp,t
). The conditional
expectation is the best possible time t forecast of Yt+` and is identical to the best linear
forecast based on the full history of the data, or equivalently based on the state vector in
an innovations representation. The linear projection using only the first p components of
Xt∗ is equivalent to the least squares prediction based on truncating the BLRP state vector
to its p-th component.
The state space representation associated with {Xt } gives the optimal forecast: Et Yt+` =
∗
Et CXt+`−1 = CA`−1 Xt . Denote the p − th row of T ∗ by τp0 and its first p − 1 rows by Tp−1
.
Let LL0 = E[Xt Xt0 ] be any square root decomposition of the covariance matrix of the state.
Note that L−1 exists because Xt contains N linearly independent time series ensuring that
E[Xt Xt0 ] is full rank. Then the best linear forecast for Yt+` using the state vector truncated
to its first p components is:
"
Π(Yt+` |
∗
∗
X1,t
, . . . , Xp,t
)
`−1
=
E CA
=
CA`−1 L
`−1
= CA
L
∗
Tp−1
Xt Xt0
!0 #
"
∗
Tp−1
!
Xt Xt0
∗
Tp−1
!0 #−1
E
τp0
τp0
τp0
!0 "
!
!0 #−1
!
∗
∗
∗
∗
Tp−1
L
Tp−1
L
Tp−1
L
Tp−1
L
τp0 L
!
τp0 L
Up−1
!0
Up−1
u0p
u0p
τp0 L
τp0 L
∗
Tp−1
τp0
L−1 Xt
L−1 Xt
∗
where Up−1 is an orthornormal basis for the range of (Tp−1
L)0 , and up is a unit vector
that is orthogonal to Up−1 . This expression reveals that the choice of the p-th component
is equivalent to choosing a new predictor from the state space which is orthogonal to the
previous p − 1 predictors.
Given that Xt∗ is a BLRP, any deviation of the choice for up must increase the mean
square forecast error, and so the choice of up must solve:
min
up
s.t.
∞ X
CA`−1
`=1
u0p up
Xt − L
Up−1
u0p
!0
Up−1
u0p
!
L
−1
!2
Xt = 1, Up−1 up = 0
Applying the properties of the trace operator, it is possible to show that solving this problem
is equivalent to solving
max
up
s.t.
0
u0p L
W L} up
| {z
≡W ∗
0
up up = 1,
9
Up−1 up = 0
!
Xt
where W is the unique solution to the Lyuponov equation W = A0 W A + C 0 C (i.e. W =
P∞
0 `−1 0
C CA`−1 ). Therefore, each basis vector up is an eigenvector of the matrix
`=1 (A )
W ∗ . The matrix U = (u1 , . . . , uN )0 is found from an eigendecomposition4 : U Σ2 U 0 = W ∗
where Σ = diag(σ1 , . . . , σN ) with σ1 ≥ · · · ≥ σN . Note that the choice of orthornormal
basis defined by U is unique up to reflections. Each row of U can be multiplied by −1 to
give another valid unit basis vector. Also, the ordering may not be unique when there are
repeated eigenvalues. The product U Σ2 U 0 does not change when switching the order of two
rows in U which correspond to two equal eigenvalues. These permutations of U and Σ lead
to equally valid eigendecompositions. Resolving these indeterminacy issues will be essential
to constructing a reduced form representation.
The implied transformation matrix relating the initial state choice to a BLRP state
choice is
T ∗ = Σ1/2 U 0 L−1
Each component matrix has an interpretation. First, the inverse root L−1 standardizes and
orthogonalizes the covariance structure of the given state vector since E[L−1 Xt (L−1 Xt )0 ] =
L−1 E[Xt Xt0 ](L−1 )0 = L−1 LL0 (L0 )−1 = IN . This transformation undoes the specific choice
of the covariance structure implied by the initial state space representation. Next, the
matrix U 0 combines these standardized components to create a BLRP state vector whose
components are ordered in terms of their prediction ability. Finally, the diagonal matrix
Σ1/2 weights each component according to how strongly it predicts the future path of the
observable data.
If the resulting state space basis is unique, then the transformed state space representation is a reduced form. Up to reflections which multiply the rows of U by −1 and
permutations of the rows of U which correspond to repeated eigenvalues, this representation is unique, which means a state space representation based on a BLRP state time series
is almost a reduced form.
This result is summarized as a lemma:
Lemma 2 For any Gaussian covariance stationary time series {Yt }, the set of BLRP state
time series is finite. Any two BLRP are related by combining sign changes to the components
of the state vector with permutations of those components which have equal explanatory
power.
This lemma significantly simplifies the task of finding a reduced form state space represen4 The eigenvectors contained in U and the eigenvalues in Σ arise from a principle components analysis
+
+
+ 0 0
0 , Y 0 , . . . )0 is the future
of the matrix (E[Xt Xt0 ]1/2 )0 E[Xt Yt+1
(Yt+1
) Xt ]E[Xt Xt0 ]1/2 where Yt+1
= (Yt+1
t+2
observable data. This is equivalent to performing orthonormalized partial least squares regression to forecast
the future history of the data using the arbitrary state vector Xt . Provided infinite data is available, this
+
is also equivalent to orthonormal partial least squares to forecast Yt+1
using N linear combinations of the
−
0
0
0
history: Yt = (Yt−1 , Yt−2 , . . . ) . The initial step of the estimation procedure in section 4 is based on this
observation.
10
tation. Because there are only a finite number of possible BLRP state time series associated
to the observable data and they are all related by basis reflections and state vector reordering, finding a reduced form only requires normalizing the basis orientation and state
vector ordering.
There are two loose ends that need to be accounted for in order to establish this result.
First, the choice of the square root matrix L must not impact the transformation T ∗ . Second,
the initial state time series must not influence the basis that is selected by this procedure,
meaning that the BLRP state should not depend on the initial choice of state vector.
The first loose end can be accounted for by using the fact that all matrix square roots are
related by unitary transformations. All possible alternative roots must take the form LV
with V 0 V = V V 0 = IN . When using the alternative root given by LV , the weight matrix
is: (LV )0 W LV . The eigendecomposition of this matrix is V 0 L0 W LV = V 0 U Σ2 U 0 V . The
resulting transformation matrix is unchanged since: Σ1/2 U 0 V (LV )−1 = Σ1/2 U V V 0 L−1 =
Σ1/2 U L−1 = T ∗ . The rotation exactly cancels with itself, making the transformation relating an initial representation to the BLRP representation independent of the choice of the
matrix square root.
To resolve the second loose end, notice that if we had started from any alternative initial
state time series, say X̃t = SXt with S some invertible matrix, then, using the same square
root for E[Xt Xt0 ], the new state covariance matrix is E[X̃t X̃t0 ] = SLL0 S 0 . The new weight
matrix doesn’t depend on S:
0
LS
0
= LS
= L
0
0
0
∞
X
[(SAS −1 )0 ]`−1 (CS −1 )0 CS −1 (SAS −1 )`−1
`=1
∞
X
`=1
∞
X
!
SL
!
S
−1
0 `−1
(A )
SS
−1
0
C CS
−1
`−1
SA
S
−1
SL
!
0 `−1
(A )
0
`−1
C CA
L = L0 W L = W ∗
`=1
Since the matrix W ∗ is invariant to the initial representation, the eigenvalues contained
in the diagonal matrix Σ2 are invariant parameters of the observable time series. Also,
given the same square root is used to factor E[Xt Xt0 ], the eigenvectors in the orthonormal
matrix U are pinned down (up to sign changes and possible re-ordering, as before). Since
SL is the implied square root of E[X̃t X̃t0 ], the alternative choice of the state vector leads
to a transformation matrix of T ∗ = Σ1/2 U 0 L−1 S −1 . The resulting BLRP state vector is
T ∗ S −1 SXt = T ∗ Xt – the choice of the initial state vector is irrelevant for the resulting
BLRP state vector. Therefore, the BLRP representation is unique up to sign changes to the
rows of U , and permutations of state components i, j such that σi = σj .
With lemma 2 at hand, it remains to choose normalizations which remove the possibility
11
of basis reflections and eigenvalue reordering. The later is inherently difficult, and instead
I appeal to the fact that in any finite dataset the ordering will be unique with probability
one. The equality of eigenvalues is a knife-edge condition which is not robust to small
perturbations of the representation, and as a result the ordering is immaterial for reduced
form estimation5 . The eigenvalues of L0 W L will be distinct for almost all time series, and so
the BLRP state space representation is essentially unique once a standardization is chosen
to remove the possibility of reflecting the state vector.
To standardize the orientation of the basis, it is tempting to use positive orientation of
the matrix U . However, this allows the initial C matrix to influence its transformed version
∗
of C ∗ = T ∗ C. The reason is that the p-th component of the state vector Xp,t
and its
∗
negative −Xp,t
are equally useful as latent state variables. The fundamental indeterminacy
is then the sign of the correlation between the state time series and the observed time
series. Standardizing U without accounting for the structure of C will not resolve this
indeterminacy6 .
Instead, it is essential to use some property of the observable data to place a sign restriction on the state time series. I propose a peak response criterion which standardizes the
impulse responses of the observable data to innovations to the state time series. This criterion is based on the fact that the observability matrix associated to a minimal state space
representation is full rank: rank(O∗ ) = N where O∗ ≡ [(C ∗ )0 , (C ∗ A∗ )0 , . . . , (C ∗ (A∗ )N −1 )0 ]0
is the observability matrix associated with a BLRP representation. This implies that any
time t innovation to the p-th component of Xt∗ must be eventually detectable in observable
data within the next N periods. One implication of this result is that the columns of O∗ are
not identically zero. I propose a sign normalization which forces the largest magnitude entry
of each column to be strictly positive. This choice implies that whichever observable series
responds most strongly to the p-th state component over N periods must have a positive
response at the peak during those N periods. This imposes N sign restrictions on the BLRP
representation that normalize how the state variables relate to the observed time series.
The following definition formalizes this normalization:
Definition 5 A state-space representation G = (A, C, Q, R, S) or an innovations representation H = (A, B, C, Ω) has positive peak responses if, for each column j = 1, . . . , N of the
5 The ordering is important for structural identification. Identification should be robust to perturbations
of the estimated reduced form which induce reordering. Checking robustness of identification to near-equal
eigenvalues is left for future work.
6 If u0 = (u
p,1 , . . . , up,N ) is the p-th row of U , then a positive orientation arises by normalizing its biggest
p
entry to be positive: max{|up,1 |, . . . , |up,N |} > 0. Since up is a unit vector, its largest entry must be nonzero, so this normalization is well defined. This choice does nothing to normalize the sign of moments such
∗ .
as EYt+` Xi,t
12
observability matrix

C



 CA 
,
O=
..




.
CAN −1
the largest magnitude entry of the column is strictly positive: Oi∗ (j),j > 0 for i∗ (j) ≡
min arg maxi {|Oi,j |}.
Using this sign normalization leads to the following theorem which is the central result
of the paper:
Theorem 1 Almost all Gaussian covariance stationary time series {Yt } have a unique
BLRP innovations representation with positive peak responses. Therefore, this representation is almost surely a reduced form for {Yt }.
To enforce the positive peak response criterion, each component of the BLRP state vector
must be reflected to undo the orientation of the peak responses that are arbitrarily selected
in the original representation. After imposing this sign normalization, to an estimated BLRP
innovations representation of observable data and to a structural state space model’s BLRP
innovations representation, the two can be matched for the purpose of identifying structural
parameters.
4
Reduced Form Estimation
The reduced form estimation procedure proceeds in two steps. The first step uses or-
thonormalized partial least square regression to get an initial estimate of the reduced form.
This sub-space based estimator is consistent and asymptotically normal, as documented in
Bauer (2005a)7 . Next, I use this preliminary parameter estimate as an initial guess for
maximum likelihood estimation based on the Kalman filter. Since the reduced form is unrestricted (maximally parameterized), this two step procedure greatly speeds up estimation
since the initial estimate provides an inexpensive but consistent guess to initialize numerical
likelihood maximization.
Orthonormalized partial least squares regression (Worsley et al. (1997),Arenas-Garcı́a
and Camps-Valls (2008)) is both a special case of three-pass-regression introduced by Kelly
and Pruitt (2013), and is a type of sub-space estimator studied by Bauer (2005a,b, 2009).
For completeness, I describe the implementation in detail.
Given a proposed model order of N and estimation horizon of h ≥ N , the procedure finds
0
linear combinations of historical data Yt−,h = (Yt0 , Yt−1
, . . . , Yt−h+1 )0 which have maximal
7 See
also Raknerud et al. (2010) for an application in macroeconomic forecasting.
13
0
0
0
covariance with future data Yt+,h = (Yt+1
, Yt+2
, . . . , Yt+h
)0 , subject to the requirement that
the components are orthogonal. The finite sample estimation procedure solves:
max
Φh,T ∈RN ×M h
s.t.
trace
"
−h
 TX

Φh,T Yt−,h (Yt+,h )0
t=h
# "T −h
X
Φh,T Yt−,h (Yt+,h )0
t=h
#0 


T
−h
X
1
Φh,T Yt−,h (Φh,T Yt−,h )0 = IN
T + 1 − 2h
t=h
Given that the sample size and estimation horizon both become infinite and the data is
a finite Gaussian process, this maximization problem gives a BLRP state space representation of the data. This occurs because each component of the resulting state vector
Xt∗ = Φ∞,∞ Yt−,∞ is the linear combination of the historical data which best predicts the
future path of the data, holding fixed the previous components of the vector. Because any
state time series, Xt , is a sufficient statistic for this optimal predictor, the transformation
T ∗ Xt is identical to Φ∞,∞ Yt−,∞ .
Solving this estimation problem is equivalent to solving the generalized eigenvalue problem:
−
+ 0 +
− 0
−
− 0
Yh,T
(Yh,T
) Yh,T (Yh,T
) φ = λYh,T
(Yh,T
)φ
+,h
−
+
where Yh,T
= (Yh−,h , . . . , YT−,h
, . . . , YT+,h
−h ) and Yh,T = (Yh
−h ). The first N eigenvectors
correspond to the rows of the matrix Φ̂h,T which solves the orthonormalized partial least
squares problem. These linear combinations of the horizon h truncated history of the data
account for the first N maximal variation orthogonal components of the horizon h truncated
future data.
The estimated BLRP state process (up to basis reflections) is given by X̂t∗ = Φ̂Yt−,h
for each t = h, . . . , T . Note that due to finite samples, it is not necessarily the case that
the resulting estimates for innovations to the state equation will be spanned by the contemporaneous innovations to the observation equation, as is required in an innovations
representation. Instead, I initially use a general state-space representation which satisfies
14
the BLRP conditions. This representation is calculated using OLS regressions:
Â
∗
=
T
X
"
∗
X̂t∗ (X̂t−1
)0
t=h+1
ε̂ξt
=
X̂t∗
∗
∗
X̂t−1
(X̂t−1
)0
t=h+1
− Â
T
X
#−1
T
X
∗
∗
X̂t−1
"
Ĉ ∗
=
ε̂Yt
∗
= Yt − Ĉ ∗ X̂t−1
Q̂∗
=
∗
Yt (X̂t−1
)0
t=h+1
T
X
#−1
∗
∗
X̂t−1
(X̂t−1
)0
t=h+1
T
X
1
εξt (εξt )0
T −h−1
t=h+1
Ŝ ∗
=
T
X
1
εξt (εYt )0
T −h−1
t=h+1
R̂∗
=
T
X
1
εYt (εYt )0
T −h−1
t=h+1
This gives an initial state space estimate: Ĝ∗ = (Â∗ , Ĉ ∗ , Q̂∗ , Ŝ ∗ , R̂∗ ). Next, this representation can be converted to an innovations representation by solving for the implied Kalman
gain to get Ĥ ∗ = (Â∗ , B̂ ∗ , Ĉ ∗ , Ω̂∗ ). By calculating the implied observability matrix Ô∗ and
finding the signs of its largest magnitude column entries, I then reflect its state vector components so that this representation has positive peak responses. This procedure creates an
initial reduced form estimate of a BLRP innovations representation.
Although this estimate is consistent and asymptotically normal so long as the truncation
horizon h is chosen according to a Bayesian information criterion (see Bauer (2009)), it
can have non-negligible small sample bias due to the required horizon truncation. It is
little additional computational cost to get a full information maximum likelihood estimate
because this first step provides a consistent initial condition which speeds up numerical
optimization.
Implementing the Kalman filter requires imposing restrictions which enforce a BLRP
representation. It turns out that the required restrictions can be easily implemented through
two quadratic constraints.
In order to construct these constraints, I first introduce the Hankel singular values of
{Yt }. These arise from the Hankel matrix:
15
Definition 6 The Hankel matrix of {Yt } is the bi-infinite matrix:
H
0
0
0
≡ E[(Yt+1
, Yt+2
, . . . )0 (Yt0 , Yt−1
, . . . )]

 
0
0
0
Γ(1) Γ(2)
E[Yt+1 Yt ] E[Yt+1 Yt−1 ] E[Yt+1 Yt−2
] ...

 
.
.


..
. .
0
]
E[Yt+2 Yt0 ] E[Yt+2 Yt−1
 Γ(2) Γ(3)
=
= 
..
..
..

 
.
.
.
E[Yt+3 Yt0 ]
 Γ(3)

 
..
..
..
..
.
.
.
.

Γ(3) . . .
..
.. 
.
.


..

.


where Γ(j) = E[Yt Yt−j ] is the auto-covariance function of {Yt }.
Note that up to deflating by the covariance matrix E[Yt , Yt0 ], this matrix contains the coefficients from an infinite number of population forecasting regressions. In particular, the
forecasting regression of Yt+` on the history Yt , Yt−1 , . . . is represented by the `-th block-row
of the Hankel matrix multiplied by E[Yt Yt0 ]−1 .
Since the state space of the time series is finite by assumption 1, only N time series
are necessary to completely characterize conditional expectations at any horizon. This
means that this matrix contains a large amount of redundant information because only N
predictor variables are necessary for forecasting. That is, there are N linear combinations
of the columns of Hankel matrix which span H.
This implication leads to the following lemma:
Lemma 3 Under assumption 1, the Hankel matrix has rank equal to the dimension of the
state space: rank(H) = N .
In turn, the Hankel matrix has a finite spectral decomposition, and the square roots of its
eigenvalues are the Hankel singular values of {Yt }:
Definition 7 The Hankel singular values, denoted by σ = (σ1 , . . . , σN )0 , are equal to the
p
square roots of the ordered eigenvalues of the Hankel matrix: σi ≡ λi (H) for each i =
1, . . . , N and σ1 ≥ · · · ≥ σN .
Indeed, the Hankel singular values are the invariant parameters which arose previously when
calculating the transformation matrix which related an arbitrary state space representation
and a BLRP representation.
Now, consider the Lyupanov equations:
E[Xt Xt0 ] ≡ M
W
= AM A0 + BΩB 0
=
16
A0 W A + C 0 C
The first determines the variance of the state in an innovations representation. The second
is the weighting matrix that was central for calculating a BLRP representation. By using
the formula for the transformation matrix in these equations, it immediately follows that
in a BLRP representation, the diagonal matrix containing the Hankel singular values solve
both of these equations. That is M = W = Σ.
This connection leads to the following theorem which characterizes BLRP innovations
representations using equality restrictions:
Theorem 2 The state time series {Xt∗ } is a BLRP with innovations representation H ∗ =
(A∗ , B ∗ , C ∗ , Ω∗ ) if and only if it satisfies
Σ
h
= A Σ(A ) + B Ω (B ) = A∗
Σ
=
∗
∗ 0
∗
∗
∗ 0
h
(A∗ )0 ΣA∗ + (C ∗ )0 C ∗ = (A∗ )0
B
∗
"
i Σ
0
#"
(A∗ )0
#
0 Ω (B ∗ )0
"
#" #
i Σ 0
A∗
(C ∗ )0
0 IM C ∗
where Σ contains the (ordered) Hankel singular values of {Yt }.
Imposing these constraints8 during maximum likelihood estimation ensures that the estimated innovations representation is a BLRP representation. Since a BLRP representation
is unique up to state vector reflections and re-ordering, this provides a simple method for
reduced form estimation.
Due to symmetry, each of these equations gives N (N + 1)/2 restrictions. Together, these
equations define N (N + 1) restrictions. With the N degrees of freedom that come from
choosing the diagonal of Σ, a BLRP representation is defined by N 2 effective restrictions
on the innovations representation.
Full likelihood estimation is then accomplished by using the Kalman filter to evaluate
the likelihood at any choice of H ∗ , and maximizing this likelihood function subject to these
two quadratic constraints. Note that, because these constraints are positive definite and
imply a reduced form up to basis reflections, there are 2N maximum likelihood estimates
which are all related by sign changes to the components of the state vector. Numerical
optimization can be used to find one of these estimates, and then the implied observability
matrix, Ô, allows the state space representation to be converted to a positive peak response
representation. This procedure delivers the almost surely unique reduced form estimate.
8 Due
to the quadratic positive definite structure, the solution is unique up to reflection for any given
values of the representation invariant parameters Σ and Ω. Therefore, optimization can be improved by
using the dual to the constrained maximum likelihood problem to concentrate out these parameters.
17
5
Identification Issues
Identifying the structural parameters of a model requires mapping the structural model’s
state space representation into the BLRP reduced form. From there, identification amounts
to inverting the structural mapping from structural parameters to the BLRP reduced form.
This general approach simplifies the analysis by creating a clean distinction between statistical inference and issues of identification. Even still, this second step is often problematic
because many dynamic structural models are globally unidentified (Canova and Sala (2009),
Morris (2013)).
It is useful to discuss how to identify structural parameters generally using the reduced
form representation. The key is to connect a general state space model to the reduced
form provided by a BLRP innovations representation with positive peak responses. To
illustrate ideas, I examine a small structural model which can be interpreted as the state
space representation of an arbitrary dynamic stochastic general equilibrium model.
In particular, suppose a macroeconomic theory suggests the following state-space model
for observable data on quarterly GDP growth, CPI inflation, and the federal funds rate:
 


 
ỹt
ỹt−1
εZ
 


t
πt 
πt−1 
 a
 

 εt 

 it  = Φ1 (θ)  it−1  + Φε (θ)  
 

 επ 

 


 t
a
a
 t
 t−1 
εit
k̃t
k̃t−1

 

∆GDPt
ỹt − ỹt−1 − εZ
t

 

πt
 ∆CP It  = 

F F Rt
it
The model has five latent theoretical factors. First, ỹt denotes output relative to a stochastic
trend in labor augmenting technology. This series is assumed to be observable via the growth
rate of GDP, up to the innovation in the stochastic trend. The quarterly inflation rate, πt , is
assumed to be directly observed as the CPI inflation rate. The nominal interest rate in the
model is observed as the Federal Funds Rate. The remaining two factors are not directly
observable and consist of a total factor productivity shock, at , and the level of the capital
a π i 0
stock relative to the trend in technology, k̃t . The structural shock vector εt = (εZ
t , εt , εt , εt )
is a standard multivariate normal shock containing innovations to the stochastic trend, total
factor productivity, a cost-push shock driving inflation dynamics, and monetary policy.
For brevity, I take the matrices Φ1 (θ) and Φε (θ) as given. They might arise by solving
a theoretical model (say through linearization of a non-linear DSGE model), and depend
on the structural parameter vector θ. The objective is to estimate θ (provided that it is
identified), by relating this state-space model to estimates of a BLRP reduced form for the
data.
First, we need to map this state-space model into the general state space representation
of the previous section. I denote the data vector by Yt = (∆GDPt , ∆CP It , F F Rt )0 . To
18
relate the structural state space representation of the model to the estimated BLRP reduced
form for Yt , I must transform the structural model into a BLR reduced form representation.
In principle, the parameter vector θ can then be recovered by solving the implied system of
non-linear equations in θ.
The theoretical state vector is ξt = (ỹt , πt , it , at , k̃t )0 . Substituting in the theoretical
model’s state equation gives a measurement equation in terms of this state vector. The
implied structural state space representation for the data is:
ξt
=
Φ1 (θ)ξt−1 + Φε (θ)εt
| {z }
εξt


∆GDPt


 ∆CP It  = Yt
F F Rt
(
=
h
I3
"
i
03,2 Φ1 (θ) −
(
h
+ I3
−1
01,N −1
02,1
02,N −1
#)
ξt−1
"
i
03,2 Φε (θ) −
1
01,2
02,1
02,2
{z
|
εY
t
#)
εt
}
This structure is a restricted version of the general state space representation previously
considered.
Next, I solve for the Kalman gain, B(θ), associated with this state space system and de(
"
#) (
"
h
i
h
i
1
01,2
1
note the residual variance by Ω(θ) =
I3 03,2 Φε (θ) −
I3 03,2 Φε (θ) −
02,1 02,2
02,1
to get the innovations representation of
Xt
Yt
Vt
=
=
iid
∼
Φ1 (θ)Xt−1 + B(θ)Vt
(
"
h
i
−1
I3 03,2 Φ1 (θ) −
02,1
01,N −1
02,N −1
#)
Xt−1 + Vt
N (0, Ω(θ))
Assuming that all of the state variables matter for the observable output, so that they are
all detectable, this is a minimal system which we will put into its BLRP representation in
order to match up with an estimated BLRP reduced form for the observable data.
Due to theorem 1, the mapping that relates the innovations state vector Xt to the
positive peak response BLRP state vector Xt∗ is almost surely unique. Section 3 shows how
to construct this mapping from any given innovations representation. Denoting it by T ∗ (θ).
19
01,2
02,2
#)0
we can write the reduced form of the structural model as:
Xt∗
=
Yt
=
Vt
iid
∼
∗
T ∗ (θ)Φ1 (θ)T ∗ (θ)−1 Xt−1
+ T ∗ (θ)B(θ)Vt
(
"
#)
h
i
−1 01,N −1
∗
T ∗ (θ)−1 Xt−1
+ Vt
I3 03,2 Φ1 (θ) −
02,1 02,N −1
N (0, Ω(θ))
Note that the white noise component of the data does not get transformed because the
transformation matrix only influences how the BLRP state maps into the observables.
Let H ∗ = (A∗ , B ∗ , C ∗ , Ω∗ ) be the positive peak response BLRP innovations representation estimated from an infinite sample of data. Now, the identification procedure amounts
to solving the following system of non-linear equations:
A∗
= T ∗ (θ)Φ1 (θ)T ∗ (θ)−1
B∗
= T ∗ (θ)B(θ)
(
"
h
i
−1
=
I3 03,2 Φ1 (θ) −
02,1
C
∗
Ω∗
=
01,N −1
02,N −1
#)
T ∗ (θ)−1
Ω(θ)
If there is a unique value for θ which solves this system of equations, then the model is
identified. Given consistent and asymptotically normal estimates of the BLRP reduced form
parameters, a minimum-chi-square estimator based on minimizing the difference between
the left hand and right hand sides of these equations will be asymptotically equivalent to
maximum likelihood estimation (Hamilton and Wu (2012)).
Assessing whether or not the model is identified is non-trivial. A simple count of coefficients can be problematic for two reasons. First, implicit parameter restrictions need
to be accounted for. As usual, there are implicit parameter restrictions which require the
covariance matrices to be symmetric and positive semidefinite. These restrictions reduce
the number of effective parameters in the Ω∗ equation.
More subtly, the reduced form representation involves implicit parameter restrictions
which arise from normalizing the state space representation to become a positive peak response BLRP representation. Second and generally difficult to resolve, the right hand side is
non-linear and may have multiple solutions even if there are the same number of parameters
to be recovered as estimated degrees of freedom. This second issue significantly complicates
identification. As shown in Morris (2013), the right hand side of this expression is usually
sufficiently nonlinear so that there are multiple solutions for the structural parameters.
20
6
Conclusion
This paper has introduced a reduced form representation for general dynamic macroe-
conometric models. This representation enables an econometrician to separate issues of
statistical inference from identification considerations. By first estimating this state space
reduced form representation for the data and then inverting the mapping from the structural
parameters to the reduced form parameters, economists can conduct structural inference in
general dynamic models. This type of approach has been used successfully in contexts
where identification and likelihood estimation of the full structural model is problematic
(e.g. Hamilton and Wu (2012)), but until now required a VAR reduced form. The BLRP
reduced form enables any state space model to be estimated using this technique.
The key insight is to order and normalize state variables in terms of their ability to
predict the entire future path of observable data. This idea leads to the concept of a best
linear recursive predictor for the data, a choice of the latent state variable which is an optimal
predictor at any level of truncation. The resulting representation can be constructed taking
advantage this recursive criterion. Up to trivial reflection and ordering transformations, this
concept leads to a reduced form representation for general finite state linear time series.
An attractive byproduct of this choice of state normalization is that it can be directly
estimated using orthonormal partial least squares. The BLRP criterion is in fact equivalent
to the ordering and normalization that occurs in such a principle components based subspace estimator. However, the representation can also be completely characterized in terms
of a system of quadratic matrix equations, and this result enables constrained maximum
likelihood estimation for improved efficiency.
Further work will demonstrate the application of the method. Salient applications include estimation of term structure models, no-arbitrage macro-finance models, and dynamic
stochastic general equilibrium models. I plan to generalize to allow for deterministic time
varying parameter changes, which will enable the estimation of shadow rate Gaussian affine
term structure models such as Wu and Xia (2014). It will also be essential to develop simple
tools that can assess the invertibility of the mapping from the structural parameter vector to
the reduced form representation, and to examine non-standard inference in the set-identified
case.
References
B.
Anderson
and
J.
Moore.
Optimal Filtering.
trical Engineering. Dover Publications,
2012.
Dover
Books
ISBN 9780486136899.
on
ElecURL
http://books.google.com/books?id=iYMqLQp49UMC.
J. Arenas-Garcı́a and G. Camps-Valls. Efficient kernel orthonormalized pls for remote sens-
21
ing applications. Geoscience and Remote Sensing, IEEE Transactions on, 46(10):2872–
2881, 2008.
K. Arun and S. Kung. Balanced approximation of stochastic systems. SIAM Journal on
Matrix Analysis and Applications, 11(1):42–68, 1990.
D.
Bauer.
Estimating
Econometric
Theory,
linear
21(1):pp.
dynamical
181–211,
systems
2005a.
using
subspace
ISSN
02664666.
methods.
URL
http://www.jstor.org/stable/3533632.
D. Bauer.
Asymptotic properties of subspace estimators.
376, Mar. 2005b.
ISSN 0005-1098.
Automatica, 41(3):359–
doi: 10.1016/j.automatica.2004.11.012.
URL
http://dx.doi.org/10.1016/j.automatica.2004.11.012.
D. Bauer.
Estimating armax systems for multivariate time series using the state ap-
proach to subspace algorithms.
Journal of Multivariate Analysis, 100(3):397 – 421,
2009.
http://dx.doi.org/10.1016/j.jmva.2008.05.008.
ISSN 0047-259X.
doi:
URL
http://www.sciencedirect.com/science/article/pii/S0047259X08001486.
F. Canova and L. Sala.
models.
Back to square one:
Journal of Monetary Economics,
Identification issues in DSGE
56(4):431–449,
May 2009.
URL
http://ideas.repec.org/a/eee/moneco/v56y2009i4p431-449.html.
C. F. Christ. The cowles commission’s contributions to econometrics at chicago, 19391955. Journal of Economic Literature, 32(1):pp. 30–59, 1994. ISSN 00220515. URL
http://www.jstor.org/stable/2728422.
P. Faurre. Stochastic realization algorithms. In R. Mehra and D. Lainiotis, editors, System
Identification: Advances and Case Studies, volume 126 of Mathematics in Science and
Engineering, pages 1–25. Academic Press, New York, 1976.
S. Fujishige, H. Nagaij, and Y. Sawaragi. System-theoretical approach to model reduction
and system-order determination. International Journal of Control, 22(6):807–819, 1975.
R. Giacomini. The relationship between DSGE and VAR models. CeMMAP working papers
CWP21/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies, May
2013. URL http://ideas.repec.org/p/ifs/cemmap/21-13.html.
J. D. Hamilton and J. C. Wu.
term structure models.
Identification and estimation of Gaussian affine
Journal of Econometrics, 168(2):315–331, 2012.
URL
http://ideas.repec.org/a/eee/econom/v168y2012i2p315-331.html.
L. P. Hansen and T. J. Sargent. Recursive models of dynamic linear economies. Princeton
University Press, 2013.
22
B. Kelly and S. Pruitt. The three-pass regression filter: A new approach to forecasting using
many predictors. Working paper, 2013.
S. Morris. Global identification of dsge models. Technical report, University of California,
San Diego, 2013.
A. C. Pugh.
The mcmillan degree of a polynomial system matrix.
Journal of Control, 24(1):129–135, 1976.
doi:
International
10.1080/00207177608932810.
URL
http://www.tandfonline.com/doi/abs/10.1080/00207177608932810.
A. Raknerud, T. Skjerpen, and A. R. Swensen.
ables from a large number of predictors:
Forecasting, 29(4):367–387, 2010.
Forecasting key macroeconomic vari-
a state space approach.
ISSN 1099-131X.
Journal of
doi: 10.1002/for.1131.
URL
http://dx.doi.org/10.1002/for.1131.
C. R. Rao. The use and interpretation of principal component analysis in applied research.
Sankhy: The Indian Journal of Statistics, Series A (1961-2002), 26(4):pp. 329–358, 1964.
ISSN 0581572X. URL http://www.jstor.org/stable/25049339.
L. Sun, S. Ji, S. Yu, and J. Ye. On the equivalence between canonical correlation analysis
and orthonormalized partial least squares, 2009.
K. J. Worsley, J.-B. Poline, K. J. Friston, and A. Evans. Characterizing the response of pet
and fmri data using multivariate linear models. NeuroImage, 6(4):305–319, 1997.
J. C. Wu and F. D. Xia. Measuring the macroeconomic impact of monetary policy at the
zero lower bound. Working paper, 2014.
23