A Reduced Form Representation for State Space Models Nelson Lind∗ May 29, 2014 Abstract Estimating structural state space models with maximum likelihood is often infeasible. If the model can be expressed as a reduced form vector-autoregression (VAR) in the observable data, then two step techniques such as minimum chi-square estimation can reliably recover structural parameter estimates. However, macroeconomists cannot always rely on the existence of a VAR reduced form – as is often the case when estimating dynamic stochastic general equilibrium models. This paper introduces a reduced form representation for general state space models. I use a prediction based criterion to normalize the state vector and generate the reduced form. In particular, I force the state vector to be the best linear recursive predictor of the data. As a byproduct, this representation is intrinsically related to orthonormalized partial least squares regression. Also, a pair of quadratic parameter restrictions characterizes the representation, which enables constrained maximum likelihood estimation. This reduced form disentangles statistical inference from structural identification, greatly clarifying and simplifying the analysis of a broad class of dynamic structural econometric models. Using these results, economists can use a reduced form approach to structural estimation without needing a VAR representation. ∗ [email protected] 1 1 Introduction Time series analysis often relies on linear state space models. Estimation proceeds along two broad themes. If the model can be re-cast as a vector-autoregression (VAR), then linear regression provides a simple and reliable estimation method. Otherwise, the Kalman filter can be used to evaluate the model’s likelihood, facilitating maximum likelihood estimation. The later approach can be numerically problematic. The VAR reduced form greatly improves the reliability of estimation, and macroeconomists often work exclusively with VAR representations of observable data1 . But this focus ignores many types of time series which are predicted by economic theory. Indeed, the vast majority of dynamic structural econometric models that arise from economic theory do not have VAR reduced forms. This paper enables the reliable estimation of general state space models by introducing reduced form representations for any covariance stationary Gaussian time series with a finite state space. The reduced form is constructed by requiring the state vector in a state space representation of observable data to be the best linear recursive predictor (BLRP) of the data2 . This recursive restriction defines a sequence of nested state space models of lower order which are pinned down by the property that they generate optimal forecasts of the observable data. Imposing this property on the state vector leads to a well defined reduced form state space representation. This normalization also is directly linked to orthonormalized partial least squares regression3 . As a result, this BLRP reduced form representation can be directly estimated from observable data. This property makes it particularly convenient because regression can be used as the first step in minimum chi-square estimation of structural state space models (as in Hamilton and Wu (2012)). This approach disentangles statistical inference from structural identification, greatly clarifying and simplifying the analysis of a broad class of dynamic structural econometric models. Reduced forms are a standard tool in econometric analysis and greatly facilitate clean identification analysis. As discussed in Christ (1994), the existence and use of a reduced form is a fundamental tenant of the Cowles Commission approach to econometrics. By estimating models via their reduced form, economists can clearly differentiate between issues of statistical inference and issues of structural identification. Given the ubiquitousness of identification issues in structural macroeconometrics, highlighted by Canova and Sala (2009) and studied closely by Morris (2013), this reduced form representation is an important tool for assessing identification in dynamic models. 1 For instance, Giacomini (2013) discusses the important connection between DSGE analysis and VAR representations. 2 The seminal use of this type of predictive efficiency criterion is Rao (1964) 3 See Sun et al. (2009) for references for OPLS. Also note that their results imply an alternative reduced form. In particular, a parallel reduced from based on canonical correlations analysis arises from a simple unitary transformation of the BLRP reduced form. 2 The paper is organized as follows. In section 2, I examine state space representations of finite Gaussian covariance stationary time series. Section 3 contains the main result of the paper. It introduces a prediction based criterion for differentiating between possible state variables and shows how this criterion leads to a reduced form representation of the observable data. Section 4 discusses estimation of the reduced form using orthonormalized partial least squares and maximum likelihood. I present a result that characterizes BLRP representations using a system of quadratic matrix equations, which enables constrained likelihood estimation. Section 5 examines how to map a structural model into the BLRP representation so that the question of identification reduces to inverting a non-linear system of equations. 2 State-Space Representations Suppose that {Yt } is an RM -valued Gaussian covariance stationary time series. Without loss of generality, assume it is mean zero. The goal of this paper is to find a reduced-form state space representation of this time series which is useful for estimation. A reduced form for the process is a parametrization which precisely characterizes its probability law: Definition 1 A reduced form of {Yt } is a probability law P ({y}; θ) and parameter space Θ ⊂ RK such that for every trajectory {y}, P ({y}; θ) = P ({y}; θ0 ) if and only if θ = θ0 . The reduced form summarizes the stochastic properties of the time series as points in the parameter space. The reduced form is useful because it allows an econometrician to use a single parameter estimate (and the standard errors of that estimate) to represent all of the information contained in a dataset. From there, this summary statistic can be used to answer economic questions through structural assumptions. This allows the problem of statistical inference to be separated from the problem of structural identification. Gaussian time series are conveniently represented using linear state space systems. Moreover, dynamic economic theory often leads to state space models. With this motivation, I will look for a reduced form whose parameters come from a state space representation of the time series. To carefully define a state space representation, I follow Faurre (1976) closely in defining the state space of the time series. Definition 2 The state space of {Yt } is defined as the collection of Gaussian time series spanned by the conditional expectation of Yt : ( X≡ ) ∞ X ∞ M Z+ {Zt } ∃{α` }`=0 ∈ (R ) s.t. Zt = α` Et Yt+` `=1 3 where Et [·] ≡ E[· | σ(Yt , Yt−1 , . . . )]. Note that due to the Gaussian assumption, the conditional expectation operator is equivalent to the linear projection onto the history of the time series. Identifying two Gaussian time series as equivalent if they have identical moments, the dimension of this set is the state space dimension which is also sometimes referred to as the McMillan degree (see Pugh (1976)). This dimension is the number of linearly independent time series necessary to describe (via linear combination) an element of the state space. I assume that the state space of the time series is finite: Assumption 1 The state space dimension of {Yt } is finite: dim(X) ≡ N < ∞ This assumption means that any time series in the state space can be expressed as the linear combination of exactly N time series. Just as a set of vectors can represent a given sub-space of Euclidean space, N distinct Gaussian time series can be used to represent each time series in the state space. A set of N variables which can combine to represent any given time series of the state space space provide a basis, represented by an N dimensional time series. A state time series underlying {Yt } is any RN -valued time series {ξt } whose component time series span the state space. The state space is pinned down by {Yt }. However, the basis used to represent the state space is not unique. Any invertible transformation T defines a new equally valid state time series: {ξ˜t } = {T ξt }. Just as a linear sub-space of Euclidean space can be represented by many different basis vectors, many different state time series can be chosen which all represent X. Resolving this indeterminacy is a central issue underlying finding a reduced form state space representation for {Yt }. Once a state time series is selected which spans the state space, we can decompose the time series by projecting ξt onto ξt−1 and Yt onto ξt−1 . This decomposition delivers a state space representation of the time series. Definition 3 A state space representation of {Yt } is a collection of matrices G = (A, C, Q, S, R) such that there exists a Gaussian time series {ξt } in RN , Gaussian white noise {εξt } in RN , and Gaussian white noise {εYt } in RM satisfying: " ξt = Aξt−1 + εξt Yt # = Cξt−1 + εYt " # " 0 Q N , 0 0 S εξt εYt iid ∼ 4 S R #! This representation always exists under assumption 1 and its form is uniquely determined by the choice of {ξt }. Note that this definition only requires that the components of {ξt } span the state space. There is no requirement that the history of {Yt } span {ξt }. That is, the components of {ξt } are not necessarily in X. This form for a state space representation is as general as possible, and it is important to start from here in order to correctly reduce structural models to their reduced form representation. The generality of the present representation has a simple interpretation based on population regressions, which shows that this representation is solely defined by the choice of the state time series {ξt }. In particular let ξ1,t , . . . , ξN,t be any Gaussian time series which spans the state space X. Specifically, for any {Zt } ∈ X there exist some weights ω1 , . . . , ωN satisfying Zt = ω1 X1,t + · · · + ωN XN,t . Define a state time series {ξt } by setting ξt = (ξ1,t , . . . , ξN,t )0 . The associated state space representation of {Yt } has coefficient matrices and white noise shocks determined by: " A 0 0 ≡ E[ξt ξt−1 ]E[ξt−1 ξt−1 ]−1 εξt ≡ ξt − Aξt−1 C 0 0 ≡ E[Yt ξt−1 ]E[ξt−1 ξt−1 ]−1 Q εYt # S S0 R ≡ Yt − Cξt−1 " # " #0 εξ εξt ≡ E Yt εt εYt The dynamics of the state time series are fully described by the coefficients from a population regression of ξt on ξt−1 . The innovations to the state time series are the residuals from this population regression. Similarly, the relationship between the observable time series and the state time series is fully described by the coefficients from a population regression of Yt on ξt−1 and the white noise component is the regression residual. This construction delivers a unique state space representation associated with each possible choice of state time series. This generality allows for arbitrary structures where latent shocks may not be spanned by observable data, which happens often when estimating structural macroeconometric models. Since the goal is to generate a reduced form for the observable data, any extraneous information contained in the state vector that is not contained in the observed data allows for indeterminacy in the timing of information. To avoid this problem, the first step I consider is to standardize the timing convention. To do so, I perform a change of variables which puts the state space into its innovations representation (see Anderson and Moore (2012) and Hansen and Sargent (2013)). This form replaces the state vector with the best linear forecast of the state vector using 5 observed data up through the current period. I define this new state variable as Xt ≡ Et ξt . Then from the properties of conditional expectation: Xt = AEt ξt−1 + Et εξt = AXt−1 + A(Et ξt−1 − Xt−1 ) + Et εξt | {z } Yt = CEt ξt−1 + Et εYt = CXt−1 + C(Et ξt−1 − Xt−1 ) + Et εYt | {z } =BVt =Vt where Vt is some white noise random variable spanned by the current data and orthogonal to all past data. In particular, it is the forecast innovation Vt ≡ Yt −Et−1 Yt . The coefficient B is the population regression coefficient from a regression of the term A(Et ξt−1 −Xt−1 )+Et εξt on Vt . This regression has no residual because Vt is a sufficient statistic for all new information that came available in the current period, and the term being forecasted is spanned by information available in observable data at time t. Indeed, the coefficient B is nothing other than the steady state Kalman gain. It is determined from the previous parameterization of the model by solving a system of Lyupanov and Ricatti equations: B = (E[ξt Yt0 ] − AP C 0 )(E[Yt Yt0 ] − CP C 0 )−1 P = AP A0 + (E[ξt Yt0 ] − AP C 0 )(E[Yt Yt0 ] − CP C 0 )−1 (E[ξt Yt0 ] − AP C 0 )0 E[ξt Yt0 ] = AE[ξt ξt0 ]C 0 + S E[Yt Yt0 ] = CE[ξt ξt0 ]C 0 + R E[ξt ξt0 ] = AE[ξt ξt0 ]A0 + Q This representation always exists and is unique given that {Yt } is Covariance stationary (non-deterministic). This mapping from the arbitrary state space representation to the innovations representation will be central when mapping structural models to reduced form state space estimates. For the purpose of estimation, I take the innovations representation as the point of departure. Indeed, by imposing that each component of the state time series is an element of the state space, the resulting state space representation is necessarily an innovations representation. As shown in Faurre (1976), this representation of the data corresponds to minimizing the variance of the state variable while holding fixed the matrices A and C. I summarize this result as a lemma: Lemma 1 (Faurre (1976)) For any Gaussian covariance stationary time series {Yt } which has a state space representation of G = (A, C, Q, S, R), there exists a unique innovations 6 representation with parameter matrices H = (A, B, C, Ω) of the form: Xt Yt = AXt−1 + BVt = CXt−1 + Vt iid N (0, Ω) ∼ Vt This representation arises from holding constant the matrices A and C and choosing the state time series with minimal possible variance. This lemma directly reduces the parameterization of the model without restricting the properties of the original time series {Yt }. Now, the parameterization of H = (A, B, C, Ω) is sufficient to fully describe the observed time series. In summary, for any given Gaussian covariance stationary time series with finite state space, a given choice of state time series which spans the state space defines a unique state space representation. Then, by removing any parts of the state time series which cannot be predicted by the observable data the representation is uniquely reduced to this innovations representation. However, since the state space basis is inherently indeterminate, this representation is also indeterminate – any invertible transformation defines a change of basis and another equally valid state time series: X̃t = T Xt . In fact, any two innovations representations built on N basis elements of X (minimal representations) are always related by some invertible matrix T . Remark 1 Given any two (minimal) state space representations G = (A, C, Q, S, R) and G̃ = (Ã, C̃, Q̃, S̃, R̃) or given their associated innovations representations H = (A, C, C, Ω) and H̃ = (Ã, B̃, C̃, Ω̃), there exists an invertible transformation matrix T such that (Ã, C̃, Q̃, S̃, R̃) = (T AT −1 , CT −1 , T QT 0 , T S, R) (Ã, B̃, C̃, Ω̃) = (T AT −1 , T B, CT −1 , Ω) These transformation matrices relate all possible choices of state time series . Any singular transformation would drop a necessary dimension of the state space, and any extra sources of variation, whether from extraneous white noise or unnecessary state variables, will lead to a non-minimal representation. To develop a reduced form state space representation, we need to standardize the choice of basis and thereby standardize the state space representation. This remark clarifies that an appropriately defined transformation matrix can be used to standardize the state space basis and define a reduced form state space representation. 7 3 A Prediction Based Reduced Form Representation I suggest choosing the standard basis so that the components of the state are the best linear recursive predictors of the future path of the data. Specifically, the p-th component of Xt is the best linear predictor of {Yt+` }∞ `=1 given the previous p − 1 components of Xt . I show that this choice leads to a reduced form state-space representation of the observable data. In control theory, this choice of basis leads to internally balanced model reduction (Arun and Kung (1990)). This approach to state space modeling was first introduced by Fujishige et al. (1975) who demonstrated that the basis maximizes predictive efficiency at each possible truncation of the state vector. This optimal prediction property is actually sufficient to define the basis because it provides a variable selection criterion to recursively construct a standardized state space representation. The formal definition of a best linear recursive predictor is as follows: Definition 4 A state time series, {Xt∗ }, is a best linear recursive predictor (BLRP) for {Yt } if for each p = 1, . . . , N and any alternative p-th component {X̃p,t } ∈ X E ∞ X ∗ ∗ kEt Yt+` −Π(Yt+` | X1,y , . . . , Xp−1,t , X̃p,t )k22 ≥ E `=1 ∞ X ∗ ∗ kEt Yt+` −Π(Yt+` | X1,t , . . . , Xp,t )k22 `=1 (1) where Π is the linear projection operator. A state time series is a BLRP if any modification of the p-th component of the time series, while holding fixed the previous p − 1 components, leads to inferior forecasts of the future path of observable data. This is a recursive condition which defines a series of nested forecasting models based on successive model truncation. Note that without the recursive choice of the state vector, it would always be possible to adjust two state components simultaneously and keep the forecast error variance constant. This indeterminacy is avoided by choosing the components sequentially. This definition leads directly to a procedure for constructing a BLRP state time series. Since the candidate predictors must come from the state space of the model they can be expressed as linear combinations of the components of a state time series associated with an innovations representation of the data. Fix some state time series {Xt } with associated innovations representation H = (A, B, C, Ω). If {Xt∗ } is a BLRP state time series there must be a transformation matrix T ∗ such that Xt∗ = T ∗ Xt . Since the components of Xt∗ must satisfy the recursive best predictor condition, this matrix can be constructed by sequentially choosing the rows of T ∗ to ensure that condition (1) is satisfied. 8 To make condition (1) operational, I calculate closed form expressions for the condi∗ ∗ tional expectation, Et Yt+` , and the linear forecast, Π(Yt+` | X1,t , . . . , Xp,t ). The conditional expectation is the best possible time t forecast of Yt+` and is identical to the best linear forecast based on the full history of the data, or equivalently based on the state vector in an innovations representation. The linear projection using only the first p components of Xt∗ is equivalent to the least squares prediction based on truncating the BLRP state vector to its p-th component. The state space representation associated with {Xt } gives the optimal forecast: Et Yt+` = ∗ Et CXt+`−1 = CA`−1 Xt . Denote the p − th row of T ∗ by τp0 and its first p − 1 rows by Tp−1 . Let LL0 = E[Xt Xt0 ] be any square root decomposition of the covariance matrix of the state. Note that L−1 exists because Xt contains N linearly independent time series ensuring that E[Xt Xt0 ] is full rank. Then the best linear forecast for Yt+` using the state vector truncated to its first p components is: " Π(Yt+` | ∗ ∗ X1,t , . . . , Xp,t ) `−1 = E CA = CA`−1 L `−1 = CA L ∗ Tp−1 Xt Xt0 !0 # " ∗ Tp−1 ! Xt Xt0 ∗ Tp−1 !0 #−1 E τp0 τp0 τp0 !0 " ! !0 #−1 ! ∗ ∗ ∗ ∗ Tp−1 L Tp−1 L Tp−1 L Tp−1 L τp0 L ! τp0 L Up−1 !0 Up−1 u0p u0p τp0 L τp0 L ∗ Tp−1 τp0 L−1 Xt L−1 Xt ∗ where Up−1 is an orthornormal basis for the range of (Tp−1 L)0 , and up is a unit vector that is orthogonal to Up−1 . This expression reveals that the choice of the p-th component is equivalent to choosing a new predictor from the state space which is orthogonal to the previous p − 1 predictors. Given that Xt∗ is a BLRP, any deviation of the choice for up must increase the mean square forecast error, and so the choice of up must solve: min up s.t. ∞ X CA`−1 `=1 u0p up Xt − L Up−1 u0p !0 Up−1 u0p ! L −1 !2 Xt = 1, Up−1 up = 0 Applying the properties of the trace operator, it is possible to show that solving this problem is equivalent to solving max up s.t. 0 u0p L W L} up | {z ≡W ∗ 0 up up = 1, 9 Up−1 up = 0 ! Xt where W is the unique solution to the Lyuponov equation W = A0 W A + C 0 C (i.e. W = P∞ 0 `−1 0 C CA`−1 ). Therefore, each basis vector up is an eigenvector of the matrix `=1 (A ) W ∗ . The matrix U = (u1 , . . . , uN )0 is found from an eigendecomposition4 : U Σ2 U 0 = W ∗ where Σ = diag(σ1 , . . . , σN ) with σ1 ≥ · · · ≥ σN . Note that the choice of orthornormal basis defined by U is unique up to reflections. Each row of U can be multiplied by −1 to give another valid unit basis vector. Also, the ordering may not be unique when there are repeated eigenvalues. The product U Σ2 U 0 does not change when switching the order of two rows in U which correspond to two equal eigenvalues. These permutations of U and Σ lead to equally valid eigendecompositions. Resolving these indeterminacy issues will be essential to constructing a reduced form representation. The implied transformation matrix relating the initial state choice to a BLRP state choice is T ∗ = Σ1/2 U 0 L−1 Each component matrix has an interpretation. First, the inverse root L−1 standardizes and orthogonalizes the covariance structure of the given state vector since E[L−1 Xt (L−1 Xt )0 ] = L−1 E[Xt Xt0 ](L−1 )0 = L−1 LL0 (L0 )−1 = IN . This transformation undoes the specific choice of the covariance structure implied by the initial state space representation. Next, the matrix U 0 combines these standardized components to create a BLRP state vector whose components are ordered in terms of their prediction ability. Finally, the diagonal matrix Σ1/2 weights each component according to how strongly it predicts the future path of the observable data. If the resulting state space basis is unique, then the transformed state space representation is a reduced form. Up to reflections which multiply the rows of U by −1 and permutations of the rows of U which correspond to repeated eigenvalues, this representation is unique, which means a state space representation based on a BLRP state time series is almost a reduced form. This result is summarized as a lemma: Lemma 2 For any Gaussian covariance stationary time series {Yt }, the set of BLRP state time series is finite. Any two BLRP are related by combining sign changes to the components of the state vector with permutations of those components which have equal explanatory power. This lemma significantly simplifies the task of finding a reduced form state space represen4 The eigenvectors contained in U and the eigenvalues in Σ arise from a principle components analysis + + + 0 0 0 , Y 0 , . . . )0 is the future of the matrix (E[Xt Xt0 ]1/2 )0 E[Xt Yt+1 (Yt+1 ) Xt ]E[Xt Xt0 ]1/2 where Yt+1 = (Yt+1 t+2 observable data. This is equivalent to performing orthonormalized partial least squares regression to forecast the future history of the data using the arbitrary state vector Xt . Provided infinite data is available, this + is also equivalent to orthonormal partial least squares to forecast Yt+1 using N linear combinations of the − 0 0 0 history: Yt = (Yt−1 , Yt−2 , . . . ) . The initial step of the estimation procedure in section 4 is based on this observation. 10 tation. Because there are only a finite number of possible BLRP state time series associated to the observable data and they are all related by basis reflections and state vector reordering, finding a reduced form only requires normalizing the basis orientation and state vector ordering. There are two loose ends that need to be accounted for in order to establish this result. First, the choice of the square root matrix L must not impact the transformation T ∗ . Second, the initial state time series must not influence the basis that is selected by this procedure, meaning that the BLRP state should not depend on the initial choice of state vector. The first loose end can be accounted for by using the fact that all matrix square roots are related by unitary transformations. All possible alternative roots must take the form LV with V 0 V = V V 0 = IN . When using the alternative root given by LV , the weight matrix is: (LV )0 W LV . The eigendecomposition of this matrix is V 0 L0 W LV = V 0 U Σ2 U 0 V . The resulting transformation matrix is unchanged since: Σ1/2 U 0 V (LV )−1 = Σ1/2 U V V 0 L−1 = Σ1/2 U L−1 = T ∗ . The rotation exactly cancels with itself, making the transformation relating an initial representation to the BLRP representation independent of the choice of the matrix square root. To resolve the second loose end, notice that if we had started from any alternative initial state time series, say X̃t = SXt with S some invertible matrix, then, using the same square root for E[Xt Xt0 ], the new state covariance matrix is E[X̃t X̃t0 ] = SLL0 S 0 . The new weight matrix doesn’t depend on S: 0 LS 0 = LS = L 0 0 0 ∞ X [(SAS −1 )0 ]`−1 (CS −1 )0 CS −1 (SAS −1 )`−1 `=1 ∞ X `=1 ∞ X ! SL ! S −1 0 `−1 (A ) SS −1 0 C CS −1 `−1 SA S −1 SL ! 0 `−1 (A ) 0 `−1 C CA L = L0 W L = W ∗ `=1 Since the matrix W ∗ is invariant to the initial representation, the eigenvalues contained in the diagonal matrix Σ2 are invariant parameters of the observable time series. Also, given the same square root is used to factor E[Xt Xt0 ], the eigenvectors in the orthonormal matrix U are pinned down (up to sign changes and possible re-ordering, as before). Since SL is the implied square root of E[X̃t X̃t0 ], the alternative choice of the state vector leads to a transformation matrix of T ∗ = Σ1/2 U 0 L−1 S −1 . The resulting BLRP state vector is T ∗ S −1 SXt = T ∗ Xt – the choice of the initial state vector is irrelevant for the resulting BLRP state vector. Therefore, the BLRP representation is unique up to sign changes to the rows of U , and permutations of state components i, j such that σi = σj . With lemma 2 at hand, it remains to choose normalizations which remove the possibility 11 of basis reflections and eigenvalue reordering. The later is inherently difficult, and instead I appeal to the fact that in any finite dataset the ordering will be unique with probability one. The equality of eigenvalues is a knife-edge condition which is not robust to small perturbations of the representation, and as a result the ordering is immaterial for reduced form estimation5 . The eigenvalues of L0 W L will be distinct for almost all time series, and so the BLRP state space representation is essentially unique once a standardization is chosen to remove the possibility of reflecting the state vector. To standardize the orientation of the basis, it is tempting to use positive orientation of the matrix U . However, this allows the initial C matrix to influence its transformed version ∗ of C ∗ = T ∗ C. The reason is that the p-th component of the state vector Xp,t and its ∗ negative −Xp,t are equally useful as latent state variables. The fundamental indeterminacy is then the sign of the correlation between the state time series and the observed time series. Standardizing U without accounting for the structure of C will not resolve this indeterminacy6 . Instead, it is essential to use some property of the observable data to place a sign restriction on the state time series. I propose a peak response criterion which standardizes the impulse responses of the observable data to innovations to the state time series. This criterion is based on the fact that the observability matrix associated to a minimal state space representation is full rank: rank(O∗ ) = N where O∗ ≡ [(C ∗ )0 , (C ∗ A∗ )0 , . . . , (C ∗ (A∗ )N −1 )0 ]0 is the observability matrix associated with a BLRP representation. This implies that any time t innovation to the p-th component of Xt∗ must be eventually detectable in observable data within the next N periods. One implication of this result is that the columns of O∗ are not identically zero. I propose a sign normalization which forces the largest magnitude entry of each column to be strictly positive. This choice implies that whichever observable series responds most strongly to the p-th state component over N periods must have a positive response at the peak during those N periods. This imposes N sign restrictions on the BLRP representation that normalize how the state variables relate to the observed time series. The following definition formalizes this normalization: Definition 5 A state-space representation G = (A, C, Q, R, S) or an innovations representation H = (A, B, C, Ω) has positive peak responses if, for each column j = 1, . . . , N of the 5 The ordering is important for structural identification. Identification should be robust to perturbations of the estimated reduced form which induce reordering. Checking robustness of identification to near-equal eigenvalues is left for future work. 6 If u0 = (u p,1 , . . . , up,N ) is the p-th row of U , then a positive orientation arises by normalizing its biggest p entry to be positive: max{|up,1 |, . . . , |up,N |} > 0. Since up is a unit vector, its largest entry must be nonzero, so this normalization is well defined. This choice does nothing to normalize the sign of moments such ∗ . as EYt+` Xi,t 12 observability matrix C CA , O= .. . CAN −1 the largest magnitude entry of the column is strictly positive: Oi∗ (j),j > 0 for i∗ (j) ≡ min arg maxi {|Oi,j |}. Using this sign normalization leads to the following theorem which is the central result of the paper: Theorem 1 Almost all Gaussian covariance stationary time series {Yt } have a unique BLRP innovations representation with positive peak responses. Therefore, this representation is almost surely a reduced form for {Yt }. To enforce the positive peak response criterion, each component of the BLRP state vector must be reflected to undo the orientation of the peak responses that are arbitrarily selected in the original representation. After imposing this sign normalization, to an estimated BLRP innovations representation of observable data and to a structural state space model’s BLRP innovations representation, the two can be matched for the purpose of identifying structural parameters. 4 Reduced Form Estimation The reduced form estimation procedure proceeds in two steps. The first step uses or- thonormalized partial least square regression to get an initial estimate of the reduced form. This sub-space based estimator is consistent and asymptotically normal, as documented in Bauer (2005a)7 . Next, I use this preliminary parameter estimate as an initial guess for maximum likelihood estimation based on the Kalman filter. Since the reduced form is unrestricted (maximally parameterized), this two step procedure greatly speeds up estimation since the initial estimate provides an inexpensive but consistent guess to initialize numerical likelihood maximization. Orthonormalized partial least squares regression (Worsley et al. (1997),Arenas-Garcı́a and Camps-Valls (2008)) is both a special case of three-pass-regression introduced by Kelly and Pruitt (2013), and is a type of sub-space estimator studied by Bauer (2005a,b, 2009). For completeness, I describe the implementation in detail. Given a proposed model order of N and estimation horizon of h ≥ N , the procedure finds 0 linear combinations of historical data Yt−,h = (Yt0 , Yt−1 , . . . , Yt−h+1 )0 which have maximal 7 See also Raknerud et al. (2010) for an application in macroeconomic forecasting. 13 0 0 0 covariance with future data Yt+,h = (Yt+1 , Yt+2 , . . . , Yt+h )0 , subject to the requirement that the components are orthogonal. The finite sample estimation procedure solves: max Φh,T ∈RN ×M h s.t. trace " −h TX Φh,T Yt−,h (Yt+,h )0 t=h # "T −h X Φh,T Yt−,h (Yt+,h )0 t=h #0 T −h X 1 Φh,T Yt−,h (Φh,T Yt−,h )0 = IN T + 1 − 2h t=h Given that the sample size and estimation horizon both become infinite and the data is a finite Gaussian process, this maximization problem gives a BLRP state space representation of the data. This occurs because each component of the resulting state vector Xt∗ = Φ∞,∞ Yt−,∞ is the linear combination of the historical data which best predicts the future path of the data, holding fixed the previous components of the vector. Because any state time series, Xt , is a sufficient statistic for this optimal predictor, the transformation T ∗ Xt is identical to Φ∞,∞ Yt−,∞ . Solving this estimation problem is equivalent to solving the generalized eigenvalue problem: − + 0 + − 0 − − 0 Yh,T (Yh,T ) Yh,T (Yh,T ) φ = λYh,T (Yh,T )φ +,h − + where Yh,T = (Yh−,h , . . . , YT−,h , . . . , YT+,h −h ) and Yh,T = (Yh −h ). The first N eigenvectors correspond to the rows of the matrix Φ̂h,T which solves the orthonormalized partial least squares problem. These linear combinations of the horizon h truncated history of the data account for the first N maximal variation orthogonal components of the horizon h truncated future data. The estimated BLRP state process (up to basis reflections) is given by X̂t∗ = Φ̂Yt−,h for each t = h, . . . , T . Note that due to finite samples, it is not necessarily the case that the resulting estimates for innovations to the state equation will be spanned by the contemporaneous innovations to the observation equation, as is required in an innovations representation. Instead, I initially use a general state-space representation which satisfies 14 the BLRP conditions. This representation is calculated using OLS regressions: Â ∗ = T X " ∗ X̂t∗ (X̂t−1 )0 t=h+1 ε̂ξt = X̂t∗ ∗ ∗ X̂t−1 (X̂t−1 )0 t=h+1 − Â T X #−1 T X ∗ ∗ X̂t−1 " Ĉ ∗ = ε̂Yt ∗ = Yt − Ĉ ∗ X̂t−1 Q̂∗ = ∗ Yt (X̂t−1 )0 t=h+1 T X #−1 ∗ ∗ X̂t−1 (X̂t−1 )0 t=h+1 T X 1 εξt (εξt )0 T −h−1 t=h+1 Ŝ ∗ = T X 1 εξt (εYt )0 T −h−1 t=h+1 R̂∗ = T X 1 εYt (εYt )0 T −h−1 t=h+1 This gives an initial state space estimate: Ĝ∗ = (Â∗ , Ĉ ∗ , Q̂∗ , Ŝ ∗ , R̂∗ ). Next, this representation can be converted to an innovations representation by solving for the implied Kalman gain to get Ĥ ∗ = (Â∗ , B̂ ∗ , Ĉ ∗ , Ω̂∗ ). By calculating the implied observability matrix Ô∗ and finding the signs of its largest magnitude column entries, I then reflect its state vector components so that this representation has positive peak responses. This procedure creates an initial reduced form estimate of a BLRP innovations representation. Although this estimate is consistent and asymptotically normal so long as the truncation horizon h is chosen according to a Bayesian information criterion (see Bauer (2009)), it can have non-negligible small sample bias due to the required horizon truncation. It is little additional computational cost to get a full information maximum likelihood estimate because this first step provides a consistent initial condition which speeds up numerical optimization. Implementing the Kalman filter requires imposing restrictions which enforce a BLRP representation. It turns out that the required restrictions can be easily implemented through two quadratic constraints. In order to construct these constraints, I first introduce the Hankel singular values of {Yt }. These arise from the Hankel matrix: 15 Definition 6 The Hankel matrix of {Yt } is the bi-infinite matrix: H 0 0 0 ≡ E[(Yt+1 , Yt+2 , . . . )0 (Yt0 , Yt−1 , . . . )] 0 0 0 Γ(1) Γ(2) E[Yt+1 Yt ] E[Yt+1 Yt−1 ] E[Yt+1 Yt−2 ] ... . . .. . . 0 ] E[Yt+2 Yt0 ] E[Yt+2 Yt−1 Γ(2) Γ(3) = = .. .. .. . . . E[Yt+3 Yt0 ] Γ(3) .. .. .. .. . . . . Γ(3) . . . .. .. . . .. . where Γ(j) = E[Yt Yt−j ] is the auto-covariance function of {Yt }. Note that up to deflating by the covariance matrix E[Yt , Yt0 ], this matrix contains the coefficients from an infinite number of population forecasting regressions. In particular, the forecasting regression of Yt+` on the history Yt , Yt−1 , . . . is represented by the `-th block-row of the Hankel matrix multiplied by E[Yt Yt0 ]−1 . Since the state space of the time series is finite by assumption 1, only N time series are necessary to completely characterize conditional expectations at any horizon. This means that this matrix contains a large amount of redundant information because only N predictor variables are necessary for forecasting. That is, there are N linear combinations of the columns of Hankel matrix which span H. This implication leads to the following lemma: Lemma 3 Under assumption 1, the Hankel matrix has rank equal to the dimension of the state space: rank(H) = N . In turn, the Hankel matrix has a finite spectral decomposition, and the square roots of its eigenvalues are the Hankel singular values of {Yt }: Definition 7 The Hankel singular values, denoted by σ = (σ1 , . . . , σN )0 , are equal to the p square roots of the ordered eigenvalues of the Hankel matrix: σi ≡ λi (H) for each i = 1, . . . , N and σ1 ≥ · · · ≥ σN . Indeed, the Hankel singular values are the invariant parameters which arose previously when calculating the transformation matrix which related an arbitrary state space representation and a BLRP representation. Now, consider the Lyupanov equations: E[Xt Xt0 ] ≡ M W = AM A0 + BΩB 0 = 16 A0 W A + C 0 C The first determines the variance of the state in an innovations representation. The second is the weighting matrix that was central for calculating a BLRP representation. By using the formula for the transformation matrix in these equations, it immediately follows that in a BLRP representation, the diagonal matrix containing the Hankel singular values solve both of these equations. That is M = W = Σ. This connection leads to the following theorem which characterizes BLRP innovations representations using equality restrictions: Theorem 2 The state time series {Xt∗ } is a BLRP with innovations representation H ∗ = (A∗ , B ∗ , C ∗ , Ω∗ ) if and only if it satisfies Σ h = A Σ(A ) + B Ω (B ) = A∗ Σ = ∗ ∗ 0 ∗ ∗ ∗ 0 h (A∗ )0 ΣA∗ + (C ∗ )0 C ∗ = (A∗ )0 B ∗ " i Σ 0 #" (A∗ )0 # 0 Ω (B ∗ )0 " #" # i Σ 0 A∗ (C ∗ )0 0 IM C ∗ where Σ contains the (ordered) Hankel singular values of {Yt }. Imposing these constraints8 during maximum likelihood estimation ensures that the estimated innovations representation is a BLRP representation. Since a BLRP representation is unique up to state vector reflections and re-ordering, this provides a simple method for reduced form estimation. Due to symmetry, each of these equations gives N (N + 1)/2 restrictions. Together, these equations define N (N + 1) restrictions. With the N degrees of freedom that come from choosing the diagonal of Σ, a BLRP representation is defined by N 2 effective restrictions on the innovations representation. Full likelihood estimation is then accomplished by using the Kalman filter to evaluate the likelihood at any choice of H ∗ , and maximizing this likelihood function subject to these two quadratic constraints. Note that, because these constraints are positive definite and imply a reduced form up to basis reflections, there are 2N maximum likelihood estimates which are all related by sign changes to the components of the state vector. Numerical optimization can be used to find one of these estimates, and then the implied observability matrix, Ô, allows the state space representation to be converted to a positive peak response representation. This procedure delivers the almost surely unique reduced form estimate. 8 Due to the quadratic positive definite structure, the solution is unique up to reflection for any given values of the representation invariant parameters Σ and Ω. Therefore, optimization can be improved by using the dual to the constrained maximum likelihood problem to concentrate out these parameters. 17 5 Identification Issues Identifying the structural parameters of a model requires mapping the structural model’s state space representation into the BLRP reduced form. From there, identification amounts to inverting the structural mapping from structural parameters to the BLRP reduced form. This general approach simplifies the analysis by creating a clean distinction between statistical inference and issues of identification. Even still, this second step is often problematic because many dynamic structural models are globally unidentified (Canova and Sala (2009), Morris (2013)). It is useful to discuss how to identify structural parameters generally using the reduced form representation. The key is to connect a general state space model to the reduced form provided by a BLRP innovations representation with positive peak responses. To illustrate ideas, I examine a small structural model which can be interpreted as the state space representation of an arbitrary dynamic stochastic general equilibrium model. In particular, suppose a macroeconomic theory suggests the following state-space model for observable data on quarterly GDP growth, CPI inflation, and the federal funds rate: ỹt ỹt−1 εZ t πt πt−1 a εt it = Φ1 (θ) it−1 + Φε (θ) επ t a a t t−1 εit k̃t k̃t−1 ∆GDPt ỹt − ỹt−1 − εZ t πt ∆CP It = F F Rt it The model has five latent theoretical factors. First, ỹt denotes output relative to a stochastic trend in labor augmenting technology. This series is assumed to be observable via the growth rate of GDP, up to the innovation in the stochastic trend. The quarterly inflation rate, πt , is assumed to be directly observed as the CPI inflation rate. The nominal interest rate in the model is observed as the Federal Funds Rate. The remaining two factors are not directly observable and consist of a total factor productivity shock, at , and the level of the capital a π i 0 stock relative to the trend in technology, k̃t . The structural shock vector εt = (εZ t , εt , εt , εt ) is a standard multivariate normal shock containing innovations to the stochastic trend, total factor productivity, a cost-push shock driving inflation dynamics, and monetary policy. For brevity, I take the matrices Φ1 (θ) and Φε (θ) as given. They might arise by solving a theoretical model (say through linearization of a non-linear DSGE model), and depend on the structural parameter vector θ. The objective is to estimate θ (provided that it is identified), by relating this state-space model to estimates of a BLRP reduced form for the data. First, we need to map this state-space model into the general state space representation of the previous section. I denote the data vector by Yt = (∆GDPt , ∆CP It , F F Rt )0 . To 18 relate the structural state space representation of the model to the estimated BLRP reduced form for Yt , I must transform the structural model into a BLR reduced form representation. In principle, the parameter vector θ can then be recovered by solving the implied system of non-linear equations in θ. The theoretical state vector is ξt = (ỹt , πt , it , at , k̃t )0 . Substituting in the theoretical model’s state equation gives a measurement equation in terms of this state vector. The implied structural state space representation for the data is: ξt = Φ1 (θ)ξt−1 + Φε (θ)εt | {z } εξt ∆GDPt ∆CP It = Yt F F Rt ( = h I3 " i 03,2 Φ1 (θ) − ( h + I3 −1 01,N −1 02,1 02,N −1 #) ξt−1 " i 03,2 Φε (θ) − 1 01,2 02,1 02,2 {z | εY t #) εt } This structure is a restricted version of the general state space representation previously considered. Next, I solve for the Kalman gain, B(θ), associated with this state space system and de( " #) ( " h i h i 1 01,2 1 note the residual variance by Ω(θ) = I3 03,2 Φε (θ) − I3 03,2 Φε (θ) − 02,1 02,2 02,1 to get the innovations representation of Xt Yt Vt = = iid ∼ Φ1 (θ)Xt−1 + B(θ)Vt ( " h i −1 I3 03,2 Φ1 (θ) − 02,1 01,N −1 02,N −1 #) Xt−1 + Vt N (0, Ω(θ)) Assuming that all of the state variables matter for the observable output, so that they are all detectable, this is a minimal system which we will put into its BLRP representation in order to match up with an estimated BLRP reduced form for the observable data. Due to theorem 1, the mapping that relates the innovations state vector Xt to the positive peak response BLRP state vector Xt∗ is almost surely unique. Section 3 shows how to construct this mapping from any given innovations representation. Denoting it by T ∗ (θ). 19 01,2 02,2 #)0 we can write the reduced form of the structural model as: Xt∗ = Yt = Vt iid ∼ ∗ T ∗ (θ)Φ1 (θ)T ∗ (θ)−1 Xt−1 + T ∗ (θ)B(θ)Vt ( " #) h i −1 01,N −1 ∗ T ∗ (θ)−1 Xt−1 + Vt I3 03,2 Φ1 (θ) − 02,1 02,N −1 N (0, Ω(θ)) Note that the white noise component of the data does not get transformed because the transformation matrix only influences how the BLRP state maps into the observables. Let H ∗ = (A∗ , B ∗ , C ∗ , Ω∗ ) be the positive peak response BLRP innovations representation estimated from an infinite sample of data. Now, the identification procedure amounts to solving the following system of non-linear equations: A∗ = T ∗ (θ)Φ1 (θ)T ∗ (θ)−1 B∗ = T ∗ (θ)B(θ) ( " h i −1 = I3 03,2 Φ1 (θ) − 02,1 C ∗ Ω∗ = 01,N −1 02,N −1 #) T ∗ (θ)−1 Ω(θ) If there is a unique value for θ which solves this system of equations, then the model is identified. Given consistent and asymptotically normal estimates of the BLRP reduced form parameters, a minimum-chi-square estimator based on minimizing the difference between the left hand and right hand sides of these equations will be asymptotically equivalent to maximum likelihood estimation (Hamilton and Wu (2012)). Assessing whether or not the model is identified is non-trivial. A simple count of coefficients can be problematic for two reasons. First, implicit parameter restrictions need to be accounted for. As usual, there are implicit parameter restrictions which require the covariance matrices to be symmetric and positive semidefinite. These restrictions reduce the number of effective parameters in the Ω∗ equation. More subtly, the reduced form representation involves implicit parameter restrictions which arise from normalizing the state space representation to become a positive peak response BLRP representation. Second and generally difficult to resolve, the right hand side is non-linear and may have multiple solutions even if there are the same number of parameters to be recovered as estimated degrees of freedom. This second issue significantly complicates identification. As shown in Morris (2013), the right hand side of this expression is usually sufficiently nonlinear so that there are multiple solutions for the structural parameters. 20 6 Conclusion This paper has introduced a reduced form representation for general dynamic macroe- conometric models. This representation enables an econometrician to separate issues of statistical inference from identification considerations. By first estimating this state space reduced form representation for the data and then inverting the mapping from the structural parameters to the reduced form parameters, economists can conduct structural inference in general dynamic models. This type of approach has been used successfully in contexts where identification and likelihood estimation of the full structural model is problematic (e.g. Hamilton and Wu (2012)), but until now required a VAR reduced form. The BLRP reduced form enables any state space model to be estimated using this technique. The key insight is to order and normalize state variables in terms of their ability to predict the entire future path of observable data. This idea leads to the concept of a best linear recursive predictor for the data, a choice of the latent state variable which is an optimal predictor at any level of truncation. The resulting representation can be constructed taking advantage this recursive criterion. Up to trivial reflection and ordering transformations, this concept leads to a reduced form representation for general finite state linear time series. An attractive byproduct of this choice of state normalization is that it can be directly estimated using orthonormal partial least squares. The BLRP criterion is in fact equivalent to the ordering and normalization that occurs in such a principle components based subspace estimator. However, the representation can also be completely characterized in terms of a system of quadratic matrix equations, and this result enables constrained maximum likelihood estimation for improved efficiency. Further work will demonstrate the application of the method. Salient applications include estimation of term structure models, no-arbitrage macro-finance models, and dynamic stochastic general equilibrium models. I plan to generalize to allow for deterministic time varying parameter changes, which will enable the estimation of shadow rate Gaussian affine term structure models such as Wu and Xia (2014). It will also be essential to develop simple tools that can assess the invertibility of the mapping from the structural parameter vector to the reduced form representation, and to examine non-standard inference in the set-identified case. References B. Anderson and J. Moore. Optimal Filtering. trical Engineering. Dover Publications, 2012. Dover Books ISBN 9780486136899. on ElecURL http://books.google.com/books?id=iYMqLQp49UMC. J. Arenas-Garcı́a and G. Camps-Valls. Efficient kernel orthonormalized pls for remote sens- 21 ing applications. Geoscience and Remote Sensing, IEEE Transactions on, 46(10):2872– 2881, 2008. K. Arun and S. Kung. Balanced approximation of stochastic systems. SIAM Journal on Matrix Analysis and Applications, 11(1):42–68, 1990. D. Bauer. Estimating Econometric Theory, linear 21(1):pp. dynamical 181–211, systems 2005a. using subspace ISSN 02664666. methods. URL http://www.jstor.org/stable/3533632. D. Bauer. Asymptotic properties of subspace estimators. 376, Mar. 2005b. ISSN 0005-1098. Automatica, 41(3):359– doi: 10.1016/j.automatica.2004.11.012. URL http://dx.doi.org/10.1016/j.automatica.2004.11.012. D. Bauer. Estimating armax systems for multivariate time series using the state ap- proach to subspace algorithms. Journal of Multivariate Analysis, 100(3):397 – 421, 2009. http://dx.doi.org/10.1016/j.jmva.2008.05.008. ISSN 0047-259X. doi: URL http://www.sciencedirect.com/science/article/pii/S0047259X08001486. F. Canova and L. Sala. models. Back to square one: Journal of Monetary Economics, Identification issues in DSGE 56(4):431–449, May 2009. URL http://ideas.repec.org/a/eee/moneco/v56y2009i4p431-449.html. C. F. Christ. The cowles commission’s contributions to econometrics at chicago, 19391955. Journal of Economic Literature, 32(1):pp. 30–59, 1994. ISSN 00220515. URL http://www.jstor.org/stable/2728422. P. Faurre. Stochastic realization algorithms. In R. Mehra and D. Lainiotis, editors, System Identification: Advances and Case Studies, volume 126 of Mathematics in Science and Engineering, pages 1–25. Academic Press, New York, 1976. S. Fujishige, H. Nagaij, and Y. Sawaragi. System-theoretical approach to model reduction and system-order determination. International Journal of Control, 22(6):807–819, 1975. R. Giacomini. The relationship between DSGE and VAR models. CeMMAP working papers CWP21/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies, May 2013. URL http://ideas.repec.org/p/ifs/cemmap/21-13.html. J. D. Hamilton and J. C. Wu. term structure models. Identification and estimation of Gaussian affine Journal of Econometrics, 168(2):315–331, 2012. URL http://ideas.repec.org/a/eee/econom/v168y2012i2p315-331.html. L. P. Hansen and T. J. Sargent. Recursive models of dynamic linear economies. Princeton University Press, 2013. 22 B. Kelly and S. Pruitt. The three-pass regression filter: A new approach to forecasting using many predictors. Working paper, 2013. S. Morris. Global identification of dsge models. Technical report, University of California, San Diego, 2013. A. C. Pugh. The mcmillan degree of a polynomial system matrix. Journal of Control, 24(1):129–135, 1976. doi: International 10.1080/00207177608932810. URL http://www.tandfonline.com/doi/abs/10.1080/00207177608932810. A. Raknerud, T. Skjerpen, and A. R. Swensen. ables from a large number of predictors: Forecasting, 29(4):367–387, 2010. Forecasting key macroeconomic vari- a state space approach. ISSN 1099-131X. Journal of doi: 10.1002/for.1131. URL http://dx.doi.org/10.1002/for.1131. C. R. Rao. The use and interpretation of principal component analysis in applied research. Sankhy: The Indian Journal of Statistics, Series A (1961-2002), 26(4):pp. 329–358, 1964. ISSN 0581572X. URL http://www.jstor.org/stable/25049339. L. Sun, S. Ji, S. Yu, and J. Ye. On the equivalence between canonical correlation analysis and orthonormalized partial least squares, 2009. K. J. Worsley, J.-B. Poline, K. J. Friston, and A. Evans. Characterizing the response of pet and fmri data using multivariate linear models. NeuroImage, 6(4):305–319, 1997. J. C. Wu and F. D. Xia. Measuring the macroeconomic impact of monetary policy at the zero lower bound. Working paper, 2014. 23
© Copyright 2026 Paperzz