RNy, econ4160 autumn 2015
Lecture note 6 (supplement to lecture 7 and 8)
1. GIVE and 2SLS equivalence.
The equivalence of ^GIV and b 2SLS can be shown “both ways”. DM page 323-324 has a argument
that shows the equivalence by starting from b 2SLS .
In the slide set to Lecture 6 we take the opposite direction. This note …lls in a couple of
steps. We start by the GIV estimator:
c 0 X1 )
^1;GIV = (W
1
with
c1 =
W
and
b1
Y
Z1
Z1
X1 =
1 c0
W1 y1
(1)
;
(2)
Y1
(3)
b 1 = PW Y1 :
Y
1
(4)
where the matrices are given in the slide set, in particular:
PW1 = W1 (W10 W1 )
1
W10
(5)
c 0 X1 becomes:
Using the partitioning, the product W
1
c 0 X1 =
W
1
=
so
Z1
Z01
b0
Y
1
^1;GIV =
0
b1
: Y
: Y1
0
: Y1
Z1
0
Z1
Z1 Z1
b 0 Z1
Y
1
=
0
Z1 Y1
b 0 Y1
Y
1
1
0
b 0 Y1 . Start with the second:
We now focus on Z1 Y1 and Y
1
Z1 Z1
b 0 Z1
Y
1
0
Z1 y1
b 0 y1
Y
1
0
Z1 Y 1
b 0 Y1
Y
1
b 0 Y1 = (PW Y1 )0 Y1
Y
1
1
The reduced form residual e1 matrix is given by
e1
where
Y1
PW1 Y1 = MW1 Y1
| {z }
b1
Y
MW1 = I
PW1
Using this we get
b 1 + e1 )
(PW1 Y1 )0 Y1 = (PW1 Y1 )0 (Y
0
b1 = Y
b0Y
b1
= Y0 P Y
1
1
W1
1
(6)
0
since (PW1 Y1 )0 e1 = Y10 PW1 (I PW1 )Y1 = 0.1
0
b 0 Z1 = Z0 Y1 . We can start by writing Y
b 0 Z1 :
Next, consider Z1 Y1 . We want to show that Y
1
1
1
b 0 Z1 = (Y0
Y
1
1
e01 )Z1 = Y10 Z1
e01 Z1
Here e01 Z1 = 0 (since the e¤ect of the exogenous Z1 has been regressed out, but try to show
formally if you want). But then
0
b 0 Z1 = Y0 Z1 , Z0 Y
b
Y
1
1
1 1 = Z1 Y1
0
0
b 0 Y1 by Y
b0Y
b
b
Replacing Y
1
1 1 and Z1 Y1 by Z1 Y1 we get
which is the expression for ^1;2SLS .
In sum we have:
^1;GIV =
b
11;GIV
b
0
=
1
0
Z1 Y1
b 0 Y1
Y
1
1
1
0
0
Z1 y1
b 0 y1
Y
1
(8)
Z1 Z1
b 0 Z1
Y
1
Z1 Z1
b 0 Z1
Y
1
!
21;GIV
=
!
b
Z01 Y
0
b
b
Y1 Y1
0
Z1 Z1
b 0 Z1
Y
1
^1;GIV =
(7)
!
b1
Z01 Y
0
b Y
b1
Y
0
Z1 y1
b 0 y1
Y
1
1
b 0 y1
Y
1
0
Z1 y1
= ^1;2SLS :
(9)
2. Another equivalent expression of the GIV-estimator (GIVE)
Let us again consider over-identi…cation: l1 > k1 = k11 + k12 .
A di¤erent method to obtain unique moments is to post-multiply W1 by the matrix J1 so
that W1 J1 is an IV-matrix of dimension n k1 and which satis…es (??)-(??).
J1 must be l1 k1 . If we choose
W1 J1 = PW1 X1
as the IV matrix (remember that PW1 = W1 (W10 W1 )
0
J1 = (W1 W1 )
1
(10)
W10 ) we can determine J1 as
1
0
(11)
W1 X1
which has the correct dimension l1 k1 .
Let us, for the time being denote the estimator that uses W1 J1 as IV-matrix by ^1;JGIV
^1;JGIV = ((W1 J1 )0 X1 )
1
= ((PW1 X1 )0 X1 )
0
= (X1 PW1 X1 )
1
(W1 J1 )0 y1
1
X01 PW1 y1
X01 PW1 y1
which is the expression for the GIV estimator in equation (8.29) in DM (p 321).
But we have already
c 0 X1 ) 1 W
c 0 y1 :
^1;GIV = (W
1
1
with
c1 =
W
1 Remember:
b1
: Y
Z1
;
0
PW1 = PW1
0
PW1 PW1 = W1 (W10 W1 )
1
W10
2
0
W1 (W10 W1 )
1
W10 = PW1
(12)
b 1 = PW Y1
Y
1
from before. Is there an internal inconsistency? Can ^1;GIV =
6 ^1;JGIV ?
The answer is "no", since
c1 =
W
b1
: Y
Z1
=
P W1 Z1
c 1 in ^1;GIV by PW X1 and obtain
we can replace W
1
b1
: PW1 Y
= PW1 X 1
c 0 X1 )
^1;GIV = (W
1
=
1 c0
W1 y 1
0
(X1 PW1 X1 ) 1 X01 PW1 y1
= ^1;JGIV
3. IV-criterion function
The projection matrix PW1 is also central in the IV-criterion function Q(
on page 321:
Q( 1 ; y1 ) = (y1 X1 1 )0 PW1 (y1 X1 1 ):
1 ; y1 )
which DM de…nes
(13)
note the close relationship to the sum of squared residuals.
Minimization of Q( 1 ; y1 ) wrt 1 gives the 1 oc:
0
X1 PW1 (y1
X1 ^1;GIV ) = 0
(14)
which gives ^1;GIV as solution
^1;GIV = (X01 PW X1 )
1
1
X01 PW1 y1
Note that from (10)
0
0
X1 PW1 = J1 W10
we can write (14) as:
0
0
J1 W1 (y1
X1 ^1;GIV ) = 0
(15)
We can then represent the just identi…ed case by setting J1 to a non-singular k1
1
Pre-multiplication in (15) by (J01 ) gives
0
W1 (y1
k1 matrix.
X1 ^1;GIV ) = 0
con…rming that in the just identi…ed case, ^1;GIV = ^1;IV .
In the just identi…ed case the minimized value of the IV-criterion function is zero:
Q( ^1;IV ; y1 ) = (y1
=
=
X1 ^1;IV )0 PW1 (y1
^
X1 1;IV )
0
1
0
^ IV;1 W1 (W1 W1 ) W1 ^IV;1
^0 IV;1 W1 (W10 W1 ) 1 [W10 ^IV;1 ] = 0
0
(16)
from the orthogonality between instrumental variables and IV-residuals.
4. Overidenti…cation tests (Sargan test and J-test)
When l1 k1 1 the validity of the over-identifying instruments can be tested, see DM Ch 8.6
The basic intuition is the following: If the instruments are valid, they should have no
signi…cant explanatory power in an (auxiliary) regression that has the GIV-residuals as regressand.Simplest case. Assume that there is one endogenous explanatory variable in the …rst structural equation.
The regression of ^GIV;1 on the instruments and a constant produces an R2 that we can call
2
RGIV res :The test called Speci…cation test when IV-estimation Pc-Give single equation is used is:
2
Speci…cation test = nRGIV
res
3
2
a
(l1
k1 )
(17)
under the H0 of valid instrumental variables. (Remember that since we now use the notation of
DM, n represent the number of observations.
As noted by DM on p 338 this Speci…cation-test is rightly called a Sargan-test , because of
the contribution of the english econometrician Denis Sargan who worked on IV-estimation theory
during the 1950s and 1960s.
In terms of computation, the Sargan-test.can be computed from the IV criterion function:
From (16) Q( ^1;IV ; y1 ) = 0, the IV-residuals are orthogonal to all instrumental variables
In the over-identi…ed case, Q( ^1;GIV ; y1 ) > 0 since the GIV-residuals are uncorrelated with
the optimal instruments, but not each individual.
Therefore, a test of the validity of the over-identi…cation can be based on Q( ^1;GIV ; y1 )
^
Q( 1;IV ; y1 ):
Q( ^1;GIV ; y1 )
2
(l1 k1 )
(18)
Speci…cation test=
a
^12
where ^12 is the usual consistent estimator for 12 .
The two ways of computing the Sargan Speci…cation test are numerically-identical.
Other econometric software-packages often report the “J-tests”or “Hansen test”for instru2
ment validity. This test uses the F -statistic from the auxiliary regression instead of RGIV
res :
J
2
test = l1 FGIV
res
2
a
(l1
k1 )
5. Distribution of the GIV estimator
As noted at the end of Lecture 7, with reference to the asymptotic nature of the three IV requirements it is not surprising that the known properties of the IV and GIVE estimators are
asymptotic.
In the case of overidenti…cation, and maintaining,
1
2
1 Inxn )
= IN (0;
(19)
^1;GIV and ^1;2SLS are consistent and asymptotically e¢ cient. The use of optimal instruments
based on the reduced form of the system-of-equations (and (19)) are the main drivers behind this
result.
If (19) cannot be maintained, or if we want to take into account correlation between 1 and
other disturbances in the SEM, other estimation methods are more e¢ cient: GMM and 3SLS for
example.
Based on their own premisis, the associated “t-ratios”etc of these estimation methods have
asymptotic standard normal distributions. .
6. GMM
GMM is an extension of the GIV estimator that accounts for heteroskedasticity and autocorrelation.
A famous paper by Lars Petter Hansen from 1982 (joint Nobel prize winner 2013) was the “.. the
…rst to propose GMM estimation under that name“ (Davidson and MacKinnon page 367).
Consider again equation # 1 in a SEM as in DM page 522:
y1 = X1
1+ 1
(20)
Memo:
X1 =
1
=(
: Y1
Z1
11
:
21
(21)
0
)
(22)
– y1 is n 1, with observations of the variable that equation # 1 in the SEM is normalized
on.
– Z1 is n k11 with observations of the k11 included predetermined or exogenous variables.
4
– Y1 , n
k12 holds the included endogenous explanatory variables.
Change assumption about hite-noise disturbances to autocorrelated/heteroskedastic disturbances:
0
E( 1 1 ) = 1 .
(23)
Consider the just-identi…ed or over-identi…ed case (by the rank and order conditions)
0
When E( 1 1 ) = IN (0; 12 Inxn ) we know how to …nd optimal instruments: Use the reduced
form to …nd the linear combination of exogenous and predetermined variables that give the best
0
predictors of the variables in Y1 . When we have the more general E( 1 1 ) = 1 , the best predictors
for Y1 should take the correlation structure between the random variables in 1 into account.
Generalized (weighted) LS analogy: The regressors in the model are weighted so as to whiten
the residuals to get back to the “classical assumptions”.
In the case of IV estimation, the weights must include the instruments as well— this leads
to Generalized Method of Moments
In the same way as for the GIV estimator, the GMM estimator of 1 can be found by
minimization of the GMM-criterion function:
QGM M (
1;
1 ; y1 )
= (y1
0
1 ) PgW1 (y1
X1
X1
1 ):
(24)
where
PgW1 = W1 (W10
1 W1 )
1
W10 :
(25)
is the generalization of the GIV prediction maker matrix.
Assume that 1 is a matrix of known parameters. The …rst order conditions (i.e. Method
of Moments!) are then compactly written as
0
X1 Pg W1 (y1
X1 ^1;GM M ) = 0
(26)
which gives ^1;GM M as the solution
^1;GM M = (X01 PWg1 X1 )
1
X01 PW1 y1
If you “write it out”, you get expression (9.10) on page 355 in DM (with some small changes
in notation). The generalization of the J1 matrix from GIV estimation is found from:
0
0
X1 PgW1 = Jg1 W10
and becomes
Jg1 = (W10
1 W1 )
1
W10 X1
(27)
which gives the optimal weights to the instruments in the GMM case.
GMM and IV: Summary
1. When E(
2. When 1 =
weights).
3.
2
1= 1 I
0
1 1)
=
2
1 I:
1
6= I, ^1;GM M is asymptotically more e¢ cient than ^1;GIV .
^1;GM M = ^1;GIV and Jg1 =
2
1
J1 (the reduced form gives the optimal iv
and exact identi…cation: ^1;GM M = ^1;GIV = ^IV
4. If the structural equation is a regression model, as in a recursive system of equations, we can
set Z1 = X1 and the optimal matrix with instruments is W1 = X1 . So ^1;GM M = ^1;OLS
in this case.
Some further notes
A. Feasible GMM.
We Have seen that GMM is the generalization of GIVE, to the case of non-white-noise error
terms. Just as GLS generalized OLS estimation to that case.
5
In the same way as with GLS, to make GMM a feasible of practical method, 1 has to be
estimated by a consistent estimator b 1 . Use GIV residuals to get a …rst estimate.
Can choose to iteration over ^1;GM M and/or b 1 .
The covariance matrix of the feasible GMM estimator is
Vd
ar( ^1;GM M ) = (X1 W1 (W10 ^ 1 W1 )
1
W10 X1 )
1
(28)
and t-ratios are asymptotically N (0; 1) under their respective null hypotheses. So the inference procedure is as are before.
B. Hansen-Sargan test for GMM
Since the GMM criterion function only depends on 1 trough the square matrix W10
it is not surprising that the IV Spec…cation test is also de…ned for GMM.
1 W1
It is usually interpreted as a test of the validity of the instruments (orthogonality conditions),
but as Davidson and MacKinnon (DM) discuss on
– p 338, for IV
– p 367-368 for GMM
the second possibility is that the Speci…cation test becomes signi…cant when an explanatory variable has incorrectly been classi…ed as an instrument instead of (correctly) as an
predetermined of exogenous variable (i.e., would go into W1 via Z1 not via X01 ).
The general implication of a signi…cant speci…cation test is therefore re-speci…cation.
C. GMM for non-linear equations
Economic theory of intertemporal decisions leads to Euler-equations that are formulated as
(say) l orthogonality conditions that are similar to the moments conditions
If the number of parameters to be estimated is less than or equal to l, we have identi…cation.
The moments conditions can be linear or non-linear in parameters
The non-linear GMM can be obtained by minimization of certain quadratic forms. Chapter
9.5 is a relatively advanced chapter on non-linear GMM that you use as a reference if you
apply this method.
7. GMM and IV. An example estimation the Euro Area NPCM
DSGE macro models to a large extent are made up of structural equations that are:
…rst order conditions of agents’intertemporal optimization problems
price and wage equations that are based on so called Calvo-pricing: New Keynesian Phillips
curves (NPCMs)
We will look at a famous example of IV estimation of a PCM from the euro currency area.
Let pt be the log of a price level index. The (hybrid) PCM states that in‡ation, de…ned as
pt pt pt 1 , is explained by Et ( pt+1 ), expected in‡ation one period ahead conditional upon
information available at time t, lagged in‡ation, and a variable xt , often called a forcing variable,
representing excess demand or marginal costs (e.g., output gap, the unemployment rate or the
wage share in logs):
pt = bfp1 Et ( pt+1 ) + bbp1 pt 1 + bp2 xt + "pt ;
(29)
The disturbance term "pt is assumed to be white-noise.
6
Theory predicts (a little simpli…ed) : 0 < bfp1 < 1 and 0 < bbp1 < 1 and bp2 > 0 if xt is
measured by the logarithm of the wage-share.
As noted, Et ( pt+1 ), represents the conditional expectations of pt+1 given the information
available in period t. We cannot …nd Et ( pt+1 ) from equation (29) alone. To any make progress,
we need a to “ complete the” system which then will represent the information set that the …rms
condition expectatioin on.
The term ’forcing variable’suggests that xt is exogenous, so we specify the following (simplest
possible) system:
pt = bfp1 Et ( pt+1 ) + bbp1 pt
xt = bx1 xt
1
+ "xt ,
1
+ bp2 xt + "pt
(30)
1 < bx1 < 1
(31)
where we assue that "xt is white-noise, and uncorrelated with "pt
The NPC in terms of observables
A popular approach is to substitute the theoretical parameter Et ( pt+1 ) by the lead-variable
pt+1 :
After substitution, the NPC is:
pt = bfp1 pt+1 + bbp1 pt
1
+ bp2 xt +
the disturbance pt contains the forecast-error f pt+1
noise variable "pt .
As usual with error-in-variables models,
(32)
pt
Et ( pt+1 )g in addition to the white
pt+1 is correlated with the
pt .
OLS on (32) is inconsistent, since this is an example if a measurement-error model.
Need (at least) IV estimation to claim consistency.
However, it can be shown that
pt
follows a …rst order Moving-Average process
In principle GMM should thfore give more e¢ cient estimation.
Euro-area NPC
We replicate the results in European in‡ation dynamics, Gali, Gertler and Lopez-Solado,
European Economic Review (2001).
Reference: Econometric evaluation of the New Keynesian Phillips Curve, Bårdsen, Jansen
and Nymoen Oxford Bulletin of Economics and Statistics (2004)
With the log of the wage-share wst as the forcing variable (xt ):
pt
=
0:60
pt+1 +
(0:06)
0:35
pt
1
+
(0:06)
0:03 wst +
(0:03)
0:08
(0:06)
(33)
GMM, T = 107 (1972 (4) to 1997 (4))
2
J
Note that
2
J
(8) = 6:74 [0:35]
(8) is the J form of the Speci…cation test mentioned above.
Note also that there are 8 overidentifying restrictions, indicating that implicitly GGL had a
larger NPC-system in mind.
The results in (33) are not very robust to the details about how we estimate ^ 1 . When we
change the details of the GMM algorithm and choose to iterate over ^ 1 , we get
pt
=
0:731
pt+1 + 0:340
(0:052)
(0:069)
pt
1
0:042 wst
(0:029)
(34)
0:102
(0:070)
GMM, T = 107 (1971 (3) to 1998 (1))
2
J
(8) = 7:34 [0:50]
7
Note the sign-change for the coe¢ cient of the wage-share variable. So in this example GMM
is not robust.
GIV estimation results:
pt
=
0:66
(0:14)
pt+1 +
0:28
pt
(0:12)
1
+
0:07 wst +
(0:09)
0:10
(0:12)
(35)
2SLS; T = 104 (1972 (2) to 1998 (1))
2
Speci…cation (6) = 11:88[0:06]
Misspeci…cation tests show that there is heavy residual autocorrelation in (35).
Consistent with NPC but could also be a result misspeci…cation:
for omitted current and lagged variables.
8
pt+1 acting as a proxy
© Copyright 2026 Paperzz