Erik Biørn Version of February 9, 2005 ECON4160

Erik Biørn
Version of February 9, 2005
ECON4160 ECONOMETRICS –
MODELLING AND SYSTEMS ESTIMATION
Lecture note no. 8
ESTIMATION OF OVERIDENTIFIED EQUATIONS
BY TWO-STAGE LEAST SQUARES (2SLS)
In this note, we consider estimation methods for two models, A and B, each having one overidentified equation and one non-identified (and hence non-estimable)
equation. We want to estimate the overidentified equation by Two-Stage Least
Squares (2SLS). The identifiable equation in Model A contains only two endogenous variables, i.e., only one endogenous RHS variable. In Model B we extend
the specification, by also including an exogenous variable in addition to the two
endogenous ones, so that the identifiable equation in this model contains two RHS
variables, one endogenous and one exogenous.
1. Model A
1.1. Model and OLS in two versions
The two-equation model we consider has SF
(A.1)
(A.2)
where
(A.3)
(A.4)
y1t + β12 y2t
+ γ10 = u1t ,
β21 y1t + y2t + γ21 x1t + γ22 x2t + γ20 = u2t ,
E(ukt | X) = 0,
(
E(ukt urs | X) =
t = 1, . . . , T ; k = 1, 2,
σkr , s = t,
0,
s=
6 t,
t, s = 1, . . . , T,
k, r = 1, 2.
The model’s RF equations are
(A.5)
(A.6)
y1t = Π10 + Π11 x1t + Π12 x2t + ε1t ,
y2t = Π20 + Π21 x1t + Π22 x2t + ε2t ,
where
(A.7)
ε1t =
u1t − β12 u2t
,
1 − β12 β21
1
ε2t =
− β21 u1t + u2t
,
1 − β12 β21
satisfying, because of (A.3),
(A.8)
cov(ε1t , x1t ) = cov(ε1t , x2t ) = cov(ε2t , x1t ) = cov(ε2t , x2t ) = 0.
From (A.5) – (A.7) it follows that y1t and y2t are both correlated with u1t and
u2t if β12 6= 0 and β21 6= 0, regardless of whether the SF-disturbance covariance
σ12 is unrestricted or set to zero a priori. Therefore OLS estimation of (A.1) is
inconsistent (simultaneity bias). This would hold both when we estimate β12 by
(a) regressing y1t on y2t , i.e., using OLS on
(A.9)
y1t = − β12 y2t − γ10 + u1t ,
or (b) by regressing y2t on y1t inverse regression, i.e., using OLS on
(A.10)
y2t = − (1/β12 ) y1t − γ10 /β12 + u1t /β12 .
Regressions (a) and (b) would have given, respectively, the estimators
OLS2
βb12
= −
(A.11)
M [y1 , y2 ]
,
M [y2 , y2 ]
and
M [y2 , y1 ]
M [y1 , y1 ]
OLS1
⇐⇒ βb12
= −
.
M [y1 , y1 ]
M [y2 , y1 ]
Now, since, from (A.9), (A.11), and (A.12), we have
(A.12)
OLS1
1/βd
= −
12
OLS2
βb12
= −
M [−β12 y2 − γ10 + u1 , y2 ]
M [u1 , y2 ]
= β12 −
,
M [y2 , y2 ]
M [y2 , y2 ]
M [−β12 y2 − γ10 + u1 , y1 ]
M [u1 , y1 ]
= β12 −
.
M [y2 , y1 ]
M [y2 , y1 ]
and since both plim(M [u1 , y2 ]) and plim(M [u1 , y1 ]) are non-zero, as y1t and y2t are
both correlated with u1t , it follows that
OLS1
βb12
=−
OLS1
plim(βb12
) 6= β12 ,
OLS2
plim(βb12
) 6= β12 .
The OLS estimators of β12 based on original and on inverse regression are different
and are both inconsistent.
1.2. Application of instrumental variables
We then turn to estimation of β12 by instrumental variables. We define
(A.13)















An observable variable is said
to be an instrumental variable for y2t
relative to (A.9) if it is
(i) correlated (theoretically) with y2t and
(ii) uncorrelated (theoretically) with u1t .
2















Symmetrically,
(A.14)















An observable variable is said
to be an instrumental variable for y1t
relative to (A.10) if it is
(i) correlated (theoretically) with y1t and
(ii) uncorrelated (theoretically) with u1t /β12 .















Assume that z, with observations zt (t = 1, . . . , T ), satisfies (A.13). We then have
from (A.1)
(A.15)
cov(y1t , zt ) + β12 cov(y2t , zt ) = cov(u1t , zt ).
Utilizing properties (i) and (ii) in (A.13) it follows that
(A.16)
β12 = −
cov(y1t , zt )
.
cov(y2t , zt )
If we had renormalized (A.1) into
(A.17)
(1/β12 )y1t + y2t + γ10 /β12 = u1t /β12 ,
we would have obtained
(A.18)
(1/β12 )cov(y1t , zt ) + cov(y2t , zt ) = cov(u1t /β12 , zt ).
Assuming that z, with observations zt (t = 1, . . . , T ), satisfies (A.14) and utilizing
properties (i) and (ii) in (A.14), it follows that
(A.19)
(1/β12 ) = −
cov(y2t , zt )
cov(y1t , zt )
⇐⇒ β12 = −
.
cov(y1t , zt )
cov(y2t , zt )
We now define the IV estimator of β12 in (A.9) with z as IV for y2 as the estimator we obtain by replacing in (A.16) the theoretical by the empirical covariances.
This gives
M [y1 , z]
IV
βb12
= −
(A.20)
.
M [y2 , z]
Symmetrically, we define the IV estimator for 1/β12 in (A.17) with z as IV for y1
by replacing in (A.18) the theoretical by the empirical covariances. This gives
(A.21)
dIV
(1/β
12 ) = −
M [y1 , z]
M [y2 , z]
IV
⇐⇒ βb12
= −
.
M [y1 , z]
M [y2 , z]
Obviously (A.20) and (A.21) give the same estimator:. Using z as an IV for y2 in
(A.9) gives the same estimator as using z as an IV for y1 in (A.17). This common
IV estimator is consistent, since
(A.22)
IV
= −
plimβb12
cov(y1t , zt )
plim(M [y1 , z])
= −
= β12 .
plim(M [y2 , z])
cov(y2t , zt )
3
1.3. Choosing instruments
How do we choose the IV, z, here? In Model A, both exogenous variables, none of
which occur in the equation under investigation, i.e., x1t and x2t , are candidates.
First, we see from the RF equation (A.6) that they are correlated with both y1t
and y2t and therefore satisfy claim (i) in both (A.13) and (A.14). Second, being
exogenous, they are uncorrelated with u1t and therefore also satisfy claim (ii) in
both (A.13) and (A.14). Inserting z = x1 and z = x2 in (A.20) or (A.21), we obtain
(A.23)
IV 1
βb12
= −
M [y1 , x1 ]
,
M [y2 , x1 ]
(A.24)
IV 2
βb12
= −
M [y1 , x2 ]
,
M [y2 , x2 ]
Hence, we get two different IV estimators of β12 in (A.9), both of which are consistent.
1.4. An alternative interpretation
These two IV estimators can also be obtained by a slightly different reasoning: We
IV 1
IV 2
can, formally, consider βb12
and βb12
as obtained by replacing in the expression
OLS2
b
for the OLS estimator β12 , given by (A.11), y2 by x1 , respectively x2 , once in
each second-order moment – we do not replace y2 by its IV in the equation
IV 1
IV 2
under estimation. Symmetrically, βb12
and βb12
can be considered the result of
OLS1
replacing in the OLS estimator based on the renormalized equation, βb12
, given
by (A.12), y1 by x1 , respectively x2 , once in each second order moment.
1.5. Optimal instruments
We next consider the more general, and for practical purposes more interesting,
possibility of choosing linear combinations of x1t and x2t as IVs for y2t in (A.9) or
for y1t in (A.17). In particular, we want to choose the linear combination which is
as strongly correlated with y1t , respectively y2t , as possible. We obtain this by
b +Π
b x +Π
b x ,
yb1t = Π
10
11 1t
12 2t
b +Π
b x +Π
b x .
yb2t = Π
20
21 1t
22 2t
(A.25)
(A.26)
The rationale for this is given in Lecture note no. 7, section 6.
We therefore use zt = yb2t as IV for y2t i (A.9), which gives the optimal IV
estimator
M [y1 , yb2 ]
IV ∗2
(A.27)
= −
.
βb12
M [y2 , yb2 ]
Symmetrically, we can use zt = yb1t as IV for y1t i (A.10), which gives the alternative
optimal IV estimator
(A.28)
1
IV ∗1
βb12
= −
M [y2 , yb1 ]
M [y1 , yb1 ]
IV ∗1
⇐⇒ βb12
= −
4
M [y1 , yb1 ]
.
M [y2 , yb1 ]
Now, the normal equations for OLS applied to the RF equations (A.5) and
(A.6) imply
T
X
εbkt =
t=1
T
X
εbkt x1t =
t=1
T
X
εbkt x2t = 0, k = 1, 2,
t=1
where εbkt = ykt − ybkt is the OLS residual corresponding to the RF disturbance εkt .
This implies
M [εbk , xr ] =
T
T
1X
1X
εbkt (xrt − x̄k ) =
εbkt xrt = 0,
T t=1
T t=1
k, r = 1, 2.
Combining this with (A.25) and (A.26), we obtain
M [εbk , ybl ] = 0,
k, l = 1, 2,
and hence we have the important identity
(A.29)
M [yk , ybl ] = M [ybk + εbk , ybl ] = M [ybk , ybl ], k, l = 1, 2.
Consequently, the optimal IV estimator, (A.27), can be written in three equivalent ways
(A.30)
IV ∗2
βb12
=−
M [y1 , yb2 ]
M [y1 , yb2 ]
M [yb1 , yb2 ]
=−
=−
.
M [y2 , yb2 ]
M [yb2 , yb2 ]
M [yb2 , yb2 ]
Symmetrically, the alternative IV estimator, (A.28), can be written in three alternative ways as
(A.31)
1
βbIV ∗1
=−
12
M [y2 , yb1 ]
M [y2 , yb1 ]
M [yb2 , yb1 ]
=−
=−
M [y1 , yb1 ]
M [yb1 , yb1 ]
M [yb1 , yb1 ]
⇐⇒
IV ∗1
βb12
=−
M [yb1 , yb1 ]
M [yb1 , yb1 ]
M [y1 , yb1 ]
=−
=−
.
M [y2 , yb1 ]
M [y2 , yb1 ]
M [yb2 , yb1 ]
1.6. The optimal IV estimator coincides with the 2SLS estimator
Interpreted from the second expression in (A.30) we see that the optimal IV estimator of β12 coincides with the estimator obtained by taking OLS regression of y1
on yb2 . We denote the latter as the Two-Stage Least Squares estimator of β12 in the
original equation (A.9). Formally
(A.32)
2SLS2
=−
βb12
5
M [y1 , yb2 ]
.
M [yb2 , yb2 ]
This name indicates that the estimator is formed by replacing y2t by yb2t in (A.9)
and estimating the equation obtained by OLS. The name is very natural: We use
OLS in two stages: First on the RF to obtain yb2t -values, and in the second stage
on the actual SF equation with y2t replaced by yb2t as RHS variable.
Symmetrically, interpreted from the second expression in (A.31) we see that
the optimal IV estimator of 1/β12 coincides with the estimator obtained by taking
OLS regression of y2 on yb1 . We denote the latter as the Two-Stage Least Squares
estimator of 1/β12 in the renormalized equation (A.17). Formally,
(A.33)
1
2SLS1
βb12
=−
M [y2 , yb1 ]
M [yb1 , yb1 ]
2SLS1
⇐⇒ βb12
=−
M [yb1 , yb1 ]
.
M [y2 , yb1 ]
This name indicates that the estimator is formed by replacing y2t by yb1t in (A.10)
and estimating the equation obtained by OLS. Again, we use OLS in two stages:
First on the RF to obtain yb1t -values, and in the second stage on the actual SF
equation with y1t replaced by yb1t as RHS variable.
1.7. A third interpretation of the 2SLS estimator
We can also interpret the 2SLS estimator of β12 from the third expression in (A.30)
and the 2SLS estimator of 1/β12 from the third expression in (A.31). We then get
2SLS2
βb12
=−
(A.34)
M [yb1 , yb2 ]
,
M [yb2 , yb2 ]
which follows when we in (A.9) replace both y1t by yb1t and y2t by yb2t and estimate
the resulting equation by OLS. The prescription then is the following: We use OLS
twice, once on both of the model’s RF equations, to get yb1t - and yb2t -values, and once
on the actual SF equation, with y1t and y2t replaced by, respectively, yb1t and yb2t .
Symmetrically,
(A.35)
1
2SLS1
βb12
=−
M [yb2 , yb1 ]
M [yb1 , yb1 ]
2SLS1
⇐⇒ βb12
=−
,
M [yb1 , yb1 ]
M [yb2 , yb1 ]
which follows when we in (A.10) replace both y1t by yb1t and y2t by yb2t and estimate
the resulting equation by OLS. The prescription then is the following: We use
OLS twice, once on both of the model’s RF equations, to get yb1t - and yb2t -values,
and once on the actual SF equation, with y1t and y2t replaced by, respectively, yb1t
and yb2t . We then remove both the RHS and the LHS variables (both endogenous)
from their RF residuals. In the second stage, we apply the “purified” values in a
new OLS estimation.
2SLS2
2SLS1
, since
6= βb12
We see from (A.34) and (A.35) that we, in general, have βb12
there is nothing in our data which should ensure that M [yb1 , yb2 ]/M [yb2 , yb2 ] were to
equal M [yb1 , yb1 ]/M [yb2 , yb1 ]. If the latter were to hold, then yb1 and yb2 would have
6
had to be perfectly correlated (R[yb1 , yb2 ]2 = 1), which could only hold by accident.
This exemplifies the following general property of the 2SLS: The 2SLS estimators
in an overidentified equation is not invariant to the choice of normalization of the
equation. Hence, like for the simple IV, there is a kind of arbitrariness also for the
2SLS method in situations with overidentification.
2. Model B
2.1. The model
We construct Model B from Model A by including x3t as an additional exogenous
variable in (A.1) and (A.2). The SF equations of Model B therefore become
(B.1)
(B.2)
y1t + β12 y2t
+ γ13 x3t + γ10 = u1t ,
β21 y1t + y2t + γ21 x1t + γ22 x2t + γ23 x3t + γ20 = u2t ,
and its RF equations become
(B.3)
(B.4)
y1t = Π10 + Π11 x1t + Π12 x2t + Π13 x3t + ε1t ,
y2t = Π20 + Π21 x1t + Π22 x2t + Π23 x3t + ε2t ,
where (A.7) still holds. We rewrite (B.1) as
(B.5)
y1t = (− β12 )y2t + (− γ13 )x3t + (− γ10 ) + u1t .
2.2. Application of instrumental variables
We then again turn to estimation by instrumental variables. We now, in principle,
need two instruments and define
(B.6)















Two observable variables are said
to be instrumental variables for y2t and x3t
relative to (B.5) if they are
(i) correlated (theoretically) with y2t and x3t and
(ii) uncorrelated (theoretically) with u1t .















To fulfill (i), the IVs should, more precisely, be chosen such that (i.a) Each instrument is correlated with at least one of the variables y2t , x3t . (i.b) The two IVs
should be linearly independent. How can we ensure this?
Assume that z1t and z2t are possible IVs. Utilizing (B.1) and claim (ii) it follows
that
(B.7)
cov(y1t ,z1t ) + β12 cov(y2t ,z1t ) + γ13 cov(x3t ,z1t ) = cov(u1t ,z1t ) = 0,
cov(y1t ,z2t ) + β12 cov(y2t ,z2t ) + γ13 cov(x3t ,z2t ) = cov(u1t ,z2t ) = 0.
7
Claim (i), made precise by (i.a) and (i.b), implies that the matrix
"
cov(y2t , z1t ) cov(x3t , z1t )
cov(y2t , z2t ) cov(x3t , z2t )
#
has full rank, 2. Then
"
(B.8)
β12
γ13
#
"
=−
cov(y2t , z1t ) cov(x3t , z1t )
cov(y2t , z2t ) cov(x3t , z2t )
#−1 "
cov(y1t , z1t )
cov(y1t , z2t )
#
.
exists. Eqs. (B.7) – (B.8) can be denoted as the theoretical IV normal equations
associated with (B.5). Replacing in (B.8) the theoretical covariances by their empirical counterparts, we obtain the IV estimators of β12 and γ13

(B.9)

IV
βb12
IV
γb13


 = − 
M [y2 , z1 ] M [x3 , z1 ]
M [y2 , z2 ] M [x3 , z2 ]
−1 


M [y1 , z1 ]
M [y1 , z2 ]

.
How do we find IVs, z1 and z2 , for our particular problem? The model has
three exogenous variables, two of which, x1t and x2t , do not occur in (B.5). One
possibility is to let x1t or x2t , or a linear combination of them, be IV for y2t and let
x3t be IV for itself. More generally, we could seek our IVs among those satisfying
the two linear equations in all the model’s three exogenous variables, i.e.,
(B.10)
z1t = a10 + a11 x1t + a12 x2t + a13 x3t ,
z2t = a20 + a21 x1t + a22 x2t + a23 x3t ,
where the a’s are so far unspecified constants. The only requirements they should
satisfy are (i.a) and (i.b). This, for instance, implies that we cannot choose the a’s
such that a21 /a11 = a22 /a12 = a23 /a13 , leaving z1t and z2t linearly dependent.
2.3. Choosing the optimal instruments
We now want to choose a10 , . . . , a23 such that z1t shows the highest possible correlation with y2t , while z2t shows the strongest correlation with x3t . This will minimize
the asymptotic variance of the estimators. (The proof for this generalizes the similar proof in Lecture note no. 7, sections 3-5.) The optimal choice in this sense
is
b +Π
b x +Π
b x +Π
b x ,
z1t = yb2t = Π
20
21 1t
22 2t
23 3t
(B.11)
z2t = x3t ,
b ,Π
b ,Π
b ,Π
b are the OLS estimators of the RF coefficients in (B.4). The
where Π
20
21
22
23
reason for this is the following:
b ,Π
b ,Π
b ,Π
b ), maximizes the
(i) Letting z1t = yb2t , i.e., (a10 , a11 , a12 , a13 ) = (Π
20
21
22
23
empirical correlation of z1t ’s with y2t subject to our model.
8
(ii) Letting z2t = x3t , i.e., (a20 , a21 , a22 , a23 ) = (0, 0, 0, 1), ensures a correlation
coefficient between x3t and its IV equal to one: x3 is IV for itself.
Inserting (B.11) into (B.9), we get the optimal IV estimators for β12 and γ13 :

(B.12)
IV ∗
βb12

IV ∗
γb13


 = − 
M [y2 , yb2 ] M [x3 , yb2 ]
−1 


M [y2 , x3 ] M [x3 , x3 ]

M [y1 , yb2 ]
.
M [y1 , x3 ]
2.4. The optimal IV estimator coincides with the 2SLS estimator
We define the 2SLS estimators of (β12 , γ13 ) by
"
(B.13)
2SLS
βb12
2SLS
γb13
"
#
= −
M [yb2 , yb2 ] M [x3 , yb2 ]
M [yb2 , x3 ] M [x3 , x3 ]
#−1 "
M [y1 , yb2 ]
M [y1 , x3 ]
#
.
The procedure we then follow is:
• First stage: Use OLS on the RF equation for y2t , (B.4), in order to obtain
yb2t . We then regress y2t on all the model’s three exogenous variables, x1t , x2t
and x3t – including x3t , which also occurs as RHS variable in (B.5). This
transformation purifies the y2t -values for the OLS residuals, εb2t .
• Second stage: Replace the observed y2t -values with the purified ones in the
SF equation (B.5), and apply OLS on the equation we then obtain. The
exogenous variable, x3t , is kept unchanged.
2.5. A third interpretation of the 2SLS estimator
The 2SLS estimators can alternatively be written as

(B.14)

2SLS
βb12
2SLS
γb13


=− 
M [yb2 , yb2 ] M [x3 , yb2 ]
M [yb2 , x3 ] M [x3 , x3 ]
−1 


M [yb1 , yb2 ]
M [yb1 , x3 ]

,
This notation is less common than (B.13). The prescription we then follow is:
• First stage: Use OLS on the RF equation for both y1t and y2t , (B.3) and
(B.4), in order to obtain yb1t and yb2t . We then regress y1t and y2t on all the
model’s three exogenous variables, x1t , x2t and x3t – including x3t which also
occurs as a RHS variable in (B.5). This transformation removes from the
y1t - and y2t -values the OLS residuals, εb1t and εb2t .
• Second stage: Replace the observed y1t - and y2t -values by the “purified” ones
in the SF equation (B.5), and apply OLS on the equation we then obtain.
The exogenous variable x3t is kept unchanged.
9
The equivalence of these two procedures again follows from the the fact that the
OLS residuals are always empirically uncorrelated with the RHS variables and
hence also uncorrelated with the computed value of the LHS variable, cf. (A.29).
Exercise: Repeat the procedures in sections 2.2 – 2.4 after having normalized
(B.5) with respect to y2t . Show that neither in this case the 2SLS estimators will
be invariant to the normalization. Do you think a third normalization of (B.5),
with respect to x3t , would have been of interest for estimation?
10