Econometrics (I)

Econometrics (I)
Mei-Yuan Chen
Department of Finance
National Chung Hsing University
Sept. 11, 2007
M.-Y. Chen
Econometrics
Econometrics
σ, α, β, ab ,
Rb
a
f (x)dx
1. Econometrics is the study of estimation and inference for
economic models using economic data.
2. Econometric theory concerns the study and development
of tools and methods for applied econometric applications.
3. Applied econometrics concerns the application of these
tools to economic dat
4. Financial Econometrics is the study of estimation and
inference for economic models using economic data.
M.-Y. Chen
Econometrics
Economics Variables
An economic variable is a measure which is well defined by
economic theories to evaluate the outcome or performance of
an economic activity in a market, a society, a country, or
whole the world. For examples, in the national account,
Y = C + I + G + (X − M), where Y is the GDP, C is the
national private consumption, I is the investment of private
sector, G is the government expenditure, X is the export and
M is the import. These are all economic variables of a country.
M.-Y. Chen
Econometrics
Economics Random Variables
An economic random variable assigns different real values onto
the possible outcomes or performances of an economic activity
or a market condition via the formal definition from economic
theories.
M.-Y. Chen
Econometrics
For examples, denote Y(2006,·) as the GDP of all countries in the world on
2006, i.e.,
Y(2006,·) : {U S, Japan, U K, T aiwan, China, etc} → {y(2006,i) , i = 1, 2, 3, . . .}.
On the other hand, denote Y(·,4) as the GDP of Taiwan from 1950 to
2006, i.e.,
Y(·,4) : {1950, 1951, 1952, etc} → {y(t,4) , t = 1950, 1951, 1952, . . .}.
A sample from Y(2006,·) consists of a sample of cross-sectional data and
from Y(·,4) consists of a sample of time series data. Denote z(t,i) as the
(t, i)th element of {[y(1950,i)
y(1951,i)
···
y(2006,i) , i = 1, 2, 3, . . .]′ },
then {z(t,i) , t = 1, . . . , T, i = 1, . . . , N } makes a sample of the panel
data.
M.-Y. Chen
Econometrics
Denote X(2006,·) as the national consumption of all countries in the world
on 2006, i.e.,
X(2006,·) : {U S, Japan, U K, T aiwan, China, etc} → {x(2006,i) , i = 1, 2, 3, . . .}.
On the other hand, denote X(·,4) as the GDP of Taiwan from 1950 to
2006, i.e.,
X(·,4) : {1950, 1951, 1952, etc} → {x(t,4) , t = 1950, 1951, 1952, . . .}.
A sample from X(2006,·) consists of a sample of cross-sectional data and
from X(·,4) consists of a sample of time series data. Denote z(t,i) as the
(t, i)th element of {[x(1950,i)
x(1951,i)
···
x(2006,i) , i = 1, 2, 3, . . .]′ },
then {z(t,i) , t = 1, . . . , T, i = 1, . . . , N } makes a sample of the panel
data.
M.-Y. Chen
Econometrics
Measures for a Random Variable
The goal of statistics for a random variable X (univariate
random variable) is to explore or demonstrate the distribution
of all possible realizations (real values) plotted on a real line
via measures for location (expectation, E(X)), dispersion
(variance, var(X)), skewness (skewness coefficient, α3 (X)),
and kurtosis (kurtosis coefficient, α4 (X)) by a sample of data
randomly (drawn independently and identically) from the
random variable under study.
M.-Y. Chen
Econometrics
Distribution function of a Random Variable
Generally, the distribution of a random variable X can be
represented by a cumulative distribution function (CDF) F (X)
which is a function of x, ∀x. Its definition is
Z c
F (c) =
dF (x)
−∞
Z c
=
f (x)dx,
if F (x) is differentiable ∀x,
−∞
where dF (x) denote the increment of function F (·) at x and
f (x) is called the probability density function (pdf) of X.
M.-Y. Chen
Econometrics
Definition of Measures
E(X) =
=
2
σX
=
Z
∞
−∞
Z ∞
−∞
Z ∞
−∞
∞
=
Z
−∞
α3 (X) =
α4
=
xdF (x)
xf (x)dx,
if F (x) is differentiable ∀x,
(x − E(X))2 dF (x)
(x − E(X))2 f (x)dx,
if F (x) is differentiable ∀x,
E[(X − E(X))3 ]/σx3
E[(X − E(X))4 ]/σx4 .
M.-Y. Chen
Econometrics
Given a random sample {x1 , x2 , . . . , xn } from X (the sample is
independent and identical,i.e., i.i.d.),
P
1. x̄n = ni=1 xi /n ⇒ E(X);
Pn
2. s2x = i=1 (xi − x̄n )2 /(n − 1) ⇒ var(X);
Pn
3. α̂3 (X) = [ i=1 (xi − x̄n )3 /(n − 1)]/s3x ⇒ α3 (X);
P
4. α̂4 (X) = [ ni=1 (xi − x̄n )4 /(n − 1)]/s4x ⇒ α4 (X).
Note that the random sample {x1 , x2 , . . . , xn } is i.i.d. means
1. the sampling distribution of each observation xi is the same as the
population distribution of X. That is,
E(xi ) = E(X), var(xi ) = var(X), α3 (xi ) = α3 (X), and
α4 (xi ) = α4 (X) for all i;
2. cov(xi , xj ) = 0, that is xi and xj are independent, for all i 6= j.
M.-Y. Chen
Econometrics
Assume the random variable X is normally distributed (the
sampling distribution of each sample observation
xi , i = 1, . . . , n is same as X), then
1. x̄n has a normal sampling distribution with mean E(X)
and variance var(X)/n;
2. (n − 1)s2x /var(X) has the sampling distribution χ2 (n − 1).
M.-Y. Chen
Econometrics
For the null hypothesis H0 : E(X) = µ0 , under X is normally
distributed, the test statistic
tx̄n =
x̄n − µ0
x̄ − µ
= n √ 0,
sx̄n
sx / n
under the null, has the sampling distribution of t(n − 1).
M.-Y. Chen
Econometrics
For the null hypothesis H0 : var(X) = σ02 , under X is normally
distributed, the test statistic
φs2x =
(n − 1)s2x
,
σ02
under the null, has the sampling distribution of χ2 (n − 1).
M.-Y. Chen
Econometrics
Bivariate Random Variables
Put X(2006,·) and Y(2006,·) to define bivariate random variables
as
(X(2006,·) , Y(2006,·) )
:
{US, Japan, UK, T aiwan, China, etc}
→ {(x(2006,i) , y(2006,i) ), i = 1, 2, 3, . . .}.
Similarly, Put X(·,4) and Y(·,4)
(X(·,4) , Y(·,4) )
:
{1950, 1951, 1952, etc}
→ {(x(t,4) , y(t,4) ), t = 1950, 1951, 1952, . . .}.
M.-Y. Chen
Econometrics
The goal of statistics for bivariate random variables is to
explore or demonstrate the distribution of all possible
realizations (real values), (x(2006,i) , y(2006,i) ) or (x(t,4) , y(t,4) ),
plotted on a real X − Y plane via measures for location
(expectation), dispersion (variance), skewness (skewness
coefficient), and kurtosis (kurtosis coefficient) for each
univariate random variable and correlation between these two
random variables by a sample of data randomly from these two
random variable under study.
M.-Y. Chen
Econometrics
Joint Distribution Function and Probability Density
Function
Generally, the joint cumulative distribution of X and Y is
written as F (X, Y ) and is a function of (x, y), ∀x, y and is
defined as
F (a, b) =
=
Z
a
−∞
Z a
−∞
Z
b
−∞
Z b
dF (x, y)
df (x, y)dxdy,
−∞
if F (x, y) is differentiable ∀x, y
where dF (x, y) is the increment of function F (·, ·) and
f (x, y) is the joint probability density function of X and Y at
(x, y).
M.-Y. Chen
Econometrics
Covariance and Correlation between X and Y
The covariance between X and Y is defined as
cov(X, Y ) = E{[X − E(X)][Y − E(Y )]}
Z ∞Z ∞
(x − E(X))(y − E(Y ))dF (x, y)
=
−∞ −∞
Z ∞Z ∞
=
(x − E(X))(y − E(Y ))f (x, y)dxdy.
−∞
−∞
M.-Y. Chen
Econometrics
The correlation coefficient between X and Y is defined as
ρX,Y
= cov(ZX , ZY ) = E(ZX ZY )
cov(X, Y )
p
= p
,
var(X) var(Y )
where ZX and ZY denote the Z-score transformations of X
and Y , i.e.,
X − E(X)
ZX = p
,
var(X)
M.-Y. Chen
Y − E(Y )
ZY = p
.
var(Y )
Econometrics
A random sample of pair observations,
{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}, is drawn from bivariate
random variables (X, Y ) indicates that
1. each pair of observation (xi , yi) has the joint sampling
distribution same as (X, Y ) distributed, xi has the
sampling distribution as X distributed and yi has the
sampling distribution as Y distributed.
2. the joint sampling distribution of the pair of observation
(xi , yi) is independent of the one of (xj , yj ), the sampling
distribution of xi is independent of the one of xj , and the
sampling distribution of yi is independent of the one yj
for i 6= j.
M.-Y. Chen
Econometrics
The sample covariance of a random sample
{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} is defined as
n
sx,y
1 X
(x − x̄n )(yi − ȳn ).
=
n − 1 i=1 i
The sample correlation is defined as
sx,y
ρ̂x,y = p p ,
s2x s2y
where s2x and s2y are the sample variance of X and Y from the
sample.
M.-Y. Chen
Econometrics
Marginal Distribution Functions
The marginal cumulative distribution function of X is
Z ∞
FX (x) =
dF (x, y)
−∞
and the marginal probability density function of X is
Z ∞
fX (x) =
f (x, y)dy.
−∞
Similar definitions is applied to Y .
M.-Y. Chen
Econometrics
Conditional Distribution Functions
The conditional cumulative distribution function of Y at X = c is
F (c, y)
FY |X=c (x, y) = R ∞
−∞ dF (c, y)
and the conditional probability function of Y at X = c is
f (c, y)
.
−∞ f (c, y)dy
fY |X=c (x, y) = R ∞
Generally, the conditional distribution of Y on X = x is written as
Y |X = x and can be viewed as a univariate random variable and
which states the distribution of Y varies with the variables X in
the population. Therefore, Y |X is the function of random
variables along different realizations of X.
M.-Y. Chen
Econometrics
Conditional Mean and Variance
The conditional mean of Y at X = c is defined as
Z ∞
E(Y |X = c) =
ydFY |X=c (x, y)
−∞
Z ∞
=
yfY |X=c (x, y)dy
−∞
and the conditional variance of Y at X = c is defined as
Z ∞
var(Y |X = c) =
(y − E(Y |X = c))2 dFY |X=c (x, y)
Z−∞
∞
=
(y − E(Y |X = c))2 fY |X=c (x, y)dy.
−∞
In general, E(Y |X = x) = m(x) and var(Y |X = x) = v(x)
are functions of x.
M.-Y. Chen
Econometrics
Regression Equation
The conditional regression error e|X = c is defined to be the difference
between realizations of Y |X = c (the pair realizations are (c, y)) and its
conditional mean E(Y |X = c) = m(c):
(e|X = c) = (Y |X = c) − m(c),
which is also a random variable. Its mean
E(e|X = c) = E(Y |X = c) − E(m(c)) = m(c) − m(c) = 0.
By construction, all realizations of Y conditional X = c (denoted as
y|X = c) satisfies
(y|X = c) = m(x) + (e|X = c).
M.-Y. Chen
Econometrics
In general, for different values of x, all paired realizations of
(x, y) can be represented as
y = m(x) + e,
where e is called the regression error. Properties of regression
error:
1. E(e|x) = 0 for all x;
2. E(e) = 0;
3. E[h(x)e] = 0 for any function of h(·);
4. E(xe) = 0.
M.-Y. Chen
Econometrics
Best Predictor of Y on X = c
Let g(c) be an arbitrary predictor (demonstration) of Y given
X = c. The expected squared error using this prediction
function is
E[(Y |X = c) − g(c)]2
=
=
=
E[m(c) + (e|X = c) − g(c)]2
E[(e|X = c)2 ] + 2E[(e|X = c)(m(c) − g(c))] + E[(m(c) − g(c))2 ]
E[(e|X = c)2 ] + E[(m(c) − g(c))2 ] ≥ E[(e|X = c)2 ],
by the third property of the regression error. The right-hand
side is minimized by setting m(c) = g(c). This concludes the
condition mean m(x) is the best predictor of Y condition on
X = x.
M.-Y. Chen
Econometrics
Conditional Variance of Y on X = c
The conditional variance of Y on X = c is defined as
σ 2 (c) = var(Y |X = c)
= E[(Y |X = c) − E(Y |X = c)]2
= E[(e|X = c)2 ] = E[e2 |X = c].
Generally, σ 2 (x) is a non-trivial function of x. The conditional
standard deviation is
σ(x) =
p
σ 2 (x), ∀x.
M.-Y. Chen
Econometrics
1. the regression error is homoskedastic if
σ 2 (x) = E(e2 |X = x) = σ 2 is constant for all x;
2. the regression error is heteroskedastic if
σ 2 (x) = E(e2 |X = x) is not constant for all x.
M.-Y. Chen
Econometrics
The most commonly applied econometric tool is regression.
This is used when the goal is to quantify the impact of one set
of variables (the regressors, conditioning variable, or
covariates) on another variable (the dependent variable). Let y
denote the dependent variable and (x1 , x2 , . . . , xk ) denote the
k regressors. It is convenient to write the set of regressors as a
vector in Rk . As the conditional mean is the best predictor of
y on (x1 , x2 , . . . , xk ), it is usually studied in econometric
analysis.
M.-Y. Chen
Econometrics
Linear Conditional Mean Models
Suppose a linear condition mean is considered, i.e.,
E(Y |x) = m(x) = β10 x1 + β20 x2 + · · · + βk0 xk ,
which is linear in function of x. The goal of a linear regression
analysis is to collect a random sample to get information about
β10 , β20 , · · · , βk0 by estimations and then to verify the results
implied by economic (finance) theory by inference. And then,
y|x = β10 x1 + β20 x2 + · · · + βk0 xk + e|x
= x′ β 0 + e|x.
M.-Y. Chen
Econometrics
Introduce an Intercept Term
Given the conditional mean specified previously,
E(Y |x) = m(x) = β10 x1 + β20 x2 + · · · + βk0 xk ,
zero unconditional mean is assumed, since
E(Y |X) = 0 = E(Y ) given βi0 = 0, ∀i. To get ride of this
restriction, an alternative specification for conditional mean is
E(Y |x) = m(x) = β10 + β20 x2 + · · · + βk0 xk .
The unconditional mean is set to be β10 .
M.-Y. Chen
Econometrics
Best Linear Predictor
For any β ∈ Rk a linear predictor for y on X = x is x′ β which with
expected squared prediction error
S(β)
= E[y − x′ β]2
= E(y 2 ) − 2β′ E(xy) + β ′ E(xx′ )β
which is quadratic in β. The best linear predictor is obtained by selecting
β to minimize S(β). That is
β∗ = arg min S(β).
β
The first-order condition for minimization is
∇β S(β)
set
= −2E(xy) + 2E(xx′ )β = 0.
We have
β ∗ = [E(xx′ )]−1 E(xy).
M.-Y. Chen
Econometrics
Given β ∗ = [E(xx′ )]−1 E(xy), denote e∗ = y − x′ β ∗ and
then we have
E(xe∗ ) = E[x(y − x′ β ∗ )]
= E{x[y − x′ E(xx′ )−1 E(xy)]}
= E{E[x[y − x′ E(xx′ )−1 E(xy)]|x]}
= E{xE(y|x) − xx′ (xx′ )−1 xE(y|x)}
= 0.
Notice that the error e∗ from the linear prediction
equation(x′ β ∗ ) is equal to the error from the regression
equation, e = y − m(x), when (and only when) the
conditional mean is linear in x; otherwise they are distinct.
M.-Y. Chen
Econometrics
Linear Multiple Regression Models
Given a sample of T observations,
{(yt , xt2 , . . . , xtk ), t = 1, . . . , T }, this specification can also be
expressed as the identification condition:
yt = β10 + β20 xt2 + · · · + βk0 xtk + et ,
for t = 1, . . . , T . This equation is called a linear multiple
regression model. In matrix notation, above equation can be
written as
y = Xβ 0 + e,
where β 0 = (β10 β20 · · · βk0 )′ is the vector of unknown
parameters.
M.-Y. Chen
Econometrics
(1)
y and X contain all the observations of the dependent and
explanatory variables, i.e.,



y=


y1
y2
..
.
yT



,





X=


1 x12
1 x22
..
..
.
.
1 xT 2
···
···
..
.
···
x1k
x2k
..
.
xT k



 = [ℓ X 2


···
X k ],
where each column vector of X contains T observations for
an explanatory variable and ℓ is a T × 1 vector with all
elements 1. xt is the column vector denoted as
[1 xt2 · · · xtk ]′ .
M.-Y. Chen
Econometrics
Linear Mutiple Regression Models
The basic “identifiability” requirement of this specification is
that the number of regressors, k, is strictly less than the
number of observations, T , such that the matrix X is of full
column rank k. That is, the model does not contain any
“redundant” regressor. It is also typical to set the first
explanatory variable as the constant one so that the first
column vector of X is a T × 1 vector of ones, ℓ. To summary,
the identification condition is ID 1:
yt = β10 + β20 xt2 + β30 xt3 + · · · + βk0 xtk + et , t = 1, . . . , T .
M.-Y. Chen
Econometrics
Estimation for the Best Linear Predictor
Given the sample average is the counterpart of population
expectation, that is, the sample moments
T
1X
Ê(xy) =
xy,
T t=1 t t
T
1X
′
Ê(xx ) =
xt xt
T t=1
′
are used to estimate the population moments E(xy) and
E(xx′ ), respectively. Therefore, the estimator for the best
P
′
linear predictor is, given [ Tt=1 xt xt ]−1 exists,
∗
β̂ T = [Ê(xx′ )]−1 Ê(xy)
T
T
X
X
′ −1
= [
xt xt ]
xt yt = (X ′ X)−1 X ′ y.
t=1
t=1
M.-Y. Chen
Econometrics
Another Derivation for Estimation of the Best
Linear Predictor
Given the fourth property of regression error, E(xe) = 0 and
the linear conditional mean, denote
′
g(β) = E[xt (yt − xt β)] = 0. Therefore, the sample
counterpart of g(β) is
ĝ(β) =
T
1X
′
xt (yt − xt β)
T t=1
M.-Y. Chen
Econometrics
By setting ĝ(β) = 0 and solving the system of k equations,
the solution of the parameters β, β̃ T satisfy
T
1X
′
0 = ĝ(β̃) =
xt (yt − xt β̃)
T t=1
T
T
1X
1X
′
=
xt yt −
xt xt .
T t=1
T t=1
We have the estimator derived by the method of moment as
T
T
X
X
′ −1
β̃ T = [
xt xt ]
xt yt = (X ′ X)−1 X ′ y
t=1
t=1
∗
which is same as β̂ T .
M.-Y. Chen
Econometrics
OLS Estimator
Our objective now is to find a k-dimensional regression
hyperplane that “best” fits the data (y, X). For obtaining an
OLS estimators, we must minimize the average of the sum of
squared errors:
Q(β) :=
1
(y − Xβ)′ (y − Xβ).
T
M.-Y. Chen
Econometrics
(2)
OLS Estimator: First Order Condition
The first order conditions for the OLS minimization problem
are:
∇β Q(β) = ∇β (y ′ y − 2y ′ Xβ + β ′ X ′ Xβ)/T
= −2X ′ (y − Xβ)/T
set
= 0,
the last equality can also be written as
set
X ′ Xβ T = X ′ y
which is known as the normal equation.
M.-Y. Chen
Econometrics
OLS Estimator
To have a unique solution for the system equation for β,
(X ′ X)−1 has to exist. This is first assumption has to be
satisfied to have solution for β uniquely.
[A2] The T × k data matrix X is full column rank.
Given that X is of full column rank, X ′ X is p.d. and hence
invertible. The solution to the normal equations can then be
expressed as
β̂ T = (X′ X)−1 X′ y.
∗
Note that β̂ T = β̃ T = β̂ .
M.-Y. Chen
Econometrics
(3)
It is easy to see that the second order condition is also
satisfied because
∇2β Q(β) = 2(X′ X)/T
is p.d. Hence, β̂T is the minimizer of the OLS criterion
function and known as the OLS estimator for β. As the matrix
inverse is unique, the OLS estimator is also unique.
M.-Y. Chen
Econometrics
The vector of OLS fitted values is
ŷ = Xβ̂T ,
and the vector of OLS residuals is
ê = y − ŷ.
By the normal equations, X′ ê = 0 so that ŷ′ ê = 0. When the
first regressor is the constant one, X′ ê = 0 implies that
P
P
P
ℓ′ ê = Tt=1 êt = 0. It follows that Tt=1 yt = Tt=1 ŷt , and the
sample average of the data yt is the same as the sample
average of the fitted values ŷt .
M.-Y. Chen
Econometrics
If X is not of full column rank and then its column vectors
satisfies an exact linear relationship, this is also known as the
problem of exact multicollinearity. In this case, without loss of
generality we can write
x1 = γ2 x2 + · · · + γk xk ,
where xi is the ith column of X and γ2 , . . . , γk are not all
zero. Then, for any number a 6= 0,
β1 x1 = (1 − a)β1 x1 + aβ1 (γ2 x2 + . . . + γk xk ).
M.-Y. Chen
Econometrics
The linear specification (1) is thus observationally equivalent
to
Xβ ∗ := (1 − a)β1 x1 + (β2 + aβ1 γ2 )x2 + · · · + (βk + aβ1 γk )xk ,
where the elements of β ∗ vary with a and therefore could be
anything. That is, the parameter vector β is not identified
when exact multicollinearity is present. Practically, when X is
not of full column rank, X′ X is not invertible, and there are
infinitely many solutions to the normal equations
set
X′ Xβ = X′ y. Consequently, the OLS estimator β̂T cannot be
computed as (3). Exact multicollinearity usually arises from
inappropriate model specifications.
M.-Y. Chen
Econometrics
For example, including both total income, total wage income,
and total non-wage income as regressors results in exact
multicollinearity because total income is, by definition, the
sum of wage and non-wage income.
It is also easy to verify that the magnitude of the coefficient
estimates β̂iT are affected by the measurement units of
variables. Thus, a larger coefficient estimate does not
necessarily imply that the associated explanatory variable is
more important in explaining the behavior of y. In fact, the
coefficient estimates are not comparable in general.
M.-Y. Chen
Econometrics
The OLS estimators are derived without resorting to the
knowledge of the “true” relationship between y and X. That
is, whether y is indeed generated according to our linear
specification is irrelevant to the computation of the OLS
estimator; it does affect the properties of the OLS estimator,
however.
M.-Y. Chen
Econometrics
y
CO
C
Geometric Interpretations
C ê = (I − PX )y
C
C
C
span(x1 , x2 )
C
C
x1 C
C
C PX y = x1 β̂1T + x2 β̂2T
x1 β̂1T
C *C
C - x2
x2 β̂2T
M.-Y. Chen
Econometrics
Frisch-Waugh-Lovell
Let X = [X1 X2 ], where X1 is T × k1 and X2 is T × k2 , and
k1 + k2 = k. We can write
y = X1 β1 + X2 β2 + random error,
′
′
and β̂T = (β̂1T
β̂2T
)′ . Consider the following regressions:
y = X1 b1 + random error,
y = X2 b2 + random error,
and their OLS estimators are b̂1T and b̂2T . Are β̂1T = b̂1T and
β̂2T = b̂2T ? Usually not!
M.-Y. Chen
Econometrics
Frisch-Waugh-Lovell Theorem
Let PX1 = X1 (X′1 X1 )−1 X′1 and PX2 = X2 (X′2 X2 )−1 X′2
denote the orthogonal projection matrices on span(X1 ) and
span(X2 ), respectively. Given a vector y, (I − PX2 )y and
(I − PX1 )y can be uniquely decomposed into two orthogonal
components:
(I − PX2 )y = (I − PX2 )X1 β̂1T + (I − PX )y,
(I − PX1 )y = (I − PX1 )X2 β̂2T + (I − PX )y.
M.-Y. Chen
Econometrics
Frisch-Waugh-Lovell Theorem: Illustration
1. Perform a regression of y on X1 (T × k1 ) and then the
resulted residual vector (T × 1), êy−X1 , is reserved.
2. Perform k2 regressions of each component in X2 on X1
and then k2 resulted residual vectors, ÊX2 −X1 (T × k2 ),
are reserved.
3. Perform a regression of êy−X1 on ÊX2 −X1 and then the
estimator β̃2T is same as β̂2T in the regression
ŷ = X1 β̂1T + X2 β̂2T .
M.-Y. Chen
Econometrics
Measures of Goodness of Fit: Absolute Measure
A natural goodness-of-fit measure is the regression variance
σ̂T2 = ê′ ê/(T − k). This measure, however, is not invariant
with respect to measurement units of the dependent variable.
M.-Y. Chen
Econometrics
Measures of Goodness of Fit: Relative Measure
Recall that
T
X
yt2
|t=1{z }
TSS
=
T
X
ŷt2 +
|t=1{z }
RSS
T
X
ê2t .
|t=1
{z }
ESS
where TSS, RSS, and ESS denote total, regression, and error
sum of squares, respectively. The non-centered coefficient of
determination (or non-centered R2 ) is defined to be the
proportion of TSS that can be explained by the regression
hyperplane:
R2 =
RSS
ESS
=1−
.
TSS
TSS
M.-Y. Chen
Econometrics
(4)
Clearly, 0 ≤ R2 ≤ 1, and the larger the R2 , the better the
model fits the data. In particular, a model has a perfect fit if
R2 = 1, and it does not account for any variation of y if
R2 = 0. Note that R2 is non-decreasing in the number of
variables in the model.
As ŷ ′ ŷ = ŷ ′ y, we can also write
R2 =
ŷ ′ ŷ
(ŷ ′ y)2
=
= cos2 θ,
y′ y
(y ′ y)(ŷ ′ ŷ)
where θ is the angle between y and ŷ. That is, R2 is a
measure of the linear association between these two vectors.
M.-Y. Chen
Econometrics
Centered R2
It is also easily verified that, when the model contains a
constant term,
T
X
2
(yt − ȳ) =
|t=1 {z
}
Centered TSS
PT
T
X
¯ 2+
(ŷt − ŷ)
|t=1 {z
Centered RSS
}
T
X
ê2t ,
|t=1
{z }
ESS
where ŷ¯ = ȳ = t=1 yt /T . Analogous to (4), the centered
coefficient of determination (or centered R2 ) is defined as
Centered RSS
ESS
=1−
.
(5)
Centered TSS
Centered TSS
This measure also takes on values between 0 and 1 and is
non-decreasing in the number of variables in the model.
Centered R2 =
M.-Y. Chen
Econometrics
If the model does not contain a constant term, the centered
R2 may be negative. As
T
X
t=1
(yt − ȳ)(ŷt − ȳ) =
T
X
t=1
(ŷt − ȳ)2 ,
we immediately get
PT
P
2
[ Tt=1 (yt − ȳ)(ŷt − ȳ)]2
t=1 (ŷt − ȳ)
.
= PT
PT
PT
2
2 ][
2]
(y
−
ȳ)
[
(y
−
ȳ)
(ŷ
−
ȳ)
t
t
t
t=1
t=1
t=1
That is, the centered R2 is also the squared sample correlation
coefficient of y and ŷ.
M.-Y. Chen
Econometrics
Adjuested R2
The adjusted R2 , R̄2 , is the centered R2 adjusted for the
degrees of freedom:
R̄2 = 1 −
ê′ ê/(T − k)
.
(y ′ y − T ȳ 2)/(T − 1)
It can also be shown that
T −1
k−1
R̄2 = 1 −
(1 − R2 ) = R2 −
(1 − R2 ).
T −k
T −k
2
2
That is, R̄ is the centered R with a penalty term depending
on model complexity and explanatory ability. Clearly, R̄2 < R2
except for k = 1 or R2 = 1. Note also that R̄2 need not be
increasing with the number of explanatory variables; in fact,
R̄2 < 0 when R2 < (k − 1)/(T − 1).
M.-Y. Chen
Econometrics
Properties of the OLS Estimators: Unbias
As
β̂ T = (X ′ X)−1 X ′ y
= (X ′ X)−1 X ′ (Xβ 0 + e)
= (X ′ X)−1 X ′ Xβ 0 + (X ′ X)−1 X ′ e
= β 0 + (X ′ X)−1 X ′ e,
then to have β̂ T to be unbiased for β 0 , the following
assumption has to be imposed:
A3: E(e|X) = 0.
That is, given ID1, A2 and A3, E(β̂ T − β 0 |X) = 0 and
E(β̂ T ) = β 0 .
M.-Y. Chen
Econometrics
Variance-Covariance Matrix of Regression Error
The conditional variance-covariance matrix of the regression
error vector e is
D
= var(e|X) = E(ee′ |X)

E(e21 |x1 )
E(e1 e2 |x1 ) E(e1 e3 |x1 )
 E(e e |x )
E(e22 |x2 )
E(e2 e3 |x2 )
2 1 2



E(e
e
|x
)
E(e
e
|x
)
E(e23 |x3 )
3 1 3
3 2 3
= 
..
..
..


.
.
.
E(eT e1 |xT ) E(eT e2 |xT ) E(eT e3 |xT )
···
···
···
..
.
···
E(e1 eT |x1 )
E(e2 eT |x2 )
E(e3 eT |x3 )
..
.
2
E(eT |xT )
when the data are random sample then (xt , et ) is independent of (xs , es )
for t 6= s, thus
E(e2t |X) =
E(et es |X) =
E(e2t |xt ) = σt2
E(et es |xt ) = E(et |xt ) E(es |xt ) = 0.
M.-Y. Chen
Econometrics








Thus in general
D = var(e|X)
 2
σ1 0

 0 σ22
= 
..
 ..
 .
.
0
0
when the data are random. Under
E(e2t |xt ) = σ02 for all t, then
 2
σ0 0 0

 0 σ02 0
D=
.. ..
 ..
 .
. .
0
0 0
(6)
0 ···
0 ···
..
..
.
.
0 ···
0
0
..
.
σT2






(7)
the homoskedasticity restriction,
···
···
..
.
···
0
0
..
.
σ02



 = σ02 IT ,


(8)
which is the classical assumption for the linear regression models. That is
A4: var(e|X) = σ02 IT .
M.-Y. Chen
Econometrics
Variance-Covariance Matrix of OLS Estimator
The conditional variance-covariance matrix for β̂ T is
h
i
VT = E (β̂ T − β 0 )(β̂ T − β 0 )′ |X
Since β̂ T − β 0 = (X ′ X)−1 X ′ e,
VT = E (X ′ X)−1 X ′ ee′ X(X ′ X)−1 |X
= (X ′ X)−1 X ′ E[ee′ |X]X(X ′ X)−1
= (X ′ X)−1 X ′ DX(X ′ X)−1 ,
where D is defined in (7).
M.-Y. Chen
Econometrics
It may be helpful to observe that
′
X DX =
T
X
xt x′t σt2 .
t=1
If E(e2t |xt ) = σ02 holds, then σt2 = σ02 , D = σ02 IT and
X ′ DX = X ′ Xσ02 . Thus, VT simplifies to
VT = (X ′ X)−1 X ′ Xσ02 (X ′ X)−1
= σ02 (X ′ X)−1 .
M.-Y. Chen
Econometrics
In the linear regression model,
VT = (X ′ X)−1 X ′ DX(X ′ X)−1 .
(9)
If E(e2t |xt ) = σ02 holds,
VT = σ02 (X ′ X)−1 .
(10)
The expression VT = (X ′ X)−1 X ′ DX(X ′ X)−1 is often
called a “sandwich formula”, because the central variance
matrix X ′ DX is “sandwiched” between the moment matrices
(X ′ X)−1 .
M.-Y. Chen
Econometrics
Gauss-Markov Theorem
(Gauss-Markov) In the linear regression model, β̂T is the best
linear unbiased estimator for β0 .
Consider an arbitrary linear estimator
β̌ T = Ay = [(X ′ X)−1 X ′ + C]y, where C is an arbitrary
non-zero matrix. β̌ T is unbiased if and only if CX = 0 since
β̌ T
=
[(X ′ X)−1 X ′ + C]y
=
[(X ′ X)−1 X ′ + C](Xβ0 + e)
=
β0 + (X ′ X)−1 X ′ e + CXβ0 + Ce.
and
E[β̌T |X] = β 0 + E[(X ′ X)−1 X ′ e|X] + E[CXβ 0 |X] + E[Ce|X]
= β 0 + CXβ 0 .
M.-Y. Chen
Econometrics
It follows that when β̌ T is unbiased
var(β̌ T |X) =
=
=
E[(β̌ T − β0 )(β̌ T − β 0 )′ |X]
E{[(X ′ X)−1 X ′ e + Ce][(X ′ X)−1 X ′ e + Ce]′ |X}
(X ′ X)−1 X ′ σ02 I T X(X ′ X)−1 + (X ′ X)−1 X ′ C ′
+CX(X ′ X)−1 + Cσ02 I T C ′
=
σ02 (X ′ X)−1 + σ02 CC ′ ,
where the first term on the right-hand side is var(β̂ T ) and the second
term is clearly p.s.d. Thus, for any linear unbiased estimator β̌ T ,
var(β̌ T ) − var(β̂ T ) is a p.s.d. matrix.
M.-Y. Chen
Econometrics
OLS Estimator of Error Variance
Under the restriction of homoscedastic errors, E(e2t |xt ) = σ02
is another parameter under estimation. The OLS estimator for
σ02 is
σ̂T2
T
ê′ ê
1 X 2
=
=
ê ,
T −k
T − k t=1 t
where k is the number of regressors. It is clear that σ̂T2 is not
linear in y.
In the homoskedastic regression model, σ̂T2 is an unbiased
estimator for σ02 .
M.-Y. Chen
Econometrics
Proof: Recall that I − P X is orthogonal to span(X). Then,
ê = (I T − P X )y = (I T − P X )(Xβ 0 + e) = (I T − P X )e,
and
E(ê′ ê|X) = E[e′ (I T − P X )e|X] = E[trace(ee′ (I T − P X ))|X].
As the trace and expectation operators can be interchanged, we have that
E(ê′ ê|X) = trace(E[ee′ (I T − P X )|X]) = trace[D(I T − P X )].
By the fact that, trace(I T − P X ) = rank(I T − P X ) = T − k and
D = σ 2 IT , it follows that E(ê′ ê) = (T − k)σ02 and that
E(σ̂T2 ) = E(ê′ ê)/(T − k) = σ02 ,
proving the unbiasedness of σ̂T2 .
M.-Y. Chen
Econometrics
OLS Estimator for the Variance-Covariance Matrix
The OLS estimation for variance-covariance matrix of β̂ T in
the homoskedastic regression model becomes
var(
ˆ β̂ T ) = σ̂T2 (X ′ X)−1
which is unbiased for var(β̂ T ) = σ02 (X ′ X)−1 provided σ̂T2 is
unbiased for σ02 .
M.-Y. Chen
Econometrics
Gaussian Quasi-MLE
In normal regression, et |xt ∼ N(0, σ 2 ) and then the likelihood
for a single observation is
1
1
′
2
Lt (β, σ ) =
exp − 2 (yt − xt β) .
2σ
(2πσ 2 )1/2
2
Then the log-likelihood function for the full sample
y T = (y1 , . . . , yT ) is
LT (y T ; β, σ 2 ) =
T
X
log Lt (yt ; β, σ 2 )
t=1
=
−
T
1
T
log(2π) − log(σ 2 ) − 2 (y − Xβ)′ (y − Xβ).
2
2
2σ
M.-Y. Chen
Econometrics
The MLE (β̃ T , σ̃T2 ) maximize LT . The FOCs of the
maximization problem are
∇LT
∇β
∂LT
∂(σ 2 )
set
= X ′ (y − Xβ)/σ 2 = 0,
= −
T
(y − Xβ)′ (y − Xβ) set
+
= 0,
2
2σ
2σ 4
which yield the MLEs of β 0 and σ 2 :
β̃ T = (X ′ X)−1 X ′ y,
σ̃T2 = (y − X β̃ T )′ (y − X β̃ T )/T = ê′ ê/T.
Clearly, the MLE β̃ T is the same as the OLS estimator β̂ T ,
2
but the MLE β̃ T is different from σ̂T2 . In fact, σ̃T2 is biased
estimator because E(σ̃T2 ) = σ02 (T − k)/T 6= σ02 .
M.-Y. Chen
Econometrics
Minimum Variance Unbiased Estimator, MVUE
In normal regression models, the OLS estimators β̂T and σ̂T2
are the minimum variance unbiased estimator (MVUE).
M.-Y. Chen
Econometrics
Consider a collection of independent random variables z T = (z1 , . . . , zT ),
where zt has the density function ft (zt , θ) with θ a r × 1 vector of
parameters. Let the joint log-likelihood function of z T be
LT (z T ; θ) = log f T (z T ; θ). Then the score function
sT (z T ; θ)
:= ∇ log f T (z T ; θ)
1
∇f T (z T ; θ)
=
f T (z T ; θ)
is the r × 1 vector of the first order derivatives of log f T with respect to
θ. Under regularity conditions, differentiation and integration can be
interchanged. When the postulated density function f T is the true
density function of z T , we have
Z
1
T
T
E[s (z ; θ)] =
∇f T (z T ; θ) f T (z T ; θ) dz T
T
f (z T ; θ)
Z
T
T
T
= ∇
f (z ; θ) dz
=
0.
M.-Y. Chen
Econometrics
That is, sT (z T ; θ) has mean zero. The variance of sT is the
Fisher’s information matrix:
BT (θ) := var[sT (z T ; θ)] = E[sT (z T ; θ) sT (z T ; θ)′ ].
Consider the r × r Hessian matrix of the second order
derivatives of log f T :
HT (z T ; θ)
:= ∇2 log f T (z T ; θ)
1
T T
′
[∇f
(z
;
θ)]
= ∇
f T (z T ; θ)
1
1
=
∇2 f T (z T ; θ) − T T 2 [∇f T (z T ; θ)][∇f T (z T ; θ)]′ ,
f T (z T ; θ)
f (z ; θ)
where ∇2 f = ∇(∇f )′ .
M.-Y. Chen
Econometrics
As
Z
1
∇2 f T (z T ; θ) f T (z T ; θ)dz T = ∇2
T
f (z T ; θ)
Z
f T (z T ; θ)dz T
= 0,
the expected value of the Hessian matrix becomes
E[HT (z T ; θ)]
Z 1
T
T
T
T
′
= −
[∇f (z ; θ)][∇f (z ; θ)] f T (z T ; θ) dz T
T
T
2
f (z ; θ)
T
T
= E[s (z ; θ) sT (z T ; θ)]
= −Bt (θ).
M.-Y. Chen
Econometrics
This established the information matrix equality:
BT (θ) + E[HT (z T ; θ)] = 0. Suppose now r = 1 for simplicity so
that both sT and BT are scalar. Let θ̂T denote an unbiased
estimator. Then
Z
∂
∂
cov[sT (z T ; θ), θ̂] =
θ̂T f T (z T ; θ) dz T =
E(θ̂T ) = 1,
∂θ
∂θ
by unbiasedness. By the celebrated Cauchy-Schwartz inequality:
cov[sT (z T ; θ), θ̂T ]
var[sT (z T ; θ)]var(θ̂
T)
=
1
var[sT (z T ; θ)]var(θ̂
T)
≤ 1.
It follows that var(θ̂T ) ≥ 1/BT (θ). The RHS, 1/BT (θ), is also
know as the Cramér-Rao lower bound. Thus, all unbiased
estimators must have variance greater than or equal to the inverse
of information. When θ is multi-dimensional, we have that
var(θ̂T ) − BT (θ)−1 is a positive semi-definite matrix.
M.-Y. Chen
Econometrics
In our application, the inverse of the information matrix
evaluated at the true parameters β 0 and σ02 can be easily
calculated as
"
#
σ02 (X ′ X)−1
0
.
0
2σ04 /T
As β̂ T achieves the Cramér-Rao lower bound, it is efficient
within the class of all unbiased estimators for β 0 , i.e., it is the
MVUE. It can be shown that any other unbiased estimator of
σ02 has variance greater than or equal to that of σ̂T2 ; hence σ̂T2
is also the MVUE.
M.-Y. Chen
Econometrics
Consider the log-likelihood function of y = (y1 , . . . , yT )′ to be
LT (y; β, σ 2 ) = −
T
1
T
log(2π) − log(σ 2 ) − 2 (y − Xβ)′ (y − Xβ).
2
2
2σ
The corresponding Hessian matrix is
2
HT (β, σ ) =
"
− σ12 (X ′ X)
′
′
−1
σ4 (X y − X Xβ)
M.-Y. Chen
′
′
−1
σ4 (X y − X Xβ)
T
1
′
2σ4 − σ6 (y − Xβ) (y − Xβ)
Econometrics
#
.
The information matrix is defined as BT (β, σ 2 ) = −E[HT (β, σ 2 )].
The (1, 2)-th element of information matrix evaluated at true
values β 0 and σ0 becomes
1
1
1
E[ 4 (X ′ y − X ′ Xβ 0 )] = 4 E(X ′ y − X ′ Xβ) = 4 (X ′ X)E(e) = 0.
σ
σ
σ0
And, the (2, 2)-th element of BT (β, σ 2 ) evaluated at β 0 and σ02 is
T
1
′
− (y − Xβ 0 ) (y − Xβ 0 )
−E
2σ04 σ06
T
1
= − 4 + 6 E(e′ e)
2σ0
σ0
T
T σ2
= − 4 + 60
2σ0
σ0
−T + 2T
T
=
= 4.
2σ04
2σ0
M.-Y. Chen
Econometrics
Sampling Distribution of β̂ T in Normal Regression
Models
In normal regression models,
yt = x′t β 0 + et
et
|
xt ∼ N(0, σ02 ).
In normal regression models,
(a) β̂ T ∼ N(β 0 , σ02 (X ′ X)−1 ).
(b) (T − k)σ̂T2 /σ02 ∼ χ2 (T − k).
(c) σ̂T2 has mean σ02 and variance 2σ04 /(T − k).
M.-Y. Chen
Econometrics
Consistency and Asymptotic Normality of OLS
Estimators
Under the linear projection model:
yt = xt β 0 + et
E(xt et ) = 0,
that is, xβ 0 is the projection of y on the linear space of x.
Given E(xt et ) = 0, xt may be stochastic, but it must be
uncorrelated with et and et is not required being
heteroskedastic.
M.-Y. Chen
Econometrics
Consider an AR(p) model for yt :
yt = c + ψ1 yt−1 + · · · + ψp yt−p + et .
(11)
It can be seen that by recursive substitution, yt is a linear
function of the current and past et . If {et } is a white noise,
then for xt = (1 yt−1 . . . yt−p )′ , we have
E(xt et ) = 0,
by the white noise property of {et }. Thus, an AR(p) model
with {et } being a white noise satisfies E(xt et ) = 0.
M.-Y. Chen
Econometrics
On the other hand, suppose that {et } is an MA(q) process:
et = ut − φ1 ut−1 − · · · − φq ut−q ,
where {ut } is a white noise with mean zero and variance σu2 .
In this case, yt is known as an ARMA(p,q) process. Note that
E(et et−i ) = −(φi − φi+1 φi−1 − · · · − φq φq−i)σu2 ,
i = 1, . . . , q,
with φ0 = 1 and φj = 0 if j < 0. That is, et and et−i are
correlated for i = 1, . . . , q. It follows that
E(xt et ) 6= 0.
M.-Y. Chen
Econometrics
Consistency and Asymptotic Normality of OLS
Estimators
Suppose that the following conditions hold:
[B1 ]yt = x′t β 0 + et such that E(xt et ) = 0 for t = 1, . . . , T ,
[B2 ]{xt x′t } obeys a SLLN (WLLN) such that M T are p.d.
and for some δ > 0, det(M T ) > δ for all T sufficiently
large,
[B3 ]{xt et } obeys a SLLN (WLLN).
Then β̂ T exists a.s. (in probability) for all T sufficiently large
and β̂ T → β 0 a.s. (in probability).
M.-Y. Chen
Econometrics
As β̂ T = (X ′ X)−1 X ′ y, (X ′ X) is a stochastic matrix so that
existence of (X ′ X)−1 can not be assumed. It is known that
β̂ T = (X ′ X)−1 X ′ y
= (X ′ X)−1 X ′ (Xβ 0 + e)
√
= (X ′ X)−1 X ′ Xβ 0 + T (X ′ X)−1 X ′ e
′ −1 ′ ′ −1 ′ XX
XX
XX
Xe
=
β0 +
.
T
T
T
T
M.-Y. Chen
Econometrics
X′ X
T




= 


PT
t=1
x2t1
PT T
t=1 xt2 xt1
T
..
.
PT
xt1 xtk
T
t=1

M11 M12

 M21 M22
→ 
..
 ..
.
 .
Mk1 Mk2
M.-Y. Chen
PT
xt1 xt2
T
PT
2
t=1 xt2
T
t=1
..
.
PT
xtk xt2
T
t=1
···
···
M1k
M2k
..
.
···
· · · Mkk
Econometrics
···
···
···
···

PT
xt1 xtk
T
PT
t=1 xt2 xtk
T
t=1
..
.
PT


 = M.


t=1
T
x2tk










X ′e

√
= 

T




= 


PT
xt1 et
T
PT
t=1 xt2 et
T
t=1
..
.
PT
xtk et
T
t=1
0
0
..
.
0









→




PT
E(xt1 et )
PT T
t=1 E(xt2 et )
T
t=1
..
.
PT


.


M.-Y. Chen
Econometrics
t=1
E(xtk et )
T







Consistency and Asymptotic Normality of OLS
Estimators
Given Assumptions [B1], suppose that the following conditions hold:
[B2′ ]{xt x′t } obeys a WLLN such that M T are p.d. and for some δ > 0,
det(M T ) > δ for all T sufficiently large, and
[B3′ ]{xt et } obeys a CLT such that ΞT = O(1) is p.d. and for some
δ > 0, det(ΞT ) > δ for all T sufficiently large.
D
−1/2 √
−1
Then, ΣT
T (β̂ T − β0 ) −→ N (0, I k ), where ΣT = M −1
T ΞT M T . If,
in addition, there exists a Ξ̂T p.s.d. and symmetric such that
P
P
Ξ̂T − ΞT −→ 0, then Σ̂T − ΣT −→ 0, and
−1/2
Σ̂T
√
D
T (β̂ T − β0 ) −→ N (0, I k ),
P
P
where Σ̂T = ( Tt=1 xt x′t /T )−1 Ξ̂T ( Tt=1 xt x′t /T )−1.
M.-Y. Chen
Econometrics
√
T β̂ T = (X ′ X)−1 X ′ y
√
T (X ′ X)−1 X ′ (Xβ 0 + e)
=
√
√
=
T (X ′ X)−1 X ′ Xβ 0 + T (X ′ X)−1 X ′ e
′ −1 ′ ′ −1 ′ √
XX
XX
XX
Xe
√
=
T
β0 +
.
T
T
T
T
M.-Y. Chen
Econometrics