Advanced Econometrics

Advanced Econometrics
Week 2
Institute of Economic Studies
Faculty of Social Sciences
Charles University in Prague
Fall 2015
1 / 30
Outline of Today’s talk
After last week’s revision of the least squares, we will
continue with the introduction of general estimation
frameworks today.
What if assumptions used by least squares are too
restrictive for our data?
We will discuss very generally concepts and ideas of
estimation.
Next weeks, we will introduce the most common ones in
detail (rigorously together with inference).
A useful application: Quantile regression.
2 / 30
From Least Squares (LS) to Maximum Likelihood
Estimation (MLE)
LS are motivated in two ways: it is a minimum mean
square error (MSE) predictor of y, and it is a minimum
variance linear unbiased estimator of β.
But what is the role of normality assumption of ?
Appropriate inference (although we know how to obtain
inference even when disturbances does not come from
normal distribution, so this is questionable).
2 Much more important, if are normally distributed, then
LS is also maximum likelihood estimator (MLE).
1
Hence being an MLE, least squares is asymptotically
efficient among consistent and asymptotically normally
distributed estimators.
This is a large sample counterpart to the Gauss-Markov
theorem (Cramér-Rao bound)
3 / 30
From Least Squares (LS) to Maximum Likelihood
Estimation (MLE)
Efficiency is an important aspect of estimation
methodology.
Gauss-Markov theorem is with this respect a powerful
result for linear regression.
However, there is no counterpart in any other modeling
context.
Once we leave the linear model, we need different tools.
The MLE principle generally allows to assert asymptotic
efficiency for an estimator.
But only for the specific distribution assumed!
4 / 30
Estimation Frameworks in Econometrics
Let’s talk about several important estimation principles in
general
5 / 30
Estimation Frameworks in Econometrics
In many cases, assumptions on data are too restrictive for
the LS estimation.
Contemporary econometrics offers remarkable variety of
estimation methods.
While it is a complex task to choose a proper estimator for
the studied data, we first need to know which principles are
at hand.
Today, we will try to motivate the basic modern principles
of estimation.
Recent research allows several methodologies which allow
us to relax several assumptions, and work also under
possibly unwarranted or improper assumptions (e.g.
GMM).
6 / 30
Estimation Frameworks in Econometrics
It is possible to relax strong assumptions by going from full
to semi- and non-parametric estimation.
The cost paid is in weakening the conclusions that can be
drawn from the data. (Most pronounced are discrete choice
models. By relaxing assumption on PDF in probit it may
be impossible to estimate probabilities, hence model is
useless).
We can also distinguish between estimators as “the best”
(more efficient) ones.
Best parametric estimator will be more efficient than
semiparametric one.
But if the parametric model is misspecified, semiparametric
estimator may be robust.
Today, simulation based inference is not so costly (in terms
of time) and can solve many problems in a situation when
other estimators can not be used.
7 / 30
Estimation Frameworks in Econometrics
We will look closer mainly on the following core methods (next
three lectures in detail, today only from a general point of
view):
1
The maximum likelihood methods of estimation (MLE).
2
Generalized method of moments (GMM) as a
semiparametric estimator.
3
Simulation-based estimation and bootstrapping.
8 / 30
Estimation Frameworks in Econometrics
“The art of the econometrician consists in finding the set of
assumptions which are both sufficiently specific and sufficiently
realistic to allow him to take the best possible advantage of the
available data.”
Kennedy, P. (2003). A Guide to Econometrics, 5th edition edn, MIT Press,
p. 2 (based on Malinvaud, 1966)
9 / 30
Parametric estimation and inference
Crucial is a full statement of the density or probability
model that provides the data generating mechanism
(DGM) for a random variable of interest.
Joint density of a random variable y and a random vector
x,
f (y, x) = g(y|x, β) × h(x, θ)
with unknown parameters β and θ.
Consider linear regression model with normally distributed
disturbances.
This assumption produces a full statement of the
conditional density (population from which we draw
observations):
yi |xi ∼ N (x0i β, σ 2 ).
STRONG ASSUMPTION TO MAKE
10 / 30
Parametric estimation and inference
After estimating unknown (but fixed parameters),
conditional probability distribution of yi is completely
defined (mean, variance, etc.).
Parametric estimation ⇒ specifying density and its
features (parameters).
The goal is to find parameters in a parameter space, the
set of allowable values.
In the regression model, we have K regression slopes (any
real value) and variance (positive): [β, σ 2 ] ∈ RK × R+ .
Estimation: specify a criterion for ranking the points in the
parameter space and choose the one that optimizes this
criterion.
Discuss with your neighbour
What is this criterion for OLS?
11 / 30
Classical Likelihood-based Estimation
The most common parametric estimator used in
econometrics is the maximum likelihood estimator.
Philosophy of “sample information”.
When density of a sample of observations is completely
specified (apart from unknown parameters), then the joint
density of those observations is the likelihood function
f (y1 , y2 , . . . , x1 , x2 , . . .) =
n
Y
f (yi , xi |β, θ).
i=1
The maximum likelihood estimator (MLE) is the
function of the data that maximizes the log of the
likelihood function.
12 / 30
MLE with normally distributed disturbances
With normally distributed disturbances, yi |xi is normally
distributed with mean x0 i β and variance σ 2 .
The density of yi |xi is
exp −1/(2σ 2 )(yi − x0 i β)2
√
yi |xi =
2πσ 2
It’s log-likelihood
ln L = −(n/2) ln 2π−(n/2) ln σ 2 −1/(2σ 2 )(y − Xβ)0 (y − Xβ).
Maximizing this function means minimizing familiar sum of
squares with the same solution (we will derive this next
lecture).
Then e0 e/n is an estimator of σ 2 .
13 / 30
MLE with normally distributed disturbances
What if the data are far from being normal?
What if we can not find a proper assumption about
density?
Then we simply have misspecified model and estimation
may be invalid.
There are solutions (robust GMM, simulation-based
inference)
Before we move ahead, I would like to introduce one of the
modern applications of MLE in econometrics - copulas.
While it is beyond the scope of this course, it is good to be
aware of the general idea and its existence, as copulas are
becoming increasingly popular and useful.
14 / 30
Copulas: A modern application
Specifying likelihood function results in making strong
assumption about the joint distribution.
There are many situations, where we have more than one
random variable of interest.
Each of these variables may have different distributions.
For the specification of likelihood, we then need to specify
marginal distributions first.
Moreover, the variables of interest may be correlated.
Copulas, due to Sklar’s (1973) theorem, allow this.
Most of the applications are quite recent, since 2000 and it
is being increasingly popular.
15 / 30
Copulas: A modern application
Note that in case marginal distributions are the same Gaussian Normal, the bivariate copula is then simply bivariate
Gaussian distribution.
16 / 30
Semiparametric Estimation
One of the solutions when we do not want to make such an
restrictive assumption as distributional one.
Estimators are devised from more general characteristics of
the population.
Semiparametric estimators will be more robust than fully
parametric.
It will retain its properties across greater range of
specifications.
We will need to pay the cost in loosing efficiency.
17 / 30
GMM Estimation
Many recent applications are based on the (generalized)
method of moments (GMM).
GMM is based on the moment equations
E[m(yi , xi , β)] = 0
Note that essential assumption for the LS estimator is
E[xi (yi − x0i β)] = 0
The estimator is obtained by finding a parameter estimator
b, which mimics the population result
X
(1/n)
[xi (yi − x0i b)] = 0
i
being normal equations for least squares.
18 / 30
Maximum Empirical Likelihood Estimation
Empirical likelihood (EL) methods are semiparametric
alternative to MLE.
It is closely related to GMM.
We let πi be the probability that yi |xi takes the realized
value in the sample.
The empirical likelihood function is then
n
Y
1/n
EL =
πi
i=1
The maximum empirical likelihood estimator maximizes
the log of the EL:
n
X
ELL = 1/n
ln πi
i=1
Note sufficient structure to admit a solution (unbounded
solution). We need to impose restrictions such that πi
probabilities sum to one.
19 / 30
Least Absolute Deviations Estimation (LAD)
LS will be affected largely in small sample by outlying
observation.
E[y|x] = x0 β
Robust estimator, LAD:
M edian[y|x] = x0 β(0.5)
We are minimizing absolute deviations:
min
b
N
X
|yi − x0 i b|
i=1
LAD is hence more resilient to changes in extreme values.
Median, as a measure of central tendency is much less
sensitive that the mean to changes in extreme values.
Note that this does not mean something is wrong with
OLS if OLS and LAD deliver different estimates.
20 / 30
Quantile Regression
LAD is semiparametric specification as it specifies only
particular feature of the distribution – median. LAD is
median regression.
Median is only one possible quantile of interest.
Discuss with your neighbour
What is quantile of the distribution?
Model can be extended to other conditional quantiles:
Q[y|x] = x0 β(q)
such that Prob[y ≤ x0 β(q)|x] = q for 0 < q < 1
This is essentially nonparametric specification (no
assumption about distribution or its conditional variance).
q can vary continuously (strictly) between zero and one ⇒
infinitely many parameter vectors to be estimated.
21 / 30
Quantile Regression
The estimator bq of β q for a specific quantile q is obtained
by minimizing the function (note that yi − x0 i β q = ei,q )
Fn (β q |y, X) =
n
X
i:ei,q ≥0
0
q|yi − x i β q | +
n
X
(1 − q)|yi − x0 i β q |
i:ei,q ≤0
Solution of the minimization requires iterative estimator:
linear programming problem.
β q is consistent and asymptotically normally distributed as
√
d
n bq − β q → N 0, H−1 GH−1 ,
where H = E[fq (0|xi )xi x0 i ] and G = q(1 − q)E[xi x0 i ]
For α = 0.5 and N (0, σ 2 ) disturbances, variance would
reduce to σ 2 (π/2)(X0 X)−1 .
Computation is complicated as we need to evaluate f (0) ⇒
good candidate for bootstrapping.
22 / 30
Quantile Regression
Why to use quantile (median) regressions?
Semiparametric (less strict assumptions needed).
Robust to some violation of assumptions.
We have complete characterisation of the conditional
distribution at hand.
This feature can be very helpful in some applications.
Examples of Applications
Income elasticity of credit card users.
Financial risk.
23 / 30
Quantile Regression: Example
Investigation of income elasticity of the monthly
expenditure (Greene, p.208)
Q(ln Spending|x, q) = β1,q + β2,q ln Income + β3,q Age
+β4,q Dependents.
β̂2,OLS = 1.08344 is not surprising.
However variation at different quantiles actually is.
From the results (next slide) we observe saturation in the
response of spending to changes in income at highest levels
of spending.
Income elasticity changes significantly with spending, hence
LS result β̂2 = 1.08344 does not completely characterize
the dependence.
24 / 30
Quantile Regression: Example
Figure: Example from Greene(2012), p.209
25 / 30
Quantile Regression: Example
Quantile regression can be of particular interest in some
situations with
Panel data
Censored data
Particularly for corner solutions.
One needs to be careful in these applications.
If we have time, we will discuss these extensions later when
discussing Panel data.
26 / 30
Comparing Parametric and Semiparametric
Analyses
The strong assumptions on the fully parametric model
come at cost.
The inferences from the model are only as good (robust) as
the underlying assumptions.
But when the assumptions are met, parametric models
always represent efficient frameworks.
Semiparamtric analysis relaxes assumptions (such as
normality).
The models to which semiparametric estimators are robust
(as we relax asumptions), may cause parametrized
estimators to be inconsistent.
Still, by relaxing assumptions, we loose efficiency and it is
much more difficult to make inferences.
All in all, comparison of these methods is not just in
efficiency. The estimators are often estimating different
quantities and we need to understand when to use it.
27 / 30
Properties of Estimators
Studying properties of estimators can help us in our choice.
Unbiasedness: a finite sample property that can be
established in very small number of cases. Still
asymptotically, we are interested in consistency.
Consistency: If we can not establish consistency, we can
not trust the estimator at all.
Assymptotic normality: platform for statistical
inference (can be handled as we will see later).
Assymptotic efficiency: can not be established in
absolute terms, but within a class (relative efficiency of
MLE and GMM). We want to know we did not make
suboptimal use of the data.
Hence we choose an estimator which is more efficient.
But sometimes, this is not possible (analytical solutions do
not exist etc.)
28 / 30
Extremum Estimators
An extremum estimator is one obtained as the optimizer of a
criterion function q(θ|data)
Discuss with your neighbour
Can you think of any extremum estimator?
ML:
Pn
2
i=1 (yi − h(xi , θ LS )) ]
P
θ̂ M L =Argmax[(1/n) ni=1 ln f (yi |xi ), θ M L )]
LS: θ̂ LS =Argmax[−(1/n)
GMM:
θ̂ GM M =Argmax[−m̄(data, θ GM M )0 Wm̄(data, θ GM M )]
Note that LS and GM M have negative sign to set them all in
one type of problem.
M Estimator
LS and ML are examples of M estimators.
29 / 30
Thank You For Your Attention
Now we understand (very) general ideas of estimation
frameworks.
In the next weeks, we will learn more about the most
important ones.
MLE
GMM
simulation-based inference
Reading
This (second) week Ch. 12 (Estimation Frameworks), pp.
432 - 454
Next (third) week Ch. 14 (MLE) pp. 509 - 548
30 / 30