Advanced Econometrics Week 2 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2015 1 / 30 Outline of Today’s talk After last week’s revision of the least squares, we will continue with the introduction of general estimation frameworks today. What if assumptions used by least squares are too restrictive for our data? We will discuss very generally concepts and ideas of estimation. Next weeks, we will introduce the most common ones in detail (rigorously together with inference). A useful application: Quantile regression. 2 / 30 From Least Squares (LS) to Maximum Likelihood Estimation (MLE) LS are motivated in two ways: it is a minimum mean square error (MSE) predictor of y, and it is a minimum variance linear unbiased estimator of β. But what is the role of normality assumption of ? Appropriate inference (although we know how to obtain inference even when disturbances does not come from normal distribution, so this is questionable). 2 Much more important, if are normally distributed, then LS is also maximum likelihood estimator (MLE). 1 Hence being an MLE, least squares is asymptotically efficient among consistent and asymptotically normally distributed estimators. This is a large sample counterpart to the Gauss-Markov theorem (Cramér-Rao bound) 3 / 30 From Least Squares (LS) to Maximum Likelihood Estimation (MLE) Efficiency is an important aspect of estimation methodology. Gauss-Markov theorem is with this respect a powerful result for linear regression. However, there is no counterpart in any other modeling context. Once we leave the linear model, we need different tools. The MLE principle generally allows to assert asymptotic efficiency for an estimator. But only for the specific distribution assumed! 4 / 30 Estimation Frameworks in Econometrics Let’s talk about several important estimation principles in general 5 / 30 Estimation Frameworks in Econometrics In many cases, assumptions on data are too restrictive for the LS estimation. Contemporary econometrics offers remarkable variety of estimation methods. While it is a complex task to choose a proper estimator for the studied data, we first need to know which principles are at hand. Today, we will try to motivate the basic modern principles of estimation. Recent research allows several methodologies which allow us to relax several assumptions, and work also under possibly unwarranted or improper assumptions (e.g. GMM). 6 / 30 Estimation Frameworks in Econometrics It is possible to relax strong assumptions by going from full to semi- and non-parametric estimation. The cost paid is in weakening the conclusions that can be drawn from the data. (Most pronounced are discrete choice models. By relaxing assumption on PDF in probit it may be impossible to estimate probabilities, hence model is useless). We can also distinguish between estimators as “the best” (more efficient) ones. Best parametric estimator will be more efficient than semiparametric one. But if the parametric model is misspecified, semiparametric estimator may be robust. Today, simulation based inference is not so costly (in terms of time) and can solve many problems in a situation when other estimators can not be used. 7 / 30 Estimation Frameworks in Econometrics We will look closer mainly on the following core methods (next three lectures in detail, today only from a general point of view): 1 The maximum likelihood methods of estimation (MLE). 2 Generalized method of moments (GMM) as a semiparametric estimator. 3 Simulation-based estimation and bootstrapping. 8 / 30 Estimation Frameworks in Econometrics “The art of the econometrician consists in finding the set of assumptions which are both sufficiently specific and sufficiently realistic to allow him to take the best possible advantage of the available data.” Kennedy, P. (2003). A Guide to Econometrics, 5th edition edn, MIT Press, p. 2 (based on Malinvaud, 1966) 9 / 30 Parametric estimation and inference Crucial is a full statement of the density or probability model that provides the data generating mechanism (DGM) for a random variable of interest. Joint density of a random variable y and a random vector x, f (y, x) = g(y|x, β) × h(x, θ) with unknown parameters β and θ. Consider linear regression model with normally distributed disturbances. This assumption produces a full statement of the conditional density (population from which we draw observations): yi |xi ∼ N (x0i β, σ 2 ). STRONG ASSUMPTION TO MAKE 10 / 30 Parametric estimation and inference After estimating unknown (but fixed parameters), conditional probability distribution of yi is completely defined (mean, variance, etc.). Parametric estimation ⇒ specifying density and its features (parameters). The goal is to find parameters in a parameter space, the set of allowable values. In the regression model, we have K regression slopes (any real value) and variance (positive): [β, σ 2 ] ∈ RK × R+ . Estimation: specify a criterion for ranking the points in the parameter space and choose the one that optimizes this criterion. Discuss with your neighbour What is this criterion for OLS? 11 / 30 Classical Likelihood-based Estimation The most common parametric estimator used in econometrics is the maximum likelihood estimator. Philosophy of “sample information”. When density of a sample of observations is completely specified (apart from unknown parameters), then the joint density of those observations is the likelihood function f (y1 , y2 , . . . , x1 , x2 , . . .) = n Y f (yi , xi |β, θ). i=1 The maximum likelihood estimator (MLE) is the function of the data that maximizes the log of the likelihood function. 12 / 30 MLE with normally distributed disturbances With normally distributed disturbances, yi |xi is normally distributed with mean x0 i β and variance σ 2 . The density of yi |xi is exp −1/(2σ 2 )(yi − x0 i β)2 √ yi |xi = 2πσ 2 It’s log-likelihood ln L = −(n/2) ln 2π−(n/2) ln σ 2 −1/(2σ 2 )(y − Xβ)0 (y − Xβ). Maximizing this function means minimizing familiar sum of squares with the same solution (we will derive this next lecture). Then e0 e/n is an estimator of σ 2 . 13 / 30 MLE with normally distributed disturbances What if the data are far from being normal? What if we can not find a proper assumption about density? Then we simply have misspecified model and estimation may be invalid. There are solutions (robust GMM, simulation-based inference) Before we move ahead, I would like to introduce one of the modern applications of MLE in econometrics - copulas. While it is beyond the scope of this course, it is good to be aware of the general idea and its existence, as copulas are becoming increasingly popular and useful. 14 / 30 Copulas: A modern application Specifying likelihood function results in making strong assumption about the joint distribution. There are many situations, where we have more than one random variable of interest. Each of these variables may have different distributions. For the specification of likelihood, we then need to specify marginal distributions first. Moreover, the variables of interest may be correlated. Copulas, due to Sklar’s (1973) theorem, allow this. Most of the applications are quite recent, since 2000 and it is being increasingly popular. 15 / 30 Copulas: A modern application Note that in case marginal distributions are the same Gaussian Normal, the bivariate copula is then simply bivariate Gaussian distribution. 16 / 30 Semiparametric Estimation One of the solutions when we do not want to make such an restrictive assumption as distributional one. Estimators are devised from more general characteristics of the population. Semiparametric estimators will be more robust than fully parametric. It will retain its properties across greater range of specifications. We will need to pay the cost in loosing efficiency. 17 / 30 GMM Estimation Many recent applications are based on the (generalized) method of moments (GMM). GMM is based on the moment equations E[m(yi , xi , β)] = 0 Note that essential assumption for the LS estimator is E[xi (yi − x0i β)] = 0 The estimator is obtained by finding a parameter estimator b, which mimics the population result X (1/n) [xi (yi − x0i b)] = 0 i being normal equations for least squares. 18 / 30 Maximum Empirical Likelihood Estimation Empirical likelihood (EL) methods are semiparametric alternative to MLE. It is closely related to GMM. We let πi be the probability that yi |xi takes the realized value in the sample. The empirical likelihood function is then n Y 1/n EL = πi i=1 The maximum empirical likelihood estimator maximizes the log of the EL: n X ELL = 1/n ln πi i=1 Note sufficient structure to admit a solution (unbounded solution). We need to impose restrictions such that πi probabilities sum to one. 19 / 30 Least Absolute Deviations Estimation (LAD) LS will be affected largely in small sample by outlying observation. E[y|x] = x0 β Robust estimator, LAD: M edian[y|x] = x0 β(0.5) We are minimizing absolute deviations: min b N X |yi − x0 i b| i=1 LAD is hence more resilient to changes in extreme values. Median, as a measure of central tendency is much less sensitive that the mean to changes in extreme values. Note that this does not mean something is wrong with OLS if OLS and LAD deliver different estimates. 20 / 30 Quantile Regression LAD is semiparametric specification as it specifies only particular feature of the distribution – median. LAD is median regression. Median is only one possible quantile of interest. Discuss with your neighbour What is quantile of the distribution? Model can be extended to other conditional quantiles: Q[y|x] = x0 β(q) such that Prob[y ≤ x0 β(q)|x] = q for 0 < q < 1 This is essentially nonparametric specification (no assumption about distribution or its conditional variance). q can vary continuously (strictly) between zero and one ⇒ infinitely many parameter vectors to be estimated. 21 / 30 Quantile Regression The estimator bq of β q for a specific quantile q is obtained by minimizing the function (note that yi − x0 i β q = ei,q ) Fn (β q |y, X) = n X i:ei,q ≥0 0 q|yi − x i β q | + n X (1 − q)|yi − x0 i β q | i:ei,q ≤0 Solution of the minimization requires iterative estimator: linear programming problem. β q is consistent and asymptotically normally distributed as √ d n bq − β q → N 0, H−1 GH−1 , where H = E[fq (0|xi )xi x0 i ] and G = q(1 − q)E[xi x0 i ] For α = 0.5 and N (0, σ 2 ) disturbances, variance would reduce to σ 2 (π/2)(X0 X)−1 . Computation is complicated as we need to evaluate f (0) ⇒ good candidate for bootstrapping. 22 / 30 Quantile Regression Why to use quantile (median) regressions? Semiparametric (less strict assumptions needed). Robust to some violation of assumptions. We have complete characterisation of the conditional distribution at hand. This feature can be very helpful in some applications. Examples of Applications Income elasticity of credit card users. Financial risk. 23 / 30 Quantile Regression: Example Investigation of income elasticity of the monthly expenditure (Greene, p.208) Q(ln Spending|x, q) = β1,q + β2,q ln Income + β3,q Age +β4,q Dependents. β̂2,OLS = 1.08344 is not surprising. However variation at different quantiles actually is. From the results (next slide) we observe saturation in the response of spending to changes in income at highest levels of spending. Income elasticity changes significantly with spending, hence LS result β̂2 = 1.08344 does not completely characterize the dependence. 24 / 30 Quantile Regression: Example Figure: Example from Greene(2012), p.209 25 / 30 Quantile Regression: Example Quantile regression can be of particular interest in some situations with Panel data Censored data Particularly for corner solutions. One needs to be careful in these applications. If we have time, we will discuss these extensions later when discussing Panel data. 26 / 30 Comparing Parametric and Semiparametric Analyses The strong assumptions on the fully parametric model come at cost. The inferences from the model are only as good (robust) as the underlying assumptions. But when the assumptions are met, parametric models always represent efficient frameworks. Semiparamtric analysis relaxes assumptions (such as normality). The models to which semiparametric estimators are robust (as we relax asumptions), may cause parametrized estimators to be inconsistent. Still, by relaxing assumptions, we loose efficiency and it is much more difficult to make inferences. All in all, comparison of these methods is not just in efficiency. The estimators are often estimating different quantities and we need to understand when to use it. 27 / 30 Properties of Estimators Studying properties of estimators can help us in our choice. Unbiasedness: a finite sample property that can be established in very small number of cases. Still asymptotically, we are interested in consistency. Consistency: If we can not establish consistency, we can not trust the estimator at all. Assymptotic normality: platform for statistical inference (can be handled as we will see later). Assymptotic efficiency: can not be established in absolute terms, but within a class (relative efficiency of MLE and GMM). We want to know we did not make suboptimal use of the data. Hence we choose an estimator which is more efficient. But sometimes, this is not possible (analytical solutions do not exist etc.) 28 / 30 Extremum Estimators An extremum estimator is one obtained as the optimizer of a criterion function q(θ|data) Discuss with your neighbour Can you think of any extremum estimator? ML: Pn 2 i=1 (yi − h(xi , θ LS )) ] P θ̂ M L =Argmax[(1/n) ni=1 ln f (yi |xi ), θ M L )] LS: θ̂ LS =Argmax[−(1/n) GMM: θ̂ GM M =Argmax[−m̄(data, θ GM M )0 Wm̄(data, θ GM M )] Note that LS and GM M have negative sign to set them all in one type of problem. M Estimator LS and ML are examples of M estimators. 29 / 30 Thank You For Your Attention Now we understand (very) general ideas of estimation frameworks. In the next weeks, we will learn more about the most important ones. MLE GMM simulation-based inference Reading This (second) week Ch. 12 (Estimation Frameworks), pp. 432 - 454 Next (third) week Ch. 14 (MLE) pp. 509 - 548 30 / 30
© Copyright 2026 Paperzz