Econometrics II Seppo Pynnönen Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Part III Limited Dependent Variable Models As of Jan 25, 2017 Seppo Pynnönen Econometrics II Background Binary Dependent Variable 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model Limited dependent variables refer to variables whose range of values is substantially restricted. A binary variable takes only two values (0/1) is an example. Other examples are is a variable that takes a small number of integer values. Other kinds of limited variables are those whose values are truncated for some reasons. For example, number of passenger tickets in an airplane or some sports event, etc. Note however that not all truncated cases need special treatment. An example is wage, which must be positive. Typical truncated value variables are those that have in the limiting value a big concentration of observations. Seppo Pynnönen Econometrics II Background Binary Dependent Variable 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Linear, Logit, and Probit Regressions 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Linear, Logit, and Probit Regressions 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Up until now in regression y = x0 β + u, (1) where x0 β = β0 + β1 x1 + · · · + βk xk , y has had quantitative meaning (e.g. wage). What if y indicates a qualitative event (e.g., firm has gone to bankruptcy), such that y = 1 indicates the occurrence of the event (”success”) and y = 0 non-occurrence (”fail”), and we want to explain it by some explanatory variables? Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions The meaning of the regression y = x0 β + u, when y is a binary variable. Then, because E[u|x] = 0, E[y |x] = x0 β. (2) Because y is a random variable that can have only values 0 or 1, we can define probabilities for y as P(y = 1|x) and P(y = 0|x) = 1 − P(y = 1|x), such that E[y |x] = 0 · P(y = 0|x) + 1 · P(y = 1|x) = P(y = 1|x). Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Thus, E[y |x] = P(y = 1|x) indicates the success probability and regression in equation 2 models P(y = 1|x) = β0 + β1 x1 + · · · + βk xk , (3) the probability of success. This is called the linear probability model (LPM). The slope coefficients indicate the marginal effect of corresponding x-variable on the success probability, i.e., change in the probability as x changes, or ∆P(y = 1|x) = βj ∆xj . (4) Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions In the OLS estimated model ŷ = β0 + β̂1 x1 + . . . β̂k xk (5) ŷ is the estimated or predicted probability of success. In order to correctly specify the binary variable, it may be useful to name the variable according to the ”success” category (e.g., in a bankruptcy study, bankrupt = 1 for bankrupt firms and bankrupt = 0 for non-bankrupt firm [thus ”success” is just a generic term]). Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Example 1 (Married women participation in labor force (year 1975)) Linear probability model (See R-snippet for the R-commands): lm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q -0.93432 -0.37526 Median 0.08833 3Q 0.34404 Max 0.99417 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.5855192 0.1541780 3.798 0.000158 nwifeinc -0.0034052 0.0014485 -2.351 0.018991 educ 0.0379953 0.0073760 5.151 3.32e-07 exper 0.0394924 0.0056727 6.962 7.38e-12 I(exper^2) -0.0005963 0.0001848 -3.227 0.001306 age -0.0160908 0.0024847 -6.476 1.71e-10 kidslt6 -0.2618105 0.0335058 -7.814 1.89e-14 kidsge6 0.0130122 0.0131960 0.986 0.324415 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 *** * *** *** ** *** *** 1 Residual standard error: 0.4271 on 745 degrees of freedom Multiple R-squared: 0.2642,Adjusted R-squared: 0.2573 F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16 Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Example 2 (Continues . . . ) All but kidsge6 are statistically significant with signs as might be expected. The coefficients indicate the marginal effects of the variables on the probability that inlf = 1. Thus e.g., an additional year of educ increases the probability by 0.037 (other variables held fixed). 0.4 0.5 Probability 0.6 0.3 0.5 0.4 0.2 0.3 Probability 0.7 0.6 0.8 0.7 0.9 0.8 Marginal effect of experince on married Marginal effect of eduction on married women labor force participation women labor force participation 0 10 20 30 40 0 Experience (years) Seppo Pynnönen 5 10 Education (years) Econometrics II 15 Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Some issues with associated to the LPM. Dependent left hand side restricted to (0, 1), while right hand side (−∞, ∞), which may result to probability predictions less than zero or larger than one. Heteroskedasticity of u, since by denoting p(x) = P(y = 1|x) = x0 β var[u|x] = (1 − p(x))p(x) (6) which is not a constant but depends on x, and hence violating Assumption 2. Seppo Pynnönen Econometrics II Background Binary Dependent Variable Linear, Logit, and Probit Regressions 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions The first of the above problems can be technically easily solved by mapping the linear function on the right hand side of equation (3) by a non-linear function to the range (0, 1). Such a function is generally called a link function. That is, instead we write equation (3) as P(y = 1|x) = G (x0 β). (7) Although any function G : R → [0, 1] applies in principle, so called logit and probit transformations are in practice most popular (the former is based on logistic distribution and the latter normal distribution). Economists favor often the probit transformation such that G is the distribution function of the standard normal density, i.e., Z z 1 2 1 √ e − 2 v dv , G (z) = Φ(z) = (8) 2π −∞ Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions In the logit tranformation ez 1 G (z) = = = z 1+e 1 + e −z z e −v −∞ (1 + e −v )2 Z dv . Both as S-shaped Logit transformation G(z) 0.0 0.0 0.2 0.2 0.4 0.4 G(z) 0.6 0.6 0.8 0.8 1.0 1.0 Probit transformation −3 −1 0 1 2 3 −3 z −1 0 z Seppo Pynnönen Econometrics II 1 2 3 (9) Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions The price, however, is that the interpretation of the marginal effects is not any more as straightforward as with the LPM. However, negative sign indicates decreasing effect on the probability and positive increasing. More precisely, using equation (7), ∆P(y = 1|x0 β) ≈ g (x0 β)βj ∆xj , where g is the√derivative function of G (g (x0 β) = (1/ 2π) exp(−x0 β) for probit and g (x0 β) = exp(−x0 β)/ (1 + exp(−x0 β))2 for logit). Seppo Pynnönen Econometrics II (10) Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Typically the marginal effects are evaluated by unit changes in xj (i.e., ∆xj = 1) at sample means of the x-variables with estimated β-coefficients [partial effect at the average (PEA)]. Another commonly used approach is to evaluate at the sample mean n 1X g (x0i β̂). (11) n i=1 Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions There are various pseudo R-suared measures for binary response models. One is McFadden measure. Another is squared correlation between ŷi s (prediceted probability) and observed yi s (which have 0/1 values). Using R, the former can be computed as 1 − residualdeivance/nulldeviance. Seppo Pynnönen Econometrics II Background Binary Dependent Variable Linear, Logit, and Probit Regressions Example 3 (Married women’s labor force . . . ) Probit: (family = binomial(link = ”probit”) in glm) Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "probit"), data = wkng) Deviance Residuals: Min 1Q Median -2.2156 -0.9151 0.4315 3Q 0.8653 Max 2.4553 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.2700736 0.5080782 0.532 0.59503 nwifeinc -0.0120236 0.0049392 -2.434 0.01492 educ 0.1309040 0.0253987 5.154 2.55e-07 exper 0.1233472 0.0187587 6.575 4.85e-11 I(exper^2) -0.0018871 0.0005999 -3.145 0.00166 age -0.0528524 0.0084624 -6.246 4.22e-10 kidslt6 -0.8683247 0.1183773 -7.335 2.21e-13 kidsge6 0.0360056 0.0440303 0.818 0.41350 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 * *** *** ** *** *** 1 Null deviance: 1029.7 on 752 degrees of freedom Residual deviance: 802.6 on 745 degrees of freedom AIC: 818.6 Pseudo R-square: 1 - 802.6 / 1029.7 = 0.221 Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model Linear, Logit, and Probit Regressions Example 4 (Continues . . . ) Logit: (family = binomial(link = ”logit”) in glm) glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = binomial(link = "logit"), data = wkng) Deviance Residuals: Min 1Q Median -2.1770 -0.9063 0.4473 3Q 0.8561 Max 2.4032 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.425452 0.860365 0.495 0.62095 nwifeinc -0.021345 0.008421 -2.535 0.01126 educ 0.221170 0.043439 5.091 3.55e-07 exper 0.205870 0.032057 6.422 1.34e-10 I(exper^2) -0.003154 0.001016 -3.104 0.00191 age -0.088024 0.014573 -6.040 1.54e-09 kidslt6 -1.443354 0.203583 -7.090 1.34e-12 kidsge6 0.060112 0.074789 0.804 0.42154 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 * *** *** ** *** *** 1 Null deviance: 1029.75 on 752 degrees of freedom Residual deviance: 803.53 on 745 degrees of freedom AIC: 819.53, Pseudo R-squared: 1 - 803.53 / 1029.75 = 0.220 Qualitatively the results are similar to those of the LPM. (R exercise: create similar graphs to those of the linear case for the marginal effects.) Seppo Pynnönen Econometrics II Background Binary Dependent Variable 1 Background 2 Binary Dependent Variable Linear, Logit, and Probit Regressions The Linear Probability Model The Logit and Probit Model 3 Tobit Model Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model Limited dependent variable is called a corner solution response variable if the variable is zero (say) for a nontrivial fraction in the population but is roughly continuously distributed over positive values. An example is the amount an individual is consuming alcohol in a given month. Nothing in principle prevents using a linear model for such a y . The problem is that fitted values may be negative. Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model In cases where it is important to have a model that implies nonnegative predicted values for y , the Tobit model is convenient. The Tobit model (typically) expresses the observed response, y , in terms of an underlying latent variable, y ∗ , y ∗ = x0 β + u (12) y = max(0, y ∗ ) (13) with and u|x ∼ N(0, σ 2 ). Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Accordingly y ∗ ∼ N(x0 β, σ 2 ) and y = y ∗ for y ∗ ≥ 0, but y = 0 for y ∗ < 0. Given sample of observations on y , the parameters can be estimated by the method of maximum likelihood. Seppo Pynnönen Econometrics II Background Binary Dependent Variable Tobit Model Example 5 (Married women annual working hour) 200 150 100 50 0 Frequency 250 300 Married women working hours 0 1000 2000 3000 Hours Seppo Pynnönen Econometrics II 4000 5000 Background Binary Dependent Variable Example 6 (OLS results) lm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, data = wkng) Residuals: Min 1Q -1511.3 -537.8 Median -146.9 3Q 538.1 Max 3555.6 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1330.4824 270.7846 4.913 1.10e-06 nwifeinc -3.4466 2.5440 -1.355 0.1759 educ 28.7611 12.9546 2.220 0.0267 exper 65.6725 9.9630 6.592 8.23e-11 I(exper^2) -0.7005 0.3246 -2.158 0.0312 age -30.5116 4.3639 -6.992 6.04e-12 kidslt6 -442.0899 58.8466 -7.513 1.66e-13 kidsge6 -32.7792 23.1762 -1.414 0.1577 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 *** * *** * *** *** 1 Residual standard error: 750.2 on 745 degrees of freedom Multiple R-squared: 0.2656,Adjusted R-squared: 0.2587 F-statistic: 38.5 on 7 and 745 DF, p-value: < 2.2e-16 Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Example 7 (Tobit regression) vglm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 + kidsge6, family = tobit(Lower = 0), data = wkng) Pearson residuals: Min 1Q Median 3Q Max mu -8.429 -0.8331 -0.1352 0.8136 3.494 loge(sd) -0.994 -0.5814 -0.2366 0.2150 11.893 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept):1 965.28507 443.93450 2.174 0.029676 * (Intercept):2 7.02289 0.03589 195.682 < 2e-16 *** nwifeinc -8.81433 4.48480 -1.965 0.049371 * educ 80.64715 21.56529 3.740 0.000184 *** exper 131.56501 17.01343 7.733 1.05e-14 *** I(exper^2) -1.86417 0.52992 -3.518 0.000435 *** age -54.40524 7.34462 -7.408 1.29e-13 *** kidslt6 -894.02622 111.46120 -8.021 1.05e-15 *** kidsge6 -16.21577 38.48134 -0.421 0.673468 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Number of linear predictors: 2 Names of linear predictors: mu, loge(sd) Log-likelihood: -3819.095 on 1497 degrees of freedom Number of iterations: 6 Seppo Pynnönen Econometrics II Tobit Model Background Binary Dependent Variable Tobit Model (Intercept):2 is an extra statistic related to residual standard deviation. OLS generally results to biased estimation due to the censored y -values. Tobit regression accounts the biasing effect. Predicted values can be from the Tobit model are of the form ŷ = Φ(x0 β̂/σ̂)x0 β̂ + σ̂φ(x0 β̂/σ̂), (14) with Φ the standard normal cumulative distribution function and φ the standard normal density function (derivative function of Φ). Exercise: Using R, plot the predicted values for working hours as a function of education (educ) when the other explanatory are set to their means. Seppo Pynnönen Econometrics II
© Copyright 2026 Paperzz