Econometrics II - University of Vaasa

Econometrics II
Seppo Pynnönen
Department of Mathematics and Statistics, University of Vaasa, Finland
Spring 2017
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Part III
Limited Dependent Variable Models
As of Jan 25, 2017
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
Limited dependent variables refer to variables whose range of
values is substantially restricted.
A binary variable takes only two values (0/1) is an example. Other
examples are is a variable that takes a small number of integer
values.
Other kinds of limited variables are those whose values are
truncated for some reasons. For example, number of passenger
tickets in an airplane or some sports event, etc.
Note however that not all truncated cases need special treatment.
An example is wage, which must be positive.
Typical truncated value variables are those that have in the
limiting value a big concentration of observations.
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Linear, Logit, and Probit Regressions
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Linear, Logit, and Probit Regressions
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Up until now in regression
y = x0 β + u,
(1)
where x0 β = β0 + β1 x1 + · · · + βk xk , y has had quantitative
meaning (e.g. wage).
What if y indicates a qualitative event (e.g., firm has gone to
bankruptcy), such that y = 1 indicates the occurrence of the
event (”success”) and y = 0 non-occurrence (”fail”), and we
want to explain it by some explanatory variables?
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
The meaning of the regression
y = x0 β + u,
when y is a binary variable. Then, because E[u|x] = 0,
E[y |x] = x0 β.
(2)
Because y is a random variable that can have only values 0 or 1,
we can define probabilities for y as P(y = 1|x) and
P(y = 0|x) = 1 − P(y = 1|x), such that
E[y |x] = 0 · P(y = 0|x) + 1 · P(y = 1|x) = P(y = 1|x).
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Thus, E[y |x] = P(y = 1|x) indicates the success probability and
regression in equation 2 models
P(y = 1|x) = β0 + β1 x1 + · · · + βk xk ,
(3)
the probability of success. This is called the linear probability
model (LPM).
The slope coefficients indicate the marginal effect of corresponding
x-variable on the success probability, i.e., change in the probability
as x changes, or
∆P(y = 1|x) = βj ∆xj .
(4)
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
In the OLS estimated model
ŷ = β0 + β̂1 x1 + . . . β̂k xk
(5)
ŷ is the estimated or predicted probability of success.
In order to correctly specify the binary variable, it may be useful to
name the variable according to the ”success” category (e.g., in a
bankruptcy study, bankrupt = 1 for bankrupt firms and
bankrupt = 0 for non-bankrupt firm [thus ”success” is just a
generic term]).
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Example 1 (Married women participation in labor force (year
1975))
Linear probability model (See R-snippet for the R-commands):
lm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
kidslt6 + kidsge6, data = wkng)
Residuals:
Min
1Q
-0.93432 -0.37526
Median
0.08833
3Q
0.34404
Max
0.99417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5855192 0.1541780
3.798 0.000158
nwifeinc
-0.0034052 0.0014485 -2.351 0.018991
educ
0.0379953 0.0073760
5.151 3.32e-07
exper
0.0394924 0.0056727
6.962 7.38e-12
I(exper^2) -0.0005963 0.0001848 -3.227 0.001306
age
-0.0160908 0.0024847 -6.476 1.71e-10
kidslt6
-0.2618105 0.0335058 -7.814 1.89e-14
kidsge6
0.0130122 0.0131960
0.986 0.324415
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
***
*
***
***
**
***
***
1
Residual standard error: 0.4271 on 745 degrees of freedom
Multiple R-squared: 0.2642,Adjusted R-squared: 0.2573
F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Example 2 (Continues . . . )
All but kidsge6 are statistically significant with signs as might be
expected.
The coefficients indicate the marginal effects of the variables on the
probability that inlf = 1. Thus e.g., an additional year of educ
increases the probability by 0.037 (other variables held fixed).
0.4
0.5
Probability
0.6
0.3
0.5
0.4
0.2
0.3
Probability
0.7
0.6
0.8
0.7
0.9
0.8
Marginal effect of experince on married
Marginal effect of eduction on married
women labor force participation
women labor force participation
0
10
20
30
40
0
Experience (years)
Seppo Pynnönen
5
10
Education (years)
Econometrics II
15
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Some issues with associated to the LPM.
Dependent left hand side restricted to (0, 1), while right hand
side (−∞, ∞), which may result to probability predictions less
than zero or larger than one.
Heteroskedasticity of u, since by denoting
p(x) = P(y = 1|x) = x0 β
var[u|x] = (1 − p(x))p(x)
(6)
which is not a constant but depends on x, and hence violating
Assumption 2.
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Linear, Logit, and Probit Regressions
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
The first of the above problems can be technically easily solved by
mapping the linear function on the right hand side of equation (3)
by a non-linear function to the range (0, 1). Such a function is
generally called a link function.
That is, instead we write equation (3) as
P(y = 1|x) = G (x0 β).
(7)
Although any function G : R → [0, 1] applies in principle, so called
logit and probit transformations are in practice most popular (the
former is based on logistic distribution and the latter normal
distribution).
Economists favor often the probit transformation such that G is
the distribution function of the standard normal density, i.e.,
Z z
1 2
1
√ e − 2 v dv ,
G (z) = Φ(z) =
(8)
2π
−∞
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
In the logit tranformation
ez
1
G (z) =
=
=
z
1+e
1 + e −z
z
e −v
−∞
(1 + e −v )2
Z
dv .
Both as S-shaped
Logit transformation
G(z)
0.0
0.0
0.2
0.2
0.4
0.4
G(z)
0.6
0.6
0.8
0.8
1.0
1.0
Probit transformation
−3
−1
0
1
2
3
−3
z
−1
0
z
Seppo Pynnönen
Econometrics II
1
2
3
(9)
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
The price, however, is that the interpretation of the marginal
effects is not any more as straightforward as with the LPM.
However, negative sign indicates decreasing effect on the
probability and positive increasing.
More precisely, using equation (7),
∆P(y = 1|x0 β) ≈ g (x0 β)βj ∆xj ,
where g is the√derivative function of G
(g (x0 β) = (1/ 2π) exp(−x0 β) for probit and
g (x0 β) = exp(−x0 β)/ (1 + exp(−x0 β))2 for logit).
Seppo Pynnönen
Econometrics II
(10)
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Typically the marginal effects are evaluated by unit changes in xj
(i.e., ∆xj = 1) at sample means of the x-variables with estimated
β-coefficients [partial effect at the average (PEA)].
Another commonly used approach is to evaluate at the sample
mean
n
1X
g (x0i β̂).
(11)
n
i=1
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
There are various pseudo R-suared measures for binary response
models.
One is McFadden measure.
Another is squared correlation between ŷi s (prediceted probability)
and observed yi s (which have 0/1 values).
Using R, the former can be computed as
1 − residualdeivance/nulldeviance.
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Linear, Logit, and Probit Regressions
Example 3 (Married women’s labor force . . . )
Probit: (family = binomial(link = ”probit”) in glm)
Call:
glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
kidslt6 + kidsge6, family = binomial(link = "probit"), data = wkng)
Deviance Residuals:
Min
1Q
Median
-2.2156 -0.9151
0.4315
3Q
0.8653
Max
2.4553
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2700736 0.5080782
0.532 0.59503
nwifeinc
-0.0120236 0.0049392 -2.434 0.01492
educ
0.1309040 0.0253987
5.154 2.55e-07
exper
0.1233472 0.0187587
6.575 4.85e-11
I(exper^2) -0.0018871 0.0005999 -3.145 0.00166
age
-0.0528524 0.0084624 -6.246 4.22e-10
kidslt6
-0.8683247 0.1183773 -7.335 2.21e-13
kidsge6
0.0360056 0.0440303
0.818 0.41350
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
*
***
***
**
***
***
1
Null deviance: 1029.7 on 752 degrees of freedom
Residual deviance: 802.6 on 745 degrees of freedom
AIC: 818.6
Pseudo R-square: 1 - 802.6 / 1029.7 = 0.221
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
Linear, Logit, and Probit Regressions
Example 4 (Continues . . . )
Logit: (family = binomial(link = ”logit”) in glm)
glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
kidslt6 + kidsge6, family = binomial(link = "logit"), data = wkng)
Deviance Residuals:
Min
1Q
Median
-2.1770 -0.9063
0.4473
3Q
0.8561
Max
2.4032
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.425452
0.860365
0.495 0.62095
nwifeinc
-0.021345
0.008421 -2.535 0.01126
educ
0.221170
0.043439
5.091 3.55e-07
exper
0.205870
0.032057
6.422 1.34e-10
I(exper^2) -0.003154
0.001016 -3.104 0.00191
age
-0.088024
0.014573 -6.040 1.54e-09
kidslt6
-1.443354
0.203583 -7.090 1.34e-12
kidsge6
0.060112
0.074789
0.804 0.42154
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
*
***
***
**
***
***
1
Null deviance: 1029.75 on 752 degrees of freedom
Residual deviance: 803.53 on 745 degrees of freedom
AIC: 819.53, Pseudo R-squared: 1 - 803.53 / 1029.75 = 0.220
Qualitatively the results are similar to those of the LPM. (R exercise: create similar
graphs to those of the linear case for the marginal effects.)
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
1
Background
2
Binary Dependent Variable
Linear, Logit, and Probit Regressions
The Linear Probability Model
The Logit and Probit Model
3
Tobit Model
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
Limited dependent variable is called a corner solution response
variable if the variable is zero (say) for a nontrivial fraction in the
population but is roughly continuously distributed over positive
values.
An example is the amount an individual is consuming alcohol in a
given month.
Nothing in principle prevents using a linear model for such a y .
The problem is that fitted values may be negative.
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
In cases where it is important to have a model that implies
nonnegative predicted values for y , the Tobit model is convenient.
The Tobit model (typically) expresses the observed response, y , in
terms of an underlying latent variable, y ∗ ,
y ∗ = x0 β + u
(12)
y = max(0, y ∗ )
(13)
with
and u|x ∼ N(0, σ 2 ).
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Accordingly y ∗ ∼ N(x0 β, σ 2 ) and y = y ∗ for y ∗ ≥ 0, but y = 0 for
y ∗ < 0.
Given sample of observations on y , the parameters can be
estimated by the method of maximum likelihood.
Seppo Pynnönen
Econometrics II
Background
Binary Dependent Variable
Tobit Model
Example 5 (Married women annual working hour)
200
150
100
50
0
Frequency
250
300
Married women working hours
0
1000
2000
3000
Hours
Seppo Pynnönen
Econometrics II
4000
5000
Background
Binary Dependent Variable
Example 6 (OLS results)
lm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) + age +
kidslt6 + kidsge6, data = wkng)
Residuals:
Min
1Q
-1511.3 -537.8
Median
-146.9
3Q
538.1
Max
3555.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1330.4824
270.7846
4.913 1.10e-06
nwifeinc
-3.4466
2.5440 -1.355
0.1759
educ
28.7611
12.9546
2.220
0.0267
exper
65.6725
9.9630
6.592 8.23e-11
I(exper^2)
-0.7005
0.3246 -2.158
0.0312
age
-30.5116
4.3639 -6.992 6.04e-12
kidslt6
-442.0899
58.8466 -7.513 1.66e-13
kidsge6
-32.7792
23.1762 -1.414
0.1577
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
***
*
***
*
***
***
1
Residual standard error: 750.2 on 745 degrees of freedom
Multiple R-squared: 0.2656,Adjusted R-squared: 0.2587
F-statistic: 38.5 on 7 and 745 DF, p-value: < 2.2e-16
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Example 7 (Tobit regression)
vglm(formula = hours ~ nwifeinc + educ + exper + I(exper^2) +
age + kidslt6 + kidsge6, family = tobit(Lower = 0), data = wkng)
Pearson residuals:
Min
1Q Median
3Q
Max
mu
-8.429 -0.8331 -0.1352 0.8136 3.494
loge(sd) -0.994 -0.5814 -0.2366 0.2150 11.893
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 965.28507 443.93450
2.174 0.029676 *
(Intercept):2
7.02289
0.03589 195.682 < 2e-16 ***
nwifeinc
-8.81433
4.48480 -1.965 0.049371 *
educ
80.64715
21.56529
3.740 0.000184 ***
exper
131.56501
17.01343
7.733 1.05e-14 ***
I(exper^2)
-1.86417
0.52992 -3.518 0.000435 ***
age
-54.40524
7.34462 -7.408 1.29e-13 ***
kidslt6
-894.02622 111.46120 -8.021 1.05e-15 ***
kidsge6
-16.21577
38.48134 -0.421 0.673468
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Number of linear predictors:
2
Names of linear predictors: mu, loge(sd)
Log-likelihood: -3819.095 on 1497 degrees of freedom
Number of iterations: 6
Seppo Pynnönen
Econometrics II
Tobit Model
Background
Binary Dependent Variable
Tobit Model
(Intercept):2 is an extra statistic related to residual standard
deviation.
OLS generally results to biased estimation due to the censored y -values.
Tobit regression accounts the biasing effect.
Predicted values can be from the Tobit model are of the form
ŷ = Φ(x0 β̂/σ̂)x0 β̂ + σ̂φ(x0 β̂/σ̂),
(14)
with Φ the standard normal cumulative distribution function and φ the
standard normal density function (derivative function of Φ).
Exercise: Using R, plot the predicted values for working hours as a
function of education (educ) when the other explanatory are set to their
means.
Seppo Pynnönen
Econometrics II