a partially adaptive estimator for the censored regression

A PARTIALLY ADAPTIVE ESTIMATOR FOR THE CENSORED
REGRESSION MODEL BASED ON A MIXTURE OF NORMAL
DISTRIBUTIONS
Steven B. Caudill
Department of Economics
216 Lowder Business Building
Auburn University, AL 36849-5242
Email: [email protected]
Fax: (334) 844-4615
March 15, 2007
A PARTIALLY ADAPTIVE ESTIMATOR FOR THE CENSORED
REGRESSION MODEL BASED ON A MIXTURE OF NORMAL
DISTRIBUTIONS
Abstract: The goal of this paper is to introduce a partially adaptive estimator for the
censored regression model based on an error structure described by a mixture of two
normal distributions. The model we introduce is easily estimated by maximum
likelihood using the EM algorithm adapted from the work of Bartolucci and Scaccia
(2004). A Monte Carlo study is conducted to examine the small sample properties of this
estimator compared to some common alternatives for the estimation of a censored
regression model such as the usual tobit model and the CLAD estimator of Powell
(1984). Our partially adaptive estimator performed well. The partially adaptive
estimator is applied to the Mroz (1987) data on wife’s hours worked. The empirical
evidence supports the partially adaptive estimator over the usual tobit model.
Keywords; partially adaptive estimator, censored regression model
JEL: C240
2
A Partially Adaptive Estimator for the Censored Regression Model
Based on a Mixture of Normal Distributions
Introduction
Estimation of the censored normal regression model, or tobit, has become quite
common in the literature. However, the usual tobit model is based on normally
distributed errors and if errors are not normally distributed the maximum likelihood
estimator is inconsistent. This lack of consistency in the tobit model has led researchers
to develop estimators less sensitive to the normality assumption. One solution to the
problem is the development of fully adaptive estimators and quasi-maximum likelihood,
or partially adaptive, estimators in which the unknown underlying error distribution is
estimated along with the parameters. A fully adaptive estimator is an estimator that is
used when the underlying distribution is unknown and is as efficient (asymptotically) as
an estimator developed with knowledge of the distribution. The idea of an adaptive
estimator was developed by Stein (1956), Beran (1974), and Stone (1975) and extended
by Bickel (1982) and Manski (1984). Bickel (1982) gives the conditions for which a
fully adaptive estimator has the same asymptotic variance that would obtain if one knew
the true error distribution. The fully adaptive estimator is usually based on a
nonparametric estimate of the unknown distribution, whereas a partially adaptive
estimator is based on a parametric approximation to the true unknown error distribution.
Partially adaptive estimators may have some advantages over fully adaptive
estimators. For example, Bickel (1982) and McDonald and Newey (1988) suggest that a
partially adaptive estimator might be more practical when the sample size is small. They
also suggest that a partially adaptive estimator with a small number of nuisance
parameters may outperform a fully adaptive estimator in small samples.
3
Wu and Stengos (2005) point out other advantages of partially adaptive
estimators. Unlike fully adaptive estimators, partially adaptive estimators do not depend
critically on bandwidth. In addition, partially adaptive estimation may encounter fewer
computational difficulties than fully adaptive estimation.
Much of the research on partially adaptive estimators, at least in the area of
microeconometrics, has dealt with three issues: 1) applications of partially adaptive
estimators to different econometric models, 2) an examination of the use and effects of
different flexible parametric error structures on performance, and 3) Monte Carlo
evaluation of the performance of partially adaptive estimators, particularly in small
samples.
In the case of linear regression models, partially adaptive estimators have been
developed based on the generalized t-distribution by several authors including McDonald
and Newey (1988), Butler, McDonald, Nelson, and White (1992), and McDonald (1993).
Partially adaptive estimators of the linear regression model based on a mixture-ofnormals error structure are developed by Phillips (1991), Phillips (1994), and Bartolucci
and Scaccia (2004). A partially adaptive regression estimator based on the maximum
entropy distribution is developed by Wu and Stengos (2005). For dichotomous choice
models, McDonald (1996) develops a partially adaptive estimator based on the
generalized t-distribution while Geweke and Keane (1997) use a mixture of normal
distributions to approximate the unknown error structure. For the censored regression
model, McDonald and Xu (1996) develop a partially adaptive estimator based on the
generalized t-distribution.
4
The goal of this paper is to introduce a partially adaptive estimator for the
censored regression model based on an error structure described by a location-scale
mixture of normal distributions. This estimator is appealing for several reasons. First,
the estimator has the normal distribution has a special case and so the usual tobit model is
embedded in the formulation. Second, the estimation of the model is much simpler, via
the EM algorithm, than many of the other robust estimators of the censored regression
model. Third, a mixture of normal distributions is known to be a very flexible form, able
to approximate many different error structures (see, for example, Marron and Wand
(1992)).
The model we introduce is easily estimated by maximum likelihood using an EM
algorithm which is presented below. Our EM algorithm combines an EM algorithm for
the estimation of a censored normal regression model of Amemiya (1995) with the EM
algorithm for the estimation of a regression model with a mixture-of-normals disturbance
term of Bartolucci and Scacci (2004). A Monte Carlo study is conducted to example the
small sample properties of our estimator compared to some common alternatives for the
estimation of a censored regression model. In particular we examine the usual tobit
model and the CLAD estimator of Powell (1984) and find that the partially adqptive
estimator performs well. Finally, the estimator is applied to the Mroz (1987) data on
wife’s hours worked. The empirical evidence supports the partially adaptive estimator
over the usual tobit model.
5
An EM Algorithm
This section provides the details for maximum likelihood estimation via an
expectations maximization, or EM, algorithm for adaptive estimation of a censored
regression model using a mixture of normal distributions to approximate the unknown
error structure. The EM algorithm has proven to be an extremely useful algorithm for
maximum likelihood estimation in a variety of complicated problems. The EM algorithm
is modified for use in missing data problems by Dempster, Laird, and Rubin (1985). The
algorithm presented here combines the EM algorithm for a censored normal regression
model given by Amemiya (1985) with the EM algorithm of Bartolucci and Scaccia
(2004) for partially adaptive estimation of a regression model with a mixture-of-normals
error structure.
Bartolucci and Scaccia develop a partially adaptive estimator for the linear
regression model based on a mixture of normals error structure. Their approach is to
allow the intercepts to differ between regimes by including dummy variables associated
with each regime into the data matrix. In this way, each component of the mixture model
has its own intercept and variance, but the slope coefficients are equal across regimes.
We extend the approach of Bartolucci and Scaccia to develop a partially adaptive
estimator of the censored regression model. Like Bartolucci and Scaccia, we allow the
intercepts and variances to differ between regimes but the slope coefficients are held
constant. The case of a mixture of two normal distributions is considered and used in all
subsequent simulations and estimations but the model is easily extended to a mixture of
more than two components.
6
We begin with the usual tobit or censored normal regression model. In the latent
variable framework the model is given by
yi = X i β + ε i
*
(1)
*
yi = max[ yi ,0],
where y* is the latent dependent variable, y is the observed dependent variable, Xi is a
vector of exogenous variables, β is a vector of parameters to be estimated, and ε is an iid
N(0, σ2) error term. Define the dummy variable, Ii, such that Ii = 1 if y*≥0 and Ii = 0 if
y=0. For each observation, the likelihood is a mixture of a probability and a density
function. We denote this likelihood by g, where
gi = I i ( f (
yi − X i β
σ
) + (1 − I i )(1 − F (
Xiβ
σ
)).
(2)
In the usual normal case, f and F are the density and distribution functions of a standard
normal random variable, respectively. The observed or incomplete loglikelihood
function for the tobit model is the sum of the logarithms of the terms in (2), or
n
log LT = ∑ log( gi ).
(3)
i =1
Maximization of the likelihood function for the tobit is routine, with several algorithms
available and several software packages currently in use to accomplish this task.
We wish to focus on maximizing the likelihood for the tobit by using the
expectations maximization, or EM, algorithm of Dempster, Laird, and Rubin (1985). The
EM algorithm is easy to implement in situations where there are “missing” data and the
algorithm is guaranteed to achieve at least a local maximum. The tobit model is a classic
case of missing data due to the presence of the limit observations. If the true values of
the limit observations were known, OLS could be applied. In the expectations or “E”
7
step of the EM algorithm these missing values are estimated by their conditional
expectations given parameter values and the data. The maximization or “M” step of the
EM algorithm usually involves evaluating the resulting expressions by OLS or WLS to
update the likelihood.
Following Amemiya, we present an EM algorithm for the estimation of the tobit
model. The EM algorithm involves the maximization of the expected value of the
loglikelihood function based on the density function of the latent variable. The completedata loglikelihood function is given by
n
 y * − Xiβ 
*
,
log LT = ∑ log f  i

σ
i =1


(4)
where f represents the standard normal density function and y* is the latent variable.
Note that if y* were observed for all observations, maximization of (4) would reduce to
OLS. This is not possible because not all values of y* are observed, so the EM algorithm
maximizes the expected value of the loglikelihood function which requires replacing yi*
by its expected value and variance given the data and parameter values.
Following Amemiya (1985), assume the sample is reordered so that the first n1
observations are nonlimit observations and the remaining n-n1 observations are limit
observations. If one obtains the FOCs for the maximization of (4) and solves, the
following expressions are obtained for updating the parameter values at each iteration

y 
* 
 E ( yi )
β i = ( X ' X ) −1 X ' 
σ i = n [∑ [ I i ( yi − X i β ) + (1 − I i )( yi − X i β ) + (1 − I i )V ( yi | I i = 0)]],
2
−1
2
2
1
8
*
(5)
where n is the sample size. Evaluation of the expressions in (5) requires the conditional
expected values and conditional variances of these “missing” or limit observations.
These expectations are given by
E ( yi | I i = 0) = X i β −
*
σf i1
Fi1
2
σf 
*
−  i1  ,
V ( yi | I i = 0) = σ 2 + X i β
Fi1  Fi1 
σf i1
(6)
where f and F are the density function and distribution function of the standard normal
evaluated at –Xiβ/σ, respectively. These values are inserted into the expressions in (5) in
the “M” step of the algorithm and the process is repeated until convergence.
We wish to combine this EM algorithm with the EM algorithm developed by
Bartolucci and Scaccia (2004) to estimate a regression model with an unknown error
structure approximated by a mixture-of-normals error structure. In their approach,
Bartolucci and Scaccia estimate a mixture of two regressions with normal errors with
constraints imposed on the regression parameters. In particular, Bartolucci and Scaccia
allow the intercepts and variances to differ between regimes, but the slope coefficients
are constrained to be equal.
The model developed by Bartolucci and Scaccia is essentially the mixture model
developed by Quandt (1988) with cross-regime parameter constraints imposed.
Following Quandt (1988), we illustrate the EM algorithm for the case of a mixture of two
normal regressions (or switching regressions). The model is given by
yi = α1 + X iδ + ε1i with
probability θ
yi = α 2 + X iδ + ε 2i with
probability 1 − θ
9
(regime1)
(regime 2)
(7)
where ε1i and ε2i are mutually independent, iid normally distributed errors with zero
means and variances given by σ12 and σ22, respectively. Bartolucci and Scaccia constrain
the slope parameters, δ. Let the regression parameter vector be denoted by β=[α1 α2 δ].
Following Bartolucci and Scaccia, let 1 be a column vector of ones of dimension n and
let 0 be a column vector of zeros of dimension n and define two data matrices denoted
X 1 = [1 0 X ] and
X 2 = [0 1 X ],
(8)
where X is a matrix of data on the independent variables. Let fij represent the density
function of a normally distributed random variable with mean Xjβ and standard deviation
σj. Then, the incomplete or observed data density function of a typical observation in the
BS mixture model is given by
hi = θf ( X 1i β ,σ 1 ) + (1 − θ ) f ( X 2i β ,σ 2 ).
(9)
To write the complete-data likelihood, define the indicator variable dij where di1=1 if the
observation is associated with the first regime, 0 otherwise, and di2=1 (in our twocomponent case, really 1-di1=1) if the observation is associated with the second regime, 0
otherwise. In our two-component case, d is a Bernoulli trial with probability θ. Thus, the
typical complete-data density function for the BS mixture of normal regressions is given
by
hi = [θf ( X 1i β ,σ 1 )] i1 [(1 − θ ) f ( X 2i β ,σ 2 )]
*
d
1− d i 1
.
(10)
Then, the complete-data loglikelihood function can be written
n
log L*M = ∑{di1 (ln θ + ln f i1 ) + (1 − d i1 )(ln(1 − θ ) + ln f i 2 )}
(11)
i =1
In the E step of the EM algorithm, the expected value of the loglikelihood is needed
which requires replacing d by its expectation given the data. This expectation is given by
10
E(di1|yi)= P(di1=1|yi), which equals
P (d i1 = 1 | yi ) =
P (d i1 = 1) P ( yi | d i1 = 1)
∑i =1 P(dij = 1) P( yi | dij = 1)
2
=
θf i1
θf i1 + (1 − θ ) f i 2
= wi1
(12)
Evaluation of (12) provides estimates of the expected values or weights, wi1 and 1-wi1.
Once these weights have been calculated, they can be substituted into the log of the
complete-data likelihood which is then maximized in the M step of the EM algorithm
with respect to the unknown parameters in the model.
To examine the M step of the EM algorithm, return to the log of the complete data
likelihood, and substitute for E(di1) to yield
n
E (log LM ) = ∑{wi1 (ln θ + ln f i1 ) + (1 − wi1 )(ln(1 − θ ) + ln f i 2 )}.
*
(13)
i =1
After substituting for the unobserved regime indicator variable, solving the first order
conditions leads to the following expressions for updating the parameter estimates with
the EM algorithm
β = ( X 1 ' diag ( w1 ) X 1 + X 2 ' diag ( w2 ) X 2 ) −1 ( X 1 ' diag ( w1 ) y + X 2 ' diag ( w2 ) y )
σ 12 = [( y − X 1β )' diag ( w1 )( y − X 1β )] / ∑ w1i
σ 2 2 = [( y − X 2 β )' diag ( w2 )( y − X 2 β )] / ∑ w2i
π=
(14)
1
∑ w1i ,
n
where w1 and w2 are vectors of the weights given above in (12). Iterations continue until
convergence.
The two EM algorithms described above can be combined into a single EM
algorithm for a censored regression model with an error structure approximated by a
mixture of normals. Following Hartley (1978), our adaptive estimator can be considered
part of a three equation system containing two (constrained) censored regression models
11
and a choice equation containing only an intercept. Following Caudill (2003), we assume
that the error term in the choice equation is independent of each error term in the
(constrained) censored regression model. This innocuous assumption greatly facilitates
calculation of the two expected values needed for insertion into the EM algorithm
developed here. In particular, our new EM algorithm requires the insertion of the
weighting matrices associated with the mixture algorithm above into the appropriate
places in the EM algorithm for maximum likelihood estimation of a censored regression
model.
The model we wish to estimate is given by
yi = α1 + X iδ + ε1i ,
yi = max[ yi ,0], with probability θ
yi = α 2 + X iδ + ε1i ,
yi = max[ yi ,0], with probability 1 − θ
*
*
*
*
(15)
where, again, β=[α1 α2 δ]. Then the observed data likelihood for a typical observation is
ri = θg ( X 1i β ,σ 1 ) + (1 − θ ) g ( X 2i β ,σ 2 ),
(16)
The incomplete or observed data loglikelihood function is given by
n
log LMT = ∑ [θg ( X 1i β , σ 1 ) + (1 − θ ) g ( X 2i β ,σ 2 )],
(17)
i =1
where g is the likelihood associated with a single tobit model as described above in (2).
The complete data likelihood corresponding to (17) contains two latent variables: one
corresponding to the regime membership and one corresponding to the limit observations
in the tobit model. The complete data loglikelihood is given by
n
log LMT = ∑{d i1 (ln θ + ln f i1 ( yi ; X i1β ,σ 1 ) + (1 − di1 )(ln(1 − θ ) + ln f i 2 ( yi ; X i 2 β ,σ 2 )}. (18)
*
*
*
i =1
where f again refers to the standard normal density function. Maximization of the
expected value of this complete data loglikelihood requires the insertion into the FOCs of
12
two kinds of expectations: one for the d values indicating regime membership and
another based on moments of the unobserved y values in the censored normal regression
model. An EM algorithm for the maximization of the likelihood function involves
inserting weighting matrices into the expressions above. In this case there are new
weights given by
w1i =
θg ( X i1β ,σ 1 )
ri
and
w2i =
(1 − θ ) g ( X i 2 β ,σ 2 )
.
ri
(19)
The weights are the posterior probabilities that an observation is associated with regime
one or regime two, respectively. In the case of a censored regression model with a
mixture-of-normals error structure, two expectations must be inserted into expressions for
the first order conditions to maximize the likelihood function: the weights given above in
(19) and the conditional expectations of the latent variables in (5) must be inserted into
the first order conditions. With these four expectations inserted (two in each regime),
expressions for the model parameters at each iteration of this EM algorithm are given by

y
y



'
(
)
+
X
diag
w
2
2


)
*
*
 E ( y | X 1 )
 E ( y | X 2 )
β = ( X 1 ' diag ( w1 ) X 1 + X 2 ' diag ( w2 ) X 2 ) −1 ( X 1 ' diag ( w1 ) 
σ 12 = 1
∑ w1i
σ2 = 1
2
∑w
[∑ w1i ( yi − X i β ) 2 + ∑ w1i ( yi − X i β ) 2 + ∑ w1iV ( y1i | di = 0, X 1 )]
2i
*
1
0
0
[∑ w2i ( yi − X i β ) 2 + ∑ w2i ( yi − X i β ) 2 + ∑ w2iV ( y2i | di = 0, X 2 )]
*
1
0
0
n
π = ∑ w1i / n.
i =1
(20)
The conditional expectations needed, corresponding to each regime, are given by
13
σ 1 f ((− X 1i β ) / σ 1 )
F ((− X 1i β ) / σ 1 )
σ ((− X 2i β ) / σ 2 )
*
E ( y2i | di = 0, X 2i ) = X 2i β − 2
F ((− X 2i β ) / σ 2 )
E ( y1i | di = 0, X 1i ) = X 1i β −
*
σ f ((− X 1i β ) / σ 1 ) σ 1 f ((− X 1i β ) / σ 1 ) 
V ( y1i | di = 0, X 1i ) = σ + X 1i β 1
−

F ((− X 1i β ) / σ 1 )  F ((− X 1i β ) / σ 1 ) 
*
2
2
1
(21)
2
σ f ((− X 2i β ) / σ 2 ) σ 2 f ((− X 2i β ) / σ 2 ) 
V ( y2i | di = 0, X 2i ) = σ 2 + X 2i β 2
−
 .
F ((− X 2i β ) / σ 2 )  F ((− X 2i β ) / σ 2 ) 
*
2
Iterations of this algorithm continue until convergence.
In order to asses the usefulness of the mixture-of-normals approach to estimating
model parameters in the censored regression model, a Monte Carlo experiment is
conducted. The two goals of the Monte Carlo study are: 1) to provide evidence of the
usefulness of the estimation method when the true errors are not normally distributed and,
2) to provide evidence on the small sample properties of the mixture-of-normals
approximation.
Monte Carlo Simulations
Several studies have conducted Monte Carlo experiments to compare small
sample properties of various estimators for the censored regression model. We wish to
examine the small sample properties of our adaptive estimator and we are also interested
in how well several common error structures can be approximated by a mixture of two
normal distributions. To examine these issues, we conduct a Monte Carlo study along the
lines of Moon (1989) and Paarsch (1984). Following Paarsch, the censored regression
model is given by
*
y i = a + bxi + u i
14
i = 1,..., N
(22)
where y* is unobserved. The observed variable, y, is given by
y i = max{0, a + bxi } .
(23)
In the Monte Carlo experiment two samples sizes are examined: N=50 and N=200. The
error distributions considered are: normal, Cauchy, Laplace, and lognormal. The means
of all error terms are zero and the variances are 100. The scale parameter in the Cauchy
is 10. All programs and random number generation in this paper are accomplished using
the IML language in SAS. In each case the mean of the error distribution is zero and the
variance is one hundred. The true value of the parameter of interest, the slope parameter,
is 1.0. The intercept is allowed to vary to change the degree of censoring. We consider
censoring levels of twenty-five and fifty percent. Each experiment consisted of 200
random trials.
For each case we considered three estimators: the usual tobit maximum
likelihood estimator (based on the normality assumption), the censored least absolute
deviations estimator (CLAD), and the partially adaptive mixture estimator (PAM).
Maximum likelihood is used to estimate parameters in the tobit and PAM models, and a
grid search is used to obtain the CLAD estimates.
The results from estimating these models are given in Tables 1A through 1D. For
each simulation we report the mean, median and root mean square error for each
estimator. Table 1A gives the results for a sample size of 50 with 25% censoring.
Considering the mean, the PAM estimator is closer to the mean for the Cauchy, Laplace,
and lognormal, but the tobit is closer if the error distribution is normal. If the median is
considered, the CLAD estimator performs best when the distribution is Cauchy, Laplace,
or lognormal. The PAM estimator performs best when the errors are normally
15
distributed. If one considers all-important RMSE criterion, the PAM estimator performs
best for every distribution except the normal, surpassed in that case by the usual tobit
estimator.
Table 1B contains the results for a sample size of 50 but with 50% censoring.
When comparisons are based on the mean, the PAM estimator performs best when the
distribution is Cauchy or lognormal, the usual tobit estimator when the distribution is
Laplace and the CLAD estimator when the distribution is normal. When comparisons are
made based on the median, the PAM estimator performs best when the distribution is
Cauchy or normal. The usual tobit estimator performs best when the distribution is
Laplace, and the CLAD estimator performs best when the distribution is lognormal. The
results are mixed when the all-important RMSE criterion is examined. The PAM
estimator has smaller RMSE when the distribution is lognormal or normal, although in
the normal case the improvement over the usual tobit model is very slight. In the case of
the Laplace distribution, the usual tobit estimator performs best but when the distribution
is Cauchy, the CLAD estimator performs best.
The effects of a larger sample size are revealed in tables 1C and 1D. Table 1C
contains the simulation results for a sample of size 200 with 25% censoring. When using
the mean as a basis for comparison, the usual tobit estimator performs best when the
distribution is Cauchy or normal, while the PAM estimator performs best when the
distribution is Laplace or lognormal. When comparisons are made using the median, the
PAM estimator performs best in every case. When the RMSE is used, the PAM
estimator performs best for every distribution but the Laplace, for which the CLAD
estimator exhibits a slight improvement.
16
Table 1D contains the simulation results for a sample size of 200 and 50%
censoring. When comparing on the basis of the mean, the PAM estimator performs best
for all distributions except the Laplace for which the usual tobit estimator holds a very
slight advantage. When comparing on the basis of the median, the CLAD estimator
performs best for the Cauchy and the normal, the PAM estimator performs best for the
lognormal, and the usual tobit estimator performs best for the Laplace. When the RMSEs
are compared, the PAM estimator is best in every case, although the usual tobit model is
tied for best in the case of the Laplace distribution.
The results do suggest that the PAM estimator is a useful estimator under a
variety of circumstances. In the sixteen cases presented in Tables 1A to 1D, the PAM
estimator has the best performance in terms of RMSE in eleven cases. When the sample
size is 200, the PAM estimator performs best according to RMSE in seven of eight cases.
Clearly, the PAM estimator has desirable small sample properties that tend to improve
with increases in the sample size. The PAM estimator is easy to calculate by comparison
to the CLAD estimator, for example. Having established that the partially adaptive
estimator can usefully mimic several other distributions, we now discuss the application
to the Mroz data on the annual hours worked of married women.
Application to the Mroz Data
Mroz (1987) investigated the effects of several independent variables on the
hours worked of married women. The data set contains observations on 753 married
women. Of these 753, 428 worked for a wage outside the home and the remainder, 325,
worked zero hours. The independent variables used in the model are nonwife income
17
(nwifeinc), wife’s education (educ), wife’s labor force experience (exper), wife’s labor
force experience squared (exper*exper), number of children less than six years of age
(kidslt6), and the number of children between the ages of six and eighteen (kidsage6).
The dependent variable is the wife’s annual hours of work.
The estimation results are contained in Table 2. For purposes of comparison, the
usual tobit model is estimated, along with OLS. The results of estimating the model by
OLS are presented in column 2 of the table. The coefficients of all variables except
nwifeinc and kidsage6 are statistically significant at the usual levels. The model R2 is
0.266 and the estimate of σ is 750.179. The tobit estimation results are given in column 3
of Table 2. The coefficients of all of the explanatory variables except kidsage6 are
statistically significant at the usual levels. Compared to the OLS estimation results, the
tobit model contains a statistically significant coefficient for nwifeinc. The other
coefficients in the two models are not directly comparable. To make them comparable,
the tobit estimates must be multiplied by the average probability of a nonlimit
observation or
1 n
∑ F.
n i =1
(24)
These adjustments are made and the results reported in brackets in column 2 of Table 2.
Aside from the significance of nwifeinc, the biggest difference in these average marginal
effects between OLS and tobit is in the effect of a year of education on hours worked.
The OLS results indicate that an additional year of education leads to an increase of about
28 hours worked but the tobit results indicate a year of education increases hours worked
by about 47, exceeding the OLS estimate by about sixty-eight percent. Also worthy of
18
note is that the tobit estimate of σ is 1122. Of course, these tobit estimates are based on
an underlying assumption of normally distributed errors.
In order to relax this assumption, we estimate the PAM model introduced here.
The estimation results for the PAM model are contained in column 4 of Table 2. The
estimation does find two distinct intercepts associated with two regimes. The value of
arbitrarily designated intercept1 is 379.58 and the value of arbitrarily designated
intercept2 is 1366.37. The mixing weight is estimated to be .782 for regime 1 and, by
implication, 0.218 for regime 2. This evidence lends support to the PAM model and the
underlying nonormality of the error terms. More evidence in favor of the PAM model
can be found by examining the graphical comparison of error structures for the tobit and
the partially adaptive model given in Figure 1. Compared to the tobit density, the PAM
density is shifted to the right and has a thicker left tail. The figure suggests an underlying
nonnormal error structure, but a statistical test is required to provide definitive evidence
of nonnormality.
Testing the PAM model against the usual tobit is possible using a modified
likelihood ratio test. The likelihood ratio test for testing one versus two components in a
mixture does not follow the usual chi-square distribution but is approximately a chisquare with two degrees of freedom. For α = 0.05, the critical value for a chi-square with
2 degrees of freedom is 5.99. However, Thode, Finch, and Mendell (1988) suggest
modifying this critical value by the following
CV = 6.08 + 4.51 / n ,
(25)
where n is the sample size. We use this critical value to test the ability of this modified
chi-square statistic to detect departures from normality in the tobit model. The chi-square
19
statistic for testing the tobit model against the PAM model is 13.60, which exceeds the
modified critical value of 6.24. We conclude that the statistical evidence favors the PAM
model over the usual tobit.
As far as statistical significance is concerned, the PAM results mirror, not the
tobit results, but the OLS results. The coefficients of all variables except nwifeinc and
kidsdage6 are statistically significant at the usual levels.
In order to compare the average marginal effects, the PAM coefficients are
adjusted according to the result in (17) above. In general, the average marginal effects
tend to fall between their OLS and tobit counterparts. The PAM estimates are, in some
cases, notably different from the usual tobit results. First, the coefficient of nwifeinc is
not statistically significant when the PAM model is estimated. Second, the average
marginal effect of an additional year of education (educ) falls from 47.5 in the tobit
model to 37.9 when the PAM model is estimated. In conclusion, in the case of the Mroz
data, the empirical evidence indicates a departure from normality in the error structure
which leads to a substantial change in some of the estimated marginal effects.
Conclusions
This paper introduces a partially adaptive estimator for the censored regression
model and presents an EM algorithm for estimation of the model. A Monte Carlo
experiment verifies that the estimator is useful in small samples and robust to underlying
distributional assumptions. The partially adaptive estimator has several virtues: 1) the
normal distribution and usual tobit model are a special case, 2) the estimation of the
model is much simpler, using the EM algorithm, than many of the other robust estimators
20
of the censored regression model, and 3) a mixture of two normal distributions is known
to be a very flexible form, able to approximate many different error structures.
The partially adaptive model is applied to the Mroz (1987) data on the yearly
hours worked of married women. A statistical test rejects the normality assumption
underlying the usual tobit model in favor of an error structure described by the partially
adaptive estimator. The partially adaptive estimation results differ from the usual tobit
estimation results in two important respects. The tobit model indicates a significant
effect of nonwife income on hours worked but the effect in not statistically significant in
the partially adaptive model. Also, the usual tobit model estimates the effect of an
additional year of education on hours worked to be twenty-five percent higher than with
the usual tobit model. The application to the Mroz data demonstrates that the restrictive
normality assumption in the usual tobit can have substantial consequences for parameter
estimates and marginal effects. The restrictive normality assumption is easily relaxed
using the partially adaptive estimator presented here.
21
Table 1A
DISTRIBUTION
Cauchy
Tobit
LAD
PAM
Laplace
Tobit
LAD
PAM
Lognormal
Tobit
LAD
PAM
Normal
Tobit
LAD
PAM
DISTRIBUTION
Cauchy
Tobit
LAD
PAM
Laplace
Tobit
LAD
PAM
Lognormal
Tobit
LAD
PAM
Normal
Tobit
LAD
PAM
Comparison of estimators in 200 random trials
Sample size=50, 25% censoring
Mean
Median
RMSE
4.877
1.167
1.018
1.371
1.040
0.961
16.677
0.793
0.686
1.062
1.090
1.031
1.054
1.010
1.030
0.265
0.397
0.274
1.288
1.049
1.032
1.277
1.001
1.024
0.472
0.241
0.190
1.002
1.149
1.030
0.996
1.068
1.001
0.251
0.543
0.301
Table 1B
Comparison of estimators in 200 random trials
Sample size=50, 25% censoring
Mean
Median
RMSE
16.412
2.250
1.708
2.248
1.277
1.223
64.888
2.404
3.545
0.992
1.491
1.036
0.999
1.216
1.033
0.279
1.239
0.340
1.212
1.364
0.994
1.135
1.014
0.905
0.498
1.178
0.333
1.021
1.006
0.993
1.018
1.009
0.994
0.069
0.099
0.068
22
DISTRIBUTION
Cauchy
Tobit
LAD
PAM
Laplace
Tobit
LAD
PAM
Lognormal
Tobit
LAD
PAM
Normal
Tobit
LAD
PAM
DISTRIBUTION
Cauchy
Tobit
LAD
PAM
Laplace
Tobit
LAD
PAM
Lognormal
Tobit
LAD
PAM
Normal
Tobit
LAD
PAM
Table 1C
Comparison of estimators in 200 random trials
Sample size=200, 25% censoring
Mean
Median
RMSE
9.981
1.045
1.067
2.397
1.060
1.030
32.693
0.210
0.229
1.039
1.018
1.015
1.036
1.018
1.015
0.126
0.131
0.113
1.269
1.017
1.009
1.265
1.102
1.008
0.315
0.099
0.070
0.999
0.999
0.997
0.997
0.995
0.998
0.014
0.018
0.014
Table 1D
Comparison of estimators in 200 random trials
Sample size=200, 50% censoring
Mean
Median
RMSE
6.114
1.577
1.196
3.794
1.156
1.169
9.386
1.371
0.372
1.012
1.104
0.987
1.011
1.034
0.975
0.151
0.504
0.151
1.356
1.036
0.990
1.270
0.973
0.974
0.510
0.290
0.146
1.024
1.004
1.001
1.019
0.999
0.997
0.042
0.039
0.034
23
Table 2
Estimation Results for the Mroz Data
VARIABLE
NWIFEINC
OLS
-3.447
(1.35)1
EDUC
28.761
(2.22)
EXPER
65.673
(6.59)
EXPER*EXPER
-0.700
(2.16)
AGE
-30.512
(6.99)
KIDSLT6
-422.090
(7.51)
KIDSAGE6
-32.779
(1.41)
Intercept1
Intercept2
1330.484
(4.91)
-------
Tobit
-8.814
(1.98)
[-5.191]
80.646
(3.74)
[47.500]
131.564
(7.61)
[77.491]
-1.864
(3.47)
[-1.098]
-54.405
(7.33)
[-32.045]
-894.022
(7.99)
[-526.579]
-16.218
(0.42)
[9.552]
965.305
(2.16)
-------
σ1
σ2
π
750.179
-------------
1122.022
-------------
R2
Loglikelihood value
0.266
-------
-------3819.09
1
Numbers in parentheses are absolute values of t-ratios.
24
PAM
-4.487
(1.15)
[-2.759]
61.753
(3.19)
[37.978]
136.115
(8.85)
[83.711]
-2.100
(4.31)
[-1.292]
-44.499
(6.14)
[-27.367]
-875.898
(8.26)
[-538.677]
1.091
(0.03)
[0.671]
379.581
(0.89)
1366.369
(3.27)
1281.484
471.170
0.782
(8.41)
-------3812.29
Figure 1
Densities of Tobit and Partially Adaptive Estimator Evaluated at Sample Means
f(x)
NO
RM
0. 00038
0. 00036
partially adaptive
estimator
0. 00034
0. 00032
tobit
0. 00030
0. 00028
0. 00026
0. 00024
0. 00022
0. 00020
0. 00018
0. 00016
0. 00014
0. 00012
0. 00010
0. 00008
0. 00006
0. 00004
0. 00002
0. 00000
- 3000
- 2000
- 1000
0
xI
25
1000
2000
3000
References
Amemiya, T. (1995). Advanced Econometrics. Harvard University Press: Cambridge.
Bartolucci, F, and Scaccia, L (2004). The Use of Mixtures for Dealing with Non-normal
Regression Errors. submitted to Computational Statistics and Data Analysis.
Beran, R. (1974). Asymptotically Efficient Adaptive Rank Estimates in Location Models.
Annals of Statistics, 2, 63-74.
Bickel, P. J. (1982). On Adaptive Estimation. Annals of Statistics, 10, 647-671.
Boyer, B.H., McDonald, J.B., and Newey, W.K. (2003). A Comparison of Partially
Adaptive and Reweighted Least Squares Estimation. Econometric Reviews, 22,
115-134.
Butler, R.J., McDonald, J.B., Nelson, R.D., and White, S.B. (1990). Robust and Partially
Adaptive Estimation of Regression Models. Review of Economics and Statistics,
2, 321-327.
Caudill, S.B. (2003). Estimating A Mixture of Stochastic Frontier Regression Models Via
the EM Algorithm: A Multiproduct Cost Function Application. Empirical
Economics, 28, 581- 598.
Dempster, A. P., N. M. Laird, and D. B. Rubin. (1977). Maximum Likelihood Estimation
from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical
Society, Series B, 39, 1977, 1-38.
Geweke, J., and M. Keane (1997). Mixture of Normals Probit. Federal Reserve Bank of
Minneapolis, Research Staff Report 237, August 1997.
Hartley, M. (1978). Comment (on “Estimating Mixtures of Normal Distributions and
Switching Regressions,” by Quandt and Ramsey). Journal of the American
Statistical Association 73, 738-741.
Li, Q. and Stengos, T. (1994). Adaptive Estimation in the Panel Data Error Component
Model with Heteroskedasticity of Unknown Form,” International Economic
Review, 35, 981-1000.
Manski, C. F. (1984). Adaptive Estimation of Non-linear Regression Models.
Econometric Reviews, 3, 145-194.
Marron, J.S. and Wand, M.P. (1992). Exact Mean Intergrated Squared Error. Annals of
Statistics, 20, 712-736.
26
McDonald, J. (1996). An Application and Comparison of Some Flexible Parametric and
Semi-parametric Qualitative Response Models. Economics Letters, 53, 145-152.
McDonald, J.B. and Xu, Y.J. (1996). A Comparison of Semi-parametric and Partially
Adaptive Estimators of the Censored Regression Model with Possibly Skewed
and Leptokurtic Error Distributions. Economics Letters, 51, 153-159.
McDonald, J.B. and Moffitt, R. (1980). The Uses of Tobit Analysis. The Review of
Economics and Statistics, 62 (2), 318-321.
McDonald, J. B. and Newey, W.K. (1988). Partially Adaptive Estimation of Regression
Models via the Generalized T Distribution. Econometric Theory, 4, 428-457.
McDonald, J.B. and White, S.B. (1993). A Comparison of Some Robust, Adaptive, and
Partially Adaptive Estimators of Regression Models. Econometric Reviews, 12,
103-124.
Moon, C. (1989). A Monte Carlo Comparison of Semiparametric Tobit Estimators.
Journal of Applied Econometrics, 4, 361-382.
Paarsch, H. (1984). A Monte Carlo Comparison of Estimators for Censored Regression
Models. Journal of Econometrics, 24, 197-213.
Phillips, R. F. (1994). Partially Adaptive Estimation via a Normal Mixture. Journal of
Econometrics, 64, 123-144.
Powell, J. L. (1986). Symmetrically Trimmed Least Squares Estimation for Tobit
Models. Econometrica, 54, 1435-1460.
Steigerwald, D. G. (1992). On the Finite Sample Behavior of Adaptive Estimators.
Journal of Econometrics, 54, 371-400.
Stein, C. (1956). Efficient Nonparametric Testing and Estimation. Proceeding of Third
Berkeley Symposium on Mathematical Statistics and Probability, 1, 187-19 5.
Stone, C. (1984). Adaptive Maximum Likelihood Estimators of a Location Parameter.
Annals of Statistics, 3, 267-284.
Thode, H., Finch, S.J., and Mendell, N.R. (1988). Simulated Percentage Points for the
Null Distribution of the Likelihood Ratio Test for a Mixture of Two Normals.
Biometrics 4, 1195-1201.
Wu, X. and Stengos, T. (2005). Partially Adaptive Estimation via the Maximum Entropy
Densities. Econometrics Journal, 9, 1-15.
27