A PARTIALLY ADAPTIVE ESTIMATOR FOR THE CENSORED REGRESSION MODEL BASED ON A MIXTURE OF NORMAL DISTRIBUTIONS Steven B. Caudill Department of Economics 216 Lowder Business Building Auburn University, AL 36849-5242 Email: [email protected] Fax: (334) 844-4615 March 15, 2007 A PARTIALLY ADAPTIVE ESTIMATOR FOR THE CENSORED REGRESSION MODEL BASED ON A MIXTURE OF NORMAL DISTRIBUTIONS Abstract: The goal of this paper is to introduce a partially adaptive estimator for the censored regression model based on an error structure described by a mixture of two normal distributions. The model we introduce is easily estimated by maximum likelihood using the EM algorithm adapted from the work of Bartolucci and Scaccia (2004). A Monte Carlo study is conducted to examine the small sample properties of this estimator compared to some common alternatives for the estimation of a censored regression model such as the usual tobit model and the CLAD estimator of Powell (1984). Our partially adaptive estimator performed well. The partially adaptive estimator is applied to the Mroz (1987) data on wife’s hours worked. The empirical evidence supports the partially adaptive estimator over the usual tobit model. Keywords; partially adaptive estimator, censored regression model JEL: C240 2 A Partially Adaptive Estimator for the Censored Regression Model Based on a Mixture of Normal Distributions Introduction Estimation of the censored normal regression model, or tobit, has become quite common in the literature. However, the usual tobit model is based on normally distributed errors and if errors are not normally distributed the maximum likelihood estimator is inconsistent. This lack of consistency in the tobit model has led researchers to develop estimators less sensitive to the normality assumption. One solution to the problem is the development of fully adaptive estimators and quasi-maximum likelihood, or partially adaptive, estimators in which the unknown underlying error distribution is estimated along with the parameters. A fully adaptive estimator is an estimator that is used when the underlying distribution is unknown and is as efficient (asymptotically) as an estimator developed with knowledge of the distribution. The idea of an adaptive estimator was developed by Stein (1956), Beran (1974), and Stone (1975) and extended by Bickel (1982) and Manski (1984). Bickel (1982) gives the conditions for which a fully adaptive estimator has the same asymptotic variance that would obtain if one knew the true error distribution. The fully adaptive estimator is usually based on a nonparametric estimate of the unknown distribution, whereas a partially adaptive estimator is based on a parametric approximation to the true unknown error distribution. Partially adaptive estimators may have some advantages over fully adaptive estimators. For example, Bickel (1982) and McDonald and Newey (1988) suggest that a partially adaptive estimator might be more practical when the sample size is small. They also suggest that a partially adaptive estimator with a small number of nuisance parameters may outperform a fully adaptive estimator in small samples. 3 Wu and Stengos (2005) point out other advantages of partially adaptive estimators. Unlike fully adaptive estimators, partially adaptive estimators do not depend critically on bandwidth. In addition, partially adaptive estimation may encounter fewer computational difficulties than fully adaptive estimation. Much of the research on partially adaptive estimators, at least in the area of microeconometrics, has dealt with three issues: 1) applications of partially adaptive estimators to different econometric models, 2) an examination of the use and effects of different flexible parametric error structures on performance, and 3) Monte Carlo evaluation of the performance of partially adaptive estimators, particularly in small samples. In the case of linear regression models, partially adaptive estimators have been developed based on the generalized t-distribution by several authors including McDonald and Newey (1988), Butler, McDonald, Nelson, and White (1992), and McDonald (1993). Partially adaptive estimators of the linear regression model based on a mixture-ofnormals error structure are developed by Phillips (1991), Phillips (1994), and Bartolucci and Scaccia (2004). A partially adaptive regression estimator based on the maximum entropy distribution is developed by Wu and Stengos (2005). For dichotomous choice models, McDonald (1996) develops a partially adaptive estimator based on the generalized t-distribution while Geweke and Keane (1997) use a mixture of normal distributions to approximate the unknown error structure. For the censored regression model, McDonald and Xu (1996) develop a partially adaptive estimator based on the generalized t-distribution. 4 The goal of this paper is to introduce a partially adaptive estimator for the censored regression model based on an error structure described by a location-scale mixture of normal distributions. This estimator is appealing for several reasons. First, the estimator has the normal distribution has a special case and so the usual tobit model is embedded in the formulation. Second, the estimation of the model is much simpler, via the EM algorithm, than many of the other robust estimators of the censored regression model. Third, a mixture of normal distributions is known to be a very flexible form, able to approximate many different error structures (see, for example, Marron and Wand (1992)). The model we introduce is easily estimated by maximum likelihood using an EM algorithm which is presented below. Our EM algorithm combines an EM algorithm for the estimation of a censored normal regression model of Amemiya (1995) with the EM algorithm for the estimation of a regression model with a mixture-of-normals disturbance term of Bartolucci and Scacci (2004). A Monte Carlo study is conducted to example the small sample properties of our estimator compared to some common alternatives for the estimation of a censored regression model. In particular we examine the usual tobit model and the CLAD estimator of Powell (1984) and find that the partially adqptive estimator performs well. Finally, the estimator is applied to the Mroz (1987) data on wife’s hours worked. The empirical evidence supports the partially adaptive estimator over the usual tobit model. 5 An EM Algorithm This section provides the details for maximum likelihood estimation via an expectations maximization, or EM, algorithm for adaptive estimation of a censored regression model using a mixture of normal distributions to approximate the unknown error structure. The EM algorithm has proven to be an extremely useful algorithm for maximum likelihood estimation in a variety of complicated problems. The EM algorithm is modified for use in missing data problems by Dempster, Laird, and Rubin (1985). The algorithm presented here combines the EM algorithm for a censored normal regression model given by Amemiya (1985) with the EM algorithm of Bartolucci and Scaccia (2004) for partially adaptive estimation of a regression model with a mixture-of-normals error structure. Bartolucci and Scaccia develop a partially adaptive estimator for the linear regression model based on a mixture of normals error structure. Their approach is to allow the intercepts to differ between regimes by including dummy variables associated with each regime into the data matrix. In this way, each component of the mixture model has its own intercept and variance, but the slope coefficients are equal across regimes. We extend the approach of Bartolucci and Scaccia to develop a partially adaptive estimator of the censored regression model. Like Bartolucci and Scaccia, we allow the intercepts and variances to differ between regimes but the slope coefficients are held constant. The case of a mixture of two normal distributions is considered and used in all subsequent simulations and estimations but the model is easily extended to a mixture of more than two components. 6 We begin with the usual tobit or censored normal regression model. In the latent variable framework the model is given by yi = X i β + ε i * (1) * yi = max[ yi ,0], where y* is the latent dependent variable, y is the observed dependent variable, Xi is a vector of exogenous variables, β is a vector of parameters to be estimated, and ε is an iid N(0, σ2) error term. Define the dummy variable, Ii, such that Ii = 1 if y*≥0 and Ii = 0 if y=0. For each observation, the likelihood is a mixture of a probability and a density function. We denote this likelihood by g, where gi = I i ( f ( yi − X i β σ ) + (1 − I i )(1 − F ( Xiβ σ )). (2) In the usual normal case, f and F are the density and distribution functions of a standard normal random variable, respectively. The observed or incomplete loglikelihood function for the tobit model is the sum of the logarithms of the terms in (2), or n log LT = ∑ log( gi ). (3) i =1 Maximization of the likelihood function for the tobit is routine, with several algorithms available and several software packages currently in use to accomplish this task. We wish to focus on maximizing the likelihood for the tobit by using the expectations maximization, or EM, algorithm of Dempster, Laird, and Rubin (1985). The EM algorithm is easy to implement in situations where there are “missing” data and the algorithm is guaranteed to achieve at least a local maximum. The tobit model is a classic case of missing data due to the presence of the limit observations. If the true values of the limit observations were known, OLS could be applied. In the expectations or “E” 7 step of the EM algorithm these missing values are estimated by their conditional expectations given parameter values and the data. The maximization or “M” step of the EM algorithm usually involves evaluating the resulting expressions by OLS or WLS to update the likelihood. Following Amemiya, we present an EM algorithm for the estimation of the tobit model. The EM algorithm involves the maximization of the expected value of the loglikelihood function based on the density function of the latent variable. The completedata loglikelihood function is given by n y * − Xiβ * , log LT = ∑ log f i σ i =1 (4) where f represents the standard normal density function and y* is the latent variable. Note that if y* were observed for all observations, maximization of (4) would reduce to OLS. This is not possible because not all values of y* are observed, so the EM algorithm maximizes the expected value of the loglikelihood function which requires replacing yi* by its expected value and variance given the data and parameter values. Following Amemiya (1985), assume the sample is reordered so that the first n1 observations are nonlimit observations and the remaining n-n1 observations are limit observations. If one obtains the FOCs for the maximization of (4) and solves, the following expressions are obtained for updating the parameter values at each iteration y * E ( yi ) β i = ( X ' X ) −1 X ' σ i = n [∑ [ I i ( yi − X i β ) + (1 − I i )( yi − X i β ) + (1 − I i )V ( yi | I i = 0)]], 2 −1 2 2 1 8 * (5) where n is the sample size. Evaluation of the expressions in (5) requires the conditional expected values and conditional variances of these “missing” or limit observations. These expectations are given by E ( yi | I i = 0) = X i β − * σf i1 Fi1 2 σf * − i1 , V ( yi | I i = 0) = σ 2 + X i β Fi1 Fi1 σf i1 (6) where f and F are the density function and distribution function of the standard normal evaluated at –Xiβ/σ, respectively. These values are inserted into the expressions in (5) in the “M” step of the algorithm and the process is repeated until convergence. We wish to combine this EM algorithm with the EM algorithm developed by Bartolucci and Scaccia (2004) to estimate a regression model with an unknown error structure approximated by a mixture-of-normals error structure. In their approach, Bartolucci and Scaccia estimate a mixture of two regressions with normal errors with constraints imposed on the regression parameters. In particular, Bartolucci and Scaccia allow the intercepts and variances to differ between regimes, but the slope coefficients are constrained to be equal. The model developed by Bartolucci and Scaccia is essentially the mixture model developed by Quandt (1988) with cross-regime parameter constraints imposed. Following Quandt (1988), we illustrate the EM algorithm for the case of a mixture of two normal regressions (or switching regressions). The model is given by yi = α1 + X iδ + ε1i with probability θ yi = α 2 + X iδ + ε 2i with probability 1 − θ 9 (regime1) (regime 2) (7) where ε1i and ε2i are mutually independent, iid normally distributed errors with zero means and variances given by σ12 and σ22, respectively. Bartolucci and Scaccia constrain the slope parameters, δ. Let the regression parameter vector be denoted by β=[α1 α2 δ]. Following Bartolucci and Scaccia, let 1 be a column vector of ones of dimension n and let 0 be a column vector of zeros of dimension n and define two data matrices denoted X 1 = [1 0 X ] and X 2 = [0 1 X ], (8) where X is a matrix of data on the independent variables. Let fij represent the density function of a normally distributed random variable with mean Xjβ and standard deviation σj. Then, the incomplete or observed data density function of a typical observation in the BS mixture model is given by hi = θf ( X 1i β ,σ 1 ) + (1 − θ ) f ( X 2i β ,σ 2 ). (9) To write the complete-data likelihood, define the indicator variable dij where di1=1 if the observation is associated with the first regime, 0 otherwise, and di2=1 (in our twocomponent case, really 1-di1=1) if the observation is associated with the second regime, 0 otherwise. In our two-component case, d is a Bernoulli trial with probability θ. Thus, the typical complete-data density function for the BS mixture of normal regressions is given by hi = [θf ( X 1i β ,σ 1 )] i1 [(1 − θ ) f ( X 2i β ,σ 2 )] * d 1− d i 1 . (10) Then, the complete-data loglikelihood function can be written n log L*M = ∑{di1 (ln θ + ln f i1 ) + (1 − d i1 )(ln(1 − θ ) + ln f i 2 )} (11) i =1 In the E step of the EM algorithm, the expected value of the loglikelihood is needed which requires replacing d by its expectation given the data. This expectation is given by 10 E(di1|yi)= P(di1=1|yi), which equals P (d i1 = 1 | yi ) = P (d i1 = 1) P ( yi | d i1 = 1) ∑i =1 P(dij = 1) P( yi | dij = 1) 2 = θf i1 θf i1 + (1 − θ ) f i 2 = wi1 (12) Evaluation of (12) provides estimates of the expected values or weights, wi1 and 1-wi1. Once these weights have been calculated, they can be substituted into the log of the complete-data likelihood which is then maximized in the M step of the EM algorithm with respect to the unknown parameters in the model. To examine the M step of the EM algorithm, return to the log of the complete data likelihood, and substitute for E(di1) to yield n E (log LM ) = ∑{wi1 (ln θ + ln f i1 ) + (1 − wi1 )(ln(1 − θ ) + ln f i 2 )}. * (13) i =1 After substituting for the unobserved regime indicator variable, solving the first order conditions leads to the following expressions for updating the parameter estimates with the EM algorithm β = ( X 1 ' diag ( w1 ) X 1 + X 2 ' diag ( w2 ) X 2 ) −1 ( X 1 ' diag ( w1 ) y + X 2 ' diag ( w2 ) y ) σ 12 = [( y − X 1β )' diag ( w1 )( y − X 1β )] / ∑ w1i σ 2 2 = [( y − X 2 β )' diag ( w2 )( y − X 2 β )] / ∑ w2i π= (14) 1 ∑ w1i , n where w1 and w2 are vectors of the weights given above in (12). Iterations continue until convergence. The two EM algorithms described above can be combined into a single EM algorithm for a censored regression model with an error structure approximated by a mixture of normals. Following Hartley (1978), our adaptive estimator can be considered part of a three equation system containing two (constrained) censored regression models 11 and a choice equation containing only an intercept. Following Caudill (2003), we assume that the error term in the choice equation is independent of each error term in the (constrained) censored regression model. This innocuous assumption greatly facilitates calculation of the two expected values needed for insertion into the EM algorithm developed here. In particular, our new EM algorithm requires the insertion of the weighting matrices associated with the mixture algorithm above into the appropriate places in the EM algorithm for maximum likelihood estimation of a censored regression model. The model we wish to estimate is given by yi = α1 + X iδ + ε1i , yi = max[ yi ,0], with probability θ yi = α 2 + X iδ + ε1i , yi = max[ yi ,0], with probability 1 − θ * * * * (15) where, again, β=[α1 α2 δ]. Then the observed data likelihood for a typical observation is ri = θg ( X 1i β ,σ 1 ) + (1 − θ ) g ( X 2i β ,σ 2 ), (16) The incomplete or observed data loglikelihood function is given by n log LMT = ∑ [θg ( X 1i β , σ 1 ) + (1 − θ ) g ( X 2i β ,σ 2 )], (17) i =1 where g is the likelihood associated with a single tobit model as described above in (2). The complete data likelihood corresponding to (17) contains two latent variables: one corresponding to the regime membership and one corresponding to the limit observations in the tobit model. The complete data loglikelihood is given by n log LMT = ∑{d i1 (ln θ + ln f i1 ( yi ; X i1β ,σ 1 ) + (1 − di1 )(ln(1 − θ ) + ln f i 2 ( yi ; X i 2 β ,σ 2 )}. (18) * * * i =1 where f again refers to the standard normal density function. Maximization of the expected value of this complete data loglikelihood requires the insertion into the FOCs of 12 two kinds of expectations: one for the d values indicating regime membership and another based on moments of the unobserved y values in the censored normal regression model. An EM algorithm for the maximization of the likelihood function involves inserting weighting matrices into the expressions above. In this case there are new weights given by w1i = θg ( X i1β ,σ 1 ) ri and w2i = (1 − θ ) g ( X i 2 β ,σ 2 ) . ri (19) The weights are the posterior probabilities that an observation is associated with regime one or regime two, respectively. In the case of a censored regression model with a mixture-of-normals error structure, two expectations must be inserted into expressions for the first order conditions to maximize the likelihood function: the weights given above in (19) and the conditional expectations of the latent variables in (5) must be inserted into the first order conditions. With these four expectations inserted (two in each regime), expressions for the model parameters at each iteration of this EM algorithm are given by y y ' ( ) + X diag w 2 2 ) * * E ( y | X 1 ) E ( y | X 2 ) β = ( X 1 ' diag ( w1 ) X 1 + X 2 ' diag ( w2 ) X 2 ) −1 ( X 1 ' diag ( w1 ) σ 12 = 1 ∑ w1i σ2 = 1 2 ∑w [∑ w1i ( yi − X i β ) 2 + ∑ w1i ( yi − X i β ) 2 + ∑ w1iV ( y1i | di = 0, X 1 )] 2i * 1 0 0 [∑ w2i ( yi − X i β ) 2 + ∑ w2i ( yi − X i β ) 2 + ∑ w2iV ( y2i | di = 0, X 2 )] * 1 0 0 n π = ∑ w1i / n. i =1 (20) The conditional expectations needed, corresponding to each regime, are given by 13 σ 1 f ((− X 1i β ) / σ 1 ) F ((− X 1i β ) / σ 1 ) σ ((− X 2i β ) / σ 2 ) * E ( y2i | di = 0, X 2i ) = X 2i β − 2 F ((− X 2i β ) / σ 2 ) E ( y1i | di = 0, X 1i ) = X 1i β − * σ f ((− X 1i β ) / σ 1 ) σ 1 f ((− X 1i β ) / σ 1 ) V ( y1i | di = 0, X 1i ) = σ + X 1i β 1 − F ((− X 1i β ) / σ 1 ) F ((− X 1i β ) / σ 1 ) * 2 2 1 (21) 2 σ f ((− X 2i β ) / σ 2 ) σ 2 f ((− X 2i β ) / σ 2 ) V ( y2i | di = 0, X 2i ) = σ 2 + X 2i β 2 − . F ((− X 2i β ) / σ 2 ) F ((− X 2i β ) / σ 2 ) * 2 Iterations of this algorithm continue until convergence. In order to asses the usefulness of the mixture-of-normals approach to estimating model parameters in the censored regression model, a Monte Carlo experiment is conducted. The two goals of the Monte Carlo study are: 1) to provide evidence of the usefulness of the estimation method when the true errors are not normally distributed and, 2) to provide evidence on the small sample properties of the mixture-of-normals approximation. Monte Carlo Simulations Several studies have conducted Monte Carlo experiments to compare small sample properties of various estimators for the censored regression model. We wish to examine the small sample properties of our adaptive estimator and we are also interested in how well several common error structures can be approximated by a mixture of two normal distributions. To examine these issues, we conduct a Monte Carlo study along the lines of Moon (1989) and Paarsch (1984). Following Paarsch, the censored regression model is given by * y i = a + bxi + u i 14 i = 1,..., N (22) where y* is unobserved. The observed variable, y, is given by y i = max{0, a + bxi } . (23) In the Monte Carlo experiment two samples sizes are examined: N=50 and N=200. The error distributions considered are: normal, Cauchy, Laplace, and lognormal. The means of all error terms are zero and the variances are 100. The scale parameter in the Cauchy is 10. All programs and random number generation in this paper are accomplished using the IML language in SAS. In each case the mean of the error distribution is zero and the variance is one hundred. The true value of the parameter of interest, the slope parameter, is 1.0. The intercept is allowed to vary to change the degree of censoring. We consider censoring levels of twenty-five and fifty percent. Each experiment consisted of 200 random trials. For each case we considered three estimators: the usual tobit maximum likelihood estimator (based on the normality assumption), the censored least absolute deviations estimator (CLAD), and the partially adaptive mixture estimator (PAM). Maximum likelihood is used to estimate parameters in the tobit and PAM models, and a grid search is used to obtain the CLAD estimates. The results from estimating these models are given in Tables 1A through 1D. For each simulation we report the mean, median and root mean square error for each estimator. Table 1A gives the results for a sample size of 50 with 25% censoring. Considering the mean, the PAM estimator is closer to the mean for the Cauchy, Laplace, and lognormal, but the tobit is closer if the error distribution is normal. If the median is considered, the CLAD estimator performs best when the distribution is Cauchy, Laplace, or lognormal. The PAM estimator performs best when the errors are normally 15 distributed. If one considers all-important RMSE criterion, the PAM estimator performs best for every distribution except the normal, surpassed in that case by the usual tobit estimator. Table 1B contains the results for a sample size of 50 but with 50% censoring. When comparisons are based on the mean, the PAM estimator performs best when the distribution is Cauchy or lognormal, the usual tobit estimator when the distribution is Laplace and the CLAD estimator when the distribution is normal. When comparisons are made based on the median, the PAM estimator performs best when the distribution is Cauchy or normal. The usual tobit estimator performs best when the distribution is Laplace, and the CLAD estimator performs best when the distribution is lognormal. The results are mixed when the all-important RMSE criterion is examined. The PAM estimator has smaller RMSE when the distribution is lognormal or normal, although in the normal case the improvement over the usual tobit model is very slight. In the case of the Laplace distribution, the usual tobit estimator performs best but when the distribution is Cauchy, the CLAD estimator performs best. The effects of a larger sample size are revealed in tables 1C and 1D. Table 1C contains the simulation results for a sample of size 200 with 25% censoring. When using the mean as a basis for comparison, the usual tobit estimator performs best when the distribution is Cauchy or normal, while the PAM estimator performs best when the distribution is Laplace or lognormal. When comparisons are made using the median, the PAM estimator performs best in every case. When the RMSE is used, the PAM estimator performs best for every distribution but the Laplace, for which the CLAD estimator exhibits a slight improvement. 16 Table 1D contains the simulation results for a sample size of 200 and 50% censoring. When comparing on the basis of the mean, the PAM estimator performs best for all distributions except the Laplace for which the usual tobit estimator holds a very slight advantage. When comparing on the basis of the median, the CLAD estimator performs best for the Cauchy and the normal, the PAM estimator performs best for the lognormal, and the usual tobit estimator performs best for the Laplace. When the RMSEs are compared, the PAM estimator is best in every case, although the usual tobit model is tied for best in the case of the Laplace distribution. The results do suggest that the PAM estimator is a useful estimator under a variety of circumstances. In the sixteen cases presented in Tables 1A to 1D, the PAM estimator has the best performance in terms of RMSE in eleven cases. When the sample size is 200, the PAM estimator performs best according to RMSE in seven of eight cases. Clearly, the PAM estimator has desirable small sample properties that tend to improve with increases in the sample size. The PAM estimator is easy to calculate by comparison to the CLAD estimator, for example. Having established that the partially adaptive estimator can usefully mimic several other distributions, we now discuss the application to the Mroz data on the annual hours worked of married women. Application to the Mroz Data Mroz (1987) investigated the effects of several independent variables on the hours worked of married women. The data set contains observations on 753 married women. Of these 753, 428 worked for a wage outside the home and the remainder, 325, worked zero hours. The independent variables used in the model are nonwife income 17 (nwifeinc), wife’s education (educ), wife’s labor force experience (exper), wife’s labor force experience squared (exper*exper), number of children less than six years of age (kidslt6), and the number of children between the ages of six and eighteen (kidsage6). The dependent variable is the wife’s annual hours of work. The estimation results are contained in Table 2. For purposes of comparison, the usual tobit model is estimated, along with OLS. The results of estimating the model by OLS are presented in column 2 of the table. The coefficients of all variables except nwifeinc and kidsage6 are statistically significant at the usual levels. The model R2 is 0.266 and the estimate of σ is 750.179. The tobit estimation results are given in column 3 of Table 2. The coefficients of all of the explanatory variables except kidsage6 are statistically significant at the usual levels. Compared to the OLS estimation results, the tobit model contains a statistically significant coefficient for nwifeinc. The other coefficients in the two models are not directly comparable. To make them comparable, the tobit estimates must be multiplied by the average probability of a nonlimit observation or 1 n ∑ F. n i =1 (24) These adjustments are made and the results reported in brackets in column 2 of Table 2. Aside from the significance of nwifeinc, the biggest difference in these average marginal effects between OLS and tobit is in the effect of a year of education on hours worked. The OLS results indicate that an additional year of education leads to an increase of about 28 hours worked but the tobit results indicate a year of education increases hours worked by about 47, exceeding the OLS estimate by about sixty-eight percent. Also worthy of 18 note is that the tobit estimate of σ is 1122. Of course, these tobit estimates are based on an underlying assumption of normally distributed errors. In order to relax this assumption, we estimate the PAM model introduced here. The estimation results for the PAM model are contained in column 4 of Table 2. The estimation does find two distinct intercepts associated with two regimes. The value of arbitrarily designated intercept1 is 379.58 and the value of arbitrarily designated intercept2 is 1366.37. The mixing weight is estimated to be .782 for regime 1 and, by implication, 0.218 for regime 2. This evidence lends support to the PAM model and the underlying nonormality of the error terms. More evidence in favor of the PAM model can be found by examining the graphical comparison of error structures for the tobit and the partially adaptive model given in Figure 1. Compared to the tobit density, the PAM density is shifted to the right and has a thicker left tail. The figure suggests an underlying nonnormal error structure, but a statistical test is required to provide definitive evidence of nonnormality. Testing the PAM model against the usual tobit is possible using a modified likelihood ratio test. The likelihood ratio test for testing one versus two components in a mixture does not follow the usual chi-square distribution but is approximately a chisquare with two degrees of freedom. For α = 0.05, the critical value for a chi-square with 2 degrees of freedom is 5.99. However, Thode, Finch, and Mendell (1988) suggest modifying this critical value by the following CV = 6.08 + 4.51 / n , (25) where n is the sample size. We use this critical value to test the ability of this modified chi-square statistic to detect departures from normality in the tobit model. The chi-square 19 statistic for testing the tobit model against the PAM model is 13.60, which exceeds the modified critical value of 6.24. We conclude that the statistical evidence favors the PAM model over the usual tobit. As far as statistical significance is concerned, the PAM results mirror, not the tobit results, but the OLS results. The coefficients of all variables except nwifeinc and kidsdage6 are statistically significant at the usual levels. In order to compare the average marginal effects, the PAM coefficients are adjusted according to the result in (17) above. In general, the average marginal effects tend to fall between their OLS and tobit counterparts. The PAM estimates are, in some cases, notably different from the usual tobit results. First, the coefficient of nwifeinc is not statistically significant when the PAM model is estimated. Second, the average marginal effect of an additional year of education (educ) falls from 47.5 in the tobit model to 37.9 when the PAM model is estimated. In conclusion, in the case of the Mroz data, the empirical evidence indicates a departure from normality in the error structure which leads to a substantial change in some of the estimated marginal effects. Conclusions This paper introduces a partially adaptive estimator for the censored regression model and presents an EM algorithm for estimation of the model. A Monte Carlo experiment verifies that the estimator is useful in small samples and robust to underlying distributional assumptions. The partially adaptive estimator has several virtues: 1) the normal distribution and usual tobit model are a special case, 2) the estimation of the model is much simpler, using the EM algorithm, than many of the other robust estimators 20 of the censored regression model, and 3) a mixture of two normal distributions is known to be a very flexible form, able to approximate many different error structures. The partially adaptive model is applied to the Mroz (1987) data on the yearly hours worked of married women. A statistical test rejects the normality assumption underlying the usual tobit model in favor of an error structure described by the partially adaptive estimator. The partially adaptive estimation results differ from the usual tobit estimation results in two important respects. The tobit model indicates a significant effect of nonwife income on hours worked but the effect in not statistically significant in the partially adaptive model. Also, the usual tobit model estimates the effect of an additional year of education on hours worked to be twenty-five percent higher than with the usual tobit model. The application to the Mroz data demonstrates that the restrictive normality assumption in the usual tobit can have substantial consequences for parameter estimates and marginal effects. The restrictive normality assumption is easily relaxed using the partially adaptive estimator presented here. 21 Table 1A DISTRIBUTION Cauchy Tobit LAD PAM Laplace Tobit LAD PAM Lognormal Tobit LAD PAM Normal Tobit LAD PAM DISTRIBUTION Cauchy Tobit LAD PAM Laplace Tobit LAD PAM Lognormal Tobit LAD PAM Normal Tobit LAD PAM Comparison of estimators in 200 random trials Sample size=50, 25% censoring Mean Median RMSE 4.877 1.167 1.018 1.371 1.040 0.961 16.677 0.793 0.686 1.062 1.090 1.031 1.054 1.010 1.030 0.265 0.397 0.274 1.288 1.049 1.032 1.277 1.001 1.024 0.472 0.241 0.190 1.002 1.149 1.030 0.996 1.068 1.001 0.251 0.543 0.301 Table 1B Comparison of estimators in 200 random trials Sample size=50, 25% censoring Mean Median RMSE 16.412 2.250 1.708 2.248 1.277 1.223 64.888 2.404 3.545 0.992 1.491 1.036 0.999 1.216 1.033 0.279 1.239 0.340 1.212 1.364 0.994 1.135 1.014 0.905 0.498 1.178 0.333 1.021 1.006 0.993 1.018 1.009 0.994 0.069 0.099 0.068 22 DISTRIBUTION Cauchy Tobit LAD PAM Laplace Tobit LAD PAM Lognormal Tobit LAD PAM Normal Tobit LAD PAM DISTRIBUTION Cauchy Tobit LAD PAM Laplace Tobit LAD PAM Lognormal Tobit LAD PAM Normal Tobit LAD PAM Table 1C Comparison of estimators in 200 random trials Sample size=200, 25% censoring Mean Median RMSE 9.981 1.045 1.067 2.397 1.060 1.030 32.693 0.210 0.229 1.039 1.018 1.015 1.036 1.018 1.015 0.126 0.131 0.113 1.269 1.017 1.009 1.265 1.102 1.008 0.315 0.099 0.070 0.999 0.999 0.997 0.997 0.995 0.998 0.014 0.018 0.014 Table 1D Comparison of estimators in 200 random trials Sample size=200, 50% censoring Mean Median RMSE 6.114 1.577 1.196 3.794 1.156 1.169 9.386 1.371 0.372 1.012 1.104 0.987 1.011 1.034 0.975 0.151 0.504 0.151 1.356 1.036 0.990 1.270 0.973 0.974 0.510 0.290 0.146 1.024 1.004 1.001 1.019 0.999 0.997 0.042 0.039 0.034 23 Table 2 Estimation Results for the Mroz Data VARIABLE NWIFEINC OLS -3.447 (1.35)1 EDUC 28.761 (2.22) EXPER 65.673 (6.59) EXPER*EXPER -0.700 (2.16) AGE -30.512 (6.99) KIDSLT6 -422.090 (7.51) KIDSAGE6 -32.779 (1.41) Intercept1 Intercept2 1330.484 (4.91) ------- Tobit -8.814 (1.98) [-5.191] 80.646 (3.74) [47.500] 131.564 (7.61) [77.491] -1.864 (3.47) [-1.098] -54.405 (7.33) [-32.045] -894.022 (7.99) [-526.579] -16.218 (0.42) [9.552] 965.305 (2.16) ------- σ1 σ2 π 750.179 ------------- 1122.022 ------------- R2 Loglikelihood value 0.266 ------- -------3819.09 1 Numbers in parentheses are absolute values of t-ratios. 24 PAM -4.487 (1.15) [-2.759] 61.753 (3.19) [37.978] 136.115 (8.85) [83.711] -2.100 (4.31) [-1.292] -44.499 (6.14) [-27.367] -875.898 (8.26) [-538.677] 1.091 (0.03) [0.671] 379.581 (0.89) 1366.369 (3.27) 1281.484 471.170 0.782 (8.41) -------3812.29 Figure 1 Densities of Tobit and Partially Adaptive Estimator Evaluated at Sample Means f(x) NO RM 0. 00038 0. 00036 partially adaptive estimator 0. 00034 0. 00032 tobit 0. 00030 0. 00028 0. 00026 0. 00024 0. 00022 0. 00020 0. 00018 0. 00016 0. 00014 0. 00012 0. 00010 0. 00008 0. 00006 0. 00004 0. 00002 0. 00000 - 3000 - 2000 - 1000 0 xI 25 1000 2000 3000 References Amemiya, T. (1995). Advanced Econometrics. Harvard University Press: Cambridge. Bartolucci, F, and Scaccia, L (2004). The Use of Mixtures for Dealing with Non-normal Regression Errors. submitted to Computational Statistics and Data Analysis. Beran, R. (1974). Asymptotically Efficient Adaptive Rank Estimates in Location Models. Annals of Statistics, 2, 63-74. Bickel, P. J. (1982). On Adaptive Estimation. Annals of Statistics, 10, 647-671. Boyer, B.H., McDonald, J.B., and Newey, W.K. (2003). A Comparison of Partially Adaptive and Reweighted Least Squares Estimation. Econometric Reviews, 22, 115-134. Butler, R.J., McDonald, J.B., Nelson, R.D., and White, S.B. (1990). Robust and Partially Adaptive Estimation of Regression Models. Review of Economics and Statistics, 2, 321-327. Caudill, S.B. (2003). Estimating A Mixture of Stochastic Frontier Regression Models Via the EM Algorithm: A Multiproduct Cost Function Application. Empirical Economics, 28, 581- 598. Dempster, A. P., N. M. Laird, and D. B. Rubin. (1977). Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39, 1977, 1-38. Geweke, J., and M. Keane (1997). Mixture of Normals Probit. Federal Reserve Bank of Minneapolis, Research Staff Report 237, August 1997. Hartley, M. (1978). Comment (on “Estimating Mixtures of Normal Distributions and Switching Regressions,” by Quandt and Ramsey). Journal of the American Statistical Association 73, 738-741. Li, Q. and Stengos, T. (1994). Adaptive Estimation in the Panel Data Error Component Model with Heteroskedasticity of Unknown Form,” International Economic Review, 35, 981-1000. Manski, C. F. (1984). Adaptive Estimation of Non-linear Regression Models. Econometric Reviews, 3, 145-194. Marron, J.S. and Wand, M.P. (1992). Exact Mean Intergrated Squared Error. Annals of Statistics, 20, 712-736. 26 McDonald, J. (1996). An Application and Comparison of Some Flexible Parametric and Semi-parametric Qualitative Response Models. Economics Letters, 53, 145-152. McDonald, J.B. and Xu, Y.J. (1996). A Comparison of Semi-parametric and Partially Adaptive Estimators of the Censored Regression Model with Possibly Skewed and Leptokurtic Error Distributions. Economics Letters, 51, 153-159. McDonald, J.B. and Moffitt, R. (1980). The Uses of Tobit Analysis. The Review of Economics and Statistics, 62 (2), 318-321. McDonald, J. B. and Newey, W.K. (1988). Partially Adaptive Estimation of Regression Models via the Generalized T Distribution. Econometric Theory, 4, 428-457. McDonald, J.B. and White, S.B. (1993). A Comparison of Some Robust, Adaptive, and Partially Adaptive Estimators of Regression Models. Econometric Reviews, 12, 103-124. Moon, C. (1989). A Monte Carlo Comparison of Semiparametric Tobit Estimators. Journal of Applied Econometrics, 4, 361-382. Paarsch, H. (1984). A Monte Carlo Comparison of Estimators for Censored Regression Models. Journal of Econometrics, 24, 197-213. Phillips, R. F. (1994). Partially Adaptive Estimation via a Normal Mixture. Journal of Econometrics, 64, 123-144. Powell, J. L. (1986). Symmetrically Trimmed Least Squares Estimation for Tobit Models. Econometrica, 54, 1435-1460. Steigerwald, D. G. (1992). On the Finite Sample Behavior of Adaptive Estimators. Journal of Econometrics, 54, 371-400. Stein, C. (1956). Efficient Nonparametric Testing and Estimation. Proceeding of Third Berkeley Symposium on Mathematical Statistics and Probability, 1, 187-19 5. Stone, C. (1984). Adaptive Maximum Likelihood Estimators of a Location Parameter. Annals of Statistics, 3, 267-284. Thode, H., Finch, S.J., and Mendell, N.R. (1988). Simulated Percentage Points for the Null Distribution of the Likelihood Ratio Test for a Mixture of Two Normals. Biometrics 4, 1195-1201. Wu, X. and Stengos, T. (2005). Partially Adaptive Estimation via the Maximum Entropy Densities. Econometrics Journal, 9, 1-15. 27
© Copyright 2026 Paperzz