The Stata Journal (yyyy) vv, Number ii, pp. 1–22 BETAMIX: A command for fitting mixture regression models for bounded dependent variables using the beta distribution Laura A. Gray School of Health and Related Research Health Economics and Decision Science University of Sheffield Sheffield, UK [email protected] Mónica Hernández-Alava School of Health and Related Research Health Economics and Decision Science University of Sheffield Sheffield, UK [email protected] Abstract. This article describes the betamix Stata command for fitting mixture regression models for dependent variables that are bounded in an interval. The most general model is a mixture of the trinomial distribution and a mixture of beta regressions. The model is a generalisation of the truncated inflated beta regression model introduced in Pereira et al. (2012) for variables with truncated supports either at the top or the bottom of the distribution. The command accepts dependent variables defined in any range which are then transformed to the interval (0, 1) before estimation. The command and post estimation options are presented and explained through examples. Keywords: st0001, betamix, truncated inflated beta mixture, beta regression, mixture model, cross sectional data 1 Introduction Continuous response variables which are bounded at both ends arise in many areas. Dependent variables measuring proportions, ratios and rates are common in the empirical literature. They are often limited to the open unit interval (0, 1) but in many cases values in both boundaries are not only possible but appear with high frequency. Applications where the variables are bounded in alternative intervals linearly transform the dependent variable to the (0, 1) interval. A few examples include modelling the rates of employee participation in pension plans (Papke and Wooldridge 1996), the percentage of women on Municipal Councils or Executive Committees (Paola et al. 2010), an index measuring Central Bank independence (Berggren et al. 2014), the proportion of a firm’s total capital accounted for by its long-term debt (Cook et al. 2008), Quality Adjusted Life Years (Basu and Manca 2012) and the score of reading accuracy (Smithson and Verkuilen 2006). Modelling variables bounded at both ends presents several problems. The usual linear regression model is not appropriate for bounded dependent variables since the predictions of the model can be outside the boundary limits. A solution often used is to transform the dependent variable so that it takes values in (−∞, +∞) and then c yyyy StataCorp LP st0001 2 Betamix use standard regression models on the transformed dependent variable. However, this approach has an important limitation as it ignores that the moments of the distribution of a bounded variable are related: as the mean response moves towards a boundary value, the variance and skewness of the variable will tend to decrease and increase respectively. It has been shown that the standard methods to deal with these issues are not appropriate for these type of variables and models based on the beta distribution have been put forward as an alternative (Paolino 2001; Kieschnick and McCullough 2003; Ferrari and Cribari-Neto 2004; Smithson and Verkuilen 2006). The standard beta regression model (Paolino 2001; Ferrari and Cribari-Neto 2004; Smithson and Verkuilen 2006) assumes that the dependent variable is continuous in the open unit interval (0, 1). This model has been generalised to allow for values at either or both boundaries by adding a degenerate distribution with probability masses at the boundary values (Cook et al. 2008; Ospina and Ferrari 2010; Basu and Manca 2012; Ospina and Ferrari 2012). Pereira et al. (2012, 2013) extend the framework to be able to model variables such as the ratio of the unemployment benefit to the maximum benefit where the proportion can take the value of zero (if the person is not eligible), any real number in the interval (τ, 1) where τ is the minimum benefit and is also very likely to have positive probabilities at the values τ and at 1. They termed this model the truncated inflated beta distribution. The model is a mixture of the beta distribution in the interval (τ, 1) and the trinomial distribution with probability masses at 0, τ and 1. In Gray et al. (2016) we extend this framework to allow for mixtures of C-components of beta regressions so that multimodality in the continuous part of the model can be addressed. The model presents an alternative to the aldvmm command discussed in Hernández Alava and Wailoo (2015) for modelling preference based measures in health economics. This article presents the Stata command betamix which is a general command that can be used to estimate mixture regression models for dependent variables that are bounded in an interval and can have truncated supports either at the top or at the bottom of the distribution. It extends one of the parameterizations of the user-written commands betafit and zoib and the Stata command betareg in several directions 1 . First, it generalizes them to mixtures of beta distributions allowing the model to capture multimodality. Second, it allows the user to model response variables which have a gap between one of the boundaries and the continuous part of the distribution. Third, it can deal with positive probabilities at either or both boundaries and at the truncation point. Fourth, there is no need to transform response variables defined in intervals other than (0, 1) as the command will take care of transforming the dependent variable using the supplied options. The article is organised as follows: section 2 gives a brief overview of the model, section 3 describes the betamix syntax and options including the syntax for predict, section 4 illustrates the syntax of the command and the interpretation of the model using a fictional dataset. Finally, section 5 concludes. 1. All three user-written commands betamix, betafit and zoib use a logit and a log link for the conditional mean and the conditional scale, respectively. The Stata command betareg allows for a number of different links for both. L. A. Gray and M. Hernández-Alava 2 3 A general beta mixture regression model There are two possible parameterizations of the beta distribution bounded in the interval (0, 1). The most common one uses two shape parameters (Johnson et al. 1995). The alternative parameterization presented in Ferrari and Cribari-Neto (2004) which defines the model in terms of its mean µ and a precision parameter φ is more useful for estimating regression models. In this parameterization, the mean and the variance of y are given by E (y) = µ, and var (y) = a<µ<b (µ − a) (b − µ) , 1+φ φ>0 The variance of y is a function of the mean of y, µ, and it decreases as the precision parameter φ increases. The density of the variable y can then be written as b−µ µ−a φ−1 φ−1 (b − y)( b−a ) Γ(φ)(y − a)( b−a ) i h i , f (y; µ, φ, a, b) = h φ−1 b−µ Γ µ−a b−a φ Γ b−a φ (b − a) y ∈ (a, b) (1) where Γ (.) is the gamma function. Equation (1) corresponds to the probability density of the beta distribution of the linearly transformed variable yT = (y − a) , (b − a) 0 < yT < 1 (2) with mean (µ − a) / (b − a) and precision parameter φ defined in the interval (0, 1). Beta distributions are very convenient in modelling as they are able to display a variety of shapes depending on the values of its two parameters µ and φ. They are symmetric if µ = (b − a) /2 and asymmetric for any other value of µ. They can also be bell-, J- and U-shaped. Figure 1 plots a number of beta probability densities for alternative combinations of µ and φ for a variable defined in the (0, 1) interval. At a value of µ = 0.5 and small values of φ, the beta distribution is U-shaped; if µ = 0.5 and φ = 2 the distribution becomes the uniform distribution; and as φ increases, the variance decreases and the distribution becomes more concentrated around its mean. Given a sample y1 , y2 , . . . , yn of independent random variables each following the probability density in equation (1), the beta regression model can be obtained by assuming that a function v (.) of the mean of yi can be written as a linear combination of a set of covariates in the vector zi v µi − a b−a = zi0 β (3) where β is a vector of parameters and v (.) is a strictly monotonic and twice differentiable link function which maps the open interval (0, 1) into R. A number of link functions 4 Betamix density 5 µ = 0.25, φ = 2 µ = 0.75, φ = 2 2 density 4 10 6 Figure 1: Probability density of the beta distribution for alternative combinations of µ and φ µ = 0.5, φ = 1 0 0 µ = 0.5, φ = 2 .4 .6 .8 1 0 .2 .4 .6 .8 1 20 .2 20 0 µ = 0.05, φ = 75 15 µ=0.95, φ =20 µ = 0.15, φ = 75 density 10 density 10 15 µ = 0.05, φ = 20 µ = 0.15, φ = 20 µ=0.85, φ =75 µ = 0.25, φ = 75 0 0 5 µ=0.75, φ =20 µ=0.5, φ =20 µ=0.75, φ =75 µ=0.5, φ =75 µ=0.85, φ =20 µ = 0.25, φ = 20 5 µ=0.95, φ =75 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 can be used but the logit link is commonly found in applications as the coefficients of the regression can be interpreted as log-odds. Using the logit link the mean of yi can be written as µi (zi ; β) = a + (b − a) exp (zi0 β) 1 + exp (zi0 β) (4) Pereira et al. (2012, 2013) present a more general beta regression model for variables defined at zero and in the interval [τ, 1]. They termed this model the truncated inflated beta regression model. It is a mixture model of a multinomial distribution with probability masses at 0, τ and 1, and a beta distribution defined in the open interval (τ, 1). In Gray et al. (2016) we extend the framework to the case where the second part can be a mixture of C−components of beta distributions. Mixtures of beta distributions can display a number of distributional shapes. Figure 2 shows two examples. The left panel plots a 50:50 mixture of two beta distributions both with the same relatively high precision parameter φ = 50 but with very different means, µ = 0.05 and µ = 0.80. This mixture displays the usual bimodal shape. The right panel of Figure 2 also plots a 50:50 mixture but the means of the two components are closer together (µ = 0.60 L. A. Gray and M. Hernández-Alava 5 and µ = 0.75) and the precision parameters are very different (φ = 10 and φ = 100 respectively). This mixture density is asymmetric with a bump on the left tail. Both densities show characteristics which cannot be captured with a single beta distribution. Figure 2: Mixture densities of beta distributions ( µ1 = 0.60; µ2 = 0.75; φ1 = 10; φ2=100 ) density 0 0 2 2 density 4 4 6 6 50:50 mixture of beta densities ( µ1 = 0.05; µ2 = 0.80; φ1 = φ2=50 ) 8 50:50 mixture of beta densities 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 Let us assume that the response variable yi is defined at the point a and in the interval [τ, b] with a < τ < b. The density of yi conditional on three, possibly different, column vectors of covariates xi1 , xi2 , xi3 can be written as P (yi P (yi P g (yi |xi1 , xi2 , xi3 ) = " (yi 1− = a|xi3 ) = b|xi3 ) = τ |xi3 ) # P P (yi = s|xi3 ) h (yi |xi1 , xi2 ) if yi = a if yi = b if yi = τ (5) if yi ∈ (τ, b) s=a,τ,b The probabilities P (yi |xi3 ) are derived from a multinomial logit model P (yi = k|xi3 ) = exp (x0i3 γk ) P 1+ exp (x0i3 γs ) for k = a, τ, b (6) s=a,p,b where xi3 is a column vector of variables that affect the probability of a boundary value of the response variable and γk is the vector of corresponding coefficients. One set of coefficients is normalised to zero for identification, in this case, the coefficients corresponding to the probability of a value in the continuous part of the distribution. The probability density function h (.) is a mixture of C−components of beta distributions with means µc i (zi ; βc ) and precision parameters φc , c = 1, ..., C: h (yi |xi1 , xi2 ) = C X c=1 (P (c|xi2 ) f (yi ; xi1 βc , φc , τ, b)) (7) 6 Betamix where f (.) is the beta density defined in (1). A multinomial logit model for the probability of latent class membership is assumed as follows: exp (x0i2 δc ) P (c|xi2 ) = PC 0 j=1 exp (xi2 δj ) (8) where xi2 is vector of variables that affect the probability of component membership, δc is the vector of corresponding coefficients and C is the number of classes used in the analysis. One set of coefficients δc is normalised to zero for identification. If no variables are included, then the probabilities of component membership are constant for all individuals. Using equations (5), (6), (7) and (8), the loglikelihood of the sample y1 , y2 , . . . , yn can be written as X X ln l (γ, β, δ, φ) = ln P (yi = a|xi3 , γ) + ln P (yi = b|xi3 , γ) i:yi =a i:yi =b + X ln P (yi = τ |xi3 , γ) + i:yi ∈p i:yi =τ + X i:yi ∈(τ,b) X ln " C X ln 1 − X P (yi = s|xi3 , γ) s=a,τ,b # (P (c|xi2 ) f (yi ; xi1 βc , φc , τ, b)) (9) c=1 where i = 1, . . . , n. The command betamix described in the section below can estimate models with and without truncation and models where the truncation is either at the bottom or at the top of the interval range. The simplest model it can estimate is a beta regression model using the alternative parameterization described above. This model can already be estimated in Stata using betareg or the user-written command betafit. In addition, the command betamix can estimate a finite mixture model using a beta distribution. If there are boundary values, the command warns the user and adds a small amount of noise (1e−6 ) to the boundary values after the response variable has been transformed to the [0, 1] as in Basu and Manca (2012). This solution is not satisfactory in cases where there are unusually large numbers of observations at the boundary values. In these cases, and provided it makes sense, the user can request that the command adds a second part to the model to add probability masses at any combination of the boundary points and the truncation value (if there is one). The online supplementary material in (Smithson and Verkuilen 2006) discusses alternative methods and recommends checking the sensitivity of the estimated parameters to different procedures. The description of the model above assumes constant precision parameters but the command allows the precision parameters to depend on covariates. These models tend to be more difficult to estimate and require good starting values. A good procedure to follow in this case is to start by estimating a model with constant precision as a stepping stone for the full model (Verkuilen and Smithson 2012). L. A. Gray and M. Hernández-Alava 7 We recommend the reader to familiarise herself/himself with the idiosyncracies of fitting mixture models (McLachlan and Peel 2000) before attempting to estimate one. In particular, it is important to emphasize that mixture models are known to have multiple optima and it is important that searches are carried out to ensure a global solution has been found. Determining the number of components in a mixture is also not straightforward and the analyst must exercise judgement in determining the appropriate number of components. Likelihood ratio tests cannot be used to test models with different number of components because it involves testing at the edge of the parameter space. The Bayesian Information Criterion (BIC) has been proposed as a useful indicator of the number of appropriate components but other approaches also exist. 3 Command syntax 3.1 betamix Syntax betamix depvar if in weight , muvar(varlist) phivar(varlist) ncomponents(#) probabilities(varlist) pmass(numlist) pmvar(varlist) lbound(#) ubound(#) trun(trun) tbound(#) vce(vcetype) Robust CLuster(varname) level(#) constraints(constraints) search(spec) repeat(#) maximize options Description betamix is a user-written program which fits a generalized beta regression model using the truncated inflated beta model of Pereira et al. (2013) and the mixture beta regression model in Verkuilen and Smithson (2012). The most general model is a mixture model of a) beta mixtures and b) a multinomial logit to inflate the distribution of the dependent variable at 3 points, the bottom limit lbound, the upper limit ubound and at a truncation point tbound. This allows estimation of dependent variables such as credit card payments which are discrete at zero, at the minimum amount (truncation point) and at the full amount and continuous between the minimum amount and the full amount (see Pereira et al. (2013) for more details). The second part of the model which models the continuous part is a C-component mixture of beta regressions. This part of the model uses the alternative parameterization of the beta distribution in terms of the mean and the precision parameter (Paolino 2001; Ferrari and Cribari-Neto 2004; Smithson and Verkuilen 2006). The mean and the precision of the beta regressions and the mixing probabilities may be functions of covariates. The default model allows the precision of the components to be different but can be constrained to be the same. The model allows the user to supply the lower and upper bounds for the dependent variable and thus can be used for dependent variables bounded in any (a, b) interval without the need for manual transformation before estimation of the model or afterwards to calculate predictions. 8 Betamix Options muvar(varlist) specifies a set of variables to be included in the mean of the beta regression mixtures. The default is a constant mean. phivar(varlist) specifies a set of variables to be included in the precision of the beta regression mixtures. The default is a constant precision parameter. ncomponents(#) specifies the number of mixture components. The default is a beta regression, that is, a single component. probabilities(varlist) specifies a set of variables used to model the probability of component membership. The probabilities are specified using a multinomial logit parameterization. The default is constant probabilities. pmass(numlist) specifies a list of exactly 3 number indicators (top inside bottom) showing the presence and position of the probability masses. For example, (1 0 0) specifies a probability mass at b the top limit of the dependent variable only; (0 0 0) specifies no probability masses, that is, a beta mixture regression model; (1 0 1) specifies a model with probability masses at both limits of the dependent variable but no probability mass at the truncation point. The default is no probability masses at any point. Note that pmass requires a list of exactly 3 numbers even if the model has no truncation. pmvar(varlist) specifies the variables used in the inflation part of the model. The model allows for a different set of variables to be used in this part of the model but in most cases it is reasonable for the same set of variables to appear in both, the inflation model and the mixture of beta regressions. lbound(#) user supplied lower limit of the dependent variable. The default is zero. Use this option if the dependent variable is limited in the interval (a,b). In this case the upper bound of the interval also needs to be supplied using the option below. ubound(#) user supplied upper limit of the dependent variable. The default is one. Use this option if the dependent variable is limited in the interval (a,b). In this case the lower bound of the interval also needs to be supplied using the option above. trun(trun) trun may be none, top or bottom. This option determines whether there is truncation in the model and if so whether it is at the bottom or the top end. Use none if no truncation is required; use top if the truncation (gap) is at the top (i.e. the dependent variable is only defined in the interval (lbound , tbound ) and the value ubound ); use bottom if the truncation (gap) is at the bottom (i.e. the dependent variable is only defined at the value lbound and in the interval (tbound , ubound ).The default is none. tbound(#) user supplied truncation value. vce(vcetype) specifies how to estimate the variance-covariance matrix corresponding to the parameter estimates. The supported options are oim, opg, robust or cluster. The current version of the command does not allow bootstrap or jacknife estimators. See [R] vce option. L. A. Gray and M. Hernández-Alava 9 robust synonym for vce(robust) cluster(clustervar ) synonym for vce(cluster(clustvar ) level(#); see [R] estimation options. constraints(numlist); see [R] estimation options. search(spec) specifies whether to use mls initial search algorithm or not. spec may be on or off. repeat(#) specifies the number of random attempts to be made to find a better initial value vector. This option is used in conjunction with search(on). maximize options: difficult, technique(algorithm spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), gtolerance(#), nrtolerance(#), nonrtolerance, from(init specs); see [R] maximize 3.2 predict Syntax if predict { stub* | varlist} if predict type varname in in , outcome(outcome) , scores Description Stata’s standard predict command can be used following betamix to obtain predicted values using the first syntax as well as the equation level scores using the second syntax. Options outcome(outcome) specifies the predictions to be stored. There are two options for outcome y or all. The default, y, stores only the dependent variable prediction in newvar. Use all to, also obtain the predicted conditional means, conditional variances and probabilities for each component in the mixture. These are stored as newvar mu1, newvar mu2,...,newvar phi1, newvar phi2,... and newvar p1, newvar p2,... respectively. If an inflation model is specified, all also stores the predicted probabilities of the multinomial logit part in newvar lb, newvar ub and newvar tb corresponding to the predicted probabilities of the lower bound, upper bound and truncated bound. The probability of an observation belonging to the beta mixture part of the model can be calculated as 1-newvar lb-newvar ub-newvar tb. 4 The betamix command in practice This section illustrates the use of the different options provided by the betamix command using a fictional dataset provided with Stata. It is important to emphasize that 10 Betamix these are just examples designed to illustrate the command and no attempt is made to find, for example, the best configuration of covariates for the models. In any empirical application though it is important to justify and test as far as possible the chosen model. We have data on 2,000 women and are interested in estimating hourly wages as a function of a number of covariates in the dataset. . webuse womenwk . describe Contains data from http://www.stata-press.com/data/r14/womenwk.dta obs: 2,000 vars: 6 3 Mar 2014 07:43 size: 18,000 variable name county age education married children wage storage type byte byte byte byte byte float display format value label %9.0g %8.0g %8.0g %8.0g %8.0g %9.0g variable label county of residence age in years years of schooling 1 if married spouse present # of children under 12 years old hourly wage; missing, if not working Summary statistics for the variables included in the dataset are shown below. . gen double wages=round(wage,0.01) (657 missing values generated) . summ wages educ age married children Variable Obs Mean wages education age married children 1,343 2,000 2,000 2,000 2,000 23.69217 13.084 36.208 .6705 1.6445 Std. Dev. Min Max 6.305397 3.045912 8.28656 .4701492 1.398963 5.88 10 20 0 0 45.81 20 59 1 5 Women not in work have a missing value for wages; there are 1,343 working women in the dataset. Note that hourly wages have been rounded to 2 decimal places to avoid potential floating problems when defining the boundary values of the beta distribution. Figure 3 shows a the distribution of wages among those women who work. The dataset will be used in two different examples in sections 4.1 and 4.2. The first example makes use of the sample of working women and shows how to fit a mixture of beta regressions. The second example will use all the sample and show how to estimate an inflated truncated mixture of beta regressions. 11 0 .02 Density .04 .06 L. A. Gray and M. Hernández-Alava 10 20 30 wages 40 50 Figure 3: Distribution of wages for women in work (n = 1, 343) 4.1 Example 1: a mixture of beta regressions For the purpose of this example we assume that the minimum and maximum hourly wages are 5.88 and 45.81 respectively, coinciding with the minimum and maximum values in the sample. Therefore, in this fictional example there are observations at the boundaries, specifically only one observation at each boundary value. Note however, that the boundary values of the beta distribution should be the theoretical values which will not necessarily coincide with the minimum and maximum in the sample. The model can be estimated by transforming the dependent variable, changing the observations at the boundaries by a small amount and then using betareg as follows: . gen double wages_t =(wages-`a´)/(`b´-`a´) (657 missing values generated) . replace wages_t = wages_t - 1e-6 if wages_t==1 (1 real change made) . replace wages_t = wages_t + 1e-6 if wages_t==0 (1 real change made) . betareg wages_t educ age i.married children Using betamix, we estimate a beta regression using the minimum and maximum wages as the endpoints of the beta distribution. . qui summ wages . local a = r(min) . local b = r(max) . betamix wages, muvar(educ age i.married children) lb(`a´) ub(`b´) Warning. Some observations are on the upper boundary but no probability mass. 12 Betamix A value of 1 is not supported by the beta distribution. -1e-6 will be added to those observations. Warning. Some observations are on the lower boundary but no probability mass. A value of 0 is not supported by the beta distribution. 1e-6 will be added to those observations. initial: log likelihood = output omitted Iteration 4: log likelihood = 1 component Beta Mixture Model Log likelihood = 482.97844 708.05189 Number of obs Wald chi2(4) Prob > chi2 708.05189 Std. Err. z P>|z| = = = 1,343 485.31 0.0000 wages Coef. [95% Conf. Interval] C1_mu education age 1.married children _cons .0964959 .0171608 -.0794301 -.0788015 -1.953179 .0056416 .0021723 .0402595 .0115869 .1054264 17.10 7.90 -1.97 -6.80 -18.53 0.000 0.000 0.049 0.000 0.000 .0854385 .0129031 -.1583373 -.1015114 -2.159811 .1075533 .0214185 -.000523 -.0560916 -1.746548 C1_lnphi _cons 2.341453 .0369527 63.36 0.000 2.269027 2.413879 C1_phi 10.39633 .3841727 9.66999 11.17724 . matrix param = e(b) . est store beta1lc The command issues warnings that the dependent variable has values at both boundaries and that it will change the 0 and 1 values (on the transformed variable) by a small amount. All variables appear significant at conventional significance levels. Age and education have a positive effect on the mean wages whereas being married and having children seem to be associated with lower mean wages. The estimated model assumes a constant precision parameter φ. As a log link is used to ensure that the precision parameter is positive, the value of the untransformed parameter φ = 10.40 is also shown at the bottom of the output table. Although the direction of the effect can be found directly from the estimates, to find the magnitude of the effects and be able to interpret the estimates it is helpful to use margins. Examples will be presented after estimation of the final model, here margins is used following estimation to find the predicted hourly wage for the estimation sample. The average predicted hourly wage is $23.72 almost identical to the sample average of $23.70. . margins Predictive margins Model VCE : OIM Expression : Prediction, predict() Margin Delta-method Std. Err. Number of obs z P>|z| = 1,343 [95% Conf. Interval] L. A. Gray and M. Hernández-Alava _cons 23.72476 13 .1569213 151.19 0.000 23.4172 24.03232 In some cases it is plausible that the variance of the distribution depends on some observed covariates through their effect on the precision parameter φ. Models where the precision parameter is a function of covariates are more difficult to estimate and it is always recommended to start by estimating a model with constant variance and use it as a stepping stone. The accompanying do file shows an example of how to specify and estimate a model where φ is a function of a covariate. Both, the model with constant variance presented earlier and the model with covariates in the variance can be estimated in Stata using the built-in command betareg. We now show how the command betamix can be used to estimate a more general model using mixtures of beta regressions. As always when estimating mixture models it is important that searches are carried out to ensure convergence to a global maximum (see Section 2). The accompanying example do file provides examples of searching procedures. Below, the estimated parameters from the beta regression are used as initial parameter values to start the optimisation using the option from in order to estimate a two component beta regression mixture model. . betamix wages, muvar(educ age i.married children) lb(`a´) ub(`b´) /// > ncomponents(2) from(param) Warning. Some observations are on the upper boundary and there is no probability mass A value of 1 is not supported by the beta distribution. -1e-6 will be added to those observations Warning. Some observations are on the lower boundary but no probability mass. A value of 0 is not supported by the beta distribution. 1e-6 will be added to those observations. initial: log likelihood = output omitted 356.55936 Iteration 21: log likelihood = 2 component Beta Mixture Model 801.45402 Log likelihood = Number of obs Wald chi2(8) Prob > chi2 801.45402 Std. Err. z P>|z| = = = 1,343 371.83 0.0000 wages Coef. [95% Conf. Interval] C1_mu education age 1.married children _cons .0862541 .0138412 -.0599449 -.0674781 -1.731981 .0055378 .0020906 .0375649 .0111419 .1019856 15.58 6.62 -1.60 -6.06 -16.98 0.000 0.000 0.111 0.000 0.000 .0754003 .0097437 -.1335708 -.0893158 -1.93187 .0971079 .0179387 .013681 -.0456405 -1.532093 C1_lnphi _cons 2.628627 .059637 44.08 0.000 2.51174 2.745513 C2_mu education age 1.married children .2521795 .0602981 -.5438286 -.2912655 .0760312 .0363617 .7015907 .1528789 3.32 1.66 -0.78 -1.91 0.001 0.097 0.438 0.057 .103161 -.0109695 -1.918921 -.5909027 .4011979 .1315657 .831264 .0083716 14 Betamix _cons -4.893927 1.536007 -3.19 0.001 -7.904445 -1.883409 C2_lnphi _cons .9858081 .2945243 3.35 0.001 .4085511 1.563065 Prob_C1 _cons 3.270903 .4894943 6.68 0.000 2.311512 4.230295 C1_phi C2_phi pi1 pi2 13.85473 2.679977 .963417 .036583 .8262551 .7893182 .0172521 .0172521 12.32637 1.504636 .909826 .0143395 15.57261 4.77343 .9856605 .090174 . est store beta2lc . matrix param2=e(b) The output now gives the parameter estimates for the two components of the model and for the multinomial logit2 which determines component membership. Note that in this model the probability of class membership is constant but it can be allowed to vary across individuals by including variables in the multinomial logit model using the probabilities(varlist) option of the command. As the probability of belonging to each component is constant, the bottom of the output table shows these probabilities (pi1 and pi2) in an interpretable metric along with the precision parameters of both components (C1 phi and C2 phi). Education and age are associated with higher average hourly wages and being married and the number of children under 12 are associated with lower average hourly wages in both components of the mixture just as they were in the beta regression model. Component 1 is a dominant component with a probability (pi1) of 0.96. It is useful to use the predict command after estimation to visualise the two different components of the mixture. . predict yhat2, outcome(all) . summ yhat2* if e(sample)==1 Variable Obs Mean Std. Dev. Min Max yhat2_mu1 yhat2_phi1 yhat2_p1 yhat2_mu2 yhat2_phi2 1,343 1,343 1,343 1,343 1,343 23.6597 13.85473 .963417 24.17528 2.679977 3.128114 1.78e-15 0 8.713329 0 17.46457 13.85473 .963417 8.269212 2.679977 32.99773 13.85473 .963417 44.19021 2.679977 yhat2_p2 yhat2 1,343 1,343 .036583 23.67856 0 3.325939 .036583 17.12817 .036583 33.40719 In the estimation sample, the first component is estimated to have average hourly wages of $23.66 and a precision parameter φ = 13.85 whereas the second component has a slightly higher mean of $24.18 and a more dispersed variance (φ = 2.68). Figure 4 plots the probability density of the mixture at this average together with the individual components. The mixture probability density is very similar to the dominant component 2. In this case, the model reduces to a logit model since there are only two components L. A. Gray and M. Hernández-Alava 15 0 1 y 2 3 but it has heavier tails. 0 .2 .4 .6 Mixture Component 2 .8 1 Component 1 Figure 4: Probability densities of the 2 component beta mixture and each individual component As noted earlier margins can be used to help interpret the model. For example, below margins is used to find the average hourly wage for the groups of women with different number of children. . margins , over(children) Predictive margins Model VCE : OIM Expression : Prediction, predict() over : children Margin children 0 1 2 3 4 5 25.7996 24.54332 23.79396 22.08443 22.04246 21.72337 Delta-method Std. Err. .2541561 .1823528 .1611243 .1873649 .2540052 .360958 Number of obs z 101.51 134.59 147.67 117.87 86.78 60.18 = 1,343 P>|z| [95% Conf. Interval] 0.000 0.000 0.000 0.000 0.000 0.000 25.30146 24.18592 23.47816 21.7172 21.54462 21.0159 26.29773 24.90073 24.10976 22.45166 22.5403 22.43083 . marginsplot Variables that uniquely identify margins: children (file margoverchild.gph saved) The plot of the predictive margins is shown in Figure 5. The group of women with no children under 12 years of age have the highest average hourly wage. Although the 16 Betamix 21 22 Prediction 23 24 25 26 average hourly wage tends to decrease as we move through the groups of women with increasing number of children, the plot flattens out for the last three groups. 0 1 2 3 # of children under 12 years old 4 5 Figure 5: Predictive margins with 95% confidence interval The model with a mixture of two components can be compared with the beta regression model using information criteria, especially BIC has been shown to give a good indication of the number of components in a mixture. The mixture model has the lowest AIC and BIC (see below) indicating that it fits the data better. Figure 6 compares the probability densities of the mixture of two components and the beta regression model with constant variance (calculated at the average). The mixture distribution is more concentrated in the middle section and has longer tails than the single component beta regression model. . est stats _all Akaike´s information criterion and Bayesian information criterion Model Obs ll(null) ll(model) df AIC BIC beta1lc beta2lc 1,343 1,343 . . 708.0519 801.454 6 13 -1404.104 -1576.908 -1372.888 -1509.273 Note: N=Obs used in calculating BIC; see [R] BIC note. Convergence was a problem when attempting to find a 3 component mixture model. Examination of the parameter estimates after several iterations showed that the model was trying to add one extra component with an almost zero probability of component membership and thus the model with 2 components was chosen as the best fitting model. 17 0 1 y 2 3 L. A. Gray and M. Hernández-Alava 0 .2 .4 Mixture .6 .8 1 Beta regression Figure 6: Probability densities of the 2 component beta mixture versus the beta regression 4.2 Example 2: an inflated truncated mixture of beta regressions In this example, we modify the data to illustrate the estimation of the inflated truncated beta regression model. For examples of more complex models in the area of health economics see Gray et al. (2016). We replace the missing values of the women who are not working by zeroes and assume that the minimum wage is $5. Figure 7 shows a histogram of the modified data on wages where almost a third of the sample has a value of zero. We first estimate an inflated truncated beta regression model using a lower bound of 0 (lb(0)), an upper bound of 45.81 (ub(45.81)) and a truncation at the bottom of the distribution with no density between 0 and 5 (tb(5) trun(bottom)). Because there are observations at the lower bound, a logit model for the inflation part of the model is used with the same variables as above (pmvar(educ age married children)). There is inflation at the lower bound but no inflation at the truncation point or at the upper bound (pmass(0 0 1)). Thus, there is one observation at the upper bound as before and the programm will alter it slightly to move it below 1. However, there are no observations in the sample at the assumed theoretical minimum wage of $5 per hour. The syntax to estimate this model is reproduced below: . betamix twage, muvar(educ age i.married children) lb(0) ub(45.81) tb(5) /// > pmass(0 0 1) pmvar(educ age i.married children) trun(bottom) Note that this model is not the same as the Heckman selection model. It is equivalent to a two part model estimated jointly under conditional independence between the two parts of the model. Using the estimated parameters to initialise the algorithm an inflated Betamix 0 .05 .1 Density .15 .2 .25 18 0 10 20 30 Hourly wages 40 50 Figure 7: Distribution of hourly wages truncated beta mixture of two components is also estimated. Both AIC and BIC are lower for the two component mixture model. . est stats infbeta1LC infbeta2LC Akaike´s information criterion and Bayesian information criterion Model Obs ll(null) ll(model) df AIC BIC infbeta1LC infbeta2LC 2,000 2,000 . . -261.0264 -198.8511 11 18 544.0528 433.7021 605.6628 534.5184 Note: N=Obs used in calculating BIC; see [R] BIC note. Results for the inflated truncated beta mixture of two components model are presented below: . betamix twage, muvar(educ age i.married children) lb(0) ub(45.81) tb(5) /// > pmass(0 0 1) pmvar(educ age i.married children) trun(bottom) /// > ncomp(2) from(param4) Warning. Some observations are on the upper boundary but no probability mass. A value of 1 is not supported by the beta distribution. -1e-6 will be added to those observations. Fitting full beta mixture model initial: log likelihood = -647.56401 output omitted Iteration 18: log likelihood = -198.85106 2 component Beta Mixture Model with inflation Log likelihood = -198.85106 Number of obs Wald chi2(12) Prob > chi2 = = = 2,000 666.11 0.0000 L. A. Gray and M. Hernández-Alava Std. Err. 19 twage Coef. z P>|z| [95% Conf. Interval] C1_mu education age 1.married children _cons .0821515 .0130726 -.0644645 -.0648105 -1.607628 .005575 .0020945 .0378997 .0110851 .1030426 14.74 6.24 -1.70 -5.85 -15.60 0.000 0.000 0.089 0.000 0.000 .0712246 .0089674 -.1387465 -.0865368 -1.809588 .0930784 .0171778 .0098175 -.0430842 -1.405668 C1_lnphi _cons 2.71935 .0591438 45.98 0.000 2.603431 2.83527 C2_mu education age 1.married children _cons .2083242 .0527565 -.0564205 -.1520258 -4.454993 .0495018 .0230223 .3966404 .1051361 1.077548 4.21 2.29 -0.14 -1.45 -4.13 0.000 0.022 0.887 0.148 0.000 .1113025 .0076335 -.8338214 -.3580887 -6.566949 .3053459 .0978795 .7209805 .0540372 -2.343038 C2_lnphi _cons 1.451851 .234219 6.20 0.000 .9927904 1.910912 Prob_C1 _cons 2.753496 .4171556 6.60 0.000 1.935886 3.571106 PM_lb education age 1.married children _cons -.0982513 -.0579303 -.7417775 -.7644882 4.159247 .0186522 .007221 .1264705 .0515289 .3320401 -5.27 -8.02 -5.87 -14.84 12.53 0.000 0.000 0.000 0.000 0.000 -.134809 -.0720833 -.9896552 -.865483 3.508461 -.0616936 -.0437773 -.4938998 -.6634935 4.810034 C1_phi C2_phi pi1 pi2 15.17046 4.271014 .9401105 .0598895 .8972382 1.000353 .023487 .023487 13.51001 2.698754 .8738995 .0273554 17.035 6.759251 .9726446 .1261005 The output table now has one more equation, PM lb, corresponding to the inflation part of the model at zero. Women are more likely to be in employment as the years of education, age and the number of children increase and if they are married. The average marginal effects for all covariates in the model can be calculated as follows: . margins, dydx(*) Average marginal effects Number of obs Model VCE : OIM Expression : Prediction, predict() dy/dx w.r.t. : education age 1.married children dy/dx education age 1.married .9790646 .3317772 2.694484 Delta-method Std. Err. .0789597 .0298328 .584154 z 12.40 11.12 4.61 = 2,000 P>|z| [95% Conf. Interval] 0.000 0.000 0.000 .8243066 .273306 1.549564 1.133823 .3902483 3.839405 20 Betamix children 2.598244 .1817673 14.29 0.000 2.241987 2.954502 Note: dy/dx for factor levels is the discrete change from the base level. The average marginal effect of education, age and the number of children are $0.98, $0.33 and $2.60 respectively. Married women earn on average $2.69 per hour more than single women. Joint tests of the parameters can also be performed after estimation. For example we could test the joint significance of the variable married in the means of the components . test [C1_mu]1.married [C2_mu]1.married ( 1) [C1_mu]1.married = 0 ( 2) [C2_mu]1.married = 0 chi2( 2) = 3.25 Prob > chi2 = 0.1973 Thus, the variable married is not jointly significant in the means of the beta mixture components at standard significance levels. 5 Concluding remarks This article described the user-written betamix command for fitting mixture regression models for bounded dependent variables using the beta distribution. The command generalises the Stata command betareg and the user-written commands betafit zoib. The command betamix allows the model to capture multimodality and other distributional shapes. betamix can easily deal with response variables which have a gap between one of the boundary values and the continuous part of the distribution and can model positive probabilities at the boundaries and at the truncation value. There is no need to manually transform variables bounded in the interval (a, b) to the (0, 1) as the command will take care of the transformation. The inflation model is globally concave but mixture models It is important to start estimating less complex models and slowly build them up, otherwise convergence problems are likely to arise. The likelihood functions of mixture models are known to have multiple optima. It is important that thorough searches around the parameter space are carried out to avoid local solutions. 6 Acknowledgements Mónica Hernández Alava acknowledges support by the Medical Research Council under grant MR/L022575/1. The views, and any errors or omissions, expressed in this article are of the authors only. L. A. Gray and M. Hernández-Alava 7 21 References Basu, A., and A. Manca. 2012. Regression Estimators for Generic Health-Related Quality of Life and Quality-Adjusted Life Years. Medical Decision Making 32(1): 56–69. http://mdm.sagepub.com/content/32/1/56.abstract. Berggren, N., S.-O. Daunfeldt, and J. Hellstrm. 2014. Social trust and centralbank independence. European Journal of Political Economy 34: 425 – 439. http://www.sciencedirect.com/science/article/pii/S0176268013000803. Cook, D. O., R. Kieschnick, and B. McCullough. 2008. Regression analysis of proportions in finance with self selection. Journal of Empirical Finance 15(5): 860 – 867. http://www.sciencedirect.com/science/article/pii/S0927539808000145. Ferrari, S., and F. Cribari-Neto. 2004. Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics 31(7): 799–815. http://dx.doi.org/10.1080/0266476042000214501. Gray, L., M. Hernández-Alava, and A. J. Wailoo. 2016. Development of methods for the mapping of utilities using mixture models: An application to asthma. HEDS discussion paper series, ScHARR, University of Sheffield. Hernández Alava, M., and A. Wailoo. 2015. Fitting adjusted limited dependent variable mixture models to EQ-5D. Stata Journal 15(3): 737–750(14). http://www.statajournal.com/article.html?article=st0401. Johnson, N., S. Kotz, and N. Balakrishnan. 1995. Continuous univariate distributions. No. v. 2 in Wiley series in probability and mathematical statistics: Applied probability and statistics, Wiley & Sons. https://books.google.co.uk/books?id=0QzvAAAAMAAJ. Kieschnick, R., and B. D. McCullough. 2003. Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Statistical Modelling 3(3): 193–213. http://smj.sagepub.com/content/3/3/193.abstract. McLachlan, G. J., and D. Peel. 2000. Finite mixture models. New York: Wiley. Ospina, R., and S. Ferrari. 2012. A general class of zero-or-one inflated beta regression models. Computational Statistics and Data Analysis 56(6): 1609–1623. Cited By 35. Ospina, R., and S. L. P. Ferrari. 2010. Inflated beta distributions. Statistical Papers 51(1): 111–126. http://dx.doi.org/10.1007/s00362-008-0125-4. Paola, M. D., V. Scoppa, and R. Lombardo. 2010. Can gender quotas break down negative stereotypes? Evidence from changes in electoral rules. Journal of Public Economics 94(56): 344 – 353. http://www.sciencedirect.com/science/article/pii/S0047272710000101. Paolino, P. 2001. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Analysis 9: 325–346. 22 Betamix Papke, L. E., and J. M. Wooldridge. 1996. Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11(6): 619–632. http://dx.doi.org/10.1002/(SICI)10991255(199611)11:6¡619::AID-JAE418¿3.0.CO;2-1. Pereira, G. H. A., D. A. Botter, and M. C. Sandoval. 2012. The Truncated Inflated Beta Distribution. Communications in Statistics - Theory and Methods 41(5): 907–919. http://dx.doi.org/10.1080/03610926.2010.530370. . 2013. A regression model for special proportions. Statistical Modelling 13(2): 125–151. http://smj.sagepub.com/content/13/2/125.abstract. Smithson, M., and J. Verkuilen. 2006. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods 11: 54– 71. Verkuilen, J., and M. Smithson. 2012. Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and Behavioral Statistics 37: 82–113. About the authors Laura Gray is a Research Associate in Health Econometrics in the Health Economics and Decision Science section in ScHARR, University of Sheffield, UK. Her area of research interest is applied microeconometrics in the areas of health and lifestyle. Mónica Hernández Alava is a Senior Research Fellow in the Health Economics and Decision Science section in ScHARR, University of Sheffield, UK. Her main research interests lie in microeconometrics and recent areas of application include the analysis of health state utility data, disease progression and the dynamics of child development.
© Copyright 2025 Paperzz