BETAMIX: A command for fitting mixture

The Stata Journal (yyyy)
vv, Number ii, pp. 1–22
BETAMIX: A command for fitting mixture
regression models for bounded dependent
variables using the beta distribution
Laura A. Gray
School of Health and Related Research
Health Economics and Decision Science
University of Sheffield
Sheffield, UK
[email protected]
Mónica Hernández-Alava
School of Health and Related Research
Health Economics and Decision Science
University of Sheffield
Sheffield, UK
[email protected]
Abstract. This article describes the betamix Stata command for fitting mixture
regression models for dependent variables that are bounded in an interval. The
most general model is a mixture of the trinomial distribution and a mixture of
beta regressions. The model is a generalisation of the truncated inflated beta
regression model introduced in Pereira et al. (2012) for variables with truncated
supports either at the top or the bottom of the distribution. The command accepts
dependent variables defined in any range which are then transformed to the interval
(0, 1) before estimation. The command and post estimation options are presented
and explained through examples.
Keywords: st0001, betamix, truncated inflated beta mixture, beta regression, mixture model, cross sectional data
1
Introduction
Continuous response variables which are bounded at both ends arise in many areas.
Dependent variables measuring proportions, ratios and rates are common in the empirical literature. They are often limited to the open unit interval (0, 1) but in many cases
values in both boundaries are not only possible but appear with high frequency. Applications where the variables are bounded in alternative intervals linearly transform the
dependent variable to the (0, 1) interval. A few examples include modelling the rates of
employee participation in pension plans (Papke and Wooldridge 1996), the percentage
of women on Municipal Councils or Executive Committees (Paola et al. 2010), an index
measuring Central Bank independence (Berggren et al. 2014), the proportion of a firm’s
total capital accounted for by its long-term debt (Cook et al. 2008), Quality Adjusted
Life Years (Basu and Manca 2012) and the score of reading accuracy (Smithson and
Verkuilen 2006).
Modelling variables bounded at both ends presents several problems. The usual
linear regression model is not appropriate for bounded dependent variables since the
predictions of the model can be outside the boundary limits. A solution often used
is to transform the dependent variable so that it takes values in (−∞, +∞) and then
c yyyy StataCorp LP
st0001
2
Betamix
use standard regression models on the transformed dependent variable. However, this
approach has an important limitation as it ignores that the moments of the distribution
of a bounded variable are related: as the mean response moves towards a boundary
value, the variance and skewness of the variable will tend to decrease and increase
respectively. It has been shown that the standard methods to deal with these issues are
not appropriate for these type of variables and models based on the beta distribution
have been put forward as an alternative (Paolino 2001; Kieschnick and McCullough
2003; Ferrari and Cribari-Neto 2004; Smithson and Verkuilen 2006).
The standard beta regression model (Paolino 2001; Ferrari and Cribari-Neto 2004;
Smithson and Verkuilen 2006) assumes that the dependent variable is continuous in the
open unit interval (0, 1). This model has been generalised to allow for values at either
or both boundaries by adding a degenerate distribution with probability masses at the
boundary values (Cook et al. 2008; Ospina and Ferrari 2010; Basu and Manca 2012;
Ospina and Ferrari 2012). Pereira et al. (2012, 2013) extend the framework to be able to
model variables such as the ratio of the unemployment benefit to the maximum benefit
where the proportion can take the value of zero (if the person is not eligible), any real
number in the interval (τ, 1) where τ is the minimum benefit and is also very likely
to have positive probabilities at the values τ and at 1. They termed this model the
truncated inflated beta distribution. The model is a mixture of the beta distribution in
the interval (τ, 1) and the trinomial distribution with probability masses at 0, τ and 1.
In Gray et al. (2016) we extend this framework to allow for mixtures of C-components
of beta regressions so that multimodality in the continuous part of the model can be
addressed. The model presents an alternative to the aldvmm command discussed in
Hernández Alava and Wailoo (2015) for modelling preference based measures in health
economics.
This article presents the Stata command betamix which is a general command that
can be used to estimate mixture regression models for dependent variables that are
bounded in an interval and can have truncated supports either at the top or at the
bottom of the distribution. It extends one of the parameterizations of the user-written
commands betafit and zoib and the Stata command betareg in several directions
1
. First, it generalizes them to mixtures of beta distributions allowing the model to
capture multimodality. Second, it allows the user to model response variables which
have a gap between one of the boundaries and the continuous part of the distribution.
Third, it can deal with positive probabilities at either or both boundaries and at the
truncation point. Fourth, there is no need to transform response variables defined in
intervals other than (0, 1) as the command will take care of transforming the dependent
variable using the supplied options.
The article is organised as follows: section 2 gives a brief overview of the model,
section 3 describes the betamix syntax and options including the syntax for predict,
section 4 illustrates the syntax of the command and the interpretation of the model
using a fictional dataset. Finally, section 5 concludes.
1. All three user-written commands betamix, betafit and zoib use a logit and a log link for the
conditional mean and the conditional scale, respectively. The Stata command betareg allows for
a number of different links for both.
L. A. Gray and M. Hernández-Alava
2
3
A general beta mixture regression model
There are two possible parameterizations of the beta distribution bounded in the interval (0, 1). The most common one uses two shape parameters (Johnson et al. 1995).
The alternative parameterization presented in Ferrari and Cribari-Neto (2004) which
defines the model in terms of its mean µ and a precision parameter φ is more useful for
estimating regression models. In this parameterization, the mean and the variance of y
are given by
E (y) = µ,
and
var (y) =
a<µ<b
(µ − a) (b − µ)
,
1+φ
φ>0
The variance of y is a function of the mean of y, µ, and it decreases as the precision
parameter φ increases. The density of the variable y can then be written as
b−µ
µ−a
φ−1
φ−1
(b − y)( b−a )
Γ(φ)(y − a)( b−a )
i h
i
,
f (y; µ, φ, a, b) = h
φ−1
b−µ
Γ µ−a
b−a φ Γ
b−a φ (b − a)
y ∈ (a, b)
(1)
where Γ (.) is the gamma function. Equation (1) corresponds to the probability density
of the beta distribution of the linearly transformed variable
yT =
(y − a)
,
(b − a)
0 < yT < 1
(2)
with mean (µ − a) / (b − a) and precision parameter φ defined in the interval (0, 1).
Beta distributions are very convenient in modelling as they are able to display a
variety of shapes depending on the values of its two parameters µ and φ. They are
symmetric if µ = (b − a) /2 and asymmetric for any other value of µ. They can also
be bell-, J- and U-shaped. Figure 1 plots a number of beta probability densities for
alternative combinations of µ and φ for a variable defined in the (0, 1) interval. At a
value of µ = 0.5 and small values of φ, the beta distribution is U-shaped; if µ = 0.5
and φ = 2 the distribution becomes the uniform distribution; and as φ increases, the
variance decreases and the distribution becomes more concentrated around its mean.
Given a sample y1 , y2 , . . . , yn of independent random variables each following the
probability density in equation (1), the beta regression model can be obtained by assuming that a function v (.) of the mean of yi can be written as a linear combination of
a set of covariates in the vector zi
v
µi − a
b−a
= zi0 β
(3)
where β is a vector of parameters and v (.) is a strictly monotonic and twice differentiable
link function which maps the open interval (0, 1) into R. A number of link functions
4
Betamix
density
5
µ = 0.25, φ = 2
µ = 0.75, φ = 2
2
density
4
10
6
Figure 1: Probability density of the beta distribution for alternative combinations of µ
and φ
µ = 0.5, φ = 1
0
0
µ = 0.5, φ = 2
.4
.6
.8
1
0
.2
.4
.6
.8
1
20
.2
20
0
µ = 0.05, φ = 75
15
µ=0.95, φ =20
µ = 0.15, φ = 75
density
10
density
10
15
µ = 0.05, φ = 20
µ = 0.15, φ = 20
µ=0.85, φ =75
µ = 0.25, φ = 75
0
0
5
µ=0.75, φ =20
µ=0.5, φ =20
µ=0.75, φ =75
µ=0.5, φ =75
µ=0.85, φ =20
µ = 0.25, φ = 20
5
µ=0.95, φ =75
0
.2
.4
.6
.8
1
0
.2
.4
.6
.8
1
can be used but the logit link is commonly found in applications as the coefficients of
the regression can be interpreted as log-odds. Using the logit link the mean of yi can
be written as
µi (zi ; β) = a + (b − a)
exp (zi0 β)
1 + exp (zi0 β)
(4)
Pereira et al. (2012, 2013) present a more general beta regression model for variables
defined at zero and in the interval [τ, 1]. They termed this model the truncated inflated
beta regression model. It is a mixture model of a multinomial distribution with probability masses at 0, τ and 1, and a beta distribution defined in the open interval (τ, 1).
In Gray et al. (2016) we extend the framework to the case where the second part can
be a mixture of C−components of beta distributions. Mixtures of beta distributions
can display a number of distributional shapes. Figure 2 shows two examples. The left
panel plots a 50:50 mixture of two beta distributions both with the same relatively high
precision parameter φ = 50 but with very different means, µ = 0.05 and µ = 0.80.
This mixture displays the usual bimodal shape. The right panel of Figure 2 also plots
a 50:50 mixture but the means of the two components are closer together (µ = 0.60
L. A. Gray and M. Hernández-Alava
5
and µ = 0.75) and the precision parameters are very different (φ = 10 and φ = 100
respectively). This mixture density is asymmetric with a bump on the left tail. Both
densities show characteristics which cannot be captured with a single beta distribution.
Figure 2: Mixture densities of beta distributions
( µ1 = 0.60; µ2 = 0.75; φ1 = 10; φ2=100 )
density
0
0
2
2
density
4
4
6
6
50:50 mixture of beta densities
( µ1 = 0.05; µ2 = 0.80; φ1 = φ2=50 )
8
50:50 mixture of beta densities
0
.2
.4
.6
.8
1
0
.2
.4
.6
.8
1
Let us assume that the response variable yi is defined at the point a and in the
interval [τ, b] with a < τ < b. The density of yi conditional on three, possibly different,
column vectors of covariates xi1 , xi2 , xi3 can be written as

P (yi





 P (yi
P
g (yi |xi1 , xi2 , xi3 ) =
" (yi





 1−
= a|xi3 )
= b|xi3 )
= τ |xi3 )
#
P
P (yi = s|xi3 ) h (yi |xi1 , xi2 )
if yi = a
if yi = b
if yi = τ
(5)
if yi ∈ (τ, b)
s=a,τ,b
The probabilities P (yi |xi3 ) are derived from a multinomial logit model
P (yi = k|xi3 ) =
exp (x0i3 γk )
P
1+
exp (x0i3 γs )
for k = a, τ, b
(6)
s=a,p,b
where xi3 is a column vector of variables that affect the probability of a boundary
value of the response variable and γk is the vector of corresponding coefficients. One
set of coefficients is normalised to zero for identification, in this case, the coefficients
corresponding to the probability of a value in the continuous part of the distribution.
The probability density function h (.) is a mixture of C−components of beta distributions with means µc i (zi ; βc ) and precision parameters φc , c = 1, ..., C:
h (yi |xi1 , xi2 ) =
C
X
c=1
(P (c|xi2 ) f (yi ; xi1 βc , φc , τ, b))
(7)
6
Betamix
where f (.) is the beta density defined in (1). A multinomial logit model for the probability of latent class membership is assumed as follows:
exp (x0i2 δc )
P (c|xi2 ) = PC
0
j=1 exp (xi2 δj )
(8)
where xi2 is vector of variables that affect the probability of component membership,
δc is the vector of corresponding coefficients and C is the number of classes used in
the analysis. One set of coefficients δc is normalised to zero for identification. If no
variables are included, then the probabilities of component membership are constant
for all individuals.
Using equations (5), (6), (7) and (8), the loglikelihood of the sample y1 , y2 , . . . , yn
can be written as
X
X
ln l (γ, β, δ, φ) =
ln P (yi = a|xi3 , γ) +
ln P (yi = b|xi3 , γ)
i:yi =a
i:yi =b

+
X
ln P (yi = τ |xi3 , γ) +
i:yi ∈p
i:yi =τ
+
X
i:yi ∈(τ,b)
X
ln
" C
X
ln 1 −

X
P (yi = s|xi3 , γ)
s=a,τ,b
#
(P (c|xi2 ) f (yi ; xi1 βc , φc , τ, b))
(9)
c=1
where i = 1, . . . , n.
The command betamix described in the section below can estimate models with and
without truncation and models where the truncation is either at the bottom or at the
top of the interval range. The simplest model it can estimate is a beta regression model
using the alternative parameterization described above. This model can already be
estimated in Stata using betareg or the user-written command betafit. In addition,
the command betamix can estimate a finite mixture model using a beta distribution.
If there are boundary values, the command warns the user and adds a small amount of
noise (1e−6 ) to the boundary values after the response variable has been transformed
to the [0, 1] as in Basu and Manca (2012). This solution is not satisfactory in cases
where there are unusually large numbers of observations at the boundary values. In
these cases, and provided it makes sense, the user can request that the command adds a
second part to the model to add probability masses at any combination of the boundary
points and the truncation value (if there is one). The online supplementary material in
(Smithson and Verkuilen 2006) discusses alternative methods and recommends checking
the sensitivity of the estimated parameters to different procedures.
The description of the model above assumes constant precision parameters but the
command allows the precision parameters to depend on covariates. These models tend
to be more difficult to estimate and require good starting values. A good procedure to
follow in this case is to start by estimating a model with constant precision as a stepping
stone for the full model (Verkuilen and Smithson 2012).
L. A. Gray and M. Hernández-Alava
7
We recommend the reader to familiarise herself/himself with the idiosyncracies of
fitting mixture models (McLachlan and Peel 2000) before attempting to estimate one.
In particular, it is important to emphasize that mixture models are known to have
multiple optima and it is important that searches are carried out to ensure a global
solution has been found. Determining the number of components in a mixture is also
not straightforward and the analyst must exercise judgement in determining the appropriate number of components. Likelihood ratio tests cannot be used to test models with
different number of components because it involves testing at the edge of the parameter space. The Bayesian Information Criterion (BIC) has been proposed as a useful
indicator of the number of appropriate components but other approaches also exist.
3
Command syntax
3.1
betamix
Syntax
betamix depvar
if
in
weight
,
muvar(varlist) phivar(varlist)
ncomponents(#) probabilities(varlist) pmass(numlist) pmvar(varlist)
lbound(#) ubound(#) trun(trun) tbound(#) vce(vcetype) Robust
CLuster(varname) level(#) constraints(constraints) search(spec)
repeat(#) maximize options
Description
betamix is a user-written program which fits a generalized beta regression model using
the truncated inflated beta model of Pereira et al. (2013) and the mixture beta regression model in Verkuilen and Smithson (2012). The most general model is a mixture
model of a) beta mixtures and b) a multinomial logit to inflate the distribution of the
dependent variable at 3 points, the bottom limit lbound, the upper limit ubound and
at a truncation point tbound. This allows estimation of dependent variables such as
credit card payments which are discrete at zero, at the minimum amount (truncation
point) and at the full amount and continuous between the minimum amount and the
full amount (see Pereira et al. (2013) for more details). The second part of the model
which models the continuous part is a C-component mixture of beta regressions. This
part of the model uses the alternative parameterization of the beta distribution in terms
of the mean and the precision parameter (Paolino 2001; Ferrari and Cribari-Neto 2004;
Smithson and Verkuilen 2006). The mean and the precision of the beta regressions and
the mixing probabilities may be functions of covariates. The default model allows the
precision of the components to be different but can be constrained to be the same. The
model allows the user to supply the lower and upper bounds for the dependent variable
and thus can be used for dependent variables bounded in any (a, b) interval without
the need for manual transformation before estimation of the model or afterwards to
calculate predictions.
8
Betamix
Options
muvar(varlist) specifies a set of variables to be included in the mean of the beta regression mixtures. The default is a constant mean.
phivar(varlist) specifies a set of variables to be included in the precision of the beta
regression mixtures. The default is a constant precision parameter.
ncomponents(#) specifies the number of mixture components. The default is a beta
regression, that is, a single component.
probabilities(varlist) specifies a set of variables used to model the probability of
component membership. The probabilities are specified using a multinomial logit
parameterization. The default is constant probabilities.
pmass(numlist) specifies a list of exactly 3 number indicators (top inside bottom) showing the presence and position of the probability masses. For example, (1 0 0) specifies
a probability mass at b the top limit of the dependent variable only; (0 0 0) specifies
no probability masses, that is, a beta mixture regression model; (1 0 1) specifies
a model with probability masses at both limits of the dependent variable but no
probability mass at the truncation point. The default is no probability masses at
any point. Note that pmass requires a list of exactly 3 numbers even if the model
has no truncation.
pmvar(varlist) specifies the variables used in the inflation part of the model. The model
allows for a different set of variables to be used in this part of the model but in most
cases it is reasonable for the same set of variables to appear in both, the inflation
model and the mixture of beta regressions.
lbound(#) user supplied lower limit of the dependent variable. The default is zero.
Use this option if the dependent variable is limited in the interval (a,b). In this case
the upper bound of the interval also needs to be supplied using the option below.
ubound(#) user supplied upper limit of the dependent variable. The default is one.
Use this option if the dependent variable is limited in the interval (a,b). In this case
the lower bound of the interval also needs to be supplied using the option above.
trun(trun) trun may be none, top or bottom. This option determines whether there
is truncation in the model and if so whether it is at the bottom or the top end. Use
none if no truncation is required; use top if the truncation (gap) is at the top (i.e.
the dependent variable is only defined in the interval (lbound , tbound ) and the value
ubound ); use bottom if the truncation (gap) is at the bottom (i.e. the dependent
variable is only defined at the value lbound and in the interval (tbound , ubound ).The
default is none.
tbound(#) user supplied truncation value.
vce(vcetype) specifies how to estimate the variance-covariance matrix corresponding to
the parameter estimates. The supported options are oim, opg, robust or cluster.
The current version of the command does not allow bootstrap or jacknife estimators. See [R] vce option.
L. A. Gray and M. Hernández-Alava
9
robust synonym for vce(robust)
cluster(clustervar ) synonym for vce(cluster(clustvar )
level(#); see [R] estimation options.
constraints(numlist); see [R] estimation options.
search(spec) specifies whether to use mls initial search algorithm or not. spec may be
on or off.
repeat(#) specifies the number of random attempts to be made to find a better initial
value vector. This option is used in conjunction with search(on).
maximize options: difficult, technique(algorithm spec), iterate(#), [no]log, trace, gradient,
showstep, hessian, showtolerance, tolerance(#), ltolerance(#), gtolerance(#), nrtolerance(#),
nonrtolerance, from(init specs); see [R] maximize
3.2
predict
Syntax
if
predict { stub* | varlist}
if
predict
type
varname
in
in
, outcome(outcome)
, scores
Description
Stata’s standard predict command can be used following betamix to obtain predicted
values using the first syntax as well as the equation level scores using the second syntax.
Options
outcome(outcome) specifies the predictions to be stored. There are two options for
outcome y or all. The default, y, stores only the dependent variable prediction
in newvar. Use all to, also obtain the predicted conditional means, conditional
variances and probabilities for each component in the mixture. These are stored
as newvar mu1, newvar mu2,...,newvar phi1, newvar phi2,... and newvar p1, newvar p2,... respectively. If an inflation model is specified, all also stores the predicted
probabilities of the multinomial logit part in newvar lb, newvar ub and newvar tb
corresponding to the predicted probabilities of the lower bound, upper bound and
truncated bound. The probability of an observation belonging to the beta mixture
part of the model can be calculated as 1-newvar lb-newvar ub-newvar tb.
4
The betamix command in practice
This section illustrates the use of the different options provided by the betamix command using a fictional dataset provided with Stata. It is important to emphasize that
10
Betamix
these are just examples designed to illustrate the command and no attempt is made to
find, for example, the best configuration of covariates for the models. In any empirical
application though it is important to justify and test as far as possible the chosen model.
We have data on 2,000 women and are interested in estimating hourly wages as a
function of a number of covariates in the dataset.
. webuse womenwk
. describe
Contains data from http://www.stata-press.com/data/r14/womenwk.dta
obs:
2,000
vars:
6
3 Mar 2014 07:43
size:
18,000
variable name
county
age
education
married
children
wage
storage
type
byte
byte
byte
byte
byte
float
display
format
value
label
%9.0g
%8.0g
%8.0g
%8.0g
%8.0g
%9.0g
variable label
county of residence
age in years
years of schooling
1 if married spouse present
# of children under 12 years old
hourly wage; missing, if not working
Summary statistics for the variables included in the dataset are shown below.
. gen double wages=round(wage,0.01)
(657 missing values generated)
. summ wages educ age married children
Variable
Obs
Mean
wages
education
age
married
children
1,343
2,000
2,000
2,000
2,000
23.69217
13.084
36.208
.6705
1.6445
Std. Dev.
Min
Max
6.305397
3.045912
8.28656
.4701492
1.398963
5.88
10
20
0
0
45.81
20
59
1
5
Women not in work have a missing value for wages; there are 1,343 working women
in the dataset. Note that hourly wages have been rounded to 2 decimal places to avoid
potential floating problems when defining the boundary values of the beta distribution.
Figure 3 shows a the distribution of wages among those women who work.
The dataset will be used in two different examples in sections 4.1 and 4.2. The first
example makes use of the sample of working women and shows how to fit a mixture of
beta regressions. The second example will use all the sample and show how to estimate
an inflated truncated mixture of beta regressions.
11
0
.02
Density
.04
.06
L. A. Gray and M. Hernández-Alava
10
20
30
wages
40
50
Figure 3: Distribution of wages for women in work (n = 1, 343)
4.1
Example 1: a mixture of beta regressions
For the purpose of this example we assume that the minimum and maximum hourly
wages are 5.88 and 45.81 respectively, coinciding with the minimum and maximum
values in the sample. Therefore, in this fictional example there are observations at the
boundaries, specifically only one observation at each boundary value. Note however,
that the boundary values of the beta distribution should be the theoretical values which
will not necessarily coincide with the minimum and maximum in the sample. The model
can be estimated by transforming the dependent variable, changing the observations at
the boundaries by a small amount and then using betareg as follows:
. gen double wages_t =(wages-`a´)/(`b´-`a´)
(657 missing values generated)
. replace wages_t = wages_t - 1e-6 if wages_t==1
(1 real change made)
. replace wages_t = wages_t + 1e-6 if wages_t==0
(1 real change made)
. betareg wages_t educ age i.married children
Using betamix, we estimate a beta regression using the minimum and maximum
wages as the endpoints of the beta distribution.
. qui summ wages
. local a = r(min)
. local b = r(max)
. betamix wages, muvar(educ age i.married children) lb(`a´) ub(`b´)
Warning. Some observations are on the upper boundary but no probability mass.
12
Betamix
A value of 1 is not supported by the beta distribution.
-1e-6 will be added to those observations.
Warning. Some observations are on the lower boundary but no probability mass.
A value of 0 is not supported by the beta distribution.
1e-6 will be added to those observations.
initial:
log likelihood =
output omitted
Iteration 4:
log likelihood =
1 component Beta Mixture Model
Log likelihood =
482.97844
708.05189
Number of obs
Wald chi2(4)
Prob > chi2
708.05189
Std. Err.
z
P>|z|
=
=
=
1,343
485.31
0.0000
wages
Coef.
[95% Conf. Interval]
C1_mu
education
age
1.married
children
_cons
.0964959
.0171608
-.0794301
-.0788015
-1.953179
.0056416
.0021723
.0402595
.0115869
.1054264
17.10
7.90
-1.97
-6.80
-18.53
0.000
0.000
0.049
0.000
0.000
.0854385
.0129031
-.1583373
-.1015114
-2.159811
.1075533
.0214185
-.000523
-.0560916
-1.746548
C1_lnphi
_cons
2.341453
.0369527
63.36
0.000
2.269027
2.413879
C1_phi
10.39633
.3841727
9.66999
11.17724
. matrix param = e(b)
. est store beta1lc
The command issues warnings that the dependent variable has values at both boundaries and that it will change the 0 and 1 values (on the transformed variable) by a small
amount. All variables appear significant at conventional significance levels. Age and
education have a positive effect on the mean wages whereas being married and having
children seem to be associated with lower mean wages. The estimated model assumes
a constant precision parameter φ. As a log link is used to ensure that the precision parameter is positive, the value of the untransformed parameter φ = 10.40 is also shown
at the bottom of the output table.
Although the direction of the effect can be found directly from the estimates, to find
the magnitude of the effects and be able to interpret the estimates it is helpful to use
margins. Examples will be presented after estimation of the final model, here margins
is used following estimation to find the predicted hourly wage for the estimation sample.
The average predicted hourly wage is $23.72 almost identical to the sample average of
$23.70.
. margins
Predictive margins
Model VCE
: OIM
Expression
: Prediction, predict()
Margin
Delta-method
Std. Err.
Number of obs
z
P>|z|
=
1,343
[95% Conf. Interval]
L. A. Gray and M. Hernández-Alava
_cons
23.72476
13
.1569213
151.19
0.000
23.4172
24.03232
In some cases it is plausible that the variance of the distribution depends on some
observed covariates through their effect on the precision parameter φ. Models where
the precision parameter is a function of covariates are more difficult to estimate and it
is always recommended to start by estimating a model with constant variance and use
it as a stepping stone. The accompanying do file shows an example of how to specify
and estimate a model where φ is a function of a covariate.
Both, the model with constant variance presented earlier and the model with covariates in the variance can be estimated in Stata using the built-in command betareg. We
now show how the command betamix can be used to estimate a more general model
using mixtures of beta regressions. As always when estimating mixture models it is
important that searches are carried out to ensure convergence to a global maximum
(see Section 2). The accompanying example do file provides examples of searching procedures. Below, the estimated parameters from the beta regression are used as initial
parameter values to start the optimisation using the option from in order to estimate a
two component beta regression mixture model.
. betamix wages, muvar(educ age i.married children) lb(`a´) ub(`b´) ///
> ncomponents(2) from(param)
Warning. Some observations are on the upper boundary and there is no probability mass
A value of 1 is not supported by the beta distribution.
-1e-6 will be added to those observations
Warning. Some observations are on the lower boundary but no probability mass.
A value of 0 is not supported by the beta distribution.
1e-6 will be added to those observations.
initial:
log likelihood =
output omitted
356.55936
Iteration 21: log likelihood =
2 component Beta Mixture Model
801.45402
Log likelihood =
Number of obs
Wald chi2(8)
Prob > chi2
801.45402
Std. Err.
z
P>|z|
=
=
=
1,343
371.83
0.0000
wages
Coef.
[95% Conf. Interval]
C1_mu
education
age
1.married
children
_cons
.0862541
.0138412
-.0599449
-.0674781
-1.731981
.0055378
.0020906
.0375649
.0111419
.1019856
15.58
6.62
-1.60
-6.06
-16.98
0.000
0.000
0.111
0.000
0.000
.0754003
.0097437
-.1335708
-.0893158
-1.93187
.0971079
.0179387
.013681
-.0456405
-1.532093
C1_lnphi
_cons
2.628627
.059637
44.08
0.000
2.51174
2.745513
C2_mu
education
age
1.married
children
.2521795
.0602981
-.5438286
-.2912655
.0760312
.0363617
.7015907
.1528789
3.32
1.66
-0.78
-1.91
0.001
0.097
0.438
0.057
.103161
-.0109695
-1.918921
-.5909027
.4011979
.1315657
.831264
.0083716
14
Betamix
_cons
-4.893927
1.536007
-3.19
0.001
-7.904445
-1.883409
C2_lnphi
_cons
.9858081
.2945243
3.35
0.001
.4085511
1.563065
Prob_C1
_cons
3.270903
.4894943
6.68
0.000
2.311512
4.230295
C1_phi
C2_phi
pi1
pi2
13.85473
2.679977
.963417
.036583
.8262551
.7893182
.0172521
.0172521
12.32637
1.504636
.909826
.0143395
15.57261
4.77343
.9856605
.090174
. est store beta2lc
. matrix param2=e(b)
The output now gives the parameter estimates for the two components of the model
and for the multinomial logit2 which determines component membership. Note that
in this model the probability of class membership is constant but it can be allowed to
vary across individuals by including variables in the multinomial logit model using the
probabilities(varlist) option of the command. As the probability of belonging to
each component is constant, the bottom of the output table shows these probabilities
(pi1 and pi2) in an interpretable metric along with the precision parameters of both
components (C1 phi and C2 phi).
Education and age are associated with higher average hourly wages and being married and the number of children under 12 are associated with lower average hourly wages
in both components of the mixture just as they were in the beta regression model. Component 1 is a dominant component with a probability (pi1) of 0.96. It is useful to use
the predict command after estimation to visualise the two different components of the
mixture.
. predict yhat2, outcome(all)
. summ yhat2* if e(sample)==1
Variable
Obs
Mean
Std. Dev.
Min
Max
yhat2_mu1
yhat2_phi1
yhat2_p1
yhat2_mu2
yhat2_phi2
1,343
1,343
1,343
1,343
1,343
23.6597
13.85473
.963417
24.17528
2.679977
3.128114
1.78e-15
0
8.713329
0
17.46457
13.85473
.963417
8.269212
2.679977
32.99773
13.85473
.963417
44.19021
2.679977
yhat2_p2
yhat2
1,343
1,343
.036583
23.67856
0
3.325939
.036583
17.12817
.036583
33.40719
In the estimation sample, the first component is estimated to have average hourly
wages of $23.66 and a precision parameter φ = 13.85 whereas the second component
has a slightly higher mean of $24.18 and a more dispersed variance (φ = 2.68). Figure 4
plots the probability density of the mixture at this average together with the individual
components. The mixture probability density is very similar to the dominant component
2. In this case, the model reduces to a logit model since there are only two components
L. A. Gray and M. Hernández-Alava
15
0
1
y
2
3
but it has heavier tails.
0
.2
.4
.6
Mixture
Component 2
.8
1
Component 1
Figure 4: Probability densities of the 2 component beta mixture and each individual
component
As noted earlier margins can be used to help interpret the model. For example,
below margins is used to find the average hourly wage for the groups of women with
different number of children.
. margins , over(children)
Predictive margins
Model VCE
: OIM
Expression
: Prediction, predict()
over
: children
Margin
children
0
1
2
3
4
5
25.7996
24.54332
23.79396
22.08443
22.04246
21.72337
Delta-method
Std. Err.
.2541561
.1823528
.1611243
.1873649
.2540052
.360958
Number of obs
z
101.51
134.59
147.67
117.87
86.78
60.18
=
1,343
P>|z|
[95% Conf. Interval]
0.000
0.000
0.000
0.000
0.000
0.000
25.30146
24.18592
23.47816
21.7172
21.54462
21.0159
26.29773
24.90073
24.10976
22.45166
22.5403
22.43083
. marginsplot
Variables that uniquely identify margins: children
(file margoverchild.gph saved)
The plot of the predictive margins is shown in Figure 5. The group of women with
no children under 12 years of age have the highest average hourly wage. Although the
16
Betamix
21
22
Prediction
23
24
25
26
average hourly wage tends to decrease as we move through the groups of women with
increasing number of children, the plot flattens out for the last three groups.
0
1
2
3
# of children under 12 years old
4
5
Figure 5: Predictive margins with 95% confidence interval
The model with a mixture of two components can be compared with the beta regression model using information criteria, especially BIC has been shown to give a good
indication of the number of components in a mixture. The mixture model has the lowest
AIC and BIC (see below) indicating that it fits the data better. Figure 6 compares the
probability densities of the mixture of two components and the beta regression model
with constant variance (calculated at the average). The mixture distribution is more
concentrated in the middle section and has longer tails than the single component beta
regression model.
. est stats _all
Akaike´s information criterion and Bayesian information criterion
Model
Obs
ll(null)
ll(model)
df
AIC
BIC
beta1lc
beta2lc
1,343
1,343
.
.
708.0519
801.454
6
13
-1404.104
-1576.908
-1372.888
-1509.273
Note: N=Obs used in calculating BIC; see [R] BIC note.
Convergence was a problem when attempting to find a 3 component mixture model.
Examination of the parameter estimates after several iterations showed that the model
was trying to add one extra component with an almost zero probability of component
membership and thus the model with 2 components was chosen as the best fitting model.
17
0
1
y
2
3
L. A. Gray and M. Hernández-Alava
0
.2
.4
Mixture
.6
.8
1
Beta regression
Figure 6: Probability densities of the 2 component beta mixture versus the beta regression
4.2
Example 2: an inflated truncated mixture of beta regressions
In this example, we modify the data to illustrate the estimation of the inflated truncated
beta regression model. For examples of more complex models in the area of health
economics see Gray et al. (2016). We replace the missing values of the women who are
not working by zeroes and assume that the minimum wage is $5. Figure 7 shows a
histogram of the modified data on wages where almost a third of the sample has a value
of zero.
We first estimate an inflated truncated beta regression model using a lower bound of
0 (lb(0)), an upper bound of 45.81 (ub(45.81)) and a truncation at the bottom of the
distribution with no density between 0 and 5 (tb(5) trun(bottom)). Because there
are observations at the lower bound, a logit model for the inflation part of the model is
used with the same variables as above (pmvar(educ age married children)). There
is inflation at the lower bound but no inflation at the truncation point or at the upper
bound (pmass(0 0 1)). Thus, there is one observation at the upper bound as before
and the programm will alter it slightly to move it below 1. However, there are no
observations in the sample at the assumed theoretical minimum wage of $5 per hour.
The syntax to estimate this model is reproduced below:
. betamix twage, muvar(educ age i.married children) lb(0) ub(45.81) tb(5) ///
> pmass(0 0 1) pmvar(educ age i.married children) trun(bottom)
Note that this model is not the same as the Heckman selection model. It is equivalent
to a two part model estimated jointly under conditional independence between the two
parts of the model. Using the estimated parameters to initialise the algorithm an inflated
Betamix
0
.05
.1
Density
.15
.2
.25
18
0
10
20
30
Hourly wages
40
50
Figure 7: Distribution of hourly wages
truncated beta mixture of two components is also estimated. Both AIC and BIC are
lower for the two component mixture model.
. est stats infbeta1LC infbeta2LC
Akaike´s information criterion and Bayesian information criterion
Model
Obs
ll(null)
ll(model)
df
AIC
BIC
infbeta1LC
infbeta2LC
2,000
2,000
.
.
-261.0264
-198.8511
11
18
544.0528
433.7021
605.6628
534.5184
Note: N=Obs used in calculating BIC; see [R] BIC note.
Results for the inflated truncated beta mixture of two components model are presented below:
. betamix twage, muvar(educ age i.married children) lb(0) ub(45.81) tb(5) ///
> pmass(0 0 1) pmvar(educ age i.married children) trun(bottom) ///
> ncomp(2) from(param4)
Warning. Some observations are on the upper boundary but no probability mass.
A value of 1 is not supported by the beta distribution.
-1e-6 will be added to those observations.
Fitting full beta mixture model
initial:
log likelihood = -647.56401
output omitted
Iteration 18: log likelihood = -198.85106
2 component Beta Mixture Model with inflation
Log likelihood = -198.85106
Number of obs
Wald chi2(12)
Prob > chi2
=
=
=
2,000
666.11
0.0000
L. A. Gray and M. Hernández-Alava
Std. Err.
19
twage
Coef.
z
P>|z|
[95% Conf. Interval]
C1_mu
education
age
1.married
children
_cons
.0821515
.0130726
-.0644645
-.0648105
-1.607628
.005575
.0020945
.0378997
.0110851
.1030426
14.74
6.24
-1.70
-5.85
-15.60
0.000
0.000
0.089
0.000
0.000
.0712246
.0089674
-.1387465
-.0865368
-1.809588
.0930784
.0171778
.0098175
-.0430842
-1.405668
C1_lnphi
_cons
2.71935
.0591438
45.98
0.000
2.603431
2.83527
C2_mu
education
age
1.married
children
_cons
.2083242
.0527565
-.0564205
-.1520258
-4.454993
.0495018
.0230223
.3966404
.1051361
1.077548
4.21
2.29
-0.14
-1.45
-4.13
0.000
0.022
0.887
0.148
0.000
.1113025
.0076335
-.8338214
-.3580887
-6.566949
.3053459
.0978795
.7209805
.0540372
-2.343038
C2_lnphi
_cons
1.451851
.234219
6.20
0.000
.9927904
1.910912
Prob_C1
_cons
2.753496
.4171556
6.60
0.000
1.935886
3.571106
PM_lb
education
age
1.married
children
_cons
-.0982513
-.0579303
-.7417775
-.7644882
4.159247
.0186522
.007221
.1264705
.0515289
.3320401
-5.27
-8.02
-5.87
-14.84
12.53
0.000
0.000
0.000
0.000
0.000
-.134809
-.0720833
-.9896552
-.865483
3.508461
-.0616936
-.0437773
-.4938998
-.6634935
4.810034
C1_phi
C2_phi
pi1
pi2
15.17046
4.271014
.9401105
.0598895
.8972382
1.000353
.023487
.023487
13.51001
2.698754
.8738995
.0273554
17.035
6.759251
.9726446
.1261005
The output table now has one more equation, PM lb, corresponding to the inflation
part of the model at zero. Women are more likely to be in employment as the years
of education, age and the number of children increase and if they are married. The
average marginal effects for all covariates in the model can be calculated as follows:
. margins, dydx(*)
Average marginal effects
Number of obs
Model VCE
: OIM
Expression
: Prediction, predict()
dy/dx w.r.t. : education age 1.married children
dy/dx
education
age
1.married
.9790646
.3317772
2.694484
Delta-method
Std. Err.
.0789597
.0298328
.584154
z
12.40
11.12
4.61
=
2,000
P>|z|
[95% Conf. Interval]
0.000
0.000
0.000
.8243066
.273306
1.549564
1.133823
.3902483
3.839405
20
Betamix
children
2.598244
.1817673
14.29
0.000
2.241987
2.954502
Note: dy/dx for factor levels is the discrete change from the base level.
The average marginal effect of education, age and the number of children are $0.98,
$0.33 and $2.60 respectively. Married women earn on average $2.69 per hour more than
single women. Joint tests of the parameters can also be performed after estimation. For
example we could test the joint significance of the variable married in the means of the
components
. test [C1_mu]1.married [C2_mu]1.married
( 1) [C1_mu]1.married = 0
( 2) [C2_mu]1.married = 0
chi2( 2) =
3.25
Prob > chi2 =
0.1973
Thus, the variable married is not jointly significant in the means of the beta mixture
components at standard significance levels.
5
Concluding remarks
This article described the user-written betamix command for fitting mixture regression
models for bounded dependent variables using the beta distribution. The command
generalises the Stata command betareg and the user-written commands betafit zoib.
The command betamix allows the model to capture multimodality and other distributional shapes. betamix can easily deal with response variables which have a gap between
one of the boundary values and the continuous part of the distribution and can model
positive probabilities at the boundaries and at the truncation value. There is no need
to manually transform variables bounded in the interval (a, b) to the (0, 1) as the command will take care of the transformation. The inflation model is globally concave but
mixture models
It is important to start estimating less complex models and slowly build them up,
otherwise convergence problems are likely to arise. The likelihood functions of mixture
models are known to have multiple optima. It is important that thorough searches
around the parameter space are carried out to avoid local solutions.
6
Acknowledgements
Mónica Hernández Alava acknowledges support by the Medical Research Council under
grant MR/L022575/1. The views, and any errors or omissions, expressed in this article
are of the authors only.
L. A. Gray and M. Hernández-Alava
7
21
References
Basu, A., and A. Manca. 2012. Regression Estimators for Generic Health-Related Quality of Life and Quality-Adjusted Life Years. Medical Decision Making 32(1): 56–69.
http://mdm.sagepub.com/content/32/1/56.abstract.
Berggren, N., S.-O. Daunfeldt, and J. Hellstrm. 2014. Social trust and centralbank independence. European Journal of Political Economy 34: 425 – 439.
http://www.sciencedirect.com/science/article/pii/S0176268013000803.
Cook, D. O., R. Kieschnick, and B. McCullough. 2008. Regression analysis of proportions in finance with self selection. Journal of Empirical Finance 15(5): 860 – 867.
http://www.sciencedirect.com/science/article/pii/S0927539808000145.
Ferrari, S., and F. Cribari-Neto. 2004.
Beta Regression for Modelling
Rates and Proportions.
Journal of Applied Statistics 31(7):
799–815.
http://dx.doi.org/10.1080/0266476042000214501.
Gray, L., M. Hernández-Alava, and A. J. Wailoo. 2016. Development of methods for
the mapping of utilities using mixture models: An application to asthma. HEDS
discussion paper series, ScHARR, University of Sheffield.
Hernández Alava, M., and A. Wailoo. 2015. Fitting adjusted limited dependent variable
mixture models to EQ-5D. Stata Journal 15(3): 737–750(14). http://www.statajournal.com/article.html?article=st0401.
Johnson, N., S. Kotz, and N. Balakrishnan. 1995.
Continuous univariate distributions.
No. v. 2 in Wiley series in probability and mathematical statistics:
Applied probability and statistics, Wiley & Sons.
https://books.google.co.uk/books?id=0QzvAAAAMAAJ.
Kieschnick, R., and B. D. McCullough. 2003. Regression analysis of variates observed on
(0, 1): percentages, proportions and fractions. Statistical Modelling 3(3): 193–213.
http://smj.sagepub.com/content/3/3/193.abstract.
McLachlan, G. J., and D. Peel. 2000. Finite mixture models. New York: Wiley.
Ospina, R., and S. Ferrari. 2012. A general class of zero-or-one inflated beta regression
models. Computational Statistics and Data Analysis 56(6): 1609–1623. Cited By 35.
Ospina, R., and S. L. P. Ferrari. 2010. Inflated beta distributions. Statistical Papers
51(1): 111–126. http://dx.doi.org/10.1007/s00362-008-0125-4.
Paola, M. D., V. Scoppa, and R. Lombardo. 2010.
Can gender quotas break down negative stereotypes?
Evidence from changes in
electoral rules.
Journal of Public Economics 94(56):
344 – 353.
http://www.sciencedirect.com/science/article/pii/S0047272710000101.
Paolino, P. 2001. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Analysis 9: 325–346.
22
Betamix
Papke, L. E., and J. M. Wooldridge. 1996. Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11(6): 619–632. http://dx.doi.org/10.1002/(SICI)10991255(199611)11:6¡619::AID-JAE418¿3.0.CO;2-1.
Pereira, G. H. A., D. A. Botter, and M. C. Sandoval. 2012. The Truncated Inflated Beta
Distribution. Communications in Statistics - Theory and Methods 41(5): 907–919.
http://dx.doi.org/10.1080/03610926.2010.530370.
. 2013. A regression model for special proportions. Statistical Modelling 13(2):
125–151. http://smj.sagepub.com/content/13/2/125.abstract.
Smithson, M., and J. Verkuilen. 2006. A better lemon squeezer? Maximum-likelihood
regression with beta-distributed dependent variables. Psychological Methods 11: 54–
71.
Verkuilen, J., and M. Smithson. 2012. Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and
Behavioral Statistics 37: 82–113.
About the authors
Laura Gray is a Research Associate in Health Econometrics in the Health Economics and
Decision Science section in ScHARR, University of Sheffield, UK. Her area of research interest
is applied microeconometrics in the areas of health and lifestyle.
Mónica Hernández Alava is a Senior Research Fellow in the Health Economics and Decision
Science section in ScHARR, University of Sheffield, UK. Her main research interests lie in
microeconometrics and recent areas of application include the analysis of health state utility
data, disease progression and the dynamics of child development.