Fitting inflated beta distributions under
GAMLSS framework
Raydonal Ospina Martínez
Departamento de Estatística
Universidade Federal de Pernambuco
Fitting inflated beta distributions with GAMLSS
1 / 38
GAMLSS for statistical modelling
Outline
• Introduction to GAMLSS modelling
• Demonstration: zero-inflated beta distributions
• Example with real data: cabler penetration data
• Concluding remarks
• References
Fitting inflated beta distributions with GAMLSS
2 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What is GAMLSS
• Generalized Additive Models for Location, Scale and Shape
(GAMLSS) is a general framework for fitting regression type
models.
Fitting inflated beta distributions with GAMLSS
3 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What is GAMLSS
• Generalized Additive Models for Location, Scale and Shape
(GAMLSS) is a general framework for fitting regression type
models.
• GAMLSS were introduced by Rigby & Stasinopoulos (2001,
2005) and Akantziliotou et al. (2002) as a way of
overcoming some of the limitations associated with
Generalized Linear Models (GLM) and Generalized Additive
Models (GAM) (Nelder & Wedderburn, 1972 and Hastie &
Tibshirani, 1990, respectively).
Fitting inflated beta distributions with GAMLSS
3 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What is GAMLSS
• Generalized Additive Models for Location, Scale and Shape
(GAMLSS) is a general framework for fitting regression type
models.
• GAMLSS were introduced by Rigby & Stasinopoulos (2001,
2005) and Akantziliotou et al. (2002) as a way of
overcoming some of the limitations associated with
Generalized Linear Models (GLM) and Generalized Additive
Models (GAM) (Nelder & Wedderburn, 1972 and Hastie &
Tibshirani, 1990, respectively).
• In GAMLSS the exponential family distribution assumption
for the response variable (y ) is relaxed and replaced by a
general distribution family D.
Fitting inflated beta distributions with GAMLSS
3 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What distributions can be used
• The response variable y ∼ D(y |µ, σ, τ, ν) where D ∈ D can
be any distribution (including highly skew and kurtotic
continuous and discrete distributions) or which exhibit
heterogeneity (eg. where the scale or shape of the
distribution of the response variable changes with
explanatory variables(s)).
Fitting inflated beta distributions with GAMLSS
4 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What distributions can be used
• The response variable y ∼ D(y |µ, σ, τ, ν) where D ∈ D can
be any distribution (including highly skew and kurtotic
continuous and discrete distributions) or which exhibit
heterogeneity (eg. where the scale or shape of the
distribution of the response variable changes with
explanatory variables(s)).
• The first two population distributional parameters µ and σ
are usually characterized as location and scale parameters
Fitting inflated beta distributions with GAMLSS
4 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What distributions can be used
• The response variable y ∼ D(y |µ, σ, τ, ν) where D ∈ D can
be any distribution (including highly skew and kurtotic
continuous and discrete distributions) or which exhibit
heterogeneity (eg. where the scale or shape of the
distribution of the response variable changes with
explanatory variables(s)).
• The first two population distributional parameters µ and σ
are usually characterized as location and scale parameters
• The remaining parameter(s) are characterized as shape
parameters, e.g. skewness and kurtosis parameters.
Fitting inflated beta distributions with GAMLSS
4 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
The original formulation of a GAMLSS model as follows. Let
y > = (y1 ; y2 , . . . , yn ) be the n length vector of the response
variable. Also for k = 1, 2, 3, 4, let gk (·) be known monotonic link
functions relating the distribution parameters to explanatory
variables by
Fitting inflated beta distributions with GAMLSS
5 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
The original formulation of a GAMLSS model as follows. Let
y > = (y1 ; y2 , . . . , yn ) be the n length vector of the response
variable. Also for k = 1, 2, 3, 4, let gk (·) be known monotonic link
functions relating the distribution parameters to explanatory
variables by
g1 (µ) = η1 = X1 β1 +
J1
X
Zj1 γj1
j=1
g2 (σ) = η2 = X2 β2 +
J2
X
Zj2 γj2
j=1
g3 (τ ) = η3 = X3 β3 +
J3
X
Zj3 γj3
j=1
g4 (ν) = η4 = X4 β4 +
J4
X
Zj4 γj4 .
j=1
Fitting inflated beta distributions with GAMLSS
5 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
• Here, (µ, σ, τ, ν) are vectors of length n.
• ηk is a predictor.
• β > = (β1k , β2k , . . . , βJk0 k ) is a parameter vector of length Jk0
(fixed effects parameters).
• Xk is a known design matrix of order n × Jk0 (fixed effects
design matrix).
• Zjk is a fixed known n × qjk design matrix (random effects
design matrix).
Fitting inflated beta distributions with GAMLSS
6 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
• (Random effects parameters) γjk is a qjk dimensional
random variable which is assumed to distributed as
γjk ∼ N (0, Gjk−1 ), where Gjk−1 is the (generalized) inverse of a
qjk × qjk symmetric matrix Gjk = Gjk (λjk ) which may depend
on a vector of hyperparameters λjk .
• We have to restrict the stochastic variables to be normally
distributed since that simplifies some of the equations in the
fitting procedure of the model.
Fitting inflated beta distributions with GAMLSS
7 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
• (Random effects parameters) γjk is a qjk dimensional
random variable which is assumed to distributed as
γjk ∼ N (0, Gjk−1 ), where Gjk−1 is the (generalized) inverse of a
qjk × qjk symmetric matrix Gjk = Gjk (λjk ) which may depend
on a vector of hyperparameters λjk .
• We have to restrict the stochastic variables to be normally
distributed since that simplifies some of the equations in the
fitting procedure of the model.
• This restriction can be lifted in general.
Fitting inflated beta distributions with GAMLSS
7 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Semi-parametric GAMLSS model
Fitting inflated beta distributions with GAMLSS
8 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Semi-parametric GAMLSS model
g1 (µ) = η1 = X1 β1 +
J1
X
hj1 (xj1 )
j=1
g2 (σ) = η2 = X2 β2 +
J2
X
hj2 (xj2 )
j=1
g3 (τ ) = η3 = X3 β3 +
J3
X
hj3 (xj3 )
j=1
g4 (ν) = η4 = X4 β4 +
J4
X
hj4 (xj4 ).
j=1
The function hjk is an unknown function of the explanatory
variable xjk
Fitting inflated beta distributions with GAMLSS
8 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear semi-parametric GAMLSS model
Fitting inflated beta distributions with GAMLSS
9 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear semi-parametric GAMLSS model
g1 (µ) = η1 = h1 (X1 , β1 ) +
J1
X
hj1 (xj1 )
j=1
g2 (σ) = η2 = h2 (X2 , β2 ) +
J2
X
hj2 (xj2 )
j=1
g3 (τ ) = η3 = h3 (X3 , β3 ) +
J3
X
hj3 (xj3 )
j=1
g4 (ν) = η4 = h4 (X4 , β4 ) +
J4
X
hj4 (xj4 ).
j=1
The function hk is an know function
Fitting inflated beta distributions with GAMLSS
9 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear parametric GAMLSS model
Fitting inflated beta distributions with GAMLSS
10 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear parametric GAMLSS model
g1 (µ) = η1 = h1 (X1 , β1 )
g2 (σ) = η2 = h2 (X2 , β2 )
g3 (τ ) = η3 = h3 (X3 , β3 )
g4 (ν) = η4 = h4 (X4 , β4 )
Fitting inflated beta distributions with GAMLSS
10 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear parametric GAMLSS model
g1 (µ) = η1 = h1 (X1 , β1 )
g2 (σ) = η2 = h2 (X2 , β2 )
g3 (τ ) = η3 = h3 (X3 , β3 )
g4 (ν) = η4 = h4 (X4 , β4 )
Parametric GAMLSS model
Fitting inflated beta distributions with GAMLSS
10 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
GAMLSS model
Nonlinear parametric GAMLSS model
g1 (µ) = η1 = h1 (X1 , β1 )
g2 (σ) = η2 = h2 (X2 , β2 )
g3 (τ ) = η3 = h3 (X3 , β3 )
g4 (ν) = η4 = h4 (X4 , β4 )
Parametric GAMLSS model
g1 (µ) = η1 = X1 β1
g2 (σ) = η2 = X2 β2
g3 (τ ) = η3 = X3 β3
g4 (ν) = η4 = X4 β4
Fitting inflated beta distributions with GAMLSS
10 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What additive terms can be used
In GAMLSS model each parameter of the distribution (µ, σ, τ, ν)
is modelled using terms in explanatory variables x
Fitting inflated beta distributions with GAMLSS
11 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What additive terms can be used
In GAMLSS model each parameter of the distribution (µ, σ, τ, ν)
is modelled using terms in explanatory variables x
Parametric additive terms
? All the parameters of the distribution can be modelled as
linear/non-linear parametric functions and/or smoothing
functions of the explanatory variables (i.e. cubic splines,
penalized splines, loess) and/or random effects.
Fitting inflated beta distributions with GAMLSS
11 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What additive terms can be used
In GAMLSS model each parameter of the distribution (µ, σ, τ, ν)
is modelled using terms in explanatory variables x
Parametric additive terms
? All the parameters of the distribution can be modelled as
linear/non-linear parametric functions and/or smoothing
functions of the explanatory variables (i.e. cubic splines,
penalized splines, loess) and/or random effects.
• Linear and interaction terms for variables and factors.
• Polynomials, inverse polynomials, piecewise polynomials
(with fixed knots), fractional polynomials (Royston and
Altman, 1994).
• Non-linear parametric terms.
Fitting inflated beta distributions with GAMLSS
11 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What additive terms can be used
Smoothing and random effects additive terms
? Additive smoothing terms
• loess (Cleveland et al., 1993).
• cubic splines (Green and Silverman, 1994)
• P-splines (Eilers and Marx, 1996).
• varying coefficient models (Hastie and Tibshirani, 1993)
Fitting inflated beta distributions with GAMLSS
12 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
What additive terms can be used
Smoothing and random effects additive terms
? Additive smoothing terms
• loess (Cleveland et al., 1993).
• cubic splines (Green and Silverman, 1994)
• P-splines (Eilers and Marx, 1996).
• varying coefficient models (Hastie and Tibshirani, 1993)
? Random effects (overdispersion, simple random effects,
random coefficients).
Fitting inflated beta distributions with GAMLSS
12 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Model estimation in GAMLSS
• The parametric vectors βk and the random effects
parameters γjk , for j = 1, 2, . . . Jk and k = 1, 2, 3, 4 are
estimated within the GAMLSS framework (for fixed values
of the smoothing hyper-parameters λjk ) by maximising a
penalized likelihood function `p given by
p
`p = ` −
J
k
1 XX
λjk γjk> Gjk γjk ,
2
k =1 j=1
where
`=
n
X
log{D(yi |µi , σi , τi , νi )}
i=1
is the log likelihood function.
Fitting inflated beta distributions with GAMLSS
13 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Model estimation in GAMLSS
(Linear or nonlinear) parametric GAMLSS model
• i.e. no random effects or smoothing terms.
• i.e. fixed effects but no random effects parameters.
• i.e. β’s but no γ’s or λ’s.
• GAMLSS estimates β’s by maximum likelihood estimation
• Newton-Rapshon or Fisher scoring is used to maximize the
(penalized) likelihood.
Fitting inflated beta distributions with GAMLSS
14 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Model estimation in GAMLSS
(Linear or nonlinear) parametric GAMLSS model
• i.e. no random effects or smoothing terms.
• i.e. fixed effects but no random effects parameters.
• i.e. β’s but no γ’s or λ’s.
• GAMLSS estimates β’s by maximum likelihood estimation
• Newton-Rapshon or Fisher scoring is used to maximize the
(penalized) likelihood.
Random effects or semi-parametric GAMLSS model
• i.e. random effects or smoothing terms.
• i.e. both fixed effects and random effects parameters.
• i.e. β’s , γ’s and λ’s.
Fitting inflated beta distributions with GAMLSS
14 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Model estimation in GAMLSS
(Linear or nonlinear) parametric GAMLSS model
• i.e. no random effects or smoothing terms.
• i.e. fixed effects but no random effects parameters.
• i.e. β’s but no γ’s or λ’s.
• GAMLSS estimates β’s by maximum likelihood estimation
• Newton-Rapshon or Fisher scoring is used to maximize the
(penalized) likelihood.
Random effects or semi-parametric GAMLSS model
• i.e. random effects or smoothing terms.
• i.e. both fixed effects and random effects parameters.
• i.e. β’s , γ’s and λ’s.
• GAMLSS estimates (β, γ) by posterior mode estimation
Fitting inflated beta distributions with GAMLSS
14 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Model estimation in GAMLSS
GAMLSS estimates hyperparameters λ by
• minimizing a profile GAIC (Akaike, 1983) over λ,
GAIC(#) = Deviance + #df ,
where # is a penalty for each degree of freedom used in
the model and df is the total degrees of freedom used in the
model.
• or maximising the marginal likelihood of λ.
Fitting inflated beta distributions with GAMLSS
15 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Algorithms for estimation the additive terms
backffitting algorithm
• RS: a generalization of the Mean and Dispersion Additive
Models (MADAM) algorithm, [and does not use the cross
derivatives], is more suited. Rigby and Stasinopoulos
(1996a)
• CG: a generalization of Cole and Green (1992) [uses the
first and (expected, observed or approximated) second and
cross derivatives of the likelihood function with respect to
the distribution parameters
• mixed: a mixture of RS+CG (i.e. j iterations of RS, followed
by k iterations of CG)
Fitting inflated beta distributions with GAMLSS
16 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
• easy implementation of new distributions.
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
• easy implementation of new distributions.
• easy implementation of new additive terms.
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
• easy implementation of new distributions.
• easy implementation of new additive terms.
• simple starting values for (µ, σ, τ, ν) easily found.
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
• easy implementation of new distributions.
• easy implementation of new additive terms.
• simple starting values for (µ, σ, τ, ν) easily found.
• stable and reliable algorithms.
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
Advantages of algorithms
• flexible modular fitting procedure.
• easy implementation of new distributions.
• easy implementation of new additive terms.
• simple starting values for (µ, σ, τ, ν) easily found.
• stable and reliable algorithms.
• very fast fitting (for fixed hyperparameters).
Fitting inflated beta distributions with GAMLSS
17 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
The gamlss package
? The gamlss() function creates a gamlss object
Fitting inflated beta distributions with GAMLSS
18 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
The gamlss package
? The gamlss() function creates a gamlss object
The gamlss function
fit = gamlss( formula = formula(data),
sigma.formula = ∼1, nu.formula = ∼1, tau.formula =
∼1, family = NO(), data = sys.parent(), weights =
NULL, contrasts = NULL, method = RS(), start.from =
NULL, mu.start = NULL, sigma.start = NULL, nu.start
= NULL, tau.start = NULL, mu.fix = FALSE, sigma.fix
= FALSE, nu.fix = FALSE, tau.fix = FALSE, control =
gamlss.control(...), i.control = glim.control(...),
... )
Fitting inflated beta distributions with GAMLSS
18 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
some methods for gamlss
• print(fit) # prints objects returned by modelling
functions
• summary(fit) # summarize objects returned by
modelling functions
• fitted(fit, “mu” ) # Fitted values extracted from the
GAMLSS object for the given parameter
• plot(fit) # returns four plots related to the
residuals of the fitted GAMLSS
• predict(fit,new.data=newdat)# produce predictors for a
new data set
• residuals(fit) # produce the residuals for a fitted
model
• update(fit) # updates and (by default) refits a
GAMLSS model
Fitting inflated beta distributions with GAMLSS
19 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• In the gamlss family many distributions are available and
•
•
•
•
•
each distribution has:
ddist: the pdf function
pdist : the cdf function
qdist : the inverse cdf function
rdist : random generating function
dist : the fitting function
Fitting inflated beta distributions with GAMLSS
20 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• In the gamlss family many distributions are available and
•
•
•
•
•
each distribution has:
ddist: the pdf function
pdist : the cdf function
qdist : the inverse cdf function
rdist : random generating function
dist : the fitting function
• There are around 50 different distributions available in the
current implementation of GAMLSS (gamlss package) in
R. (R Development Core Team, 2007), a free software, see
URL http://www.R-project.org.
Fitting inflated beta distributions with GAMLSS
20 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• Truncated versions of these distributions can be used used
(gamlss.tr package).
Fitting inflated beta distributions with GAMLSS
21 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• Truncated versions of these distributions can be used used
(gamlss.tr package).
• Censored (or interval) and finite mixtures response
variables can be used (gamlss.cens and gamlss.mx
packages).
Fitting inflated beta distributions with GAMLSS
21 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• Truncated versions of these distributions can be used used
(gamlss.tr package).
• Censored (or interval) and finite mixtures response
variables can be used (gamlss.cens and gamlss.mx
packages).
• Different parameterisations (nonlinear models) of a
distribution can be implemented (gamlss.nl package)
Fitting inflated beta distributions with GAMLSS
21 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• Truncated versions of these distributions can be used used
(gamlss.tr package).
• Censored (or interval) and finite mixtures response
variables can be used (gamlss.cens and gamlss.mx
packages).
• Different parameterisations (nonlinear models) of a
distribution can be implemented (gamlss.nl package)
• New distributions can be added easily (gamlss.dist
package).
Fitting inflated beta distributions with GAMLSS
21 / 38
GAMLSS for statistical modelling
Introduction to GAMLSS modelling
How to use GAMLSS
• Truncated versions of these distributions can be used used
•
•
•
•
(gamlss.tr package).
Censored (or interval) and finite mixtures response
variables can be used (gamlss.cens and gamlss.mx
packages).
Different parameterisations (nonlinear models) of a
distribution can be implemented (gamlss.nl package)
New distributions can be added easily (gamlss.dist
package).
For new users of GAMLSS we recommend the second
edition of the GAMLSS manual
http://studweb.north.londonmet.ac.uk/∼stasinom/
papers/gamlss-manual.pdf.
Fitting inflated beta distributions with GAMLSS
21 / 38
GAMLSS for statistical modelling
Parametric model
Fitting inflated beta distributions
What is an inflated beta distribution
• Inflated beta distributions are suitable models to describe
fractional data (rates, proportions) observed on [0, 1), (0, 1]
or [0, 1].
Fitting inflated beta distributions with GAMLSS
22 / 38
GAMLSS for statistical modelling
Parametric model
Fitting inflated beta distributions
What is an inflated beta distribution
• Inflated beta distributions are suitable models to describe
fractional data (rates, proportions) observed on [0, 1), (0, 1]
or [0, 1].
• These distribuitions are mixed continuous-discrete.
Fitting inflated beta distributions with GAMLSS
22 / 38
GAMLSS for statistical modelling
Parametric model
Fitting inflated beta distributions
What is an inflated beta distribution
• Inflated beta distributions are suitable models to describe
fractional data (rates, proportions) observed on [0, 1), (0, 1]
or [0, 1].
• These distribuitions are mixed continuous-discrete.
• The distributions capture the probability mass at 0, at 1 or
both, depending on the case.
Fitting inflated beta distributions with GAMLSS
22 / 38
GAMLSS for statistical modelling
Parametric model
Fitting inflated beta distributions
What is an inflated beta distribution
• Inflated beta distributions are suitable models to describe
fractional data (rates, proportions) observed on [0, 1), (0, 1]
or [0, 1].
• These distribuitions are mixed continuous-discrete.
• The distributions capture the probability mass at 0, at 1 or
both, depending on the case.
• The beta distribution is used to describe the continuous
component of the model since its density can have quite
different shapes depending on the values of the two
parameters that index the distribution.
Fitting inflated beta distributions with GAMLSS
22 / 38
GAMLSS for statistical modelling
Parametric model
Fitting inflated beta distributions
• These models are special cases of the class of inflated
models.
• The word inflated suggests that the probability mass of
some points exceeds what is allowed by the proposed
model (Tu, 2002).
• For more technical details of these distribuitions see Ospina
& Ferrari (2008), “Inflated beta distributions”, available at
url = www.arxiv.org/pdf/0705.0700v3 (in
Statistical Papers, DOI:
10.1007/s00362-008-0125-4).
Fitting inflated beta distributions with GAMLSS
23 / 38
GAMLSS for statistical modelling
Parametric model
Example
zero-inflated beta distribution for fitting a GAMLSS
se y = 0,
α,
bezi(y ; α, µ, φ) = (1 − α)f (y ; µ, φ), se y ∈ (0, 1),
0,
se y ∈
/ [0, 1),
f (y ; µ, φ) =
Γ(φ)
y µφ−1 (1−y )(1−µ)φ−1 ,
Γ(µφ)Γ((1 − µ)φ)
y ∈ (0, 1).
• where Γ(·) is the gamma function, 0 < α < 1, 0 < µ < 1,
and φ > 0.
• P(y = 0) = α.
Fitting inflated beta distributions with GAMLSS
24 / 38
GAMLSS for statistical modelling
Parametric model
Example
Mean and variance
E(y ) = (1 − α)µ,
V (µ)
Var(y ) = (1 − α)
+ α(1 − α)µ2 .
φ+1
Fitting inflated beta distributions with GAMLSS
25 / 38
GAMLSS for statistical modelling
Parametric model
Example
Mean and variance
E(y ) = (1 − α)µ,
V (µ)
Var(y ) = (1 − α)
+ α(1 − α)µ2 .
φ+1
• Here V (µ) = µ(1 − µ) denotes the “variance function". The
parameter φ plays the role of a precision parameter in the
sense that, for fixed µ, the larger the value of φ, the smaller
the variance of y .
Fitting inflated beta distributions with GAMLSS
25 / 38
GAMLSS for statistical modelling
Parametric model
Example
Mean and variance
E(y ) = (1 − α)µ,
V (µ)
Var(y ) = (1 − α)
+ α(1 − α)µ2 .
φ+1
• Here V (µ) = µ(1 − µ) denotes the “variance function". The
parameter φ plays the role of a precision parameter in the
sense that, for fixed µ, the larger the value of φ, the smaller
the variance of y .
• Different values of the parameters generate different
shapes.
Fitting inflated beta distributions with GAMLSS
25 / 38
GAMLSS for statistical modelling
Parametric model
Example
Plotting a zero-inflated beta distribution
Figure:
Fitting inflated beta distributions with GAMLSS
(α = 0.2, µ = 0.5, φ = 5)
26 / 38
GAMLSS for statistical modelling
Parametric model
Example with real data: Cable penetration data
Objective: Examine influences on cable penetration (Federal
Communications Commission, 1993)
• We consider 282 observations of ‘cable community units’,
which are essentially individual franchise areas.
Fitting inflated beta distributions with GAMLSS
27 / 38
GAMLSS for statistical modelling
Parametric model
Example with real data: Cable penetration data
Objective: Examine influences on cable penetration (Federal
Communications Commission, 1993)
• We consider 282 observations of ‘cable community units’,
which are essentially individual franchise areas.
• (y ) = Proportion of households within a market area that
subscribe additional services to cable television
Fitting inflated beta distributions with GAMLSS
27 / 38
GAMLSS for statistical modelling
Parametric model
Example with real data: Cable penetration data
Objective: Examine influences on cable penetration (Federal
Communications Commission, 1993)
• We consider 282 observations of ‘cable community units’,
which are essentially individual franchise areas.
• (y ) = Proportion of households within a market area that
subscribe additional services to cable television
• Covariates
• lin = logarithm of franchise median income
• child = percentage of franchise households with children
• ltv = number of local broadcast television signals
• agehe = the age of the cable system headend
Fitting inflated beta distributions with GAMLSS
27 / 38
GAMLSS for statistical modelling
Parametric model
Frequency histogram of cable penetration data
Figure: Distribution of Proportion of households within a market area
that subscribe additional services to cable television
Fitting inflated beta distributions with GAMLSS
28 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
• We modelled the parameters of the zero-inflated beta
distribution in dependence on the covariates, i.e.
logit(α) = γ0 + γ1 lin + γ2 child + γ3 agehe + γ4 ltv ,
logit(µ) = β0 + β1 lin + β2 child + β3 agehe + β4 ltv ;
Fitting inflated beta distributions with GAMLSS
29 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
• We modelled the parameters of the zero-inflated beta
distribution in dependence on the covariates, i.e.
logit(α) = γ0 + γ1 lin + γ2 child + γ3 agehe + γ4 ltv ,
logit(µ) = β0 + β1 lin + β2 child + β3 agehe + β4 ltv ;
• Maximum (penalised) likelihood estimation is used.
Fitting inflated beta distributions with GAMLSS
29 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
• We modelled the parameters of the zero-inflated beta
distribution in dependence on the covariates, i.e.
logit(α) = γ0 + γ1 lin + γ2 child + γ3 agehe + γ4 ltv ,
logit(µ) = β0 + β1 lin + β2 child + β3 agehe + β4 ltv ;
• Maximum (penalised) likelihood estimation is used.
• The penalized log likelihood function of the model is
maximized iteratively using either the RS or CG algorithm of
Rigby and Stasinopoulos (2005), which in turn uses a
backffitting algorithm to perform each step of the Fisher
scoring procedure.
Fitting inflated beta distributions with GAMLSS
29 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
#==== load library ====
library(gamlss) # gamlss package
library(gamlss.dist) # gamlss.dist package, contains inflated
beta distributions
library(RODBC) # odbc connection, load xls files.
source("ExtraBIc.R") #extra functions for inflated beta distributions
source("residualsBIc.R") #news residuals for inflated beta distributions
source("HistBIc.R") #especial histogram for inflated beta
distributions
source("linksBEINF.R") # news links functions
#===== load data =====
load("datos.Rdata")
X = as.data.frame(X)
attach(X)
#==== histogram data =======
histBEZI(y, xlab="y", main="", ylab = "Frequency")
#===== Fit model (all covariates) =====
mod.0=gamlss(y~., nu.formula=~., family=BEZI, data=X)
#==== model selection for alpha using AIC ====
mod.1=stepGAIC(mod.0, what="nu")
Fitting inflated beta distributions with GAMLSS
29 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
#==== model selection for mu using AIC ====
mod.2 = stepGAIC(mod.1)
#==== final fit model ====
fit=gamlss(formula = y ~ clinas , nu.formula = ~ltv, family = BEZI,
data = X, trace = FALSE)
#summary results
summary(fit)
# fit values
mean.fit = meanBEZI(fit)
#==== residuals =====
a = residuals.BIc(fit) #standardized residual
b = residuals.BIc(fit, type="experimental") #weighted residual
c = residuals.BIc(fit, type="quantil") #quantal residual
(Dunn \& Smyth (1996))
#==== normal probability plots ====
envelope.BIc(fit, main="standardized residual" )
envelope.BIc(fit, type="experimental", main="weighted residual" )
envelope.BIc(fit, type="quantil", main="quantal residual")
Fitting inflated beta distributions with GAMLSS
30 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
Table: Parameter estimates with standard errors
Estimador
γ
b0
γ
b1
γ
b2
γ
b3
γ
b4
βb0
βb1
βb2
βb3
βb4
b
φ
Fitting inflated beta distributions with GAMLSS
Estimativa
-1.91332
0.09228
0.00001
0.02119
-0.09077
-7.83031
0.64117
0.00886
0.00538
0.01673
6.05032
30 / 38
Erro Padrão
4.66688
0.47763
0.01585
0.01852
0.05269
1.58294
0.16248
0.00536
0.00722
0.01879
0.52647
GAMLSS for statistical modelling
Parametric model
The gamlss model
Table: Parameter estimates with standard errors
Estimador
γ
b0
γ
b1
γ
b2
γ
b3
γ
b4
βb0
βb1
βb2
βb3
βb4
b
φ
Estimativa
-1.91332
0.09228
0.00001
0.02119
-0.09077
-7.83031
0.64117
0.00886
0.00538
0.01673
6.05032
Erro Padrão
4.66688
0.47763
0.01585
0.01852
0.05269
1.58294
0.16248
0.00536
0.00722
0.01879
0.52647
• Model to α. This contains only local broadcast television
signals characteristics, which is surprising.
Fitting inflated beta distributions with GAMLSS
30 / 38
GAMLSS for statistical modelling
Parametric model
The gamlss model
Table: Parameter estimates with standard errors
Estimador
γ
b0
γ
b1
γ
b2
γ
b3
γ
b4
βb0
βb1
βb2
βb3
βb4
b
φ
Estimativa
-1.91332
0.09228
0.00001
0.02119
-0.09077
-7.83031
0.64117
0.00886
0.00538
0.01673
6.05032
Erro Padrão
4.66688
0.47763
0.01585
0.01852
0.05269
1.58294
0.16248
0.00536
0.00722
0.01879
0.52647
• Model to α. This contains only local broadcast television
signals characteristics, which is surprising.
• Model to µ. This contains only income characteristics.
Fitting inflated beta distributions with GAMLSS
30 / 38
GAMLSS for statistical modelling
Parametric model
Model selection
• Using the generalized Akaike information criterion (GAIC)
as model selection criterion, the following final model was
selected:
logit(α) = γ0 + γ4 ltv ,
logit(µ) = β0 + β1 lin.
Fitting inflated beta distributions with GAMLSS
31 / 38
GAMLSS for statistical modelling
Parametric model
Model selection
• Using the generalized Akaike information criterion (GAIC)
as model selection criterion, the following final model was
selected:
logit(α) = γ0 + γ4 ltv ,
logit(µ) = β0 + β1 lin.
• φ is fixed.
• AIC = 121.9194.
• If we use the likelihood ratio (LR) statistic for testing
H0 : θ = (β2 , β3 , β4 , γ1 , γ2 , γ3 ) = 0 vs H1 : θ 6= 0 we obtain the
statisc value Λ = 4.7870 and p-value 0.5714.
• H0 is not rejected.
Fitting inflated beta distributions with GAMLSS
31 / 38
GAMLSS for statistical modelling
Parametric model
Figure:
Fitting inflated beta distributions with GAMLSS
Residual plot
Cabler data penetration:residual plots
32 / 38
GAMLSS for statistical modelling
Parametric model
Figure:
Normal probability plots
Cabler data penetration: Normal probability plots.
Fitting inflated beta distributions with GAMLSS
33 / 38
GAMLSS for statistical modelling
Conclusion
respect to GAMLSS
In short, why use gamlss
• Unified framework for univariate regression type of models
• Allows any distribution for the response variable y
• Models all the parameters of the distribution of y
• Allows a variety of additive terms in the models for the
•
•
•
•
distribution parameters
The fitted algorithm is modular, where different components
can be added easily
Models can be fitted easily and fast
Explanatory tool to find appropriate set of models
It deals with overdispersion, skewness and kurtosis
Fitting inflated beta distributions with GAMLSS
34 / 38
GAMLSS for statistical modelling
Conclusion
respect to application
Aplication
The application using real data show that the inflated beta
distributions as GAMLSS models are quite flexible for modeling
fractional data on the closed or half-open unit interval (in this
case [0,1)).
Fitting inflated beta distributions with GAMLSS
35 / 38
GAMLSS for statistical modelling
Conclusion
respect to application
Aplication
The application using real data show that the inflated beta
distributions as GAMLSS models are quite flexible for modeling
fractional data on the closed or half-open unit interval (in this
case [0,1)).
Challenges
• improve the methods of statistical inference.
Fitting inflated beta distributions with GAMLSS
35 / 38
GAMLSS for statistical modelling
Conclusion
respect to application
Aplication
The application using real data show that the inflated beta
distributions as GAMLSS models are quite flexible for modeling
fractional data on the closed or half-open unit interval (in this
case [0,1)).
Challenges
• improve the methods of statistical inference.
• develop new ways to evaluate the fit of models.
Fitting inflated beta distributions with GAMLSS
35 / 38
GAMLSS for statistical modelling
Conclusion
respect to application
Aplication
The application using real data show that the inflated beta
distributions as GAMLSS models are quite flexible for modeling
fractional data on the closed or half-open unit interval (in this
case [0,1)).
Challenges
• improve the methods of statistical inference.
• develop new ways to evaluate the fit of models.
• techniques of local (global) influence.
? residuals.
? improved statistical tests.
? selection models, ...
Fitting inflated beta distributions with GAMLSS
35 / 38
GAMLSS for statistical modelling
Conclusion
respect to application
Aplication
The application using real data show that the inflated beta
distributions as GAMLSS models are quite flexible for modeling
fractional data on the closed or half-open unit interval (in this
case [0,1)).
Challenges
• improve the methods of statistical inference.
• develop new ways to evaluate the fit of models.
• techniques of local (global) influence.
?
?
?
?
residuals.
improved statistical tests.
selection models, ...
new parametric (nonparametric, semi-parametric) models.
Fitting inflated beta distributions with GAMLSS
35 / 38
GAMLSS for statistical modelling
References
some GAMLSS related papers
•
Rigby, R.A., Stasinopoulos, D.M. (2001) The GAMLSS project: a flexible approach to
statistical modelling, In New Trends in Statistical Modelling: Proceedings of the 16th
International Workshop on Statistical Modelling, eds B. Klein and L. Korsholm, pp
337–345, Odense, Denmark.
•
Akantziliotou, K. Rigby, R. A. and Stasinopoulos, D. M. (2002) The R implementation of
Generalized Additive Models for Location Scale and Shape, in Statistical modelling in
Society: Proceedings of the 17th International Workshop on statistical modelling, ed:
Stasinopoulos, M. and Touloumi, G., 75–83, Chania, Greece.
•
Rigby, R. A. and Stasinopoulos D. M. (2004). Smooth centile curves for skew and kurtotic
data modelled using the Box–Cox Power Exponential distribution, Statistics in Medicine,
23, pp 3053–3076.
•
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized Additive Models for Location,
Scale and Shape, (with discussion). Appl. Statist., 54, pp 507–554.
•
Stasinopoulos D. M., Rigby R.A. and Akantziliotou C. (2005). Using GAMLSS for
univariate statistical modelling. In Statistical Solutions to Modern Problems, Proceedings
of the 20th International Workshop on Statistical Modelling, eds A.R. Francis, K.M.
Matawie, A. Oshlack, G.K. Smyth, pp 411–418, Sydney, Australia.
•
Stasinopoulos D. M., Rigby R.A. and Akantziliotou C. (2006) Instructions on how to use
the GAMLSS package in R. Technical Report 01/06. STORM Research Centre, London
Metropolitan University, London
Fitting inflated beta distributions with GAMLSS
36 / 38
GAMLSS for statistical modelling
References
some inflated beta distributions related references
•
Rigby, R. A. and Stasinopoulos D. M. (2006). Using the Box–Cox t distribution in GAMLSS
to model skewness and kurtosis.Statistical Modelling, 6,pp 209–229.
•
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and
shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007.
•
Rigby R. A., Stasinopoulos M.D. and Akantziliotou C. (2007) A General framework for
modelling overdispersed count data. Submitted.
•
Dunn, P. K. & Smyth, G. K. (1996). Randomized quantile residuals. Journal of
Computational and Graphical, 5, 1–10.
•
Federal Communications Commission (1993). FCC 93-177, Report and order and further
notice of Proposed rule making, MM Docker 92-266 (3 May 1993), 6134.
•
Espinheira,P.L, Ferrari, S.L.P. & Cribari–Neto, F. (2008a). On beta regression residuals.
Journal of Applied Statistics, a aparecer.
•
Espinheira,P.L, Ferrari, S.L.P. & Cribari–Neto, F. (2008b). Influence diagnostics in beta
regression Computational Statistics & Data Analysis, a aparecer.
•
Ferrari, S. L. P. & Cribari-Neto, F. (2004). Beta regression for modelling rates and
proportions. Journal of Applied Statistics, 7, 799–815.
•
Hoff, A. (2007). Second stage DEA: Comparison of approaches for modelling the DEA
escore. European Journal of Operational Research, 181, 425–435.
Fitting inflated beta distributions with GAMLSS
37 / 38
GAMLSS for statistical modelling
References
•
Kieschnick R. & McCullough, B. D. (2003). Regression analysis of variates observed on
(0,1): percentages, proportions, and fractions. Statistical Modelling, 3, 1–21.
•
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models, 2nd ed. London:
Chapman and Hall.
•
Ospina, R. Cribari–Neto, F. & Vasconcellos, K. L. P. (2006). Improved point and interval
estimation for a beta regression model. Computational Statistics & Data Analysis, 51,
960–981.
•
Ospina, R. (2006). The zero-inflated beta distribution for fitting a GAMLSS. Available at
gamlss.dist: Extra distributions to be used for GAMLSS modelling.
http://cran.r-project.org/src/contrib/.
•
Tu, W. (2002). Zero inflated data. Encyclopedia of Environmetrics, 4, 2387–2391.
Fitting inflated beta distributions with GAMLSS
38 / 38
GAMLSS for statistical modelling
© Copyright 2026 Paperzz