Why is Mixture Modelling so popular?
Tony Robinson
Department of Mathematical Sciences
University of Bath
15th May, 2008
Tony Robinson
Why is Mixture Modelling so popular?
Outline
Tony Robinson
Why is Mixture Modelling so popular?
Outline
1. Heterogeneity - one model won’t do!
Tony Robinson
Why is Mixture Modelling so popular?
Outline
1. Heterogeneity - one model won’t do!
2. Basic mixture formulation.
Tony Robinson
Why is Mixture Modelling so popular?
Outline
1. Heterogeneity - one model won’t do!
2. Basic mixture formulation.
3. Ways to fit.
Tony Robinson
Why is Mixture Modelling so popular?
Outline
1. Heterogeneity - one model won’t do!
2. Basic mixture formulation.
3. Ways to fit.
4. Inferential difficulties.
Tony Robinson
Why is Mixture Modelling so popular?
Outline
1. Heterogeneity - one model won’t do!
2. Basic mixture formulation.
3. Ways to fit.
4. Inferential difficulties.
5. Cautionary example.
Tony Robinson
Why is Mixture Modelling so popular?
Introduction - Model Heterogeneity.
Much of the data we encounter does not seem to be
appropriately fitted by those nice simple probability models
we learnt about “in school”.
Tony Robinson
Why is Mixture Modelling so popular?
Introduction - Model Heterogeneity.
Much of the data we encounter does not seem to be
appropriately fitted by those nice simple probability models
we learnt about “in school”.
20
10
0
Frequency
30
40
Hidalgo Stamp Thicknesses
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
thickness
Tony Robinson
Why is Mixture Modelling so popular?
Finite Mixture Models
Tony Robinson
Why is Mixture Modelling so popular?
Finite Mixture Models
Idea is to model complex data as a finite mixture of component
models, often of same type.
Tony Robinson
Why is Mixture Modelling so popular?
Finite Mixture Models
Idea is to model complex data as a finite mixture of component
models, often of same type.
Loosely, model M is a weighted mixture of component models
{Mi }, if
g
M = ∑ wi Mi
i=1
wi represents the proportion of the data “explained” by Mi .
Tony Robinson
Why is Mixture Modelling so popular?
Finite Mixture Models
Idea is to model complex data as a finite mixture of component
models, often of same type.
Loosely, model M is a weighted mixture of component models
{Mi }, if
g
M = ∑ wi Mi
i=1
wi represents the proportion of the data “explained” by Mi .
Example : Density estimation
g
f (xj ) = ∑ wi fi (xj ),
i=1
g
fi (xj ) are “standard” densities and 0 ≤ wi ≤ 1, and ∑i=1 wi = 1.
Tony Robinson
Why is Mixture Modelling so popular?
For the Hidalgo stamp thickness data, one simple objective
might be density estimation.
Tony Robinson
Why is Mixture Modelling so popular?
For the Hidalgo stamp thickness data, one simple objective
might be density estimation.
We could model the thicknesses as a mixture of Gaussian
distributions.
g
f (xj ) = ∑ wi φi (xj , µi , σ2i ),
i=1
Tony Robinson
Why is Mixture Modelling so popular?
For the Hidalgo stamp thickness data, one simple objective
might be density estimation.
We could model the thicknesses as a mixture of Gaussian
distributions.
g
f (xj ) = ∑ wi φi (xj , µi , σ2i ),
i=1
But this raises an immediate question?
Tony Robinson
Why is Mixture Modelling so popular?
For the Hidalgo stamp thickness data, one simple objective
might be density estimation.
We could model the thicknesses as a mixture of Gaussian
distributions.
g
f (xj ) = ∑ wi φi (xj , µi , σ2i ),
i=1
But this raises an immediate question?
Do we know how many component models there are in the
mixture? Or, do we need to find out?
Tony Robinson
Why is Mixture Modelling so popular?
Four?
Tony Robinson
Why is Mixture Modelling so popular?
40
0
20
Density
60
80
Stamp Thickness, g=4
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
thickness
Four?
Tony Robinson
Why is Mixture Modelling so popular?
Five?
Tony Robinson
Why is Mixture Modelling so popular?
40
0
20
Density
60
80
Stamp Thickness, g=5
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
thickness
Five?
Tony Robinson
Why is Mixture Modelling so popular?
Six?
Tony Robinson
Why is Mixture Modelling so popular?
40
0
20
Density
60
80
Stamp Thickness, g=6
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
thickness
Six?
Tony Robinson
Why is Mixture Modelling so popular?
Seven?
Tony Robinson
Why is Mixture Modelling so popular?
40
0
20
Density
60
80
Stamp Thickness, g=7
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
thickness
Seven?
Tony Robinson
Why is Mixture Modelling so popular?
Model-Based Clustering
Tony Robinson
Why is Mixture Modelling so popular?
Model-Based Clustering
The stamp data paper is likely to come from several different
origins.
Tony Robinson
Why is Mixture Modelling so popular?
Model-Based Clustering
The stamp data paper is likely to come from several different
origins.
Mixture modelling provides a natural framework for this
situation.
Tony Robinson
Why is Mixture Modelling so popular?
Model-Based Clustering
The stamp data paper is likely to come from several different
origins.
Mixture modelling provides a natural framework for this
situation.
At its simplest, each component in the mixture represents a
cluster within the total population.
Tony Robinson
Why is Mixture Modelling so popular?
Model-Based Clustering
The stamp data paper is likely to come from several different
origins.
Mixture modelling provides a natural framework for this
situation.
At its simplest, each component in the mixture represents a
cluster within the total population.
Assumes clusters are individually well fitted by the models
{Mi }.
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful Data
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of regressions, linear Models, GLMs, survival,...
EG. Hurn et al. (2000)
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
l
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
l
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
BUT many sources of heterogeneity
l
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
BUT many sources of heterogeneity
Can mix any or all of
l
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
BUT many sources of heterogeneity
Can mix any or all of
1. different Θs
l
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
BUT many sources of heterogeneity
Can mix any or all of
1. different Θs
l
2. branch lengths
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Mixtures of Phylogenetic Models (Evolutionary tree of species)
Characters (ACGT) evolve according
to a Markov process Θ involving
parameters such as mutation rates.
BUT many sources of heterogeneity
Can mix any or all of
1. different Θs
l
2. branch lengths
3. topologies
Tony Robinson
Why is Mixture Modelling so popular?
More Applications
Social Networks Handcock et al(2006)
Tony Robinson
Why is Mixture Modelling so popular?
Fitting Mixture Models
Two popular methods of fitting.
Tony Robinson
Why is Mixture Modelling so popular?
Fitting Mixture Models
Two popular methods of fitting.
1. EM Algorithm
Tony Robinson
Why is Mixture Modelling so popular?
Fitting Mixture Models
Two popular methods of fitting.
1. EM Algorithm
2. Markov chain Monte Carlo.
Tony Robinson
Why is Mixture Modelling so popular?
Fitting Mixture Models
Two popular methods of fitting.
1. EM Algorithm
2. Markov chain Monte Carlo.
The EM algorithm involves assessing to which component j of
the mixture each datum is expected to belong. Once this is
established for all the data, fitting by usual MLE proceeds for
the parameters of model j.
McMC may also utilise this device for completing the “missing”
part of the data.
Tony Robinson
Why is Mixture Modelling so popular?
Allocation Variables
I
Latent allocation variables describe which observations
are assigned to each of the current components at each
iteration of the EM algorithm or McMC sampler.
In the case of McMC these provide a sample cluster
configuration per McMC iteration.
Tony Robinson
Why is Mixture Modelling so popular?
Allocation Variables
I
Latent allocation variables describe which observations
are assigned to each of the current components at each
iteration of the EM algorithm or McMC sampler.
In the case of McMC these provide a sample cluster
configuration per McMC iteration.
I
Set Zj = i if observation j is allocated to component i
Tony Robinson
Why is Mixture Modelling so popular?
Allocation Variables
I
Latent allocation variables describe which observations
are assigned to each of the current components at each
iteration of the EM algorithm or McMC sampler.
In the case of McMC these provide a sample cluster
configuration per McMC iteration.
I
Set Zj = i if observation j is allocated to component i
I
Resulting allocation vector Z = (Z1 , ..., Zn )
Tony Robinson
Why is Mixture Modelling so popular?
Any Problems?
Model selection, inference and interpretation can be
complicated by several factors.
Tony Robinson
Why is Mixture Modelling so popular?
Any Problems?
Model selection, inference and interpretation can be
complicated by several factors.
I
What is g?
I
g large ⇒ many parameters.
I
Identifiability?
Tony Robinson
Why is Mixture Modelling so popular?
Label Switching
Tony Robinson
Why is Mixture Modelling so popular?
Label Switching
I
In a mixture model, if all the g components belong to the
same parametric family, then the mixture density is
invariant under the g! permutations of the component
labels.
Tony Robinson
Why is Mixture Modelling so popular?
Label Switching
I
In a mixture model, if all the g components belong to the
same parametric family, then the mixture density is
invariant under the g! permutations of the component
labels.
I
The likelihood
n
Lik = ∏{w1 f (xi ; θ1 ) + . . . + wg f (xi ; θg )}
i=1
is the same for all permutations of the labels (1, 2, 3, ..., g).
Tony Robinson
Why is Mixture Modelling so popular?
Label Switching
I
In a mixture model, if all the g components belong to the
same parametric family, then the mixture density is
invariant under the g! permutations of the component
labels.
I
The likelihood
n
Lik = ∏{w1 f (xi ; θ1 ) + . . . + wg f (xi ; θg )}
i=1
is the same for all permutations of the labels (1, 2, 3, ..., g).
I
The term label switching is used to describe the invariance
of the likelihood under the relabelling of the components.
Tony Robinson
Why is Mixture Modelling so popular?
Label Switching
I
In a mixture model, if all the g components belong to the
same parametric family, then the mixture density is
invariant under the g! permutations of the component
labels.
I
The likelihood
n
Lik = ∏{w1 f (xi ; θ1 ) + . . . + wg f (xi ; θg )}
i=1
is the same for all permutations of the labels (1, 2, 3, ..., g).
I
The term label switching is used to describe the invariance
of the likelihood under the relabelling of the components.
I
Often handled by imposing constraints, e.g label
components in increasing order of weight wj . Not always
satisfactory.
Tony Robinson
Why is Mixture Modelling so popular?
Identifiability v Nonidentifiability
Tony Robinson
Why is Mixture Modelling so popular?
Identifiability v Nonidentifiability
I
For example, consider the allocation vectors
(4, 4, 3, 3, 4, 2, 2, 3, 1, 3) and (2, 2, 1, 1, 2, 3, 3, 1, 4, 1)
these are different models if each component can be
identified from some information.
Tony Robinson
Why is Mixture Modelling so popular?
Identifiability v Nonidentifiability
I
For example, consider the allocation vectors
(4, 4, 3, 3, 4, 2, 2, 3, 1, 3) and (2, 2, 1, 1, 2, 3, 3, 1, 4, 1)
these are different models if each component can be
identified from some information.
I
However these two vectors define the same partition of the
data and so are identical models from the clustering
viewpoint. Therefore we would like some unique
representation of them that identifies their common
partition, i.e. {{1, 2, 5}, {3, 4, 8, 10}, {6, 7}, {9}}.
Tony Robinson
Why is Mixture Modelling so popular?
RGF Representation
Tony Robinson
Why is Mixture Modelling so popular?
RGF Representation
I
One simple solution is to relabel the allocations in
increasing order as new components make an appearance,
starting with label 0 or 1 for the first observation.
Tony Robinson
Why is Mixture Modelling so popular?
RGF Representation
I
One simple solution is to relabel the allocations in
increasing order as new components make an appearance,
starting with label 0 or 1 for the first observation.
I
The restricted growth function representation of the two
allocation vectors above is
{1, 1, 2, 2, 1, 3, 3, 2, 4, 2}
Tony Robinson
Why is Mixture Modelling so popular?
RGF Representation
I
One simple solution is to relabel the allocations in
increasing order as new components make an appearance,
starting with label 0 or 1 for the first observation.
I
The restricted growth function representation of the two
allocation vectors above is
{1, 1, 2, 2, 1, 3, 3, 2, 4, 2}
I
Given allocation vectors from a sampler possibly suffering
from label switching, we convert them to an unambiguous
sample from space of possible partitions.
Tony Robinson
Why is Mixture Modelling so popular?
Example, Old Faithful Data
Tony Robinson
Why is Mixture Modelling so popular?
Example, Old Faithful Data
I
Using MVN mixtures, we obtained a sample of 2000
allocations from a reversible jump McMC sampler with a
burn-in of 300000 iterations and a subsequent sample of
100000 iterations thinned every 50.
Tony Robinson
Why is Mixture Modelling so popular?
Example, Old Faithful Data
I
Using MVN mixtures, we obtained a sample of 2000
allocations from a reversible jump McMC sampler with a
burn-in of 300000 iterations and a subsequent sample of
100000 iterations thinned every 50.
I
So estimating g as well as Gaussian parameters and
weights.
Tony Robinson
Why is Mixture Modelling so popular?
Example, Old Faithful Data
I
Using MVN mixtures, we obtained a sample of 2000
allocations from a reversible jump McMC sampler with a
burn-in of 300000 iterations and a subsequent sample of
100000 iterations thinned every 50.
I
So estimating g as well as Gaussian parameters and
weights.
Standard approach to clustering is typically:
I
1. Estimate likely number of components.
2. Given g, estimate likely clustering.
3. This conditional approach can easily mislead.
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful “Standard” Clustering
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful “Standard” Clustering
The figure below shows the posterior distribution of the number
of components k which has a prominent mode at 3. Alongside
is the classification into 3 clusters obtained by standard
hierarchical clustering using the number of times each pair of
observations appear in the same sample cluster as a distance
measure. (O’Hagan)
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful “Standard” Clustering
The figure below shows the posterior distribution of the number
of components k which has a prominent mode at 3. Alongside
is the classification into 3 clusters obtained by standard
hierarchical clustering using the number of times each pair of
observations appear in the same sample cluster as a distance
measure. (O’Hagan)
100
1000
90
Waiting time until next eruption (minutes)
1200
Frequency
800
600
400
200
0
80
70
60
50
1
2
3
Number of Components
4
5
Tony Robinson
40
1.5
2
2.5
3
3.5
4
Duration time (minutes)
Why is Mixture Modelling so popular?
4.5
5
5.5
Old Faithful, Common Partitions
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful, Common Partitions
The figure below depicts the 3 most commonly occurring
configurations - all contain just 2 components.
Tony Robinson
Why is Mixture Modelling so popular?
Old Faithful, Common Partitions
The figure below depicts the 3 most commonly occurring
configurations - all contain just 2 components.
1.5
2.0
2.5
3.0
3.5
4.0
Duration time(minutes)
4.5
5.0
90
80
70
50
60
Time till next eruption (minutes)
80
70
50
60
Time till next eruption (minutes)
80
70
60
50
Time till next eruption (minutes)
Third Configuration, Posterior Probability =0.0165
90
Second Configuration, Posterior Probability =0.095
90
Modal Configuration, Posterior Probability =0.3
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1.5
2.0
2.5
Duration time(minutes)
Tony Robinson
Why is Mixture Modelling so popular?
3.0
3.5
4.0
Duration time(minutes)
4.5
5.0
Can we do more?
I
Simple to explore the commonly occuring partitions.
I
But what about variation in the sampled partition values?
I
These live in a very complex and high dimensional discrete
space.
I
Could we get some insight into nature of this sample
distribution?
I
Dissimilarity between each pair of partitions - number of
pairs of observations that agree in the two partions, ie.
either both in same group or both in different groups.
Tony Robinson
Why is Mixture Modelling so popular?
MDS plot of Old Faithful partitions
Tony Robinson
Why is Mixture Modelling so popular?
MDS plot of Old Faithful partitions
Here is a 2D configuration obtained from a nonmetric
multidimensional scaling based on the computed dissimilarities
between the 1172 sampled cluster configurations.
Tony Robinson
Why is Mixture Modelling so popular?
MDS plot of Old Faithful partitions
Here is a 2D configuration obtained from a nonmetric
multidimensional scaling based on the computed dissimilarities
between the 1172 sampled cluster configurations.
isoMDS of unique sampled allocations
6000
4
4
4
4
4 4
4
4
4
4 3
3 343 4
3 4 34 3
4 3
3 3 3 3 3
3 4
33 3 4
3 3
4 3 334 333
3
4 33 33
4 4 3 4333 3
3
33
333
33 3
4
33
33
3
3
33333
34 3 4
3
33
33
33
33
33
3
3
3
3
3
3
3
3
3
3
3
3
3333333
44 3
3333
3 3 3 333
33 3
3333
433333333 3
3
3333333
33 4
3
3
4
3
3
4
3
3
3
4
3
3
3
2
4 3
M
33
3 333
32
3
333
33 4
33
23
33
3
3
3
433333
33
2
3
333
2
3
3
3
3
43
3
3
2
333
33
334 4 333334
433 33333
3
3 33
3333
3 3
33
33
4
33
3
3 33
33
333
3 3
3
3
3
3
3
3 33433
3
3
4
4
3
3
3
4
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3 3 33 3 4 4
343 3 333
3
3333
3333
33
3
3
3
3
3
3
43
333
33
3
3
33
3
44 33
3
3
333
3
3
3
4
3
4
3
3
3
33
33 43
3333
3333
333
333
3333
433333
33
33
3333
3333333333334 3 3 33333 33
3
3333
33
33
333
4 3333333
33333
43
333
3
33
33333
43333333
33
3 333343 3333333
3
3333
33334
33
433333333333
3
3
3
34
3
4
3333
33 3
3
3
3333
3333
3333333
333333
33333
3333
3
3
33
3
3
3
3
3
33333
3
3
33 34 3
4
4
0
2000
4000
4
−2000
4
4 4
44
−2000
0
2000
Tony Robinson
4000
4
4
6000
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
Tony Robinson
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
I
Wide applicability.
Tony Robinson
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
I
Wide applicability.
I
In most applications, it matters, to some extent, which Mi
each observation belongs to.
Tony Robinson
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
I
Wide applicability.
I
In most applications, it matters, to some extent, which Mi
each observation belongs to.
I
Lesson: Care must be taken when classifying for
inferences and prediction. Especially when g unknown.
Tony Robinson
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
I
Wide applicability.
I
In most applications, it matters, to some extent, which Mi
each observation belongs to.
I
Lesson: Care must be taken when classifying for
inferences and prediction. Especially when g unknown.
I
Using allocation helps to avoid pitfalls arising from
standard conditional model selection.
Tony Robinson
Why is Mixture Modelling so popular?
Summary
I
Mixture models very attractive for model heterogeneous
data.
I
Wide applicability.
I
In most applications, it matters, to some extent, which Mi
each observation belongs to.
I
Lesson: Care must be taken when classifying for
inferences and prediction. Especially when g unknown.
I
Using allocation helps to avoid pitfalls arising from
standard conditional model selection.
I
THANK YOU.
Tony Robinson
Why is Mixture Modelling so popular?
© Copyright 2026 Paperzz