Clustering Methods

Clustering Methods: Part 8
Gaussian Mixture Models
Ville Hautamäki
Speech and Image Processing Unit
Department of Computer Science
University of Joensuu, FINLAND
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Preliminaries
• We assume that the dataset X has been
generated by a parametric distribution
p(X).
• Estimation of the parameters of p is
known as density estimation.
• We consider Gaussian distribution.
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
Figures taken from:
http://research.microsoft.com/~cmbishop/PRML/
www.cs.joensuu.fi
Typical parameters (1)
• Mean (μ): average value of p(X), also
called expectation.
• Variance (σ): provides a measure of
variability in p(X) around the mean.
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Typical parameters (2)
• Covariance: measures how much two
variables vary together.
• Covariance matrix: collection of
covariances between all dimensions.
– Diagonal of the covariance matrix
contains the variances of each attribute.
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
One-dimensional Gaussian
Normal( x |  ,  2 ) 
 1
2
exp  2  x    
2 2
 2

1
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
• Parameters to be estimated are
the mean (μ) and variance (σ)
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Multivariate Gaussian (1)
Normal(x |  , ) 
1
T
 1

exp

x



x


 

 
(2 ) 2 det()1/ 2
2


1
• In multivariate case we have
covariance matrix instead of variance
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Multivariate Gaussian (2)
Diagonal
Single
 2

 0
0 
2 
 
  12 0 

2
 0 2 
Full covariance
 112 12 2 
 2
2
 12  22 
Complete data log likelihood:
N
ln p( X )  ln  Normal( x n |  , )
n 1
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Maximum Likelihood (ML)
parameter estimation
• Maximize the log likelihood formulation
• Setting the gradient of the complete data log
likelihood to zero we can find the closed form
solution.
– Which in the case of mean, is the sample average.
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
When one Gaussian
is not enough
• Real world datasets are rarely unimodal!
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Mixtures of Gaussians
M
p(x)    k Normal( x | k ,  k )
k 1
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Mixtures of Gaussians (2)
• In addition to mean and covariance
parameters (now M times), we have
mixing coefficients πk.
Following properties hold for the mixing coefficients:
M

k 1
k
1
0  k 1
It can be seen as the prior probability of the component k
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Responsibilities (1)
Complete data
Incomplete data
• Component labels (red, green and blue)
cannot be observed.
• We have to calculate approximations
(responsibilities).
Responsibilities
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Responsibilities (2)
• Responsibility describes, how
probably observation vector x is from
component k.
• In clustering, responsibilities take
values 0 and 1, and thus, it defines the
hard partitioning.
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Responsibilities (3)
We can express the marginal density p(x) as:
M
p ( x)   p ( k ) p ( x | k )
k 1
From this, we can find the responsibility of the
kth component of x using Bayesian theorem:
 k ( x)  p ( k | x)
p ( x) p ( x | k )

 p(l ) p(x | l )
l

 k Normal(x | k ,  k )
  l Normal(x | l , l )
l
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Expectation Maximization (EM)
• Goal: Maximize the log likelihood of
the whole data
M

ln p( X | ,  , )   ln   k Normal(x n | k ,  k ) 
n 1
 k 1

N
• When responsibilities are calculated,
we can maximize individually for the
means, covariances and the mixing
coefficients!
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Exact update equations
New mean estimates:
1
k 
Nk
N
N

n 1
k
(xn )xn
N k    k (x n )
n 1
Covariance estimates
T
N
1
k 
 k (x n )(x   )(x   )

N k n 1
Mixing coefficient estimates
Nk
k 
N
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
EM Algorithm
• Initialize parameters
• while not converged
– E step: Calculate responsibilities.
– M step: Estimate new parameters
– Calculate log likelihood of the new
parameters
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Example of EM
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Computational complexity
• Hard clustering with MSE criterion is
NP-complete.
• Can we find optimal GMM in
polynomial time?
• Finding optimal GMM is in class NP
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Some insights
• In GMM we need to estimate the
parameters, which all are real numbers
– Number of parameters:
M+M(D) + M(D(D-1)/2)
• Hard clustering has no parameters, just
set partitioning (remember optimality
criteria!)
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Some further insights (2)
• Both optimization functions are
mathematically rigorous!
• Solutions minimizing MSE are always
meaningful
• Maximization of log likelihood might
lead to singularity!
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Example of singularity
University of Joensuu
Dept. of Computer Scienc
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi