Clustering Methods: Part 8 Gaussian Mixture Models Ville Hautamäki Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Preliminaries • We assume that the dataset X has been generated by a parametric distribution p(X). • Estimation of the parameters of p is known as density estimation. • We consider Gaussian distribution. University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Figures taken from: http://research.microsoft.com/~cmbishop/PRML/ www.cs.joensuu.fi Typical parameters (1) • Mean (μ): average value of p(X), also called expectation. • Variance (σ): provides a measure of variability in p(X) around the mean. University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Typical parameters (2) • Covariance: measures how much two variables vary together. • Covariance matrix: collection of covariances between all dimensions. – Diagonal of the covariance matrix contains the variances of each attribute. University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi One-dimensional Gaussian Normal( x | , 2 ) 1 2 exp 2 x 2 2 2 1 University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu • Parameters to be estimated are the mean (μ) and variance (σ) Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Multivariate Gaussian (1) Normal(x | , ) 1 T 1 exp x x (2 ) 2 det()1/ 2 2 1 • In multivariate case we have covariance matrix instead of variance University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Multivariate Gaussian (2) Diagonal Single 2 0 0 2 12 0 2 0 2 Full covariance 112 12 2 2 2 12 22 Complete data log likelihood: N ln p( X ) ln Normal( x n | , ) n 1 University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Maximum Likelihood (ML) parameter estimation • Maximize the log likelihood formulation • Setting the gradient of the complete data log likelihood to zero we can find the closed form solution. – Which in the case of mean, is the sample average. University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi When one Gaussian is not enough • Real world datasets are rarely unimodal! University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Mixtures of Gaussians M p(x) k Normal( x | k , k ) k 1 University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Mixtures of Gaussians (2) • In addition to mean and covariance parameters (now M times), we have mixing coefficients πk. Following properties hold for the mixing coefficients: M k 1 k 1 0 k 1 It can be seen as the prior probability of the component k University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Responsibilities (1) Complete data Incomplete data • Component labels (red, green and blue) cannot be observed. • We have to calculate approximations (responsibilities). Responsibilities University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Responsibilities (2) • Responsibility describes, how probably observation vector x is from component k. • In clustering, responsibilities take values 0 and 1, and thus, it defines the hard partitioning. University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Responsibilities (3) We can express the marginal density p(x) as: M p ( x) p ( k ) p ( x | k ) k 1 From this, we can find the responsibility of the kth component of x using Bayesian theorem: k ( x) p ( k | x) p ( x) p ( x | k ) p(l ) p(x | l ) l k Normal(x | k , k ) l Normal(x | l , l ) l University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Expectation Maximization (EM) • Goal: Maximize the log likelihood of the whole data M ln p( X | , , ) ln k Normal(x n | k , k ) n 1 k 1 N • When responsibilities are calculated, we can maximize individually for the means, covariances and the mixing coefficients! University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Exact update equations New mean estimates: 1 k Nk N N n 1 k (xn )xn N k k (x n ) n 1 Covariance estimates T N 1 k k (x n )(x )(x ) N k n 1 Mixing coefficient estimates Nk k N University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi EM Algorithm • Initialize parameters • while not converged – E step: Calculate responsibilities. – M step: Estimate new parameters – Calculate log likelihood of the new parameters University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Example of EM University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Computational complexity • Hard clustering with MSE criterion is NP-complete. • Can we find optimal GMM in polynomial time? • Finding optimal GMM is in class NP University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Some insights • In GMM we need to estimate the parameters, which all are real numbers – Number of parameters: M+M(D) + M(D(D-1)/2) • Hard clustering has no parameters, just set partitioning (remember optimality criteria!) University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Some further insights (2) • Both optimization functions are mathematically rigorous! • Solutions minimizing MSE are always meaningful • Maximization of log likelihood might lead to singularity! University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Example of singularity University of Joensuu Dept. of Computer Scienc P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi
© Copyright 2026 Paperzz