Fourier Transform

Expectation
Maximization for GMM
Comp344 Tutorial
Kai Zhang
GMM
 Model the data distribution by a combination
of Gaussian functions
 Given a set of sample points, how to estimate
the parameters of the GMM?
EM Basic Idea
 Given data X, and initial parameter Θt
 Assume a hidden variable Y
 1. Study how Y is distributed based on current knowledge (X and
Θt), i.e., p(Y|X, Θt)
 Compute the expectation of the joint data likelihood under
this distribution (called Q function)



Q  t , t 1  EY | X , t L X , Y |  t 1

 2. Maximize this expectation w.r.t. the to-be-determined
parameter Θt+1
 Iterate step 1 and 2 until convergence
EM with GMM
 In the context of GMM
 X: data points
 Y: which Gaussian creates which data points
 Θ:parameters
of the mixture model
c
p ( x)   p ( yk |  ) p ( x | yk ,  )
k 1
p ( x | y k ,  )  p ( x | y k ,  k )  pk
or
1
k
 ( x  k )2 

exp  
2
2 k 
2

 1

1
exp   x   k '  k x   k 
 2

2  k
1
 pk
   pk ,  k ,  k k 1
c

Constraint: Pk’s must sum up to 1, so that p(x) is a pdf
 How to write the Q function under GMM setting
 Likelihood of a data set is the multiplication of all the
sample likelihood, so
Q( , 
t
t 1

n
 
)   p y | xi ,  t log p y, xi |  t 1
i 1 Y


 

t
t
p
y
|

p
x
|
y
,

i
p y | xi ,  t 

t
t
 p y |  p xi | y,



y
 






pk p xi |  k
t
p
p
x
|

 k i k
k

t

p y, xi |  t 1  p y |  t 1 p xi | y,  t 1  pkt 1 p xi |  k


t 1
 The Q function specific for GMM is
Q( , 
t
t 1





pk p xi |  k
t 1
t 1
)  
log
p
p
x
|

k
i
k
t
i 1 k  pk ' p xi |  k '
n
t


k'
 Plug in the definition of p(x|Θk), compute derivative w.r.t.
the parameters, we obtain the iteration procedures

E step
pkt p( xi |  kt )
p 
,
t
t
p
p
(
x
|

)
 k' i k'
t
ik
k'
n

M step
p
t 1
k
1 n t t 1
  pik ,  k 
n i 1
p x
i 1
n
t
ik i
p
i 1
t
ik
n
,  tk1 


 pikt xi  kt xi  kt
i 1
n
p
i 1
t
ik

T
Posteriors
t 1
p
 Intuitive meaning of ik

The posterior probability that xi is created by the kth
Gaussian component (soft membership)
 The meaning of pkt 1
t 1
 Note that it is the summation of all pik having the
same k
 So it means the strength of the kth Gaussian
component
Comments
 GMM can be deemed as performing a


density estimation, in the form of a combination of a number
of Gaussian functions
clustering, where clusters correspond to the Gaussian
component, and cluster assignment can be achieved
through the bayes rule
 GMM produces exactly what are needed in the Bayes
decision rule: prior probability and class conditional
probability
 So GMM+Bayes rule can compute posterior
probability, hence solving clustering problem
Illustration
X1(i=1)
X2(i=2)
…………
Class1,k=1 (P1)
P11 =P(x1|k=1)
P21 =P(x2|k=1)
row sum up to 1
……Each
(a Gaussian curve)
Class2,k=2 (P2)
P12 = p(x1|k=2)
P22 = P(x2|k=2)
Condition:
P1 + P2=1
Each column can be
used to compute the
posterior probability
Class/points
Conditional Prob
p( x | yk , k )  pk
Illustration
1
k
 ( x  k ) 2 

exp  
2
2

2
k


class
Prior
probability
Conditional
probability
x1
x2
x3
x4
x5
c1
P1=2/5
c1
P1|1=0.35
P2|1=0.35
P3|1=0.1
P41=0.1
P51=0.1
c2
P2=3/5
c2
P1|2=0.05
P2|2=0.05
P3|2=0.3
P4|2=0.3
P5|2=0.3
pkt p( xi |  kt )
p 
,
t
t
p
p
(
x
|

)
 k' i k'
t
ik
k'
class
(updated)
Prior
Probability
Posterior
probability
x1
x2
x3
x4
x5
c1
(28/17+6/11
)/5
c1
14/17
14/17
2/11
2/11
2/11
c2
(6/17+21/11
)/5
c2
3/17
3/17
3/11
9/11
9/11
p
t 1
k
1 n t
  pik
n i 1
(Updated)
Conditional
Probability
Estimate the mean and covariance

 

n

i 1


kt 1    pikt xi  /  pikt , tk1   pikt xi   kt xi   kt  /  pikt 
n
 i 1
n
  i 1
c1
X1(14/17),X2(14/17),X3(2/11),X4(2/11),X5(2/11)
c2
X1(4/17),X2(4/17),X3(9/11),X4(9/11),X5(9/11)
T
n
 i 1

 ( x  k ) 2 

exp  
2 k2 
 k 2

 ( x  k ) 2 
1

p( x | yk , k )  pk
exp  
2 k2 
 k 2

p( x | yk , k )  pk
1
Initialization
 Perform an initial clustering and divide the
data into m clusters (e.g., simply cut one
dimension into m segments)
 For the kth cluster



Its mean is the kth Gaussian component mean
(μk)
Its covariance is the kth Gaussian component
covariance (Σk)
The portion of samples is the Prior for the kth
Gaussian component (pk)
EM iterations
Applications, image segmentation