EM Algorithm

EM Algorithm
主講人:虞台文
大同大學資工所
智慧型多媒體研究室
Contents
Introduction
 Example  Missing Data
 Example  Mixed Attributes
 Example  Mixture
 Main Body
 Mixture Model
 EM-Algorithm on GMM

EM Algorithm
Introduction
大同大學資工所
智慧型多媒體研究室
Introduction

EM is typically used to compute maximum likelihood
estimates given incomplete samples.

The EM algorithm estimates the parameters of a
model iteratively.
– Starting from some initial guess, each iteration
consists of


an E step (Expectation step)
an M step (Maximization step)
Applications






Filling in missing data in samples
Discovering the value of latent variables
Estimating the parameters of HMMs
Estimating parameters of finite mixtures
Unsupervised learning of clusters
…
EM Algorithm
Example:
Missing Data
大同大學資工所
智慧型多媒體研究室
X ~ N ( , )
2
Univariate Normal Sample
Sampling

x  ( x1 , x2 ,

1
 (x  ) 
f ( x | , ) 
exp  
2 
2

2 


2
, xn )
ˆ  ?
2
ˆ  ?
T
f ( x | , 2 ) 
1
 (x  ) 
exp  
2
2 
 2 
Maximum Likelihood
Sampling


L ( ,  | x) 
2
Given x, it is a
function of  and 2
x  ( x1 , x2 ,
, xn )
T
We want to
maximize it.
f (x |  ,  2 )  f ( x1 |  ,  2 )
 1 

2 
2




n/2
f ( xn |  ,  2 )
 n ( xi   )2 
exp  

2
2

 i 1

L ( ,  | x) 
2
 1 

2 
2



n/2
 n ( xi   )2 
exp  

2
2

 i 1

Log-Likelihood Function
l (  ,  | x)  log L (  ,  | x)
2
Maximize
this instead
2
n
( xi   )2
n
1
 log

2
2
2
2
2

i 1
n
n
1 n 2  n
n 2
2
  log   log 2  2  xi  2  xi  2
2
2
2 i 1
 i 1
2
By setting

l (  ,  2 | x)  0 and



2
l
(

,

| x)  0
2
Max. the Log-Likelihood Function
l (  ,  | x)
2
n
n
1 n 2  n
n 2
2
  log   log 2  2  xi  2  xi  2
2
2
2 i 1
 i 1
2

1
2
l (  ,  | x)  2


n
n
x 
i 1
i
2
0
1 n
ˆ   xi
n i 1
1 n
ˆ   xi
n i 1
n
1
ˆ 2   xi2  ˆ 2
n i 1
Max. the Log-Likelihood Function
l (  ,  | x)
2
n
n
1 n 2  n
n 2
2
  log   log 2  2  xi  2  xi  2
2
2
2 i 1
 i 1
2
2
n

n

2
2
l
(

,

|
x
)



x
 4  xi  4  0
2
2
4  i

2
2 i 1
 i 1
2

n
n
n
i 1
i 1
n
1
n 2   xi2  2  xi  n 2
2
2
 1

  x    xi     xi 
n  i 1  n  i 1 
i 1
n
n
2
i
n
2
1 n
ˆ   xi
n i 1
Miss Data
Sampling
x  ( x1 ,
n
1
ˆ 2   xi2  ˆ 2
n i 1
, xm , xm 1 ,

, xn )
Missing data

n

1 m
ˆ    xi   x j 
n  i 1
j  m 1

n
1 m 2
2
ˆ    xi   x j   ˆ 2
n  i 1
j  m 1

2
T
n

1 m
ˆ    xi   x j 
n  i 1
j  m 1

E-Step
n
1 m 2
2
ˆ    xi   x j   ˆ 2
n  i 1
j  m 1

2
 (t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations

E
ˆ ( t ) , 2
E
ˆ ( t ) , 2
(t )
(t )
 n
  xj
 j m1

x   (n  m) ˆ (t )

 n 2
  xj
 j m1

(t ) 2
2(t )
x   (n  m) ˆ  



n

1 m
ˆ    xi   x j 
n  i 1
j  m 1

E-Step
n
1 m 2
2
ˆ    xi   x j   ˆ 2
n  i 1
j  m 1

2
 (t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations

E
ˆ ( t ) , 2
E
ˆ ( t ) , 2
(t )
(t )
 n
  xj
 j m1

x   (n  m) ˆ (t )

 n 2
  xj
 j m1

(t ) 2
2(t )
x   (n  m) ˆ  

m
(t )
1
s

m
(t )
2
s
  xi  (n  m) ˆ (t )
i 1


  xi2  (n  m) ˆ (t )   2
i 1
2
(t )

n

1 m
ˆ    xi   x j 
n  i 1
j  m 1

M-Step
n
1 m 2
2
ˆ    xi   x j   ˆ 2
n  i 1
j  m 1

2
 (t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations



( t 1)
2 ( t 1)
m
(t )
1
s

n
s2(t )
( t 1)2

 ˆ
n
(t )
1
s
m
(t )
2
s
  xi  (n  m) ˆ (t )
i 1

  xi2  (n  m) ˆ (t )   2
i 1
2
(t )

Exercise
X ~ N ( , )
2
n = 40 (10 data missing)
Estimate  ,  using different initial conditions.
2
375.081556
362.275902
332.612068
351.383048
304.823174
386.438672
430.079689
395.317406
369.029845
365.343938
243.548664
382.789939
374.419161
337.289831
418.928822
364.086502
343.854855
371.279406
439.241736
338.281616
454.981077
479.685107
336.634962
407.030453
297.821512
311.267105
528.267783
419.841982
392.684770
301.910093
EM Algorithm
Example:
Mixed Attributes
大同大學資工所
智慧型多媒體研究室
Multinomial Population
Sampling
( p1 , p2 ,
, pn )
 pi  1
N samples
x  ( x1 , x2 ,
xi : #samples  Ci
x1  x2 
p(x | p1 ,
, xn )
T
 xn  N
N!
, pn ) 
p1x1
x1 ! xn !
pnxn
p(x | p1 ,
N!
, pn ) 
p1x1
x1 ! xn !
xn
n
p
Maximum Likelihood
Sampling
( 2,4,4, )
1
2



1
2
N samples
x  ( x1 , x2 , x3 , x4 )
T
xi : #samples  Ci
x1  x2  x3  x4  N
N!
 x1  x2  x3 1 x4
1
p
(
x
|

)

(

L ( | x ) 
2
2) ( 4) ( 4) (2)
x1 ! xn !
p(x | p1 ,
N!
, pn ) 
p1x1
x1 ! xn !
xn
n
p
Maximum Likelihood
Sampling
( 2,4,4, )
1
2



1
2
N samples
x  ( x1 , x2 , x3 , x4 )
T
xi : #samples  Ci
We want to
1  x2  x3  x4  N
maximize x
it.
N!
 x1  x2  x3 1 x4
1
p
(
x
|

)

(

L ( | x ) 
2
2) ( 4) ( 4) (2)
x1 ! xn !
N!
L ( | x ) 
( 12  2 ) x1 ( 4 ) x2 ( 4 ) x3 ( 12 ) x4
x1 ! xn !
Log-Likelihood
l ( | x)  log L ( | x)
 x1 log( 12  2 )  x2 log( 4 )  x3 log( 4 )  const

x1
x2 x3
l ( | x)  
 

1  
0
 x1  x2 (1   )  x3 (1   )  0
ˆ 
x2  x3
x1  x2  x3
ˆ 
x2  x3
x1  x2  x3
Mixed Attributes
Sampling
( 2,4,4, )
1
2



1
2
N samples
x  ( x1 , x2 , x3  x4 )
T
x3 is not available
ˆ 
x2  x3
x1  x2  x3
E-Step
Sampling
( 2,4,4, )
1
2



1
2
N samples
x  ( x1 , x2 , x3  x4 )
T
x3 is not available
Given (t), what can you say about x3?
E ( t )  x3 | x  ( x3  x4 ) 1
2
 (t )
4

 (t )
4
 xˆ3(t )
ˆ 
x2  x3
x1  x2  x3
M-Step
ˆ
( t 1)
x2  xˆ

(t )
x1  x2  xˆ3
(t )
3
E ( t )  x3 | x  ( x3  x4 ) 1
2
 (t )
4

 (t )
4
 xˆ3(t )
Exercise
xobs  ( x1 , x2 , x3  x4 )  (38,34,125)
T
T
Estimate  using different initial conditions?
EM Algorithm
Example: Mixture
大同大學資工所
智慧型多媒體研究室
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n6
n5
X | M ~ P ( )
Married Obasongs
Unmarried Obasongs
(No Children)
P( M )  1  
P( M c )  
P( x | M ) 
 x e 
x!
P( X  0 | M c )  1
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n5
n6
n0  nA  nB
Married Obasongs
Unmarried Obasongs
(No Children)
Unobserved data:
nA : # married Ob’s
nB : # unmarried Ob’s
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
Binomial/Poison Mixture
pA  
pB  e  (1   )
# Children
 x e 
x  1, 2,
px 
(1   )
x!
n0 n1 n2 n3
# Obasongs
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
n  (nA , nB , n1 ,
n obs  (n0 , n1 ,
, n6 )T
, n6 )T
n0  nA  nB
Complete Data Likelihood
pA  
pB  e  (1   )
# Children
 x e 
x  1, 2,
px 
(1   )
x!
n0 n1 n2 n3
# Obasongs
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
(nA  nB  n1   n6 )! nA nB n1
p A pB p1
L ( ,  | n)  p(n |  ,  ) 
nA !nB !n1 ! n6 !
p6n6
n  (nA , nB , n1 ,
n obs  (n0 , n1 ,
, n6 )T
, n6 )T
n0  nA  nB
Complete Data Likelihood
pA  
pB  e  (1   )
# Children
 x e 
x  1, 2,
px 
(1   )
x!
n0 n1 n2 n3
# Obasongs
L (Complete
,  | n)
data
nA , nB n1 n2
n3
n4
n5
n6
n4
n5
n6n
  x e 

(nA  nB  n1   n6 )! nA  
 Probability p , p p e p (1  p)  
(1  p
)

p
p
2
3
nA !nB !n1 ! nA6 ! B 1
x! 5
x 14 
6
(nA  nB  n1   n6 )! nA nB n1
n6
p
(
n
|

,

)

p
p
p
p
L ( ,  | n) 
A
B
1
6
nA !nB !n1 ! n6 !
nB
6
x
Log-Likelihood
l ( ,  | n)
6
 nA log   nB   nB log(1   )   nx    x log   log(1   )   k
x 1
L ( ,  | n)
x 
nB
 e

(nA  nB  n1   n6 )! nA  

 e (1   )   
(1   ) 
nA !nB !n1 ! n6 !
x!
x 1 

6
nx
N  n0 
 n6
Maximization
l ( ,  | n)
6
 nA log   nB   nB log(1   )   nx    x log   log(1   )   k
x 1

nA nB  n1   n6
l ( ,  | n) 

0


1 
N  nA

0

1 
nA
nA (1   )  ( N  nA )  0
n
ˆ  A
N
N  n0 
 n6
n
A
ˆ

N
Maximization
l ( ,  | n)
6
 nA log   nB   nB log(1   )   nx    x log   log(1   )   k
x 1

l ( ,  | n)  (nB  n1 

( N  n A ) 
1
6
xn


x 1
x
 n6 ) 
0
1
6
xn


x 1
x
0
ˆ 
6
 xn
x 1
x
N  nA
6
E-Step
n
ˆ  A
N
n  (nA , nB , n1 ,
n obs  (n0 , n1 ,
n
x 1
x
N  nA
T
, n6 )
, n6 )
n0  nA  nB
T
pA  
Given  (t ) ,  (t )
(t )
A
ˆ 
 xn
 E ( t ) , ( t )  nA | n obs  
pB  e  (1   )
n0

(t )
(t )
 (1   )e
(t )
(t )
6
M-Step
n
ˆ  A
N
ˆ 
 xn
x 1
x
N  nA
6

n
( t 1)
(t )
A
(t )
A
n

N

( t 1)
 E ( t ) , ( t )  nA | n obs  

 xn
x 1
N n
n0

(t )
x
(t )
A
(t )
 (1   )e
(t )
(t )
Example
# Children
# Obasongs 3,062 587 284 103
33
4
2
t


nA
nB
0
1
2
3
4
5
0.750000
0.614179
0.614378
0.614532
0.614652
0.614744
0.400000
1.035478
1.036013
1.036427
1.036748
1.036996
2502.779
2503.591
2504.219
2504.705
2505.081
2505.371
559.221
558.409
557.781
557.295
556.919
556.629
EM Algorithm
Main Body
大同大學資工所
智慧型多媒體研究室
Maximum Likelihood

X  {x1 , x 2 ,, x N }
N
L ( | X)  p ( X |)   p (x i|)
i 1
  arg max L( | X)
*

Latent Variables

Incomplete
Data
X  {x1 , x 2 ,, x N }
Y  {y1 , y 2 ,, y N }
Complete Data
Z  ( X, Y)
Complete Data
Z  ( X, Y)
Complete Data Likelihood

X  {x1 , x 2 ,, x N }
Y  {y1 , y 2 ,, y N }
L ( | Z)  p (Z|)  p( X, Y|)
 p(Y | X, ) p( X|)
Complete Data
Z  ( X, Y)
Complete Data Likelihood
L ( | Z)  p(Y | X, ) p( X|)
A function of
random
variable Y.
If we are given ,
A function of
latent variable Y
and parameter 
The result is in
term of random
variable Y.
A function of
parameter 
Computable
L ( | Z)  p( X, Y | )
Expectation Step
Let (i1) be the parameter vector obtained at the
(i1)th step.
Define
Q(Θ, Θ(i 1) )  E[log L (Θ | Z) | X, Θ(i 1) ]
( i 1)

log
p
(
X
,
y
|
Θ
)

p
(
y
|
X
,
Θ
)dy continuous

y

Y


( i 1)
log
p
(
X
,
y
|
Θ
)

p
(
y
|
X
,
Θ
)
discrete

yY
L ( | Z)  p( X, Y | )
Maximization Step
Let ((i1)
i ) be the parameter vector obtained at (the
i 1)
(i1)th step.
Θ  arg max Q(Θ, Θ
Θ
)
Define
Q(Θ, Θ(i 1) )  E[log L (Θ | Z) | X, Θ(i 1) ]
( i 1)

log
p
(
X
,
y
|
Θ
)

p
(
y
|
X
,
Θ
)dy continuous

y

Y


( i 1)
log
p
(
X
,
y
|
Θ
)

p
(
y
|
X
,
Θ
)
discrete

yY
EM Algorithm
Mixture Model
大同大學資工所
智慧型多媒體研究室
Mixture Models


If there is a reason to believe that a data set
is comprised of several distinct populations,
a mixture model can be used.
It has the following form:
M
M
p(x | Θ)   j p j (x |  j )
with
j 1
Θ  (1,, M ,1, ,,M )

j 1
j
1
M
p(x | Θ)   j p j (x |  j )
Mixture Models

j 1
X  {x1 , x 2 ,, x N }
Y  { y1 , y2 ,, y N }
Let yi{1,…, M} represents the source that
generates the data.
M
p(x | Θ)   j p j (x |  j )
Mixture Models

x
j 1
p ( x | y  j , Θ)  p j ( x |  j )
y j
p ( y  j | Θ)   j
Let yi{1,…, M} represents the source that
generates the data.
p ( x | y  j , Θ)  p j ( x |  j )
M
p(x | Θ)   j p j (x |  j )
p ( y  j | Θ)   j
Mixture Models
j 1
p(z i | Θ)  p(xi , yi | Θ)  p( yi | xi , Θ) p(xi | Θ)

xi
yi
zi
p ( x | y  j , Θ)  p j ( x |  j )
p ( y  j | Θ)   j
M
p(x | Θ)   j p j (x |  j )
Mixture Models
j 1
p(z i | Θ)  p(xi , yi | Θ)  p( yi | xi , Θ) p(xi | Θ)
p (x i , yi , Θ) p(x i | yi , Θ) p( yi , Θ)
p ( yi | x i , Θ ) 

p ( x i , Θ)
p ( x i , Θ)
p(x i | yi , Θ) p( yi | Θ) p(Θ) p(x i | yi , Θ) p ( yi | Θ)


p(x i | Θ) p(Θ)
p ( x i | Θ)

p yi ( x i |  yi ) yi
M

j 1
j
p j (x |  j )
p ( x | y  j , Θ)  p j ( x |  j )
p ( y  j | Θ)   j
M
p(x | Θ)   j p j (x |  j )
Mixture Models
j 1
p(z i | Θ)  p(xi , yi | Θ)  p( yi | xi , Θ) p(xi | Θ)
p (x i , yi , Θ) p(x i | yi , Θ) p( yi , Θ)
p ( yi | x i , Θ ) 

p ( x i , Θ)
p ( x i , Θ)
Given x and , the
p(x i | yi , Θ) p( yi | Θ) p(Θ) p(x i | yi , Θ) p ( yi | Θ)

conditional
 density of y
p(x i | Θ) p(Θ) can be computed.
p ( x i | Θ)

p yi ( x i |  yi ) yi
M

j 1
j
p j (x |  j )
N
L (Θ | Z)  p( X, y | Θ)    yi p yi (x i |  yi )
i 1
Complete-Data Likelihood Function
X  {x1 ,, x N }

y  { y1 ,, y N }
z i  (xi , yi )
Z  {z1 ,, z N )
L (Θ | Z)  p(Z|Θ)  p ( X, y|Θ)  p( X | y, Θ) p(y|Θ)
N
N
i 1
i 1
  p(x i | yi , Θ) p( yi|Θ)    yi p yi (x i |  yi )
y
i
y
i
N
L (Θ | Z)  p( X, y | Θ)    yi p yi (x i |  yi )
i 1
Expectation
N

log L (Θ | Z)   log  yi p yi (x i |  yi )

i 1
Q(Θ, Θ g )  E[log L (Θ | Z) | X, Θ g ]
  log L (Θ | Z) p (y | X, Θ g )
g: Guess
yY
N
N
g
p
(
y
|
x
,
Θ
)
   log  yi p yi (xi |  yi )  
j
j
j 1
yY
i 1
N
L (Θ | Z)  p( X, y | Θ)    yi p yi (x i |  yi )
i 1
Expectation
N

log L (Θ | Z)   log  yi p yi (x i |  yi )

i 1
Q(Θ, Θ g )  E[log L (Θ | Z) | X, Θ g ]
  log L (Θ | Z) p (y | X, Θ g )
g: Guess
yY
N
N
g
p
(
y
|
x
,
Θ
)
   log  yi p yi (xi |  yi )  
j
j
j 1
yY
i 1
N
M
    yi ,l log  l pl (xi | l )   p( y j | x j , Θ g )
yY i 1 l 1
N
j 1
Expectation
M
N
N
Q(Θ, Θ g )   log  l pl (xi |  l )  yi ,l  p( y j | x j , Θ g )
l 1 i 1
M
yY
N
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
  log  l pl (xi |  l )    yi ,l  p( y j | x j , Θ g )
l 1 i 1
Zero when yi  l
N
M
Q(Θ, Θ )     yi ,l log  l pl (xi | l )   p( y j | x j , Θ g )
g
yY i 1 l 1
N
j 1
Expectation
M
N
N
Q(Θ, Θ g )   log  l pl (xi |  l )  yi ,l  p( y j | x j , Θ g )
l 1 i 1
M
N
yY
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
  log  l pl (xi |  l )    yi ,l  p( y j | x j , Θ g )
l 1 i 1
M
  log  l pl (xi | l )  
 y 1
l 1 i 1
 1

M
N
M
M

yi 1 1 yi 1 1

g 
g
p
(
y
|
X
,
Θ
)
p
(
l
|
x
,
Θ
)


j
i

y N 1 j 1

j i

M
N
Expectation
M
N
N
Q(Θ, Θ g )   log  l pl (xi |  l )  yi ,l  p( y j | x j , Θ g )
l 1 i 1
M
N
yY
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
  log  l pl (xi |  l )    yi ,l  p( y j | x j , Θ g )
l 1 i 1
M
  log  l pl (xi | l )  
 y 1
l 1 i 1
 1

M
N
Q(Θ, Θ )
g
M
M

yi 1 1 yi 1 1

g 
g
p
(
y
|
X
,
Θ
)
p
(
l
|
x
,
Θ
)


j
i

y N 1 j 1

j i

M
N
 N  M


  log  l pl (x i |  l )   p( y j | X, Θ g )  p(l | xi , Θ g )

 j 1  y 1
l 1 i 1
j

 j i 
M
N
Expectation
M
N
Q(Θ, Θ g )   log  l pl (x i |  l )p(l | x i , Θ g )
l 1 i 1
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
1
Q(Θ, Θ )
g
 N  M


  log  l pl (x i |  l )   p( y j | X, Θ g )  p(l | xi , Θ g )

 j 1  y 1
l 1 i 1
j

 j i 
M
N
Θ  (1,, M ,1, ,,M )
Maximization
Given the initial guess
M
N
M
g
,
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
We want to find , to maximize the above
expectation.
In fact, iteratively.
The GMM (Guassian Mixture Model)
Guassian model of a d-dimensional source, say j :
1
 1

T
1
p j (x | μ j , Σ j ) 
exp  (x  μ j ) Σ j (x  μ j )
d /2
1/ 2
(2 ) | Σ j |
 2

 j  (μ j , Σ j )
GMM with M sources:
M
j 0
j 1

p j (x | μ1 , Σ1 ,, μ M , Σ M )   j p j (x | μ j , Σ j )
j
1
EM Algorithm
EM-Algorithm on
GMM
大同大學資工所
智慧型多媒體研究室
Goal
M
Mixture Model
p(x | Θ)    l pl (x |  l )
l 1
Θ  (1,, M ,1, ,,M )
M
subject to

l 1
l
1
To maximize:
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
Goal
M
Mixture Model
p(x | Θ)    l pl (x |  l )
Correlated
with l only.
Correlated
Θ  (1,, Mwith
,1, ,
M)
l,only.
l 1
M
subject to

l 1
l
1
To maximize:
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Due to the constraint on l’s, we introduce Lagrange
Multiplier , and solve the following equation.
 M N
M

g
 log(  l ) p(l | xi , Θ )     l  1  0,
 l  l 1 i 1
 l 1

N
1

i 1
N
p(l | xi , Θ g )    0,
l  1,, M
l
g
p
(
l
|
x
,
Θ
)   l   0,

i
i 1
l  1,, M
l  1,, M
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
Finding l
M
  N
N
 p(l | x , Θ
i
l 1 i 1
N
M
g
l 1
M
 p(l | x , Θ
i
i 1 l 1
1
)   l  0
M
g
)   l  0
l 1
1
N
N
g
p
(
l
|
x
,
Θ
)   l   0,

i
i 1
l  1,, M
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
1
l 
N
l 1 i 1
  N
N
 p(l | x , Θ
i
i 1
p(l | xi , Θ g ) 
g
)
 lg pl (xi | lg )
M
g
g

p
(
x
|

 j j j)
j 1
N
g
p
(
l
|
x
,
Θ
)   l   0,

i
i 1
l  1,, M
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Only need to maximize
this term
Consider GMM
1
 1

T
1
pl (x | μl , Σl ) 
exp

(
x

μ
)
Σ
(
x

μ
)
l
l
l 
d /2
1/ 2

(2 ) | Σl |
 2

l  (μl , Σl )
log[ pl (x | μ l , Σ l )]   d2 log 2  12 log | Σ l |1/ 2  12 (x  μ l )T Σl1 (x  μ l )
unrelated
M
N
M
N
Q(Θ, Θ )   log(  l ) p(l | x i , Θ )   log[ pl (x i |  l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Only need to maximize
this term
Therefore, we want to maximize:
M
N


Q(Θ, Θ )    12 log | Σ l |1/ 2  12 (x i  μ l )T Σ l1 (x i  μ l ) p(l | x i , Θ g )
g
l 1 i 1
How?
knowledge on matrix algebra is needed.
log[ pl (x | μ l , Σ l )]   d2 log 2  12 log | Σ l |1/ 2  12 (x  μ l )T Σl1 (x  μ l )
unrelated
p(l | xi , Θ g ) 
Finding l
 lg pl (xi | lg )
M
g
g

p
(
x
|

 j j j)
j 1
Therefore, we want to maximize:
M
N


Q(Θ, Θ )    12 log | Σ l |1/ 2  12 (x i  μ l )T Σ l1 (x i  μ l ) p(l | x i , Θ g )
g
l 1 i 1
N
μl 
 x p(l | x , Θ
i 1
N
i
i
 p(l | x , Θ
i 1
g
i
g
N
)
l 
)
g
T
p
(
l
|
x
,
Θ
)(
x

μ
)(
x

μ
)

i
i
l
i
l
i 1
N
g
p
(
l
|
x
,
Θ
)

i
i 1
p(l | xi , Θ g ) 
Summary
 lg pl (xi | lg )
M
g
g

p
(
x
|

 j j j)
j 1
EM algorithm for GMM
Given an initial guess g, find new as follows

new
l
1

N
N
g
p
(
l
|
x
,
Θ
)

i
i 1
N
Not converge
μ lnew 
 x p(l | x , Θ
i 1
N
i
i
 p(l | xi , Θ )
g
i 1
Θ Θ
g
N
g
new
)
 lnew 
 p(l | x , Θ
i 1
g
i
N
)( x i  μ lnew )( x i  μ lnew )T
g
p
(
l
|
x
,
Θ
)

i
i 1
Demonstration
EM algorithm for Mixture models
Exercises






Write a program to generate
multidimensional Gaussian distribution.
Draw the distribution for 2-dim data.
Write a program to generate GMM.
Write EM-algorithm to analyze GMM data.
Study more EM-algorithm for mixture.
Find applications for EM-algorithm.