EM Algorithm
主講人:虞台文
大同大學資工所
智慧型多媒體研究室
Contents
Introduction
Example Missing Data
Example Mixed Attributes
Example Mixture
Main Body
Mixture Model
EM-Algorithm on GMM
EM Algorithm
Introduction
大同大學資工所
智慧型多媒體研究室
Introduction
EM is typically used to compute maximum likelihood
estimates given incomplete samples.
The EM algorithm estimates the parameters of a
model iteratively.
– Starting from some initial guess, each iteration
consists of
an E step (Expectation step)
an M step (Maximization step)
Applications
Filling in missing data in samples
Discovering the value of latent variables
Estimating the parameters of HMMs
Estimating parameters of finite mixtures
Unsupervised learning of clusters
…
EM Algorithm
Example:
Missing Data
大同大學資工所
智慧型多媒體研究室
X ~ N ( , )
2
Univariate Normal Sample
Sampling
x ( x1 , x2 ,
1
(x )
f ( x | , )
exp
2
2
2
2
, xn )
ˆ ?
2
ˆ ?
T
f ( x | , 2 )
1
(x )
exp
2
2
2
Maximum Likelihood
Sampling
L ( , | x)
2
Given x, it is a
function of and 2
x ( x1 , x2 ,
, xn )
T
We want to
maximize it.
f (x | , 2 ) f ( x1 | , 2 )
1
2
2
n/2
f ( xn | , 2 )
n ( xi )2
exp
2
2
i 1
L ( , | x)
2
1
2
2
n/2
n ( xi )2
exp
2
2
i 1
Log-Likelihood Function
l ( , | x) log L ( , | x)
2
Maximize
this instead
2
n
( xi )2
n
1
log
2
2
2
2
2
i 1
n
n
1 n 2 n
n 2
2
log log 2 2 xi 2 xi 2
2
2
2 i 1
i 1
2
By setting
l ( , 2 | x) 0 and
2
l
(
,
| x) 0
2
Max. the Log-Likelihood Function
l ( , | x)
2
n
n
1 n 2 n
n 2
2
log log 2 2 xi 2 xi 2
2
2
2 i 1
i 1
2
1
2
l ( , | x) 2
n
n
x
i 1
i
2
0
1 n
ˆ xi
n i 1
1 n
ˆ xi
n i 1
n
1
ˆ 2 xi2 ˆ 2
n i 1
Max. the Log-Likelihood Function
l ( , | x)
2
n
n
1 n 2 n
n 2
2
log log 2 2 xi 2 xi 2
2
2
2 i 1
i 1
2
2
n
n
2
2
l
(
,
|
x
)
x
4 xi 4 0
2
2
4 i
2
2 i 1
i 1
2
n
n
n
i 1
i 1
n
1
n 2 xi2 2 xi n 2
2
2
1
x xi xi
n i 1 n i 1
i 1
n
n
2
i
n
2
1 n
ˆ xi
n i 1
Miss Data
Sampling
x ( x1 ,
n
1
ˆ 2 xi2 ˆ 2
n i 1
, xm , xm 1 ,
, xn )
Missing data
n
1 m
ˆ xi x j
n i 1
j m 1
n
1 m 2
2
ˆ xi x j ˆ 2
n i 1
j m 1
2
T
n
1 m
ˆ xi x j
n i 1
j m 1
E-Step
n
1 m 2
2
ˆ xi x j ˆ 2
n i 1
j m 1
2
(t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations
E
ˆ ( t ) , 2
E
ˆ ( t ) , 2
(t )
(t )
n
xj
j m1
x (n m) ˆ (t )
n 2
xj
j m1
(t ) 2
2(t )
x (n m) ˆ
n
1 m
ˆ xi x j
n i 1
j m 1
E-Step
n
1 m 2
2
ˆ xi x j ˆ 2
n i 1
j m 1
2
(t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations
E
ˆ ( t ) , 2
E
ˆ ( t ) , 2
(t )
(t )
n
xj
j m1
x (n m) ˆ (t )
n 2
xj
j m1
(t ) 2
2(t )
x (n m) ˆ
m
(t )
1
s
m
(t )
2
s
xi (n m) ˆ (t )
i 1
xi2 (n m) ˆ (t ) 2
i 1
2
(t )
n
1 m
ˆ xi x j
n i 1
j m 1
M-Step
n
1 m 2
2
ˆ xi x j ˆ 2
n i 1
j m 1
2
(t ) be the estimated parameters at
Let
2 (t ) the initial of the tth iterations
( t 1)
2 ( t 1)
m
(t )
1
s
n
s2(t )
( t 1)2
ˆ
n
(t )
1
s
m
(t )
2
s
xi (n m) ˆ (t )
i 1
xi2 (n m) ˆ (t ) 2
i 1
2
(t )
Exercise
X ~ N ( , )
2
n = 40 (10 data missing)
Estimate , using different initial conditions.
2
375.081556
362.275902
332.612068
351.383048
304.823174
386.438672
430.079689
395.317406
369.029845
365.343938
243.548664
382.789939
374.419161
337.289831
418.928822
364.086502
343.854855
371.279406
439.241736
338.281616
454.981077
479.685107
336.634962
407.030453
297.821512
311.267105
528.267783
419.841982
392.684770
301.910093
EM Algorithm
Example:
Mixed Attributes
大同大學資工所
智慧型多媒體研究室
Multinomial Population
Sampling
( p1 , p2 ,
, pn )
pi 1
N samples
x ( x1 , x2 ,
xi : #samples Ci
x1 x2
p(x | p1 ,
, xn )
T
xn N
N!
, pn )
p1x1
x1 ! xn !
pnxn
p(x | p1 ,
N!
, pn )
p1x1
x1 ! xn !
xn
n
p
Maximum Likelihood
Sampling
( 2,4,4, )
1
2
1
2
N samples
x ( x1 , x2 , x3 , x4 )
T
xi : #samples Ci
x1 x2 x3 x4 N
N!
x1 x2 x3 1 x4
1
p
(
x
|
)
(
L ( | x )
2
2) ( 4) ( 4) (2)
x1 ! xn !
p(x | p1 ,
N!
, pn )
p1x1
x1 ! xn !
xn
n
p
Maximum Likelihood
Sampling
( 2,4,4, )
1
2
1
2
N samples
x ( x1 , x2 , x3 , x4 )
T
xi : #samples Ci
We want to
1 x2 x3 x4 N
maximize x
it.
N!
x1 x2 x3 1 x4
1
p
(
x
|
)
(
L ( | x )
2
2) ( 4) ( 4) (2)
x1 ! xn !
N!
L ( | x )
( 12 2 ) x1 ( 4 ) x2 ( 4 ) x3 ( 12 ) x4
x1 ! xn !
Log-Likelihood
l ( | x) log L ( | x)
x1 log( 12 2 ) x2 log( 4 ) x3 log( 4 ) const
x1
x2 x3
l ( | x)
1
0
x1 x2 (1 ) x3 (1 ) 0
ˆ
x2 x3
x1 x2 x3
ˆ
x2 x3
x1 x2 x3
Mixed Attributes
Sampling
( 2,4,4, )
1
2
1
2
N samples
x ( x1 , x2 , x3 x4 )
T
x3 is not available
ˆ
x2 x3
x1 x2 x3
E-Step
Sampling
( 2,4,4, )
1
2
1
2
N samples
x ( x1 , x2 , x3 x4 )
T
x3 is not available
Given (t), what can you say about x3?
E ( t ) x3 | x ( x3 x4 ) 1
2
(t )
4
(t )
4
xˆ3(t )
ˆ
x2 x3
x1 x2 x3
M-Step
ˆ
( t 1)
x2 xˆ
(t )
x1 x2 xˆ3
(t )
3
E ( t ) x3 | x ( x3 x4 ) 1
2
(t )
4
(t )
4
xˆ3(t )
Exercise
xobs ( x1 , x2 , x3 x4 ) (38,34,125)
T
T
Estimate using different initial conditions?
EM Algorithm
Example: Mixture
大同大學資工所
智慧型多媒體研究室
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n6
n5
X | M ~ P ( )
Married Obasongs
Unmarried Obasongs
(No Children)
P( M ) 1
P( M c )
P( x | M )
x e
x!
P( X 0 | M c ) 1
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n5
n6
n0 nA nB
Married Obasongs
Unmarried Obasongs
(No Children)
Unobserved data:
nA : # married Ob’s
nB : # unmarried Ob’s
M : married obasong
X : # Children
Binomial/Poison Mixture
# Children
# Obasongs n0
n1 n2
n3
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
Binomial/Poison Mixture
pA
pB e (1 )
# Children
x e
x 1, 2,
px
(1 )
x!
n0 n1 n2 n3
# Obasongs
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
n (nA , nB , n1 ,
n obs (n0 , n1 ,
, n6 )T
, n6 )T
n0 nA nB
Complete Data Likelihood
pA
pB e (1 )
# Children
x e
x 1, 2,
px
(1 )
x!
n0 n1 n2 n3
# Obasongs
n4
n5
n6
Complete
data
nA , nB n1 n2
n3
n4
n5
n6
Probability
p A, p B p 1 p 2
p3
p4
p5
p6
(nA nB n1 n6 )! nA nB n1
p A pB p1
L ( , | n) p(n | , )
nA !nB !n1 ! n6 !
p6n6
n (nA , nB , n1 ,
n obs (n0 , n1 ,
, n6 )T
, n6 )T
n0 nA nB
Complete Data Likelihood
pA
pB e (1 )
# Children
x e
x 1, 2,
px
(1 )
x!
n0 n1 n2 n3
# Obasongs
L (Complete
, | n)
data
nA , nB n1 n2
n3
n4
n5
n6
n4
n5
n6n
x e
(nA nB n1 n6 )! nA
Probability p , p p e p (1 p)
(1 p
)
p
p
2
3
nA !nB !n1 ! nA6 ! B 1
x! 5
x 14
6
(nA nB n1 n6 )! nA nB n1
n6
p
(
n
|
,
)
p
p
p
p
L ( , | n)
A
B
1
6
nA !nB !n1 ! n6 !
nB
6
x
Log-Likelihood
l ( , | n)
6
nA log nB nB log(1 ) nx x log log(1 ) k
x 1
L ( , | n)
x
nB
e
(nA nB n1 n6 )! nA
e (1 )
(1 )
nA !nB !n1 ! n6 !
x!
x 1
6
nx
N n0
n6
Maximization
l ( , | n)
6
nA log nB nB log(1 ) nx x log log(1 ) k
x 1
nA nB n1 n6
l ( , | n)
0
1
N nA
0
1
nA
nA (1 ) ( N nA ) 0
n
ˆ A
N
N n0
n6
n
A
ˆ
N
Maximization
l ( , | n)
6
nA log nB nB log(1 ) nx x log log(1 ) k
x 1
l ( , | n) (nB n1
( N n A )
1
6
xn
x 1
x
n6 )
0
1
6
xn
x 1
x
0
ˆ
6
xn
x 1
x
N nA
6
E-Step
n
ˆ A
N
n (nA , nB , n1 ,
n obs (n0 , n1 ,
n
x 1
x
N nA
T
, n6 )
, n6 )
n0 nA nB
T
pA
Given (t ) , (t )
(t )
A
ˆ
xn
E ( t ) , ( t ) nA | n obs
pB e (1 )
n0
(t )
(t )
(1 )e
(t )
(t )
6
M-Step
n
ˆ A
N
ˆ
xn
x 1
x
N nA
6
n
( t 1)
(t )
A
(t )
A
n
N
( t 1)
E ( t ) , ( t ) nA | n obs
xn
x 1
N n
n0
(t )
x
(t )
A
(t )
(1 )e
(t )
(t )
Example
# Children
# Obasongs 3,062 587 284 103
33
4
2
t
nA
nB
0
1
2
3
4
5
0.750000
0.614179
0.614378
0.614532
0.614652
0.614744
0.400000
1.035478
1.036013
1.036427
1.036748
1.036996
2502.779
2503.591
2504.219
2504.705
2505.081
2505.371
559.221
558.409
557.781
557.295
556.919
556.629
EM Algorithm
Main Body
大同大學資工所
智慧型多媒體研究室
Maximum Likelihood
X {x1 , x 2 ,, x N }
N
L ( | X) p ( X |) p (x i|)
i 1
arg max L( | X)
*
Latent Variables
Incomplete
Data
X {x1 , x 2 ,, x N }
Y {y1 , y 2 ,, y N }
Complete Data
Z ( X, Y)
Complete Data
Z ( X, Y)
Complete Data Likelihood
X {x1 , x 2 ,, x N }
Y {y1 , y 2 ,, y N }
L ( | Z) p (Z|) p( X, Y|)
p(Y | X, ) p( X|)
Complete Data
Z ( X, Y)
Complete Data Likelihood
L ( | Z) p(Y | X, ) p( X|)
A function of
random
variable Y.
If we are given ,
A function of
latent variable Y
and parameter
The result is in
term of random
variable Y.
A function of
parameter
Computable
L ( | Z) p( X, Y | )
Expectation Step
Let (i1) be the parameter vector obtained at the
(i1)th step.
Define
Q(Θ, Θ(i 1) ) E[log L (Θ | Z) | X, Θ(i 1) ]
( i 1)
log
p
(
X
,
y
|
Θ
)
p
(
y
|
X
,
Θ
)dy continuous
y
Y
( i 1)
log
p
(
X
,
y
|
Θ
)
p
(
y
|
X
,
Θ
)
discrete
yY
L ( | Z) p( X, Y | )
Maximization Step
Let ((i1)
i ) be the parameter vector obtained at (the
i 1)
(i1)th step.
Θ arg max Q(Θ, Θ
Θ
)
Define
Q(Θ, Θ(i 1) ) E[log L (Θ | Z) | X, Θ(i 1) ]
( i 1)
log
p
(
X
,
y
|
Θ
)
p
(
y
|
X
,
Θ
)dy continuous
y
Y
( i 1)
log
p
(
X
,
y
|
Θ
)
p
(
y
|
X
,
Θ
)
discrete
yY
EM Algorithm
Mixture Model
大同大學資工所
智慧型多媒體研究室
Mixture Models
If there is a reason to believe that a data set
is comprised of several distinct populations,
a mixture model can be used.
It has the following form:
M
M
p(x | Θ) j p j (x | j )
with
j 1
Θ (1,, M ,1, ,,M )
j 1
j
1
M
p(x | Θ) j p j (x | j )
Mixture Models
j 1
X {x1 , x 2 ,, x N }
Y { y1 , y2 ,, y N }
Let yi{1,…, M} represents the source that
generates the data.
M
p(x | Θ) j p j (x | j )
Mixture Models
x
j 1
p ( x | y j , Θ) p j ( x | j )
y j
p ( y j | Θ) j
Let yi{1,…, M} represents the source that
generates the data.
p ( x | y j , Θ) p j ( x | j )
M
p(x | Θ) j p j (x | j )
p ( y j | Θ) j
Mixture Models
j 1
p(z i | Θ) p(xi , yi | Θ) p( yi | xi , Θ) p(xi | Θ)
xi
yi
zi
p ( x | y j , Θ) p j ( x | j )
p ( y j | Θ) j
M
p(x | Θ) j p j (x | j )
Mixture Models
j 1
p(z i | Θ) p(xi , yi | Θ) p( yi | xi , Θ) p(xi | Θ)
p (x i , yi , Θ) p(x i | yi , Θ) p( yi , Θ)
p ( yi | x i , Θ )
p ( x i , Θ)
p ( x i , Θ)
p(x i | yi , Θ) p( yi | Θ) p(Θ) p(x i | yi , Θ) p ( yi | Θ)
p(x i | Θ) p(Θ)
p ( x i | Θ)
p yi ( x i | yi ) yi
M
j 1
j
p j (x | j )
p ( x | y j , Θ) p j ( x | j )
p ( y j | Θ) j
M
p(x | Θ) j p j (x | j )
Mixture Models
j 1
p(z i | Θ) p(xi , yi | Θ) p( yi | xi , Θ) p(xi | Θ)
p (x i , yi , Θ) p(x i | yi , Θ) p( yi , Θ)
p ( yi | x i , Θ )
p ( x i , Θ)
p ( x i , Θ)
Given x and , the
p(x i | yi , Θ) p( yi | Θ) p(Θ) p(x i | yi , Θ) p ( yi | Θ)
conditional
density of y
p(x i | Θ) p(Θ) can be computed.
p ( x i | Θ)
p yi ( x i | yi ) yi
M
j 1
j
p j (x | j )
N
L (Θ | Z) p( X, y | Θ) yi p yi (x i | yi )
i 1
Complete-Data Likelihood Function
X {x1 ,, x N }
y { y1 ,, y N }
z i (xi , yi )
Z {z1 ,, z N )
L (Θ | Z) p(Z|Θ) p ( X, y|Θ) p( X | y, Θ) p(y|Θ)
N
N
i 1
i 1
p(x i | yi , Θ) p( yi|Θ) yi p yi (x i | yi )
y
i
y
i
N
L (Θ | Z) p( X, y | Θ) yi p yi (x i | yi )
i 1
Expectation
N
log L (Θ | Z) log yi p yi (x i | yi )
i 1
Q(Θ, Θ g ) E[log L (Θ | Z) | X, Θ g ]
log L (Θ | Z) p (y | X, Θ g )
g: Guess
yY
N
N
g
p
(
y
|
x
,
Θ
)
log yi p yi (xi | yi )
j
j
j 1
yY
i 1
N
L (Θ | Z) p( X, y | Θ) yi p yi (x i | yi )
i 1
Expectation
N
log L (Θ | Z) log yi p yi (x i | yi )
i 1
Q(Θ, Θ g ) E[log L (Θ | Z) | X, Θ g ]
log L (Θ | Z) p (y | X, Θ g )
g: Guess
yY
N
N
g
p
(
y
|
x
,
Θ
)
log yi p yi (xi | yi )
j
j
j 1
yY
i 1
N
M
yi ,l log l pl (xi | l ) p( y j | x j , Θ g )
yY i 1 l 1
N
j 1
Expectation
M
N
N
Q(Θ, Θ g ) log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
M
yY
N
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
Zero when yi l
N
M
Q(Θ, Θ ) yi ,l log l pl (xi | l ) p( y j | x j , Θ g )
g
yY i 1 l 1
N
j 1
Expectation
M
N
N
Q(Θ, Θ g ) log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
M
N
yY
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
M
log l pl (xi | l )
y 1
l 1 i 1
1
M
N
M
M
yi 1 1 yi 1 1
g
g
p
(
y
|
X
,
Θ
)
p
(
l
|
x
,
Θ
)
j
i
y N 1 j 1
j i
M
N
Expectation
M
N
N
Q(Θ, Θ g ) log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
M
N
yY
j 1
M
M
M
N
y1 1
yi 1
y N 1
j 1
log l pl (xi | l ) yi ,l p( y j | x j , Θ g )
l 1 i 1
M
log l pl (xi | l )
y 1
l 1 i 1
1
M
N
Q(Θ, Θ )
g
M
M
yi 1 1 yi 1 1
g
g
p
(
y
|
X
,
Θ
)
p
(
l
|
x
,
Θ
)
j
i
y N 1 j 1
j i
M
N
N M
log l pl (x i | l ) p( y j | X, Θ g ) p(l | xi , Θ g )
j 1 y 1
l 1 i 1
j
j i
M
N
Expectation
M
N
Q(Θ, Θ g ) log l pl (x i | l )p(l | x i , Θ g )
l 1 i 1
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
1
Q(Θ, Θ )
g
N M
log l pl (x i | l ) p( y j | X, Θ g ) p(l | xi , Θ g )
j 1 y 1
l 1 i 1
j
j i
M
N
Θ (1,, M ,1, ,,M )
Maximization
Given the initial guess
M
N
M
g
,
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
We want to find , to maximize the above
expectation.
In fact, iteratively.
The GMM (Guassian Mixture Model)
Guassian model of a d-dimensional source, say j :
1
1
T
1
p j (x | μ j , Σ j )
exp (x μ j ) Σ j (x μ j )
d /2
1/ 2
(2 ) | Σ j |
2
j (μ j , Σ j )
GMM with M sources:
M
j 0
j 1
p j (x | μ1 , Σ1 ,, μ M , Σ M ) j p j (x | μ j , Σ j )
j
1
EM Algorithm
EM-Algorithm on
GMM
大同大學資工所
智慧型多媒體研究室
Goal
M
Mixture Model
p(x | Θ) l pl (x | l )
l 1
Θ (1,, M ,1, ,,M )
M
subject to
l 1
l
1
To maximize:
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
Goal
M
Mixture Model
p(x | Θ) l pl (x | l )
Correlated
with l only.
Correlated
Θ (1,, Mwith
,1, ,
M)
l,only.
l 1
M
subject to
l 1
l
1
To maximize:
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Due to the constraint on l’s, we introduce Lagrange
Multiplier , and solve the following equation.
M N
M
g
log( l ) p(l | xi , Θ ) l 1 0,
l l 1 i 1
l 1
N
1
i 1
N
p(l | xi , Θ g ) 0,
l 1,, M
l
g
p
(
l
|
x
,
Θ
) l 0,
i
i 1
l 1,, M
l 1,, M
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
l 1 i 1
Finding l
M
N
N
p(l | x , Θ
i
l 1 i 1
N
M
g
l 1
M
p(l | x , Θ
i
i 1 l 1
1
) l 0
M
g
) l 0
l 1
1
N
N
g
p
(
l
|
x
,
Θ
) l 0,
i
i 1
l 1,, M
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
1
l
N
l 1 i 1
N
N
p(l | x , Θ
i
i 1
p(l | xi , Θ g )
g
)
lg pl (xi | lg )
M
g
g
p
(
x
|
j j j)
j 1
N
g
p
(
l
|
x
,
Θ
) l 0,
i
i 1
l 1,, M
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Only need to maximize
this term
Consider GMM
1
1
T
1
pl (x | μl , Σl )
exp
(
x
μ
)
Σ
(
x
μ
)
l
l
l
d /2
1/ 2
(2 ) | Σl |
2
l (μl , Σl )
log[ pl (x | μ l , Σ l )] d2 log 2 12 log | Σ l |1/ 2 12 (x μ l )T Σl1 (x μ l )
unrelated
M
N
M
N
Q(Θ, Θ ) log( l ) p(l | x i , Θ ) log[ pl (x i | l )] p(l | x i , Θ g )
g
g
l 1 i 1
Finding l
l 1 i 1
Only need to maximize
this term
Therefore, we want to maximize:
M
N
Q(Θ, Θ ) 12 log | Σ l |1/ 2 12 (x i μ l )T Σ l1 (x i μ l ) p(l | x i , Θ g )
g
l 1 i 1
How?
knowledge on matrix algebra is needed.
log[ pl (x | μ l , Σ l )] d2 log 2 12 log | Σ l |1/ 2 12 (x μ l )T Σl1 (x μ l )
unrelated
p(l | xi , Θ g )
Finding l
lg pl (xi | lg )
M
g
g
p
(
x
|
j j j)
j 1
Therefore, we want to maximize:
M
N
Q(Θ, Θ ) 12 log | Σ l |1/ 2 12 (x i μ l )T Σ l1 (x i μ l ) p(l | x i , Θ g )
g
l 1 i 1
N
μl
x p(l | x , Θ
i 1
N
i
i
p(l | x , Θ
i 1
g
i
g
N
)
l
)
g
T
p
(
l
|
x
,
Θ
)(
x
μ
)(
x
μ
)
i
i
l
i
l
i 1
N
g
p
(
l
|
x
,
Θ
)
i
i 1
p(l | xi , Θ g )
Summary
lg pl (xi | lg )
M
g
g
p
(
x
|
j j j)
j 1
EM algorithm for GMM
Given an initial guess g, find new as follows
new
l
1
N
N
g
p
(
l
|
x
,
Θ
)
i
i 1
N
Not converge
μ lnew
x p(l | x , Θ
i 1
N
i
i
p(l | xi , Θ )
g
i 1
Θ Θ
g
N
g
new
)
lnew
p(l | x , Θ
i 1
g
i
N
)( x i μ lnew )( x i μ lnew )T
g
p
(
l
|
x
,
Θ
)
i
i 1
Demonstration
EM algorithm for Mixture models
Exercises
Write a program to generate
multidimensional Gaussian distribution.
Draw the distribution for 2-dim data.
Write a program to generate GMM.
Write EM-algorithm to analyze GMM data.
Study more EM-algorithm for mixture.
Find applications for EM-algorithm.
© Copyright 2026 Paperzz