EM and AEM clustering
Adavanced Numerical Computation 2008,
AM NDHU
1
Spin model
• Graph bisection
N
N
E( S ) wij Si S j
i 1 j i
min E ( S )
{S }
Adavanced Numerical Computation 2008,
AM NDHU
2
Randomization
• Boltzmann assumption
– S is regarded as a random vector
Pr(S) exp(- E( S ))
• Free energy
F E(S)
1
Adavanced Numerical Computation 2008,
AM NDHU
H (S )
3
Entropy
• Entropy of the whole system
H ( S ) Pr( S ) log Pr( S )
{S }
• Sum of individual entropies
H ( S ) H(S i )
i
Adavanced Numerical Computation 2008,
AM NDHU
4
Mean field approximation
• Individual entropy
H(Si ) Pr( Si ) log Pr( Si )
Si 1
Pr( Si ) exp( uiSi )
exp( ui Si )
Pr(S i )
exp( ui ) exp( ui )
Adavanced Numerical Computation 2008,
AM NDHU
5
Pr( Si ) exp( ui )
exp( ui Si )
Pr(S i )
exp( ui ) exp( ui )
exp( ui ) exp( ui )
Si
tanh( ui )
exp( ui ) exp( ui )
Pr(Si 1)
Si 1
2
, Pr(Si 1)
Adavanced Numerical Computation 2008,
AM NDHU
Si 1
2
6
exp( ui ) exp( ui )
Si
exp( ui ) exp( ui )
Pr(Si 1)
Si 1
2
, Pr(Si 1)
Si 1
2
H(S i ) Pr( Si ) log Pr( Si )
Si 1
Pr( Si 1) log Pr( Si 1) Pr( Si 1) log Pr( Si 1)
Si 1
2
log
Si 1
2
Si 1
2
log
Adavanced Numerical Computation 2008,
AM NDHU
Si 1
2
7
Mean field approximation
F E(S)
E( Si ) -
1
H (S )
1
H (S )
i
i
N
N
wij Si S j i 1 j i
1
H (S )
i
i
Adavanced Numerical Computation 2008,
AM NDHU
8
N
N
F wij S i S j
i 1 j i
Si 1
Si 1 Si 1
S i 1
log
log
i 2
2
2
2
vi S i
1
N
F
wij v j tanh 1 (vi ) 0
vi
j i
N
vi tanh( wij v j )
j i
Adavanced Numerical Computation 2008,
AM NDHU
9
Mean field equation
N
vi tanh( wij v j )
j i
Adavanced Numerical Computation 2008,
AM NDHU
10
1.
2.
3.
4.
5.
6.
Set each vi near zero randomly
Set beta to a sufficiently small value
Use mean field equations to find a fixed point
If the halting condition holds, halt
Increase beta by an annealing schedule
Go to step 3
Adavanced Numerical Computation 2008,
AM NDHU
11
Continuous
variables
Discrete
variables
Discrete &
Continuous
variables
Newton
method
MFA
EM
NewtonGauss
MFA(LM)
?
MFA
+Newton’s
AEM
LM method
Adavanced Numerical Computation 2008,
AM NDHU
12
Unsupervised data
Adavanced Numerical Computation 2008,
AM NDHU
13
Unsupervised data
Adavanced Numerical Computation 2008,
AM NDHU
14
Gaussian mixtures
Assumption:
unsupervised data
are sampled from
Gaussian mixtures
Adavanced Numerical Computation 2008,
AM NDHU
15
X={x[t]}t
Adavanced Numerical Computation 2008,
AM NDHU
16
Gaussian pdf
Adavanced Numerical Computation 2008,
AM NDHU
17
Weight sum of Gaussian pdfs
p(x) k pk (x | A, y k )
k
Adavanced Numerical Computation 2008,
AM NDHU
18
Uncorrelated data
1
( x y k )T ( x y k )
pk ( x | k , y k )
exp(
)
2
2 k
2 k
Adavanced Numerical Computation 2008,
AM NDHU
19
Data creation
• Write a matlab function to create
uncorrelated data, X={x[t]}t
• X is a sample from Gaussian mixtures
• x R d d=2,
Adavanced Numerical Computation 2008,
AM NDHU
20
1.5
1
0.5
0
-0.5
-1
-1.5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Adavanced Numerical Computation 2008,
AM NDHU
1
21
Adavanced Numerical Computation 2008,
AM NDHU
22
Membership
ekK [0 , 0, , 0, 1, 0, , 0, 0]T
pos 1 2 k - 1, k, k 1, , K
Standard basis
{e ,, e ,, e }
K
1
K
k
Adavanced Numerical Computation 2008,
AM NDHU
K
K
23
Membership vector
δ[t ] {e1K ,, ekK ,, eKK }
δ[t ] e x[t] is generated by the kth pdf
K
k
Adavanced Numerical Computation 2008,
AM NDHU
24
Pr(ξ i e Kk ) exp(- || x i - y k ||2 )
K
K
Pr(
ξ
e
i k ) 1
k 1
Pr(ξ i e Kk ) ?
Pr(ξ i e Kk ) C exp(- || x i - y k ||2 )
K
C exp(- || x i - y k ||2 ) 1
k 1
C
1
K
exp(- || x
k 1
Pr(ξ i e )
K
k
i
- y k ||2 )
exp(- || x i - y k ||2 )
K
exp(- || x
k 1
i
- y k ||2 )
Adavanced Numerical Computation 2008,
AM NDHU
25
Pr(ξ i e Kk )
exp(- || x i - y k ||2 )
K
exp(- || x
k 1
i
- y k ||2 )
ξi ?
ξ i ξ i1 ,
ξi2 ,
ξ iK
T
2
2
2
exp(- || x i - y1 || )
exp(- || x i - y 2 || )
exp(- || x i - y K || )
K
, K
, , K
2
2
2
exp(- || x i - y k || )
exp(- || x i - y k || ) exp(- || x i - y k || )
k 1
k 1
k 1
Adavanced Numerical Computation 2008,
AM NDHU
26
K
ξ i Pr(ξ i e kK )e kK
k 1
K
k 1
exp(- || x i - y k ||2 )
e kK
K
exp(- || x
h 1
i
- y h ||2 )
2
2
2
exp(
||
x
y
||
)
exp(
||
x
y
||
)
exp(
||
x
y
||
)
i
1
i
2
i
K
K
, K
,, K
2
2
2
exp(- || x i - y k || )
exp(- || x i - y k || ) exp(- || x i - y k || )
k 1
k 1
k 1
Adavanced Numerical Computation 2008,
AM NDHU
27
vik ik Pr(ξ i e )
K
k
exp(- || x i - y k || )
2
K
exp(- || x
k 1
Adavanced Numerical Computation 2008,
AM NDHU
(E1)
2
i
- y k || )
28
N
K
E( ξ, Y ) ik (x i y k ) (x i y k )
T
i 1 k 1
N
K
E( ξ , Y ) ik (x i y k ) (x i y k )
T
i 1 k 1
E( ξ , Y )
y k
N
2 ik (x i y k ) 0
i 1
N
N
i 1
N
ik y k ik x i y k
i 1
i 1
N
i 1
Adavanced Numerical Computation 2008,
AM NDHU
ik
xi
ik
29
1
L(δ, y ) k [t ]( x[t ] y k )T (x[t ] y k )
2 t k
Pr(δ[t ] e kK ) exp( uk [t ])
uk [t ]
dL( δ | y )
d k [t ]
Pr(δ[t] e Kk ) exp(- || xi - y k ||2 )
Adavanced Numerical Computation 2008,
AM NDHU
30
vik
exp(- || x i - y k ||2 )
K
j1
(E1)
exp(- || x i - y j ||2 )
uik || x i - y k ||
vik
2
exp( uik )
K
j1
exp( uij )
N
yk
v
i 1
N
ik
v
i 1
Adavanced Numerical Computation 2008,
AM NDHU
xi
(M1)
ik
31
EM method
1. Set each vik near zero randomly,
set yk near the mean of all xi
2. Fix beta to one
3. E step: determine all vik using eq (E1)
4. M step: use eq (M1) to update all yk
5. If the halting condition holds, halt
6. Go to step 3
Adavanced Numerical Computation 2008,
AM NDHU
32
Fitting Gaussian mixtures
L Lk
k
Lk log
p (x[t ])
δ [ t ] ek
k
ln p (x[t ])
δ [ t ] ek
k
k [t ] ln pk (x[t ])
t
Adavanced Numerical Computation 2008,
AM NDHU
33
Objective function
L(δ, y, A) Lk
k
1
N
T
k [t ](x[t ] y k ) A(x[t ] y k ) log A
2 t k
2
Adavanced Numerical Computation 2008,
AM NDHU
34
L(δ, y , A ) Lk
k
1
N
T
k [t ]( x[t ] y k ) A(x[t ] y k ) log A
2 t k
2
Pr(δ[t ] e kK ) exp( u k [t ])
u k [t ]
dL( δ | y , A )
d k [t ]
Pr(δ[t ] e ) exp(
K
k
2
(x[t ] y k )T A(x[t ] y k ))
Adavanced Numerical Computation 2008,
AM NDHU
35
E step
L
1
x[t ] y k A(x[t ] y k ) (E1)
uk [t ]
k [t ]
2
exp( uk [t ])
vk [t ] k [t ]
exp( ul [t ])
(E2)
l
Adavanced Numerical Computation 2008,
AM NDHU
36
M step
Solve
dL(y | v, A)
0, k 1,..., K (M1)
dy k
where
v δ
Adavanced Numerical Computation 2008,
AM NDHU
37
M step
Solve
dL( A | v, y )
0, a, b 1,..., d (M2)
dAab
where
v δ
Adavanced Numerical Computation 2008,
AM NDHU
38
EM method
1. Set each vik near zero randomly,
set yk near the mean of all xi
2. Fix beta to one
3. E step: determine all vik using eq (E1-2)
4. M step: use eq (M1) to update all yk
use eq(M2) to update matrix A
5. If the halting condition holds, halt
6. Go to step 3
Adavanced Numerical Computation 2008,
AM NDHU
39
Exercise
Derive updating rules for the M step
Adavanced Numerical Computation 2008,
AM NDHU
40
L 1
vk [t ]( A AT )( x[t ] y k ) 0
yk 2 t
v [t ]x[t ]
v [t ]
k
yk
t
(M1)
k
t
Adavanced Numerical Computation 2008,
AM NDHU
41
L 1
N
vk [t ]( xa [t ] yka )( xb [t ] ykb ) [( AT ) 1 ]ab 0
Aab 2 t k
2
A (W )
1 T
(M2)
1
Wab vk [t ]( xa [t ] yka )( xb [t ] ykb )
N t k
Adavanced Numerical Computation 2008,
AM NDHU
42
Annealed EM
1. Set to a sufficient ly low value
A 0.01 I
1
1
y k x[t ], vk [t ]
N t
K
2. E step : update v using (E1)
3. M step : update y using (M1)
update A using (M2)
1
2
4. If
v
[
t
]
0.98, halt
k
N t k
else 0.98 , goto step 2
Adavanced Numerical Computation 2008,
AM NDHU
43
L(δ, y , A ) Lk
k
1
N
T
k [t ]( x[t ] y k ) A(x[t ] y k ) log A
2 t k
2
Pr(δ[t ] e kK ) exp( u k [t ])
Pr(δ[t ] e kK ) C exp( u k [t ])
C
1
K
exp( u [t ])
k 1
k
Adavanced Numerical Computation 2008,
AM NDHU
44
Pr(δ[t ] e )
K
k
exp( uk [t ])
K
exp( u [t ])
j
j 1
K
Expectatio n of δ[t ] e kK Pr(δ[t ] e kK )
k 1
K
Entropy of δ[t ] Pr(δ[t ] e kK ) ln Pr(δ[t ] e kK )
k 1
Adavanced Numerical Computation 2008,
AM NDHU
45
Pr(δ[t ] e )
K
k
exp( u k [t ])
K
exp( u [t ])
vk [t ] k [t ]
j
j 1
K
H t Entropy of δ[t ] Pr(δ[t ] e kK ) ln Pr(δ[t ] e kK )
k 1
K
K
k 1
j 1
vk [t ](uk [t ] ln exp( u j [t ]))
K
K
K
k 1
k 1
j 1
vk [t ]uk [t ] vk [t ] ln exp( u j [t ])
Adavanced Numerical Computation 2008,
AM NDHU
46
Free energy
• Combination of mean energy and negative
entropy
Adavanced Numerical Computation 2008,
AM NDHU
47
Randomization
• Boltzmann assumption
– δ is regarded as a random vector
Pr(δ) exp(- E( δ))
• Free energy
F E( δ)
1
H (δ)
Adavanced Numerical Computation 2008,
AM NDHU
48
F E( δ)
1
E( δ[t ] ) -
H (δ)
1
H (δ[t ])
t
Derived based on
Kullback - Leiberg(KL ) divergence
Adavanced Numerical Computation 2008,
AM NDHU
49
F E( δ[t ] ) -
1
H
t
t
E( v ) vk [t ]u k [t ]
t
k
1
ln exp( u [t ])
j
t
j
Adavanced Numerical Computation 2008,
AM NDHU
50
Mean field equation
F
F
0,
0, k , t
vk [t ]
u k [t ]
E( v )
u k [t ]
,
vk [t ]
vk [t ]
exp( u k [t ])
exp( u [t ])
j
j
Adavanced Numerical Computation 2008,
AM NDHU
51
K=2
vi tanh( ui )
exp( u1 ) exp( u1 )
exp( u1 ) exp( u1 )
exp( u1 )
exp( u1 )
exp( u1 ) exp( u1 ) exp( u1 ) exp( u1 )
Pr( si 1) Pr( si 1)
Adavanced Numerical Computation 2008,
AM NDHU
52
Free energy
L(δ, y, A) Lk
k
1
N
T
k [t ]( x[t ] y k ) A(x[t ] y k ) log A
2 t k
2
F( v, u, y , A )
E( v, y , A ) vk [t ]u k [t ]
t
k
1
ln exp( u [t ])
j
t
j
Adavanced Numerical Computation 2008,
AM NDHU
53
Saddle point
F
F
0,
0, k , t
vk [t ]
uk [t ]
F dL(y | v, A)
0, (M1)
y k
dy k
F
dL( A | v, y )
0, (M2)
Aab
dAab
Adavanced Numerical Computation 2008,
AM NDHU
54
Updating rules
L
1
x[t ] y k A(x[t ] y k ) (E1)
uk [t ]
k [t ]
2
exp( uk [t ])
vk [t ] k [t ]
exp( ul [t ])
v [t ]x[t ]
v [t ]
(E2)
l
k
yk
t
(M1)
k
t
A (W )
1 T
(M2)
1
Wab vk [t ]( xa [t ] yka )( xb [t ] ykb )
N t k
Adavanced Numerical Computation 2008,
AM NDHU
55
Exercise
• Implement AEM clustering
Adavanced Numerical Computation 2008,
AM NDHU
56
Data set
Adavanced Numerical Computation 2008,
AM NDHU
57
Data set
Adavanced Numerical Computation 2008,
AM NDHU
58
© Copyright 2026 Paperzz