Generalization Performance of
Exchange Monte Carlo Method for
Normal Mixture Models
Kenji Nagata, Sumio Watanabe
Tokyo Institute of Technology
Contents
Background
Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method
Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
Conclusion
Background: Normal Mixture Models
A normal mixture model is a learning machine which estimates
a target probability density by sum of normal distributions.
K
p( x | w) ak g ( x | bk )
k 1
g ( x | bk )
parameter:
1
2
M
|| x bk ||2
exp
2
K
:number of component
M
:dimension of data
w ak , bk : k 1, K
K
0 ak 1, ak 1
k 1
Normal mixture model is widely used in pattern recognition,
data clustering and many other applications.
Background: Bayesian Learning
q(x)
X n { X 1 , X 2 ,, X n }
Empirical
1
Kullback information: H n ( w)
(w)
p ( x | w)
n
q( X i )
log
n i 1
p( X i | w)
1
n
exp( nH n ( w)) ( w)
Posterior distribution: p ( w | X )
n
Z0 ( X )
Marginal likelihood: Z 0 ( X n ) exp( nH n ( w)) ( w)dw
n
Predictive distribution: p( x | X )
n
p
(
x
|
w
)
p
(
w
|
X
)dw
Because of difficulty of analytical calculation,
the Markov Chain Monte Carlo (MCMC) method is widely used.
Background: MCMC method
The algorithm to generate the sample sequence which
converges to the target distribution p( w) exp( Hˆ ( w)) .
<Metropolis algorithm>
p ( w | w)
w
w
Hˆ ( w) Hˆ ( w) set w as the next position.
set w with probability r ,
ˆ
ˆ
H ( w) H ( w ) set
w with probability 1 r .
[r exp( Hˆ ( w' ) Hˆ ( w))]
1 J
p( x | X ) p( x | w) p( w | X )dw p( x | w( j ) )
J j 1
n
w
n
,
,
w
p
(
w
|
X
)
(1)
(J )
n
Huge computational cost!!
Purpose
We propose that the exchange MC
method is appropriate for Bayesian
learning in hierarchical learning machines.
We clarify its effectiveness by
experimental result.
Contents
Background
Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method
Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
Conclusion
Exchange Monte Carlo method
[Hukushima,96]
We consider to obtain the sample sequence
from the following simultaneous distribution.
L
Pwl ; tl Pl ( wl )
l 1
Pl (w) exp( tl Hˆ (w))
<Algorithm>
The following two steps are performed alternately:
1.Each sequence wl is obtained from each target distribution Pl (w)
by using the Metropolis algorithm independently and
simultaneously for a few iterations.
2.Exchange of two positions, wl and
with the following probability,
wl 1 , is tried and accepted
Pwl , wl 1; tl , tl 1 min( 1, exp( ))
wl , wl 1 ; tl , tl 1 (tl 1 tl ) Hˆ ( wl ) Hˆ ( wl 1 )
Exchange Monte Carlo method
[Hukushima,96]
<Metropolis algorithm>
P(w)
<Exchange Monte Carlo method>
P1 (w)
P2 (w)
P3 ( w)
P4 (w)
Application to Bayesian learning
1
p( w | X , t )
exp( ntH n ( w)) ( w)
n
Z0 ( X , t)
n
Pwl , wl 1 ; tl , tl 1 min( 1, exp( ))
wl , wl 1 ; tl , tl 1 n(tl 1 tl )H n ( wl ) H n ( wl 1 )
<example>
w w(1) , w( 2)
t 0 (prior)
w
H n ( w) w
(1) 2
( 2) 2
(w) :standard normal distribution
0 t 1
t 1(posterior)
Contents
Background
Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method
Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
Conclusion
Experimental Settings
<Bayesian Learning in Normal Mixture Model>
dimension of data: M
3
number of training data:n 500
<True distribution>
2
q ( x) ak* g ( x | bk* )
k 1
a 0.52 0.48
*
<Learning machine>
2 components
1.19 1.43 3.50
b
3.54 2.01 2.35
*
5
p( x | w) ak g ( x | bk )
5 components
k 1
<Prior distribution>
(a k ): uniform distribution [0:1]
(b k ): standard normal distribution
T
Experimental Settings
<Setting of MCMC method>
1.Exchange Monte Carlo method (EMC)
t1 0
t2
L
t L1
tL 1
Sample for expectation
Monte Carlo (MC) Step
Experimental Settings
<Setting of MCMC method>
2. Conventional Metropolis algorithm (CM)
n
p( w | X )
L
Sample for expectation
MC Step
Experimental Settings
<Setting of MCMC method>
3. Parallel Metropolis algorithm (PM)
n
p( w | X )
p( w | X n )
L
p( w | X n )
p( w | X n )
Sample for expectation
MC Step
Experimental Settings
<Setting of MCMC method>
<setting of temperature>
{t1,, tL } : L 42 tl
0
(l
1)
1.25l L (otherwise)
<setting of Metropolis algorithm>
•Initial value: random sampling from the prior distribution
(a k )
(b k )
•For calculating the expectation,
we use the last 50% of the sample sequence.
Experimental result (histogram)
Marginal distribution of parameter a1
MC step:3200
1.EMC
True marginal distribution has
two peaks around 0 and 0.5.
2.CM
3.PM
a1
The algorithm CM cannot approximate
the Bayesian posterior distribution.
Experimental result (generalization error)
Convergence of the generalization error
n
n
q( X i)
1
n
test data: ( X ) { X 1, X 2 , , X n }
G( X ) log
n
n
p
(
X
|
X
)
i 1
i
n 2500
MC step: from 100 to 3200
1.EMC
2.CM
3.PM
CM
EMC
EMC provides smaller generalization error than CM.
Contents
Background
Singular Learning Machine
Bayesian Learning
MCMC method
Proposed method
Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
Conclusion
Conclusion
We proposed that the exchange MC method is
appropriate for the Bayesian learning in
hierarchical learning machines.
We clarified its effectiveness by the simulation of
Bayesian learning in normal mixture models.
By experimental results,
The exchange MC method approximates the Bayesian
posterior distribution more accurately than the
Metropolis algorithm.
The exchange MC method provides better generalization
performance than the Metropolis algorithm.
© Copyright 2026 Paperzz