The Exchange Monte Carlo Method for Bayesian Learning in

Generalization Performance of
Exchange Monte Carlo Method for
Normal Mixture Models
Kenji Nagata, Sumio Watanabe
Tokyo Institute of Technology
Contents

Background




Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method


Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
 Conclusion

Background: Normal Mixture Models
A normal mixture model is a learning machine which estimates
a target probability density by sum of normal distributions.
K
p( x | w)   ak g ( x | bk )
k 1
g ( x | bk ) 
parameter:
1
2
M
 || x  bk ||2 

exp  
2


K
:number of component
M
:dimension of data
w  ak , bk : k  1,  K 
K
0  ak  1,  ak  1
k 1
Normal mixture model is widely used in pattern recognition,
data clustering and many other applications.
Background: Bayesian Learning
q(x)
X n  { X 1 , X 2 ,, X n }
Empirical
1
Kullback information: H n ( w) 
 (w)
p ( x | w)
n
q( X i )
log

n i 1
p( X i | w)
1
n
exp( nH n ( w)) ( w)
Posterior distribution: p ( w | X ) 
n
Z0 ( X )

Marginal likelihood: Z 0 ( X n )  exp(  nH n ( w)) ( w)dw
n
Predictive distribution: p( x | X ) 
n
p
(
x
|
w
)
p
(
w
|
X
)dw

Because of difficulty of analytical calculation,
the Markov Chain Monte Carlo (MCMC) method is widely used.
Background: MCMC method
The algorithm to generate the sample sequence which
converges to the target distribution p( w)  exp(  Hˆ ( w)) .
<Metropolis algorithm>
p ( w | w)
w
w
Hˆ ( w)  Hˆ ( w)  set w as the next position.
set w with probability r ,
ˆ
ˆ

H ( w)  H ( w )  set
w with probability 1  r .
[r  exp(  Hˆ ( w' )  Hˆ ( w))]
1 J
p( x | X )   p( x | w) p( w | X )dw   p( x | w( j ) )
J j 1
n
w
n

,

,
w

p
(
w
|
X
)
(1)
(J )
n
Huge computational cost!!
Purpose


We propose that the exchange MC
method is appropriate for Bayesian
learning in hierarchical learning machines.
We clarify its effectiveness by
experimental result.
Contents

Background




Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method


Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
 Conclusion

Exchange Monte Carlo method
[Hukushima,96]
We consider to obtain the sample sequence
from the following simultaneous distribution.
L
Pwl ; tl    Pl ( wl )
l 1
Pl (w)  exp( tl Hˆ (w))
<Algorithm>
The following two steps are performed alternately:
1.Each sequence wl  is obtained from each target distribution Pl (w)
by using the Metropolis algorithm independently and
simultaneously for a few iterations.
2.Exchange of two positions, wl and
with the following probability,
wl 1 , is tried and accepted
Pwl , wl 1; tl , tl 1   min( 1, exp( ))
wl , wl 1 ; tl , tl 1   (tl 1  tl ) Hˆ ( wl )  Hˆ ( wl 1 )


Exchange Monte Carlo method
[Hukushima,96]
<Metropolis algorithm>
P(w)
<Exchange Monte Carlo method>
P1 (w)
P2 (w)
P3 ( w)
P4 (w)
Application to Bayesian learning
1
p( w | X , t ) 
exp( ntH n ( w)) ( w)
n
Z0 ( X , t)
n
Pwl , wl 1 ; tl , tl 1   min( 1, exp( ))
wl , wl 1 ; tl , tl 1   n(tl 1  tl )H n ( wl )  H n ( wl 1 ) 
<example>

w  w(1) , w( 2)
t  0 (prior)

  w 
H n ( w)  w
(1) 2
( 2) 2
 (w) :standard normal distribution
0  t 1
t  1(posterior)
Contents

Background




Normal Mixture Models
Bayesian Learning
MCMC method
Proposed method


Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
 Conclusion

Experimental Settings
<Bayesian Learning in Normal Mixture Model>
dimension of data: M
3
number of training data:n  500
<True distribution>
2
q ( x)   ak* g ( x | bk* )
k 1
a  0.52 0.48
*
<Learning machine>
2 components
  1.19 1.43 3.50 

b  
 3.54 2.01 2.35 
*
5
p( x | w)   ak g ( x | bk )
5 components
k 1
<Prior distribution>
 (a k ): uniform distribution [0:1]
 (b k ): standard normal distribution
T
Experimental Settings
<Setting of MCMC method>
1.Exchange Monte Carlo method (EMC)
t1  0
t2
L

t L1
tL  1
Sample for expectation
Monte Carlo (MC) Step
Experimental Settings
<Setting of MCMC method>
2. Conventional Metropolis algorithm (CM)
n
p( w | X )
L
Sample for expectation
MC Step
Experimental Settings
<Setting of MCMC method>
3. Parallel Metropolis algorithm (PM)
n
p( w | X )
p( w | X n )
L
p( w | X n )
p( w | X n )
Sample for expectation
MC Step
Experimental Settings
<Setting of MCMC method>
<setting of temperature>
{t1,, tL } : L  42 tl 
0
(l
1)
1.25l L (otherwise)
<setting of Metropolis algorithm>
•Initial value: random sampling from the prior distribution
 (a k )
 (b k )
•For calculating the expectation,
we use the last 50% of the sample sequence.
Experimental result (histogram)
Marginal distribution of parameter a1
MC step:3200
1.EMC
True marginal distribution has
two peaks around 0 and 0.5.
2.CM
3.PM
a1
The algorithm CM cannot approximate
the Bayesian posterior distribution.
Experimental result (generalization error)
Convergence of the generalization error
n
n
q( X i)
1
n
test data: ( X )  { X 1, X 2 , , X n }
G( X )   log
n


n
p
(
X
|
X
)
i 1
i
n  2500
MC step: from 100 to 3200
1.EMC
2.CM
3.PM
CM
EMC
EMC provides smaller generalization error than CM.
Contents

Background




Singular Learning Machine
Bayesian Learning
MCMC method
Proposed method


Exchange Monte Carlo method
Application to Bayesian Learning
Experiment and Discussion
 Conclusion

Conclusion



We proposed that the exchange MC method is
appropriate for the Bayesian learning in
hierarchical learning machines.
We clarified its effectiveness by the simulation of
Bayesian learning in normal mixture models.
By experimental results,


The exchange MC method approximates the Bayesian
posterior distribution more accurately than the
Metropolis algorithm.
The exchange MC method provides better generalization
performance than the Metropolis algorithm.