d - 大同大學

Simulated Annealing &
Boltzmann Machines
虞台文
大同大學資工所
智慧型多媒體研究室
Content
 Overview
 Simulated
Annealing
 Deterministic Annealing
 Boltzmann Machines
Simulated Annealing &
Boltzmann Machines
Overview
大同大學資工所
智慧型多媒體研究室
0 E  0
P  E   
1 E  0
Hill Climbing
E > 0
E
E < 0
E: cost (energy)
The Problem with Hill Climbing

Gets stuck at local minima
–
–

Gradient decent approach
Hopfield neural networks
Possible solutions
–
–
Try different initial states
Increase the size of the neighborhood
(e.g. in TSP try 3-opt rather than 2-opt)
Goal: escape from local-minima.
Stochastic Approaches


Stochastic optimization refers to the
minimization (or maximization) of a
function in the presence of randomness in
the optimization process.
The randomness may be present as either
noise in measurements or Monte Carlo
randomness in the search procedure, or
both.
Two Important Methods

Simulated Annealing (SA)
–
–

Motivated by the physical annealing process
Evolution from a single solution
Genetic Algorithms (GA)
–
–
Motivated by the evolution process of biology
Evolution from multiple solutions
Two Important Methods

Simulated Annealing (SA)
– Motivated
the C.D.,
physical
annealing
process
Kirkpatrick,
S ,by
Gelatt,
Vecchi,
M.P. 1983.
“Optimization
by Simulated
– Evolution from
a singleAnnealing.”
solution
Science, vol 220, No. 4598, pp 671-680.

Genetic Algorithms (GA)
Motivated by the evolution process of biology
– Holland,
Evolution
from multiple
solutions
J.
Adaptation
in Natural
and Artificial Systems,
University of Michigan Press, 1975.
–
Simulated Annealing &
Boltzmann Machines
Simulated
Annealing
大同大學資工所
智慧型多媒體研究室
Global Optimization
Statistical Mechanics in a Nutshell
T

+





+


+
+

+
+
+
+

+
+

P  E   e E / kBT
Z (T )   e E / kBT
e  E / k BT
PE 
Z (T )
Statistical mechanics is the study of the behavior
of very large systems of interacting components in
thermal equilibrium at a temperature, say T.
Boltzmann Factor
T

+




+


+
+

+
+
+
+

+

+
kB : Boltzmann constant
P  E   e E / kBT
Z (T )   e E / kBT
e  E / k BT
PE 
Z (T )
k B  1.380657799 1023 J / K
Z(T) : Boltzmann partition function
Raising temperature
• the system becomes more `active’
• the average energy becomes higher
Boltzmann Factor
P  E   e E / kBT
T

+




+


+
+

+
+
+
Z (T )   e E / kBT
+

+
+
e  E / k BT
PE 
Z (T )

0.1
0.08
0.06
T1 < T2 < T3
0.04
0.02
E
0
0
10
20
30
40
50
eE / T
P  E   
1
E  0
E  0
Simulation
Metropolis Acceptance Criterion
E > 0
E
E < 0
E: cost (energy)
eE / T
P  E   
1
E  0
E  0
Simulation
Metropolis Acceptance Criterion
p ( E )
1
T1 < T2 < T3
E
Simulated Annealing Algorithm
p ( E )





Create initial solution S
Initialize temperature T
repeat
– for k = 1 to iteration-length do
• Generate a random transition from S to S’
• Let E = E(S’)  E(S)
• if E < 0 then S = S’
• else if exp[E/T] > rand(0,1) then S = S’
– Reduce temperature T
until no change in E(S)
Return S
1
T1 < T2 < T3
E
Hill Climbing
Simulated Annealing Algorithm





Create initial solution S
Initialize temperature T
repeat
– for k = 1 to iteration-length do
• Generate a random transition from S to S’
• Let E = E(S’)  E(S)
• if E < 0 then S = S’
• else if exp[E/T] > rand(0,1) then S = S’
– Reduce temperature T
until no change in E(S)
Return S
Main Components of SA

Solution representation
–

Transition mechanism between solutions
–

Appropriate for computing energy (cost)
Incremental changes of solutions
Cooling schedule
–
–
–
–
–
Initial system temperature
Temperature decrement function
Number of iterations between temperature change
Acceptance criteria
Stop criteria
Example
Traveling
Salesman
Problem
Given n-city locations specified
in a two-dimensional space, find
the minimum tour length. The
salesman must visit each and
every city only once and should
return to the starting city
forming a closed path.
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Example
Traveling
Salesman
Problem
Solution Representation (TSP)
Assume cities are fully
connected with
symmetric distance.
2
∞
1
3
4
5
6
9
11
10
8
7
Solution Representation (TSP)
1
2
3
4
6
5
7
9 11 8 10
2
> 1
3
4
5
6
9
11
10
8
7
Energy (Cost) Computation (TSP)
d10,1
1
2
3
4
6
5
7
9 11 8 10
d12 d23 d34 d46 d65 d57 d79 d9.11 d11,8 d8,10
d12
2
> 1
d10,1
11
d23
3
d34
4
5
d46
d9,11
6
9
d65
d11,8
10
d8,10
d57
8
d79
7
State Transition (TSP)
d10,1
1
2
3
4
5
6
7
8
9 10
d12 d23 d34 d45 d56 d67 d78 d89 d9,10
10
1
2
9
3
8
4
7
6
5
1. Randomly select two edges
State Transition (TSP)
d10,1
1
2
3
4
5
6
7
8
9 10
d12 d23 d34 d45 d56 d67 d78 d89 d9,10
10
1
2
9
3
8
4
1. Randomly select two edges
2. Swap the path
7
6
5
State Transition (TSP)
d10,1
1
2
3
4
8
5
7
6
7
5
8
4
9 10
d12 d23 d38 d87 d76 d65 d54 d49 d9,10
10
1
2
9
3
8
4
1. Randomly select two edges
2. Swap the path
7
6
5
Cooling Schedules
Geometric Schedule
Tk  Tk 1
Empirical evidence shows that typically 0.8    0.99
yields successful applications (fairly slow cooling
schedules).
100-city TSP
Simulation
100 cities are
randomly chosen
from 1010 square.
100-city TSP
1000N iterations are
made for each test.
Simulation
Total
Length
Direct
Search
Test1
100 cities are
randomly chosen
from 1010 square.
Simulated Annealing (T0 is starting temperature)
T0=1
T0=2
T0=5
T0=10
T0=25
T0=50
T0=100
89.211
82.757
82.732
79.113
81.792
82.701
79.405
79.528
Test2
89.755
81.325
81.334
80.532
83.166
80.461
79.25
82.549
Test3
81.44
82.063
84.296
80.629
81.658
80.35
79.21
79.933
Test4
85.038
82.449
82.388
80.996
79.764
82.688
82.131
84.17
Test5
87.256
Test6
88.989
Test7
85.895
Test8
85.654
Test9
88.246
Test10
Average
80.658 temperature
80.814
82.261T is hold
80.82 for
81.338
Each
100N80.406
82.92
81.284
82.607
80.599
83.526
81.43
reconfigurations
or 10N
successful
84.479
80.807
80.402
80.837
79.504
81.052
reconfigurations,
whichever
comes
first.
81.251
82.878
80.604
T80.549
is reduced
by80.195
10% each
time.80.169
79.869
80.894
82.201
80.125
80.461
79.832
79.123
80.617
81.676
80.741
81.306
84.586
82.446
82.855
81.249
82.885
79.68
80.532
81.665
86.607
82.011
81.759
80.711
81.501
81.21
80.476
81.224
100-city TSP
Simulation
Total
Length
Direct
Search
Test1
100 cities are
randomly chosen
from 1010 square.
Simulated Annealing (T0 is starting temperature)
T0=1
T0=2
T0=5
T0=10
T0=25
T0=50
T0=100
89.211
82.757
82.732
79.113
81.792
82.701
79.405
79.528
Test2
89.755
81.325
81.334
80.532
83.166
80.461
79.25
82.549
Test3
81.44
82.063
84.296
80.629
81.658
80.35
79.21
79.933
Test4
85.038
82.449
82.388
80.996
79.764
82.688
82.131
84.17
Test5
87.256
80.658
80.814
82.261
80.82
81.338
80.406
79.869
Test6
88.989
82.92
81.284
82.607
80.599
83.526
81.43
80.894
Test7
85.895
84.479
80.807
80.402
80.837
79.504
81.052
82.201
Test8
85.654
80.549
81.251
80.195
82.878
80.169
80.604
80.125
Test9
88.246
80.461
79.832
79.123
80.617
81.676
80.741
81.306
Test10
84.586
82.446
82.855
81.249
82.885
79.68
80.532
81.665
Average
86.607
82.011
81.759
80.711
81.501
81.21
80.476
81.224
Simulated Annealing &
Boltzmann Machines
Deterministic
Annealing
大同大學資工所
智慧型多媒體研究室
The Problems of SA


SA techniques are inherently slow because
of their randomized local search strategy.
Converge to global optimum in probability
one sense only if the cooling schedule is in
the order of
T0
T (t n ) 
1  ln t n
The Problems of SA
Geman, S. & Geman, D. (1984) “Stochastic relaxation, Gibbs distributions

SA techniques are inherently slow because
and the Bayesian restoration of images,” IEEE Trans. on Pattern Analysis
their
randomized
andof
Machine
Intelligence
6, 721-741.local search strategy.

Converge to global optimum in probability
one sense only if the cooling schedule is in
the order of
T0
T (t n ) 
1  ln t n
Geman and Geman [1984]
Review Simulated Annealing Algorithm
p ( E )





1
Create initial solution S{0, 1}n
T1 < T2 < T3
Initialize temperature T
repeat
– for k = 1 to iteration-length do
• Generate a random transition from S to S’ by
inverting a random bit si
• Let E = E(S’)  E(S)
• if E < 0 then S = S’
• else if exp[E/T] > rand(0,1) then S = S’
– Reduce temperature T
until no change in E(S)
Return S
E
Review Simulated Annealing Algorithm
p ( E )





1
Create initial solution S{0, 1}n
T1 < T2 < T3
Initialize temperature T
repeat
– for k = 1 to iteration-length do
• Generate a random transition from S to S’ by
inverting a random bit si
• Let E = E(S’)  E(S)
Stochastic nature
• if E < 0 then S = S’
• else if exp[E/T] > rand(0,1) then S = S’
– Reduce temperature T
until no change in E(S)
Return S
E
Also called mean-field annealing.
Deterministic Annealing (DA)



Create initial solution S[0, 1]n
Initialize temperature T
repeat
– for k = 1 to iteration-length do
• Choose a random bit si
•

1
 E 1 
1  exp 

 si T 
1
T1 < T2 < T3
Deterministic behavior
Reduce temperature T
until convergence criterion met
Return S
–

si 
si

E
si
Simulated Annealing &
Boltzmann Machines
Boltzmann
Machine
大同大學資工所
智慧型多媒體研究室
Boltzmann Machines
Discrete
Hopfield NN
+
Simulated
Annealing
Boltzmann
Machine
Update Rules

Discrete Hopfield NN
1 neti  0
Unipolar neuron vi  
0 neti  0

Boltzmann Machine
1
P(vi  1) 
1  e  neti / T
Cooling schedule is required.
Update Rules
1.2

P(vi  1)
1
Discrete Hopfield
NN
T=0 T=1
0.8
0.6
T=2
T=3
1 T=
neti  0

0.2
Unipolar neuron
vi  
net
0
i
0
net

0
i6
-4
-2
-0.2 0
2 
4
0.4
-6

Boltzmann Machine
1
P(vi  1) 
1  e  neti / T
Exercises

Computer Simulations on the same TSP
problem demonstrated previously using
–
–
–

Simulated Annealing
Deterministic Annealing, and
Boltzmann Machine.
Perform some analyses on your results.