A Hybrid Discrete PSO-SA algorithm to Find Optimal Elimination

2010 2nd International Conference on Industrial and Information Systems
A Hybrid Discrete PSO-SA algorithm to Find Optimal Elimination Orderings for
Bayesian Networks
l2
l2
3
l2
l2
Xuchu Dong , , Dantong Ouyang , , Dianbo Cai , Yonggang Zhang , , Yuxin Ye ,
1. Department of Computer Science and Technology,Jilin University
2. Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education
3. Network Monitor Maintenance Centre of Jilin Branch ,China Telecom Corporation Limited
Changchun,Jilin,China
[email protected]
of Jij. Therefore, all the parent nodes of Jij are exactly all
the direct causes giving rise to Jij, which is denoted by
Pa(Jij).
Given a Bayesian network G, a triangulation of a
Bayesian network can be obtained by two steps: first, for
each node V, in the DAG, connect Pa(V,) into a complete
subgraph, and drop the directions of all the edges. The
resulting undirected graph is known as moral graph. Here,
we let (j1 denote the moral graPth of G, and N(V,) denote
all the neighbors of node V, in G . Second,according to an
ordering of the nodes in the moral graph, a triangulation
can be constructed by the following procedure.
ELIMINATION
Input: the moral graph GM=(V,E), an ordering ORD=
(VJ,...,Vn) of V.
Output: a triangulation H
Begin Procedure
1. For i=1 to n
Begin
1.1 Connect N(V,) into a complete subgraph. Let
Fill(V,) represent the added edges;
1.2 Delete V, and all the edges linking to Vi;
End
2. Return H=(V,Eu Fill(V1) u ...uFill(Vn));
End Procedure
In the above procedure, the operation carried out by
step 1.1 and 1.2 is also known as "eliminate node V, from
the gra,eh". After all the nodes VJ,...,Vn are eliminated
from G ,GM becomes an empty graph. ORD is called "an
elimination ordering of G". The state space size of a
triangulation H is defmed as J{H)=ICeClq(H)OVieCw(v,),
where Clq(H) means all the cliques (i.e. maximum
complete subgraphs) of H. Finally, a junction tree can be
obtained by running maximum-weight spanning tree
algorithm on the junction graph, which is constructed by
all the cliques in the triangulation. That construction
process is irrelevant to our topic, and detailed discussions
can be found in [10]. Fig. 1 gives an example of a
Bayesian network, its moral graph, a triangulation
generated by an elimination ordering (A BCD E F),and a
junction tree obtained from the triangulation. If all the
variables are binary, the state space size of the
triangulation is 4x23=32.
As regard to probabilistic inference, we always expect
a junction tree with minimal size. In fact, a node in a
junction tree corresponds to a clique in the triangulation,so
a triangulation with minimum state space size is preferred.
Abstract-In this paper, a hybrid algorithm named DPSO­
SA is proposed to find near-to-optimal elimination orderings
in Bayesian networks. DPSO-SA is a discrete particle swarm
optimization method enhanced by simulated annealing.
Computational tests show that this hybrid method is very
effective and robust for the elimination ordering problem.
Keywords-Bayesian networks; triangulation; elimination
ordering; particle swarm optimization; simulated annealing
I.
INTRODUCTION
As a kind of graphical model representing causality
and inferring uncertainty, Bayesian network has been
widely used in the real-world applications [1]. The
efficiency of most inference methods depends on the
ordering of all the variables in the Bayesian network [2].
However, finding an optimal ordering that produces best
computational efficiency is a NP-hard problem [3].
Some heuristics has been proposed to solve this
problem [4][5]. Wen and Kjrerulff provided simulated
annealing methods to find a near-to-optimal ordering
[3][4]. Larraftaga et al. gave a genetic algorithm
framework, in which 8 crossover operators and 3 mutation
operators can be used [6]. In [7],Wang et al. developed an
adaptive genetic algorithm. Other swarm intelligence
approaches, such as ant colony system and estimation of
distribution algorithms, are also applied to this problem
[8][9].
In section II, a brief introduction to the elimination
ordering problem is given. Section III presents a discrete
particle swarm optimization method, DPSO, to solve such
a problem. A novel algorithm hybridizing DPSO and
simulated annealing is proposed in section IV. Section V
displays the experimental results for comparison of our
hybrid algorithm and other swarm intelligence methods.
II.
ELIMINATION ORDERING OF BAYESIAN NETWORKS
In a Bayesian network, the causal relationship between
some probabilistic events of interest is expressed by a
directed acyclic graph (DAG). A probabilistic event is
represented by a random variable,which is formulated as a
node in the DAG. Therefore, we do not distinguish
between the two terms of "node" and "variable". The set of
all the possible values of a random variable V, is also
called the state space of v" and the size of v,'s state space
is denoted by w(V,). An edge pointing from node V, to
node Jij represents the causal relationship between cause
event V, and effect event Jij. We also say that V, is a parent
978-1-4244-8217-7110/$26.00 ©201O IEEE
lIS 2010
510
positions found by Pk and by the swarm of all particles
until time step t respectively. At time step t, the particles
(a) a Bayesian network
located in Poi
can be called as "best particles". rl and
g best
r2 are two random real numbers. w, Cl and C2 are three
parameters given in advance.
Given two position vectors Pos; and Posj' the
subtraction of Pos; Posj is calculated by the following
procedure.
and Pos,= (V] ,... ,V])
Input: Pos=
; (V;1,...,V;)
J
1
n
n
(b) moral graph
'
Output: a sequence SS of swap operations
Begin Procedure
1. SS<-NULL;
2. For k=1 to n
Begin
2.1 If V;/T0 then
(d) j unction tree
(c) triangulation
Figure
I.
Begin
2.1.1 In Posj, find
Example of a Bayesian network, its moral graph,
triangulation and junction tree.
2.1.2 Add
The optimal elimination ordering is an ordering
according to which a triangulation with minimum state
space size can be generated by the above ELIMINATION
procedure. For convenience, we also usej(ORD) to denote
the state space size of the triangulation generated by the
ELIMINATION procedure according to an ordering ORD.
The problem of finding an optimal elimination ordering is
NP-hard [3].
To fmd good elimination orderings in a fast way,
some heuristics, such as minimum fill, minimum size and
H2, have been proposed [4][5]. These heuristics can be
used to generate elimination orderings. Given an
undirected graph H, a node in H can be selected by a
heuristic if the corresponding evaluation function gives
the minimal value for that node. For example, the
minimum size heuristic evaluate a node with the number
of its neighbors plus 1, so the evaluation value of seven
nodes in Fig. 1 (b) are 3, 3, 3, 4, 4 and 3 respectively.
Therefore, A, B, Cor F can be selected as the node to be
eliminated. After eliminating the selected node, another
node is selected and eliminated in the same way. By
repeating such a greedy procedure, an elimination
ordering can be obtained.
III.
A DISCRETE PSO ALGORITHM
In this section,we present a discrete PSO algorithm for
fmding good elimination orderings. In our DPSO
algorithm,the position vector of each particle is defined as
an elimination ordering, and the function computing state
space size is used to evaluate a particle. For a particle Ph
the following equations are used to calculate its velocity
and position.
t+
t
t
t
Vel 1 =wXVel +c1Xr1X(Pos
-Pos +
;)
k
k
k best
(1)
c2Xr2X(Poig_ es-Poh
k'
b t
Pos I=Pos +Vel 1
(2)
�
�
End
End
End Procedure
V;k and then swap it with T0k;
(V;k T0) to the swap sequence SS;
The swap operation sequence obtained by Pos; Pos)
can be considered as a velocity. The multiplication of a
velocity Vel by a real number r (O:'(r:'( 1) means a swap
operation sequence obtained by selecting the swap
operations in Vel with a probability of r. If r'? 1 then we
let r XVel=Vel. The summation of two velocities is just the
concatenation of the two swap operation sequences. The
summation of a position Pos and a velocity Vel is applying
all the swap operations in Vel to Pos one by one.
The above calculation of particle's position and
velocity is inspired by [11]. Based on such a calculation,
the DPSO algorithm can be described as follows.
Algorithm DPSO
Input: a moral graph eM, a set RS of heuristics.
Output: a best-so-far solution.
Begin Procedure
1. S+-InitSwarm(RS);
2. For t=O to M4X ITER-l
Begin
For each particle Pi< in S
DPSO_Move�);
End
3. Return PosMAXJTER.'
g_best
End Procedure
In DPSO,the initial swarm of particles at time step 0 is
generated by the following procedure of InitSwarm(RS).
Procedure InitSwarm
Input: a set of heuristics RS.
Ouput: a swarm S of particles.
Begin Procedure
1. Generate particles by the heuristics in RS;
for each particle Pk and Poso
2. Calculate Pos O
g_best
k _best
for the swarm;
3. Let m be the number of best particles located in
�
Poso
. Reserve one best particle's position and reset
g_best
other m-J best particles at random positions;
In (1) and (2), Pos and Vel mean the position vector
�
k
�
and the velocity vector of particle Pk at time step t
t
respectively. Pos
and Poi
are the best-so-far
g_best
k _best
4. Update Pos O
for each particle
k _best
for the swarm;
511
Pk
and Poso
g_best
End
6.3.2 Else
Begin
6.3.2.1 if Statusk=STATUS- SA then
Begin
End Procedure
First, all the particles are generated greedily by the
heuristics in RS. The number of particles generated by
each heuristic is about the same. Although particles
generated in such a heuristic way are much better than in a
random way, the shortcoming is obvious. The diversity of
swarm may be much worse because particles generated by
a same heuristic are most likely near to each other or even
at the same position. To avoid many particles aggregating
on the best position, in step 3, the positions of all the best
particles except one of them are randomized.
The DPSO_Move procedure is used to move a particle
in the search space according to (l) and (2).
Procedure DPSO - Move
Input: a particle Pk at time step t.
Begin Procedure
1. Generate two random real numbers rl and r2;
.
t+
t+
2. Calculate Vel 1 and POS 1 usmg (1) and (2);
k
k
t+
t+
3. Update Pos 1 and POS 1 ;
g_besl
k _besl
End Procedure
6.3.2.1.1 Pos <-Pos
End
6.4
;
MAX ITER
position Poi
using simulated annealing through the
g_besl
whole optimization process. The cooling scheme of
simulated annealing can be easily seen from step 4, 5 and
6.4, where To is initial temperature and fJ is cooling rate.
This cooling scheme borrows ideas from Kjrerulff's
simulated annealing methods, detailed discussions can be
found in [4].
Except for Pshadow, other particles in the swarm has two
moving status: STATUS DPSO and STATUS - SA. The
status of particle Pk is identified by Statusk. If a particle's
evaluation value is far from f(Poi
\ the particle will
g_best'
be set in STATUS_DPSO and its moving is under the
control of DPSO_Move procedure presented in section III.
Otherwise it will be set in STATUS_SA and its moving is
controlled by the following SA_Move procedure.
Procedure SA Move
Input: a particle Pk at time step t,temperature T.
Begin Procedure
3. Genearte a particle Pshadow and Poi
<-Poi ;
shadow
g_besl
10l0gJQ2 0-a
7
4. a<- O.1, U<- e ,Y<;
ITER
1. ORD'<-Pos ;
�
5 T. +-y' fJ<-1.·'
0
a
a
2. For i=1 to MAX SA ITER
Begin
2.1 Generate two random integers
•
6. For t=0 to MAX ITER-l
Begin
6.1 SA_Move(pshadow);
n);
6.2 iff(Poi
) then
shadow)<j(Poig_best'
.
Poi
<-Poi
g_besl
shadow'
6.3 For each particle Pk in S
Begin
2.2 (Vip . . . , Vin)<-ORD';
2.3 ORD'<-(Vip , Viu•p Viu+p
• • •
2.4 6<-f(ORD)' -f(Pos ;
j
• • •
U
and w (1 �u<w�
, Viw' Viu' Viw+p ,Vin);
• • •
L::,.
2.5 prob<-min{l, e-r);
2.6 Generate a random real number r3;
2.7 Ifr3<prob then Pos +1<-ORD';
)
f(Poi
) -f(Poi
g best'
k best'
6 3 1 ·1f
0 1 then
<.
1
)
f(Pos
g_best'
Begin
6.3.1.1 if Statusk=STATUS- DPSO then
Begin
1
1
6.3.1.1.1 Pos <-Pos
;
k besl
k
6.3.1.1.2 Statusk<-STATUS- SA;
End
6.3.1.2 SA_Move(phTt);
t+
1
6.3.1.3 Vel 1<-Vel ;
k
k
•
To
7. Return Pos
- ;
g besl
In the begmning of DPSO-SA, the swarm is initialized
(step 1 and 2) and an additional particle Pshadow is produced,
which is used to exploit the area around the global best
Due to lack of effective local search, standard PSO
may possibly converge to a local optimum. The following
algorithm tries to improve the local search ability of DPSO
using simulated annealing.
A�orithm DPSO-SA
Input: a moral graph G ,a set RS of heuristics.
Output: a best-so-far solution.
Begin Procedure
1. S<-InitSwarm(RS);
2. For each particle Pk in S
Statusk<-STATUS_DPSO;
MAX
6.3.2.1.2 Statusk<-STATUS- DPSO;
End
6.3.2.2 DPSO_Move(pk);
End
1; 1+(t+ 1) XfJ
End
A HYBIRD DISCRETE PS�-SA ALGORITHM
IV.
�_besl;
�
�
•
1
1
2.8 EIse Pos +1<-Pos ;
k
k
End
End Procedure
In fact, SA_Move is just a metropolis process which
search around the neighborhood of
Pos� at given
temperature T. The neighboring solution is constructed by
selecting two variables Viu and Viu and then moving Viu
backwards to the position after Viw (step 2.1 2.3).
�
512
V.
EXPERIMENT RESULTS
Algorithm
We test the performance of DPSO-SA on four
Bayesian networks: Water, Mildew, Barley and Muninl,
which are downloaded from http://bndg.cs.aau.dk!. DPSO­
SA algorithm is implemented in C++. Its parameter setting
listed in Table I. In Table I, n stands for the number of
nodes in the Bayesian network. On each network, DPSO­
SA is performed 50 times. The best, average and deviation
of 50 running results for each network are listed in Table II
to Table V and compared with other swarm intelligence
methods: GA-ALL, TAGA, ACSF, ACSv, ACSCM and
ACSALL presented in [7] and [8].
TABLE I.
Parameter
Value
5n
MAX SA ITER
100
w
0.4298
c]
0.69618
C2
0.69618
RS
{minimum fill, minimum weight, H2}
TABLE II.
Algorithm
3,028,305
3,302,154.7
18,515.3
TAGA
3,028,305
3,192,906.7
173,655.3
ACSF
3,028,305
3,175,424.7
22,917.8
ACSy
3,028,305
3,360,359.6
7,927.7
ACSCM
3,362,268
3,438,780.4
8,157.7
ACSALL
3,028,305
3,226,672.2
23,607.3
DPSO-SA
3,028,305
3,028,796.5
3,475.6
Algorithm
Best
Average
GA-ALL
3,400,464
3,421,822,1
2,844.7
TAGA
3,400,464
3,532,394.8
198,092.7
ACSF
3,400,464
3,473.817.4
13,714.7
ACSy
3,400,464
3,418,569.0
8,127.5
ACSCM
3,400,464
3,400,464.0
0.0
3,400,464
3,403,434.2
1,925.1
DPSO-SA
3,400,464
3,400,464.0
0.0
COMPARISON ON BARLEY
Best
Average
GA-ALL
17,140,796
17,199,695.6
16,167.0
17,140,796
17,188,307.7
93,097.1
ACSF
17,140,796
17,147,217.7
2,137.2
ACSy
17,140,941
17,465,208.3
132,182.4
ACSCM
17,140,796
17,272,460.4
11.407.2
ACSALL
17,140,796
17,161,703.3
3,898.0
DPSO-SA
17,140,796
17,140,796.0
0.0
Algorithm
COMPARISON ON MUNINI
Best
Average
Deviation
83,735,918
103,269,358.0
3,180,680.3
TAGA
88,968,090
126,839,236.9
23,575,586.9
ACSF
85,352,183
101,328,156.5
1,709,201.5
GA-ALL
99,480,808.5
872,175.0
86,934,403
105,243,863.6
1,548,232.2
DPSO-SA
83,735,758
83,736,638.7
515.7
VI.
CONCLUSION
[1]
Li Feng, Wei Wang, Lina Zhu, Yi Zhang. Predicting intrusion goal
using dynamic Bayesian network with transfer probability
estimation. Journal of Network and Computer Applications, 2009,
32(3): 721-732.
[2]
Dechter R, Mateescu R. AND/OR search spaces for graphical
models. Artif. Intell. 2007, 171(2-3): 73-106.
[3]
Wen W X. Optimal decomposition of belief networks. UAI
1990:245-256.
[4]
Kjrerulff U. Triangulation of graphs-algorithms giving small total
state space. R90-09, Department of Mathematics and Computer
Science, Aalborg University, 1990.
[5]
Cano A, Moral S. Heuristic Algorithms for the Triangulation of
Graphs. IPMU 1994:98-107.
[6]
Larraftaga P, Kuijpers C M H, Poza M, Murga R H. Decomposing
Bayesian networks: triangulation of the moral graph with genetic
algorithms. Statistics and Computing, 1997, 7(1): 19-34.
[7]
Wang H, Yu K, Wu X H, Yao H L. Triangulation of Bayesian
Networks Using an Adaptive Genetic Algorithm. ISMIS 2006:127136.
[8]
Gamez J A, Puerta J M. Searching for the best elimination
sequence in Bayesian networks by using ant colony optimization.
Pattern Recognition letters, 2002, 23(1-3): 261-277.
[9]
Romero T, Larraftaga P. Triangulation of Bayesian networks with
recursive estimation of distribution algorithms. Int. 1. Approx.
Reasoning, 2009, 50(3): 72-484.
Deviation
TAGA
TABLE V.
87,254,224
ACSALL
From Table II�V, it can be seen that DPSO-SA
achieves the best average results on all networks and all
the deviations obtained by DPSO-SA are very small.
Therefore, we can say that DPSO-SA is a very effective
and robust method for the elimination ordering problem.
Deviation
ACSALL
Algorithm
ACSCM
REFERENCES
COMPARISON ON MILDEW
TABLE IV.
1,520,758.1
This work was supported in part by NSFC Major
Research Program under Grant Nos. 60496320 and
60496321, Basic Theory and Core Techniques of Non
Canonical Knowledge; NSFC under Grant Nos. 60873148
and 60973089; European Commission under Grant No.
THiAsia Link/OlQ (111084); and the Science Foundation
for Young Scholars of Jilin Province, China, under Grant
Nos. 20080107, 20080607 and 20090108. Thanks all of
them.
Deviation
Average
GA-ALL
TABLE III.
Deviation
103,875,447.1
ACKNOWLEDGMENT
COMPARISON ON WATER
Best
Average
84,586,392
In this paper, a hybrid swarm intelligence algorithm
named DPSO-SA is proposed to fmd close-to-optimal
elimination orderings for Bayesian networks. DPSO-SA is
a discrete particle swarm optimization method enhanced
by simulated annealing. Computational experiments show
that DPSO-SA is more effective and robust than other
existing swarm intelligence methods.
THE PARAMETER SETTING OF CCSHGA AND
CCSHGA-N ALGORITHMS
MAX ITER
Best
ACSy
[10] Finn Verner Jensen, Frank Jensen: Optimal junction Trees. UAI
1994:360-366.
[11] KangPing Wang, Lan Huang, ChunGuang Zhou, Wei Pang.
Particle swarm optimization for traveling salesman problem.
ICMLC 2003: 1583-1585.
513