The max-sum Algorithm for Factor Graph

The max-sum Algorithm for
Factor Graph
Hauptseminar
von
Dayou Wang
geb. am 01.10.1981
wohnhaft in:
Berg-am-laim-str.84
81673 München
Tel.: 0176 31245778
Lehrstuhl für
STEUERUNGS- und REGELUNGSTECHNIK
Technische Universität München
Univ.-Prof. Dr.-Ing./Univ. Tokio Martin Buss
Univ.-Prof. Dr.-Ing. Sandra Hirche
Betreuer: Betreuer: M.Sc. Indar Sugiarto
Beginn:
22.10.2012
Abgabe: 18.01.2013
Abstract
Graphical model is a natural tool used to evaluate the statistical relationships between the random variable exploiting the graph theory.In directed and undirected
graph which are the two major classes of graphical model,a global function(joint
distribution) can be expressed into a product of a local functions(marginals).Based
on factor graph,the sum-product algorithm is studied,which provides a efficient algorithm to evaluate the local marginal distribution by using the idea of messages
passing.Finally,a variation of sum-product algorithm—max-sum algorithm will be
introduced,which calculate the maximal value of the joint distribution and the corresponding variables.
Zusammenfassung
Grafisches Model ist ein natürliches Werkzeug,das mit Ausnutzung der grafische
Theorie verwendet wird zum Auswerten der statistischen Beziehungen zwischen die
Zufallsvariablen.In gerichtetem Graph und ungerichtetem Graph, die zwei haupte
Unterklasse von grafischem Model sind, eine globale Funktion(multivariate Verteilung)
kann als ein Produkt von lokalen Funktionen(Marginalverteilungen) formuliert werden. Basierend auf Faktor Graph wird der sum-product Algorithmus einstudiert,der
unter Verwendung einer Idee von messages passing ein effiziente Algorithmus zum
Auswerten der lokale Marginalverteilung liefert.Schlieβlich wird eine Variante von
sum-product Algorithmus—max-sum Algorithmus eingeführt,der den maximale Wert der multivariate Verteilung und die beziehende Variblen rechnet.
CONTENTS
3
Contents
1 Introduction
5
2 Graphical Models
2.1 Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Undirected Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
7
3 Inference Problem
9
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Inference Problem On Markov Chain . . . . . . . . . . . . . . . . . . 9
3.3 Messages And Message Propagating . . . . . . . . . . . . . . . . . . . 10
4 Tree And Factor Graph
4.1 Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Factor Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Convert To Factor Graph . . . . . . . . . . . . . . . . . . . . . . . .
11
11
11
11
5 The Sum-Product Algorithm
13
6 The Max-Sum Algorithm—Variation Of Sum-Product
15
7 Summary
17
List of Figures
19
Bibliography
21
5
Chapter 1
Introduction
This seminar introduces the sum-product algorithm which operates in the factor
graph. In a factor graph a global function(joint distribution) of a set of variables
can be factorized into a product of local functions(marginal), each of that depends
on a subset of variables.The sum-product algorithm provides an efficient computation to evaluate local functions using a message-propagating algorithm in a factor
graph. Thereafter, the max-sum algorithm which is a variation of sum-product algorithm will be introduced to finde the maximum value of the global function and
the variables that maximizes the global function.
7
Chapter 2
Graphical Models
Graphical models consist of nodes connected by links.Each node represents a random
variable and a link probabilistic relationship between the variables.Usually there are
two major classes of graphical models——directed graph and undirected graph.
2.1
Directed Graph
In the directed graph which is known as bayesian network,links carry arrows from
parent nodes to children nodes.The joint distribution is written as a product of a
conditional distribution over all of the nodes,each such conditional distribution is
conditioned on the parent nodes of the corresponding node in the graph.
2.2
Undirected Graph
Undirected graph also called as markov network comprises a set of nodes which are
connected by links without arrows. The joint distribution is defined as product of
potential functions Ψc (XC ) over the maximal cliques C of the graph,with the formel
p(X) =
1Y
Ψc (XC ).
z C
where Z is a normalization constant and defined by Z =
(2.1)
P
X
Q
C
Ψc (XC ).
9
Chapter 3
Inference Problem
3.1
Problem Statement
Not all of the processes can be directly oberserved and discribed, but with the aid of
some oberservable processes which have statistical relationship to the unoberservable
processes,the truth of the unoberserved processes can be told. That is a inference
problem. A classical example is that we can evaluate the weather by analyzing the
moisture of the earth in a area. The main goal of inference problem is to evaluate
the posterior distribution p(x|y) using bayes’ theorem p(x|y) = (p(y|x)p(x))
, with
(p(y))
the given conditional distribution p(y|x) and prior distribution p(x). The marginal
distribution p(y) is derived by marginalization of joint distribution p(x, y) over all
the x.Then the problem to calculate the posterior distribution is translated to be
expressed in term of p(x) and p(y|x), which are already assumed.
390
3.2
Inference Problem On Markov Chain
8. GRAPHICAL MODELS
x1 involving
x2 the
Figure 8.32 (a) Example
a directed
In a ofmore
complex case
graph.
(b) The equivalent undirected (a)
in this undirected graph is given by
graph.
(b)
Exercise 8.14
Section 8.4
x1
x2
xN joint distribution
−1
so-called MarkovxN
chain.
The
xN
xN −1
Figure
3.1: or by choosing nodes at
instance by repeatedly raster scanning through
the image,
random.
If we have a sequence of updates in which every site is visited at least once,
and in which no changes to the
1 variables are made, then by definition the algorithm
= maximum
ψ1,2 (x1 , xof2 )ψ
(x2 , x3 )...ψ
, xN ).
(3.1)
N −1,N
N −1
will have convergedp(X)
to a local
the2,3
probability.
This
need(x
not,
however,
z
correspond to the global maximum.
By For
definition,
the of
marginal
forillustration,
the nodewe
xnhave
is the
summation
overtoall
the purposes
this simple
fixed
the parameters
be of the variable
, 2.1 and h = 0. Note that leaving h = 0 simply means that the prior
βexcept
= 1.0,xηn=
X Starting
X X
X
X
probabilities of the two states of xi are 1equal.
with
the observed
noisy image
p(xn ) =
...
...
p(X).
(3.2)
as the initial configuration, we run ICM
until
convergence,
leading
to
the
de-noised
z x1 xn−1 xn xn+1 xN
image shown in the lower left panel of Figure 8.30. Note that if we set β = 0,
which effectively removes the links between neighbouring pixels, then the global
most probable solution is given by xi = yi for all i, corresponding to the observed
noisy image.
Later we shall discuss a more effective algorithm for finding high probability solutions called the max-product algorithm, which typically leads to better solutions,
although this is still not guaranteed to find the global maximum of the posterior dis-
10
3.3
CHAPTER 3. INFERENCE PROBLEM
Messages And Message Propagating
396
GRAPHICAL
To 8.
evaluate
p(xn ) MODELS
we substitute the factorized expression for p(X) into p(xn ) and
rerrange the summation and multiplication. We can do that because for example the
P
(xMODELS
the
only one in that the xN appears. The desired
the
desired
marginal
form
N −1,N
N
−1 , xN )inisthe
xN ψ
396 term8. GRAPHICAL
1
p(xn ) =
the desired
 Zmarginal in the form "
## 
"
X
X
X1
ψ1,2 (x1 , x2 ) · · · 
ψ2,3 (x2 , x3 )
p(xn ) = ψn−1,n (xn−1 , xn ) · · ·
Z
x
x
x
n−1
## }
"1
" 2 {z
| X
X
X

(x2n,3)(x2 , x3 )
ψ1,2 (x1 , x2 ) · · · 
ψn−1,n (xn−1 , xn ) · · · µα ψ


" x2
x1 #
x
X {z
Xn−1
|
}

(8.52)
ψN −1,N (xN −1 , xN ) · · ·  .
ψn,n+1 (xn , xn+1 ) · · ·
µα (xn )
xn+1

# }
"xN
| X
{z
X

(8.52)
ψn,n+1 (xn , xn+1 ) ·µ· β· (xn ) ψN −1,N (xN −1 , xN ) · · ·  .
xN
xn+1
| is encouraged to study this{z
}
The reader
re-ordering carefully as the underlying
idea
µ
(x
)
forms the basis for the later discussionβ ofnthe general sum-product algorithm. Here
the key concept that we are exploiting is that multiplication is distributive over addiThesoreader
tion,
that is encouraged to study this re-ordering carefully as the underlying idea
forms the basis for the later discussion
the+general
sum-product algorithm.
Here
ab + ac =ofa(b
c)
(8.53)
marginal p(xthe
canconcept
be splited
parts is
µαthat
(xnmultiplication
)and µβ (xn )is which
canover
be viewed
n ) key
that weinaretwo
exploiting
distributive
addiintion,
which
left-hand
side involves
three arithmetic
whereas
the right- we
as a local message
passed
forwards
and backwards
alongoperations
the chain.
Rekursively,
so the
that
hand
side
reduces
this
to
two
operations.
have
ac = a(b + c)
(8.53)
X ab +cost
Let us work out
the computational
of
evaluating
the
required marginal using(3.3)
µ
(x
)
=
ψ(x
,
x
)µ
(x
)
α n
n
in re-ordered
which the expression.
left-hand
side
involves
three arithmetic
operations whereas
the rightthis
We
have to n−1
perform
Nα− 1n−1
summations
each of which
is
xn−1
hand
reduces
this of
to which
twoX
operations.
over
Kside
states
and each
involves a function of two variables. For instance,
Let us work
computational
cost
evaluating
the summation
over
xthe
involves
only
function
ψ1n+1
, x2required
), whichmarginal
is a tableusing
of(3.4)
µout
=
ψ(xthe
, xof
1n )
,2 (x)1the
β (x
n+1
n )µ
β (x
re-ordered
have
to
perform
N
−
1
summations
each
of
which
is
Kthis
×K
numbers.expression.
We have toxWe
sum
this
table
over
x
for
each
value
of
x
and
so
this
n+1
1
2
over
K 2states
each
of which
involves
functionisofmultiplied
two variables.
instance,
has
O(K
) cost.and
The
resulting
vector
of K anumbers
by theFor
matrix
of
By definition
the
normalization
ZO(K
isthe
then
obtained
by xsumming
µαa(x
2 function
n )µof
β (xn )
the
summation
x1 constant
involves
only
ψ1there
is
table
numbers
ψ2,3 (x2 ,over
x3 ) and
so is again
). Because
N which
− 1 summations
,2 (x1 ,are
2 ),
over all states
procedure
is evaluating
illustrated
as
below.
8.4.kind,
Inference
intable
Graphical
397
K multiplications
×ofK xnumbers.
have
to sum
this
x1Models
for eachthe
value
of
x2 and
so) this
and
ofrekursive
this
the
total
costover
of
marginal
p(xThis
is ren . ThisWe
n
2
2
has K
O(K
) cost.
The resulting
vector
of K
numbers
is multiplied
by the matrix
O(N
). This
is linear
in the length
of the
chain,
in contrast
to the exponential
costof
2
ψ2,3 (x2 , x3We
) and
so istherefore
again O(K
).able
Because
there the
are N
− 1conditional
summations
ofnumbers
a naive approach.
have
been
to exploit
many
gure 8.38 The marginal distribution
µβ (xn )
µβ (xn+1 )
µα (xn−1 )
µα (xn )
and
multiplications
of
this
kind,
the
total
cost
of
evaluating
the
marginal
p(xn ) is
independence
properties
of
this
simple
graph
in
order
to
obtain
an
efficient
calculaxn ) for a node xn along the chain is ob2
O(NIfKthe
). graph
This ishad
linear
in fully
the length
of the chain,
in contrast
the no
exponential
cost
ned by multiplying the two messages
tion.
been
connected,
there would
have to
been
conditional
(xn ) and µβ (xn ), and then normalizof a naive
approach.
We
have
therefore
been
able
to
exploit
the
many
conditional
independence
properties,
and
we
would
have
been
forced
to
work
directly
with
the
x1
xn−1
xn
xn+1
xN
. These messages can themselves
independence
properties of this simple graph in order to obtain an efficient calculafull
joint distribution.
evaluated recursively by passing mestion.
thegive
graph
had beeninterpretation
fully connected,
there
would have
beenofnothe
conditional
We
a powerful
of this
calculation
in terms
passing
ges from both ends of the chain
to- Ifnow
we would
have
been
to the
work
directly with
the
ofindependence
local messagesproperties,
around onand
the graph.
From
(8.52)
weforced
see that
expression
for the
rds node xn .
full jointp(x
distribution.
marginal
)
decomposes
into
the
product
of
two
factors
times
the
normalization
n
3.2: of this calculation in terms of the passing
We now give a powerful Figure
interpretation
constant
along the chain
to node
xn from
nodeonxthe
. Note
that each
thesee
messages
com1 From
n+1graph.
of local
messages
around
(8.52)ofwe
that the expression
for the
p(xn ) =
µα (xn )µβthe
(x ).
(8.54)
prises a set of
K values,
for each choice
xproduct
two the
mesn , and so
marginal
p(xone
into of
theZ
of twonproduct
factors of
times
normalization
n ) decomposes
sages should
beorder
interpreted
as the point-wise
of theprovides
elements of
constant
arranging
the
of summations
andmultiplication
multiplications
a the
foundation for
We shall
interpret
a message
α (x
1passed forwards along the chain from node
two messages
to give
anotherµset
ofn )Kasvalues.
p(x
)
=
µ
(x
)µ
(x
).
(8.54)
sum-product
algorithm.
x
to
node
x
.
Similarly,
µ
(x
)
can
be
viewed
as
a
message
passed
backwards
α n β n
n−1 µ (x ) n
β nn
The message
α n can be evaluated recursively
Z because


We shall interpret µα (xn ) as a message passed forwards along the chain from node
X
X
xn−1µto(x
node xn . Similarly,
(xn ) can
beviewed
 a message passed backwards
ψn−1µ,nβ(x
· · ·as
α n) =
n−1 , xn )
xn−1
X
xn−2
=
ψn−1,n (xn−1 , xn )µα (xn−1 ).
(8.55)
c Christopher M. Bishop
xn−1 (2002–2006). Springer, 2006. First printing.
Further information available at http://research.microsoft.com/∼cmbishop/PRML
We therefore first evaluate
11
Chapter 4
Tree And Factor Graph
4.1
Tree
The messages can be passed on a more complex structure called trees.A tree is
defined as a graph in which there is only one link between any two nodes for an
undirected graph.In the case of directed graphs a tree has a unique root node which
has no parents,and all other nodes have one parent.
4.2
Factor Graph
Both undirected and directed trees can be converted to be a new graphical construction called factor graph,which consists of variable and factor nodes.A node is always
connected by an undirected link to the opposite nodetype.
Compatibly,a joint distribution can be expressed as a product of factors over subsets
Q
of variables with formel p(X) = s fs (Xs ),where fs denotes the factor and Xs the
set of variables,on which the factor depends.This factor is the local conditional distribution p(xk |pak ) in directed graph and the potential function ϕc (xc ) in undirected
graph.The undirected links connect the factor node to all of the variable nodes on
which this factor depends.Note that there are no loops in factor graph should be
strictly restricted.
4.3
Convert To Factor Graph
To convert a undirected graph to a factor graph,we just create the variable nodes
corresponding to the undirected graph and factor nodes corresponding to the potential function of maximal cliques,at last draw the appropriate links.In the case
of directed graph,the variable nodes are created similarly and then factor nodes
corresponding to conditional distributions are added.
13
Chapter 5
The Sum-Product Algorithm
The sum-product algorithm is an efficient inference algorithm to calculate the local
marginal p(x) exploiting of tree structed factor graph.The tree structure of factor
graph enable us to partition the factors into groups.One group is associated with
each factor node fs which is a neighbour of the variable node x.Then the joint
distribution can be expressed as below,
Y
p(X) =
Fs (x, Xs )
(5.1)
s∈ne(x)
ne(x) denotes the set of factor nodes which are neighnours of x and Xs the set of
8. GRAPHICAL MODELS
Figure 8.46 A fragment of a factor graph illustrating the
evaluation of the marginal p(x).
Fs (x, Xs )
µfs →x (x)
fs
x
fs , and Fs (x, Xs ) represents the product of all the factors in the group associated
with factor fs .
all variable nodes connected to factor node fs except x.Fs (x, Xs ) is the product of
Substituting (8.62) into (8.61) and interchanging the sums and products,
we obP
all the factors associated with fs . Substituting (5.1) into p(x) = X/x p(X) and
tain
rearranging summations "and products#we obtain
Y X
 Xs )

p(x) =
Fs (x,
Y
Y
s∈ne(x) XsY

p(X)
Fs (x, Xs ) =
µf s (x).
(5.2)
Y =
=
µfs∈ne(x)
(x). X/x
(8.63)
s∈ne(x)
s →x
s∈ne(x)
Here µf s (x)a is
defined as messages from factor node to the variable node.Note that
Here we have introduced
set of functions µfs →x (x), defined by
Fs (x, Xs ) is the product X
of factors and self can be factorized as The set x,x1 ,x2 ...xM
µfs →x (x) ≡
Fs (x, Xs )
(8.64)
Xs
which can be viewed as messages from the factor nodes fs to the variable node x.
We see that the required marginal p(x) is given by the product of all the incoming
messages arriving at node x.
In order to evaluate these messages, we again turn to Figure 8.46 and note that
each factor F (x, X ) is described by a factor (sub-)graph and so can itself be fac-
,
=
,
,…
( ,
14
)...
,
(
) (11) CHAPTER 5. THE SUM-PRODUCT ALGORITHM
8.4. Inference in Graphical Models
tration of the factorization of the subgraph asated with factor node fs .
405
xM
µxM →fs (xM )
fs
x
µfs →x (x)
xm
Gm (xm , Xsm )
The set { ,
=∑
,…
} is the set of variables on which the factor depends.Then, …∑
,
,…
∏
re ne(fs ) denotes the set of variable nodes that are neighbours of the factor node
and ne(fs ) \ x denotes the same
but∑with node, x ,removed.
∏Here we \have
=∑ set …
…
ned the following messages from variable nodes to factor nodes
\
∑
,
)\x denotes the set of all the neighbour variable nodes of the factor node with where ne(
X
µxm→fs (xm
)
≡
G
(8.67)
m (xm , Xsm ).
node x removed. Xsm
is the set of variables
on which the factor fs depends. Then, where ne(fs )/x denotes
Similarly to ,
is the message from variable node to factor node and the set of all the neighbour variable nodes of the factor node fs with node x removed.
have therefore introduced two distinct
kinds
of
message,
those
that
go
from
factor
) is the message from variable node to factor node
Similarly todefined by µfs (x),
, µxm →f
→x (x)
s (x
es to variable nodes denoted µf →x
and those
that
gomfrom variable nodes to
defined
or nodes denotedand
µx→f
(x). In by
each case, we
see that messages passed
=∑
along a
,
are always a function of the variable associated with the variable X
node that link
nects to.
Gm (x in Xsm )is the product of terms (5.3) ,
µxm →fs (xmwith ) = node associated each of m , turn ,
X
sm to a variThe result (8.66) says that to evaluate
the
message
sent
by
a
factor
node
which is associated with one factor node ,which is its neighbouring factor excluding node node along the link connecting them, take the product of the incoming messages
Gm (xm , Xsm ) of node associated with
node xm in turn is the product of terms Fl (xm , Xsm ),
,so similarlly we obtain ng all other links coming into the factor node, multiply
by the factor associated
of which isover
associated
one associated
factor node
l ,which is its neighbouring factor
h that node, andeach
then marginalize
all of thewith
variables
withfthe
∏
∑
=
node fin
node8.47.
xm It,sois similarly
wenote
obtain
oming messages.excluding
This is illustrated
Figure
important
to
\
,that
s of
ctor node can send a message to a variable node once
Yit has received incoming Y
=∏
sages from all other neighbouring
nodes.
\
(x
)
=
(F µf l→xm (xm )
(5.4)
µxmvariable
→fs
m
l (xm , xml )) =
lne(x
)/f
lne(x
)/f
m
s from variable nodes
m
s
Finally, we derive an expression for evaluating the messages
actor nodes, again by making use of the (sub-)graph factorization. From FigThe
computation
with
designating
nodebyasaroot and initiating the messages
8.48, we see that
term
Gm (xm , Xsmbeginns
) associated
with
node xm isagiven
duct of terms Fl (x
, Xml )leaves
each associated
with onedefinition
of the factor
fl that is
atmeach
of the tree.By
ofnodes
initialization,we
obtain µx→f (x)=1 if the leaf
ed to node xm (excluding
node
f
),
so
that
s
node is a variable node ,and µ
if the leaf node is a factor node. Then we
x→f (x)=f (x)
propagate the messages
recursively until the messages are passed along every link
Y
F
)
G
(x
,
X
)
=
and
node has received
messages
from(8.68)
all of its neighbouring nodes.Once
l (xm , Xml
m the
m root
sm
l∈ne(xm )\fs is complete,we can then propagate messages from the root
the message propagation
node out to the leaf nodes as before until every leaf node has received messages from
re the product isalltaken
over
all neighbours nodes.Subsequently
of node xm except for we
nodecan
fs . calculate a local marginal p(x).
of its
neighbouring
e that each of the factors Fl (xm , Xml ) represents a subtree of the original graph
To specify normalization constant Z, we firstly run the sum − product algorithm to
recisely the same kind as introduced in (8.62). Substituting (8.68) into (8.67),
find the unnormalized marginals and Z is obtained by local computation over any
one of these marginals.
Christopher M. Bishop (2002–2006). Springer, 2006. First printing.
her information available at http://research.microsoft.com/∼cmbishop/PRML
15
Chapter 6
The Max-Sum
Algorithm—Variation Of
Sum-Product
The max-sum algorithm is used to find a set of variables who maximizes the joint
probability and the value of this maximum probability. We define
X max = argx maxp(X)
(6.1)
P (X max ) = maxx p(X) = maxx1 ...maxxM p(X)
(6.2)
Firstly we modify the sum-product algorithm to obtain the max-product algorithm
by replacing the summation operator with maximum operator in messages µf →x (x)
and µx→f (x). As before we arbitrarily choose a node as root and pass the messages
from leaves to root until they all reach the root node. At the root we obtain the
maximum probability by multipling all the incoming messages.


P max = maxx 
Y
µf →x (x)
(6.3)
s∈ne(x)
Usually we prefer to use logarithm in scientific computation to avoid potential underflow. Because of the monotonic property of logarithm,what means if a > b the
lna > lnb , we can therefore interchange the max and the logarithm operators. This
performing ln(maxx p(X)) = maxx lnp(X) has a effect of replacing the products with
sums. Therefore, we name this algorithm max-sum algrorithm. Finally, we obtain

P max = maxx 

X
µf →x (x)
(6.4)
s∈ne(x)
Now we should find out the maximizing variable set X max . The most probable value
for root xN is then given by
xmax
= argxN max [µfN →xN (xN )]
N
(6.5)
16
CHAPTER 6. THE MAX-SUM ALGORITHM—VARIATION OF SUM-PRODUCT
To determine the states of the previous variables of the maximizing configuration
we use back-tracking, what is a application of dynamic programming. Each node
has K possible states,for each state of a variable there is a unique state function
involves it self and the previous variable which maximize the probability, defined by
h
i
Φ(xn ) = arg maxxn−1 lnfn−1,n (xn−1 , xn ) + µxn−1 →fn−1 (xn ) .
(6.6)
Beginning at the root node xN we can then simply find the most probable state of
max
previous node xN −1 using xmax
n−1 = Φ(xN ) and continue the operation back to the
leaf node x1 step by step. Finally we obtain the global maximizing configuration
xmax
... xmax
1
N .
17
Chapter 7
Summary
Obviously, the sum-product algorithm has a great advantage at computation complexity than the navie calculation in finding a single marginal.
Suppose that there are N nodes in the chain each with K states.If we perform this
summation explicitly to calculate p(xn ),we obtain for N variables K N values. This
naive calculation with the exponential scale is very wateful and could cause a overload in computation. In sum-product algorithm we just need to computation twice
the number of links in the tree, which saves a lot of computation time and resources.
It is also interesting to compare the sum-product algorithm and max-sum algorithm.
To compute p(x) with sum-product algorithm we need to sotre all of the messages
from the neighbouring nodes of x towards x. And if we use max-sum algorithm,
every time we perform a max operation, we should store the settings of the children nodes that led to the maximum. Because of its high computation efficiency,the
max-sum algorithm is widely applied. For instance when applied in Hidden Markov
Model,the max-sum algorithm is then known as viterbi algorithm.
BIBLIOGRAPHY
21
Bibliography
[1] Frank R. Kschischang, Brendan J. Frey, Hans-Andrea Loeliger. Factor Graphs
and the Sum- Product Algorithm., In IEEE Transaction on Information Theory,
vol. 47, no. 2, pp. 498-519, February. 2001.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning., Chapter
8. Springer. pp.359?418. ISBN 0-387-31073-8.