Algebra of reversible Markov chains

Algebra of reversible Markov chains
Giovanni Pistone
Maria Piera Rogantin
No. 167
December 2010
www.carloalberto.org/working_papers
© 2010 by Giovanni Pistone and Maria Piera Rogantin. Any opinions expressed here are those of the
authors and not those of the Collegio Carlo Alberto.
Working paper version 1.1
History: Wogas 2, U. Torino, CREST-SBM 2, arXiv:1007.4282v1
Algebra of reversible Markov chains
Giovanni Pistone · Maria Piera Rogantin
September 2010
Abstract We prove that the Kolmogorov’s conditions for reversibility define a toric ideal. We derive new parameterizations for
reversible Markov chains.
1 Introduction
A transition matrix Pv→w , v, w ∈ V , satisfies the detailed balance conditions if κ(v) > 0, v ∈ V , and
κ(v)Pv→w = κ(w)Pw→v ,
v, w ∈ V.
It follows that π(v) ∝ κ(v) is an invariant probability and the Markov chain Xn , n = 0, 1, . . . , with invariant probability π and
transition matrix Pv→w , v, w ∈ V , has reversible two-step joint distribution
P (Xn = v, Xn+1 = w) = P (Xn = w, Xn+1 = v) ,
v, w ∈ V,
n ≥ 0.
Reversible Markov Chains (MCs) are important in Statistical Physics, e.g. in the theory of entropy production, and in applied
probability, e.g. the simulation method Monte Carlo Markov Chain (MCMC).
In Section 2 we recall some basic notion from Dobrushin et al (1988), (Strook 2005, Ch 5), Diaconis and Rolles (2006),
Hastings (1970), Peskun (1973), Liu (2008).
In Section 3 and discuss the algebraic theory that results from the detailed balance conditions. The exposition is intended for
both algebrists and probabilists, because the results pertain to the area of Algebraic Statistics, see e.g. Pistone et al (2001), Drton
et al (2009), Gibilisco et al (2009). This justifies the abundance of elementary examples and tutorial sections.
2 Background
2.1 Reversible process and the graph of 2-distributions
Let V be a finite state space with elements v, w . . . and #V = N. The stochastic process (Xn )n≥0 with state space V is 2-reversible
if
P (Xn = v, Xn+1 = w) = P (Xn = w, Xn+1 = w) , v, w ∈ V, n ≥ 0.
In particular, the process is 1-stationary: by summing over w ∈ V , we have:
P (Xn = v) = P (Xn+1 = v) = π(v),
v ∈ V, n ≥ 0.
Let V1 be the set of all singleton of V and V2 be the set of all subset of V of cardinality 2. The following parametrization of
the 2-dimensional distributions has been used in Diaconis and Rolles (2006):
θ{v} = P (Xn = v, Xn+1 = v) ,
{v} ∈ V1 ,
θ{v,w} = P (Xn = v, Xn+1 = w) + P (Xn = w, Xn+1 = v) = 2P (Xn = v, Xn+1 = w) ,
G. Pistone
Collegio Carlo Alberto
Via Real Collegio, 30
10024 Moncalieri, Italy
E-mail: [email protected]
M.P. Rogantin
DIMA Università di Genova
Via Dodecaneso, 35
16146 Genova, Italy
E-mail: [email protected]
{v, w} ∈ V2 .
2
The number of parameters is N +
G. Pistone, M.P. Rogantin
N
2 ,
N+1
2 ;
i.e.
1=
∑
moreover it holds
P (Xn = v, Xn+1 = w) =
v,w∈V
∑
θ{v} +
∑
θ{v,w} ,
{v,w}∈V2
{v}∈V1
hence θ = (θ{v} : {v} ∈ V1 , θ{v,w} : {v, w} ∈ V2 ) belongs to the simplex ∆ (V1 ∪V2 ).
We assume now we are given the undirected connected graph G = (V, E ) such that P (Xn = v, Xn+1 = w) = 0 if {v, w} ∈
/ E.
The relevant parameter θ = (θ{v} : {v} ∈ V1 , θ{v,w} : {v, w} ∈ E ) now belongs to the simplex ∆ (V1 ∪ E ).
The probability π can be written using the θ parameters:
π(v) =
1
∑ P (Xn = v, Xn+1 = w) = θ{v} + 2
w∈V
∑
θ{v,w}
w : {v,w}∈E
or, in matrix form,
1
π = θV + Γ θE ,
2
where Γ is the incidence matrix of the graph G.
Example 1 (Running example) Consider the graph G = (V, E ) with V = {1, 2, 3, 4} and E = {{1, 2} , {2, 3} , {3, 4} , {1, 4} , {2, 4}},
see left side of Figure 1. Here
{1, 2} {2, 3} {3, 4} {1, 4} {2, 4}


1
1
0
0
1
0
2  1
1
0
0
1 

2Γ = 
3  0
1
1
0
0 
4
0
0
1
1
1
Proposition 1 1. The map
θV
θV
1
∈ ∆ (V1 )
γ : ∆ (V1 ∪ E ) 3 θ =
7 → π = IV1 2 Γ
−
θE
θE
is a surjective Markov map.
2. The image of (0, θE ), θE ∈ ∆ (E ), is the convex hull of the half point of each edge of ∆ (V ) which belongs to E .
Proof 1. The image of the convex set ∆ (V1 ∪ E ) is the convex hull of the extreme points ei , i ∈ V , and e{x,y} , {x, y} ∈ E , hence
γ (∆ (V1 ∪ E )) is the convex hull of the columns of the matrix IV1 21 Γ .
2. By inspection of the columns of (1/2)Γ .
t
u
2.2 From the 1-margin to the 2-margin
Given π, the fiber γ −1 (π) is contained in an affine space parallel to the subspace θV + (1/2)Γ θE . Each fiber contains special
solutions. One is the constant case (π, 0E ). If the graph has full connections, G = (V,V2 ), there is the independence solution
θ{v} = π(v)2 , θ{v,w} = 2π(v)π(w).
If π(v) > 0, v ∈ V , a strictly positive solution is obtained as follows. Let d(x) = # {y : {x, y} ∈ E } be the degree of the vertex v
and define a transition probability by A(v, w) = 1/2d(v) if {v, w} ∈ E , A(v, v) = 1/2, and A(v, w) = 0 otherwise. A is the transition
matrix of a random walk on the graph G , stopped with probability 1/2. Define a probability on V ×V with Q(x, y) = π(x)A(x, y).
If Q(x, y) = Q(y, x) we are done we have a 2-reversible probability with marginal π. Otherwise, we have the following HastingsMetropolis construction.
Proposition 2 Let Q be a strictly positive probability on V ×V and let π(x) = ∑y Q(v, w). If f :]0, 1[×]0, 1[→]0, 1[ is a symmetric
function such that f (x, y) ≤ x ∧ y then
(
f (Q(v, w), Q(w, v))
if v 6= w,
P(v, w) =
π(v) − ∑w : w6=v P(v, w) if v = w.
is a 2-reversible strictly positive probability on V ×V such that π(v) = ∑w P(v, w).
Proof For v 6= w we have P(v, w) = P(w, v) > 0. As P(v, w) ≤ Q(v, w), v 6= w, it follows
P(v, v) = π(v) −
∑
P(v, w)
w : w6=v
≥ ∑ Q(v, w) −
w
∑
Q(v, w)
w : w6=w
= Q(v, w) > 0.
We have ∑w P(v, w) = π(v) by construction and, in particular, P is a probability on V ×V .
t
u
ARMC
3
Remark 1 1. The proposition applies to
(a) f (x, y) = x ∧ y. This is the standard Hastings choice.
(b) f (x, y) = xy/(x + y). This was suggested by Barker.
(c) f (x, y) = xy. In fact, as x < 1, we have xy < x. It is an algebraic function which is included in Hastings’s proposals.
2. Given P, the corresponding parameters
θ{v,w} = 2P(v, w) and θ{v} = P(v, v)
are strictly positive. We have shown the existence of a mapping from the interior of ∆ (V ) to the interior of ∆ (V1 ∪V2 ).
3. The mapping θ 7→ (π, Pv→w ) is a rational mapping from ∆ (V1 ∪V2 ) into ∆ (V ) ⊗ ∆ (V )⊗V .
Example 2 Let V = {0, 1, 2} and π be a binomial distribution on V , π(i) = 2i pi (1 − p)2−i . Choose
qi j =
1/2 if i 6= j
0 if i = j.
Then
π( j)q ji
1
=
ai j =
π(i)qi j
2
p
1− p
j−i
and
pi j =
1
(ai j ∧ 1) .
2
We have:
a0,1 = 2
p
1− p
a0,2 =
p
1− p
2
a1,2 =
1 p
.
2 1− p
If p < 1/3 then a0,1 , a0,1 and a0,1 are all smaller than 1 and the transition probability matrix is:

 1−


P=



p
1−p
−
1
2
p
1−p
2
1
2
p
1−p
1
2
1
2
p
− 14 1−p
1
2
1
2
p
1−p
2
1 p
4 1−p
0








2.3 Reversible Markov Chain
Now consider a special case. Assume the reversible process (Xn )n∈N is a Markov chain and consider the undirected graph G (V, E )
such that {v, w} ∈ E if, and only if, θ{v,w} > 0.
The transition probability are:
θ{v,w}
P (Xn = v, Xn+1 = w)
=
P (Xn = v)
∑w θ{v,w}
θ{v,w}
P (Xn = w, Xn+1 = v)
=
=
P (Xn = w)
∑v θ{v,w}
Pv→w =
Pw→v
so that, denoting ∑w θ{v,w} by κ(v) and ∑v θ{v,w} by κ(w):
κ(v)Pv→w = κ(w)Pw→v
(1)
and ∑v κ(v) = 1.
Vice-versa, if there exist positive constants κ(v), v ∈ V such that (1), by summing on w we obtain that κ is an unnormalized
invariant probability:
κ(v) = ∑ κ(w)Pw→v .
w
b
Note that if P is reversible, then the backward transition is Pv→w
= κ(w)Pw→v /κ(v) = Pv→w .
Let G (V, E ) a connected graph. We denote by ω a closed path, that is a path on the graph such that the last vertex coincides
with the first one: ω = v0 v1 . . . vn v0 and by r(ω) the reversed path r(ω) = v0 vn . . . v1 v0 .
Theorem 1 (Kolmogorov’s theorem) Let (Xn )n∈N be a Markov process on V with support of the transitions G . The process is
reversible if and only if for all closed path ω
P(ω|X0 = v1 ) = P(−ω|X0 = v1 ).
(2)
4
G. Pistone, M.P. Rogantin
1
2
1
2
4
3
4
3
Fig. 1 From the undirected graph to the directed graph of transitions
Proof Assume that the process is reversible. Then we have:
κ(v1 )Pv1 →v2 = κ(v2 )Pv2 →v1
κ(v2 )Pv2 →v3 = κ(v3 )Pv3 →v2
..
.
κ(vn )Pvn →v1 = κ(v1 )Pv1 →vn .
By multiplying together all these equality and simplifying the κ’s we obtain
Pv1 →v2 Pv2 →v3 · · · Pvn →v1 = Pv1 →vn · · · Pv3 →v2 Pv2 →v1 .
Vice-versa assume that all the circuit have the property 2. We denote by v and w the first and the next to last vertices,
respectively. By summing on the intermediate vertices on all circuits with boundary vertex v and w, we obtain:
∑
Pi→v2 Pv2 →v3 · · · Pw→v =
v2 v3 ...vn−1
and
Pv→w · · · Pv3 →v2 Pv2 →v
∑
v2 v3 ...vn−1
(n−2)
(n−2)
Pv→w Pw→v = Pv→w Pv→w
(n−2)
where Pv→w denote the transition probability in (n − 2) steps. If n → ∞
π(w)Pw→v = Pv→w π(v)
t
u
and the chain is reversible.
Suomela (1979) has a proof that does not use the ergodic theorem. This proof is related with our algebraic discussion below.
3 Algebraic theory
The present section is devoted to the algebraic structure implied by the Kolmogorov’s theorem for reversible Markov chains.
We refer mainly to the textbooks Berge (1985) and Bollobás (1998) for graph theory, and to the textbooks Cox et al (1997) and
Kreuzer and Robbiano (2000) for computational commutative algebra. The theory of toric ideals is treated in detail in Sturmfels
(1996) and Bigatti and Robbiano (2001). General references for algebraic methods in Stochastics are e.g. Drton et al (2009),
Gibilisco et al (2009). The relevance of Graver bases, see Sturmfels (1996), has been pointed out to us by Shmuel Onn has in
view of the applications discussed in De Loera et al (2008) and Onn (to appear).
3.1 Kolmogorov’s ideal
We denote by G = (V, E ) an (undirected simple) graph. We split each edge into two opposite arcs to get a connected directed
graph (without loops) denoted by O = (V, A ). The arc going from vertex v to vertex w is denoted (v → w). The graph O is such
that (v → v) ∈
/ A and (v → w) ∈ A if, and only if, (w → v) ∈ A . Because of our application to Markov chains, we want two
arcs on each edge, as in Figure 1. Because of that, some of our statements about graphs differ from the standard presentation,
where to each edge is given one single orientation. For example, the set of arcs leaving a vertex v, denoted out(v), and the set of
arcs entering the same vertex, denoted in(v), correspond to the same set of edges.
The reversed arc is the image of the 1-to-1 function r : A → A defined by r(v → w) = (w → v). A path is a sequence of
vertices ω = v0 v1 · · · vn such that (vk−1 → vk ) ∈ A , k = 1, . . . , n. The reversed path is denoted by rω = vn vn−1 · · · v0 . Equivalently,
a path is a sequence of inter-connected arcs ω = a1 . . . an , ak = (vk−1 → vk ), and rω = r(an ) . . . r(a1 ).
A closed path ω = v0 v1 · · · vn−1 v0 is any path going from a vertex v0 to itself; rω = v0 vn−1 · · · v1 v0 is the reversed closed path.
In a closed path any vertex can be the initial and final vertex. If we do not distinguish any initial vertex, the equivalence class
of paths is called a circuit. A closed path is elementary if it has no proper sub-closed-path, i.e. if does not meet twice the same
vertex except the initial one v0 . The circuit of an elementary closed path is a cycle. We denote by C the set of cycles of O.
Consider the indeterminates P = [Pv→w ], (v → w) ∈ A , and the polynomial ring k[Pv→w : (v → w) ∈ A ]. For each path
ω = a1 · · · an , ak ∈ A , k = 1, . . . , n, we define the monomial term
n
ω = a1 · · · an 7→ Pω = ∏ Pak .
k=1
ARMC
5
2 3 3 4 4 1 1 2 2 3 3 4 4 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2
1
(1)
(3)
4
3
P1
2
2
(3)
(4)
(4)
3
4
3
4
3 P3 4 P4 1 P4 2
P2
Fig. 2 Illustration of a path, its traversal counts, its monomial term
1
2
1
2
1
2
1
2
1
2
1
2
4
3
4
3
4
3
4
3
4
3
4
3
Fig. 3 The 6 cycles of a graph: ωA = 1 → 2 2 → 4 4 → 1, ωB = 2 → 3 3 → 4 4 → 2, ωC = 1 → 2 2 → 3 3 → 2 4 → 1, ωD = rωA , ωE = rωB ,ωF = rωC .
For each a ∈ A , let Na (ω) be the number of traversals of the arc a by the path ω. Hence,
Pω =
N (ω)
∏ Pa a
.
a∈A
See the illustration in Figure 2.
Note that ω 7→ Pω is a representation of the non-commutative concatenation of arcs on the commutative product of indeterminates. Two closed paths associated to the same circuit are mapped to the same monomial term because they have the same
traversal counts. The monomial term of a cycle is square-free. Figure 3 presents the 6 cycles of a square with one diagonal.
Definition 1 (K-ideal) The Kolmogorov’s ideal or K-ideal of the graph G is the ideal of the ring k[Pv→w : (v → w) ∈ A ] generated
by the binomials Pω − Prω , where ω is any circuit. The K-variety is the k-affine variety of the K-ideal.
Our main application concerns the real case k = R, but the combinatorial structure of the K-ideal does not depend on the
choice of a specific field. A interesting choice for computations could be the Galois field k = Z2 .
For a given connected graph G , we say that a transition matrix P = [Pv→w ], u, v ∈ V , is compatible with G if Pv→w = 0
whenever (v → w) ∈
/ A and v 6= w. If out(v) be the set of arcs leaving v, and define the simplex
(
∆ (v) =
out(v)
Pv→· ∈ R+
)
:
∑
Pv→w (w) ≤ 1 .
w∈out(v)
A transition matrix P compatible with G is a point in the product of simplexes ∆ (O) = ×u∈V ∆ (u).
Proposition 3 (Examples of K-ideals) Let P be compatible with G and reversible.
1. The transition matrix Pv→w , (v → w) ∈ A , is a point of the intersection of the variety of the K-ideal with ∆ (O).
2. Let (Xn )n≥0 be the stationary Markov chain with transition P. Then the joint probabilities
p(v, w) = P (Xn = u,Xn+1 = v),
(v → w) ∈ A , are points in the intersection of the K-variety and the simplex ∆ (A ) = p ∈ RA
+ : ∑a∈A P(a) ≤ 1 .
Proof 1. It is the first part of the Kolmogorov’s theorem.
2. Let ω = v0 . . . vn v0 be a closed path. If π is the stationary probability, by multiplying the Kolmogorov’s equations by the
product of the initial probabilities at each transition, we obtain
π(v0 )π(v1 ) · · · π(vn )Pv0 →v1 · · · Pvn →v0 = π(v0 )π(vn ) · · · π(vn )Pv0 →vn · · · Pv1 →v0 ,
hence
p(v0 , v1 )p(v1 , v2 ) · · · p(vn , v0 ) = p(v0 , vn )p(vn , vn−1 ) · · · p(v1 , v0 ).
t
u
The K-ideal has a finite basis because of the Hilbert’s basis theorem. Precisely, a finite basis is obtained by restricting to
cycles, which are finite in number.
Proposition 4 (Finite basis of the K-ideal) The K-ideal is generated by the set of binomials Pω − Prω , where ω is cycle.
6
G. Pistone, M.P. Rogantin
t1
gcd { t1 , t3}
t3
)
tA
gcd { tA, tC }
r tC
tC
gcd { tA, tC}
r gcd { tA, tC}
r tA
)
r tB
0
tB
Fig. 4 ωA is the green cycle and ωC is the red cycle. In blue we have represented the common part.
Proof Let ω = v0 v1 · · · v0 be a closed path which is not elementary and consider the least k ≥ 1 such that vk = vk0 for some
k0 < k. Then the sub-path ω1 between the k0 -th vertex and the k-th vertex is an elementary closed path and the residual path
ω2 = v0 · · · vk0 vk+1 · · · v0 is closed and shorter than the original one. The arcs of ω are in 1-to-1 correspondence with the arcs of
ω1 and ω2 . The procedure can be iterated and stops in a finite number of steps. Hence, given any closed path ω, there exists a finite
sequence of cycles ω1 , . . . , ωl , such that the list of arcs in ω is partitioned into the lists of arcs of the ωi ’s. From Pωi − Prωi = 0,
i = 1, . . . , l, it follows
l
l
Pω = ∏ Pωi = ∏ Prωi = Prω .
i=1
i=1
t
u
The K-ideal is generated by a finite set of binomials and this set has the same number as the set of unordered which in fact is
a Gröbner basis. We refer to Cox et al (1997) and Kreuzer and Robbiano (2000) for a detailed discussion.
We review below the basic definitions of this theory, which is based on the existence of a monomial order , i.e. a total order
on monomial terms which is compatible with the product. Given such an order, the leading term LT( f ) of the polynomial f is
defined. A generating set is a Gröbner basis if the set of leading terms of the ideal is generated by the leading terms of monomials
in the generating set. A Gröbner basis is reduced if the coefficient of the leading term of each element of the basis is 1 and no
monomial in any element of the basis is in the ideal generated by the leading terms of the other element of the basis. The Gröbner
basis property depends on the monomial order. However, a generating set is said to be a universal Gröbner basis if it is a Gröbner
basis for all monomial orders.
The finite algorithm for testing the Gröbner basis property depends on the definition of syzygy. Given two polynomial f and
g in the polynomial ring K, their syzygy is the polynomial
S( f , g) =
LT(g)
LT( f )
f−
g.
gcd(LT( f ), LT(g))
gcd(LT( f ), LT(g))
A generating set of an ideal is a Gröbner basis if, and only if, it contains the syzygy S( f , g) whenever it contains f and g, see
(Cox et al 1997, Ch 6) or (Kreuzer and Robbiano 2000, Th. 2.4.1 p. 111).
Proposition 5 (Universal G-basis) The binomials Pω − Prω , where ω is any cycle, form a reduced universal Gröbner basis of
the K-ideal.
Proof Choose any monomial order and let ω1 and ω2 be two cycles with ωi rωi , i = 1, 2. Assume first they do not have any
arc in common. Then gcd(Pω1 , Pω2 ) = 1 and the syzygy is
S(Pω1 − Prω1 , Pω2 − Prω2 ) = Pω2 (Pω1 − Prω1 ) − Pω1 (Pω2 − Prω2 ) = Pω1 Prω2 − Prω1 Pω2 ,
which belong to the K-ideal.
Let now α be the common part. The syzygy of Pω1 − Prω1 and Pω2 − Prω2 is
Pω1 −α Prω2 − Pω2 −α Prω1 = Prα (Pω1 −α Prω2 −rα − Pω2 −α Prω1 −rα ) = 0,
which again belong to the K-ideal because ω1 − α + r(ω2 − α) is a union of cycles. In fact ω1 − α and ω2 − α have in common
the extreme vertices, corresponding to the extreme vertices of α. Notice that α is the common part of ω1 and ω2 only if it is
traversed in the same direction by the both cycle. The previous proof does not depend on the choice of the leading term of the
binomials, therefore the Gröbner basis is universal. The Gröbner basis is reduced because no monomial of a cycle can divide a
monomial of a different cycle.
t
u
Example 3 (Running example continue) An illustration of the previous proof is in Figure 4.
Remark 2 (Monomial basis) A monomial order is obtained by first introducing a total order on arcs. For example, one could give
a total order on vertexes, then order lexicographically the arc. We do not see any special order with particular meaning in this
problem. The issue is related with the monomial basis which is linear basis of the quotient ring.
ARMC
7
3.2 Cycle and cocycle spaces
We adapt to our context some standard notions from algebraic graph theory, precisely the cycle an cocycle spaces, see e.g. (Berge
1985, Ch 2) and (Bollobás 1998, II.3).
Let C be the set of cycles. For each cycle ω ∈ C we define the cycle vector of ω to be z(ω) = (za (ω) : a ∈ A ), where


+1 if a is an arc of ω,
za (ω) = −1 if r(a) is an arc of ω,


0
otherwise.
Note that zr(a) (ω) = −za (ω). If z+ , z− , are the positive and the negative parts of z, respectively, then z+
a (ω) = Na (ω) and
ω
N(ω) = Pz
z−
a (ω) = Na (rω). It follows that P = P
+ (ω)
z+ (ω)
= ∏a∈A Pa a
Pω − Prω = Pz
+ (ω)
and
− Pz
− (ω)
.
(3)
More generally, the definition can be is extended to any circuit ω by defining
za (ω) = Na (ω) − Nr(a) (ω).
Let Z(O) be the cycle space, i.e. the vector space generated in kA by the cycle vectors.
For each proper subset S of the set of vertices, 0/ 6= S ( V we define the cocycle vector of S to be u(S) = (ua (S) : a ∈ A ),
where:
1


+1
ua (S) = −1


0
if a exits from S,
if a enters into S,
otherwise.
2
S
a∈A.
4
3
Note that ur(a) (S) = −ua (S).
Let U(O) be the cocycle space, i.e. the vector space generated in kA by the cocycle vectors. Let U be the matrix whose rows
are the cocycle vectors u(S), 0/ 6= S ( V . We call such a matrix U = [ua (S)]06/ =S(V,a∈A the cocycle matrix.
The cycle space and the cocycle space are orthogonal in kA . In fact, for each cycle vector z(ω) and cocycle vector u(S), we
have
zr(a) (ω)ur(a) (S) = (−za (ω))(−ua (S)) = za (ω)ua (S), a ∈ A ,
so that
z(ω) · u(S) =
∑ za (ω)ua (S) = ∑ za (ω)ua (S) + ∑
a∈A
a∈ω
za (ω)ua (S)
r(a)∈ω
"
=2
∑ za (ω)ua (S) = 2
a∈ω
#
∑
a∈ω,ua (S)=+1
1−
∑
1 = 0.
a∈ω,ua (S)=−1
It is shown e.g. in the previous references that the cycle space is the orthogonal complement of the cocycle space for undirected
graphs. In our setting it is the orthogonal complement relative to the subspace of vectors x such that xr(a) = −xa . Consider the
E × A matrix E whose element Ee,a in position (e, a) is 1 if the arc a belongs to the edge e, otherwise it is zero. If we form the
block matrix
E
A=
U
then Z (O) = kerZ A.
The matrix A has dimension #E + #V − 1. In fact E can be re-arranged as [I#E |I#E ], with I#E the identity matrix, and U has
#V − 1 linearly independent rows, the dimension of the cocycle space. Remember that a basis of the cocycle space is obtained by
considering a spanning three and separating vertices at each of its edges.
Example 4 (Running example continue) Table 1 shows the matrix A o and a lattice basis of kerZ A, computed with CoCoA. Three
linearly independent rows of U are highlighted.
We recall from Onn (to appear) the definition of Graver basis. Let z(ω1 ) and z(ω2 ) be two element of the cycle space Z(O).
We introduce a partial order and its set of minimal elements as follows.
Definition 2 (Graver basis)
1. z(ω1 ) is conformal to z(ω2 ), z(ω1 ) v z(ω2 ), if the component-wise product is non-negative and |z(ω1 )| ≤ |z(ω2 )| componentwise, i.e. za (ω1 )za (ω2 ) ≥ 0 and |za (ω1 )| ≤ |za (ω2 )| for all a ∈ A .
2. A Graver basis of Z(O) is the set of the minimal elements with respect to the conformity partial order v.
8
G. Pistone, M.P. Rogantin
Table 1 An example of cycle and cocycle spaces. The top matrix is the E matrix; the bottom matrix is the U matrix, where three linearly independent rows are
highlighted. The two row vectors are a lattice basis of kerZ (A)
1, 2
1, 4
2, 3
2, 4
3, 4
{1}
{2}
{3}
{4}
{12}
{13}
{14}
{23}
{24}
{34}
{123}
{124}
{134}
{234}
1→2
1
0
0
0
0
1
−1
0
0
0
1
1
−1
−1
0
0
0
1
−1
1→4
0
1
0
0
0
1
0
0
−1
1
1
0
0
−1
−1
1
0
0
−1
2→3
0
0
1
0
0
0
1
−1
0
1
−1
0
0
1
−1
0
1
−1
0
2→4
0
0
0
1
0
0
1
0
−1
1
0
−1
1
0
−1
1
0
−1
0
3→4
0
0
0
0
1
0
0
1
−1
0
1
−1
1
−1
0
1
−1
0
0
2→1
1
0
0
0
0
−1
1
0
0
0
−1
−1
1
1
0
0
0
−1
1
4→1
0
1
0
0
0
−1
0
0
1
−1
−1
0
0
1
1
−1
0
0
1
3→2
0
0
1
0
0
0
−1
1
0
−1
1
0
0
−1
1
0
−1
1
0
4→2
0
0
0
1
0
0
−1
0
1
−1
0
1
−1
0
1
−1
0
1
0
4→3
0
0
0
0
1
0
0
−1
1
0
−1
1
−1
1
0
−1
1
0
0
z(ωA )
z(ωB )
1
0
−1
0
0
1
1
−1
0
1
−1
0
1
0
0
−1
−1
1
0
−1
Proposition 6 1. For each cycle vector z ∈ Z(O), z = ∑ω∈C λ (ω)z(ω), there exist cycles ω1 , . . . , ωn ∈ C and positive integers
α(ω1 ), . . . , α(ωn ), such that z+ ≥ z+ (ωi ), z− ≥ z− (ωi ), i = 1, . . . , n and
n
z = ∑ α(ωi )z(ωi ).
i=1
2. The set {z(ω) : ω ∈ C } is a Graver basis of Z (O).
Proof 1. For all ω ∈ C we have −u(ω) = u(rω), so that we can assume all the λ (ω)’s to be non-negative. Notice also that we
can arrange things in such a way that at most one of the two direction of each cycle has a positive λ (ω). We define
A+ (z) = {a ∈ A : za > 0} ,
A− (z) = {a ∈ A : za < 0} ,
and consider two subgraph of O with a restricted set of arcs, O+ (z) = (V, A+ (z)), O− (z) = (V, A− (z)). We drop from now
on the dependence on z for ease of notation. We note that rA+ = A− and rA− = A+ .
We show first that A+ must contain a cycle. If O+ where acyclic, it would exists a vertex v such that out(v) ∩ A+ = 0/ and
in(v) ∩ A+ 6= 0.
/ Let u(v) be the cocycle vector of {v}; we derive a contradiction to the assumption z · u(v) = 0. In fact,
z · u(v) =
∑
za ua (v) +
a∈A+
∑
a∈A−
za ua (v) = 2
∑
za ua (v) = 2
∑
za ua (v) ≤ −1.
a∈A+ ∩in(v)
a∈A+
Let ω1 be a cycle in A+ and define an integer α(ω1 ) ≥ 1 such that z+ − α(ω1 )z+ (ω1 ) ≥ 0 and it is zero for at least one a.
The vector z1 = z − α(ω1 )z(ω1 ) belongs to the cycle space Z (O), and moreover A+ (z1 ) ⊂ A+ (z).
By repeating the same step a finite number of times we obtain a new representation of z in the form z = ∑ni=1 α(ωi )z(ωi )
where the support of each α(ωi )z+ (ωi ) is contained in A+ . It follows
n
n
i=1
i=1
z+ = ∑ α(ωi )z+ (ωi ) and z− = ∑ α(ωi )z− (ωi )
(4)
2. In the previous decomposition each z(ωi ), i = 1, . . . , n is conformal to z. In fact, from z+ ≥ z+ (ωi ) and z− ≥ z− (ωi ), it follows
+
− −
+
−
+
−
za za (ωi ) = z+
a za (ωi ) − za za (ωi ) ≥ 0 and |za (ωi )| = za (ωi ) − za (ωi ) ≤ za + zz = |za |. Moreover z(ωi ) @ z.
t
u
Example 5 (Running example continue) We give an illustration of the previous proof. Consider the cycle vectors
z(ωA ) = (
z(ωB ) = (
z(ωC ) = (
1→2 2→1 2→3 3→2 3→4 4→3 4→1 1→4 2→4 4→2
1
−1
0
0
0
0
1
−1
1
−1 )
0
0
1
−1
1
−1
0
0
−1
1 )
1
−1
1
−1
1
−1
1
−1
0
0 )
and the element of the cycle space z = z(ωA ) + 2z(ωB ) + 2z(ωC ), see Figure 5. We have
z = z(ωA ) + 2z(ωB ) + 2z(ωC ) = (3 , −3 , 4 , −4 , 4 , −4 , 0 , 0 , −1 , 1)
z+ = z+ (ωB ) + 3z+ (ωC ) = (3 , 0 , 4 , 0 , 4 , 0 , 0 , 0 , 0 , 1)
as it is illustrated in Figure 6.
ARMC
9
1
2
1
2
+2
4
3
1
2
1
+2
4
3
=
4
(3)
(1)
(3)
3
2
(4)
4
(4)
3
Fig. 5 An element of the cycle space
1
(3)
(1)
(4)
(4)
4
1
2
(3)
2
=
1
2
4
3
+3
4
3
3
Fig. 6 Computation of the conformal representation of the z of Figure 5
3.3 Toric ideal
We want to show that the K-ideal is a toric ideal, as defined in (Sturmfels 1996, Ch 4), see also Bigatti and Robbiano (2001). a
Consider the ring Q[Pa : a ∈ A ] and the Laurent ring Q(te±1 ,tS±1 : e ∈ E , 0/ 6= S ( V ), together with their homomorphism h
defined by
E
u (S)
h : Pa 7−→ ∏ te e,a ∏ tS a
e
= t A(a) .
S
If a = v → w, the corresponding edge is e = v, w, so that the first factor is a generic symmetric function s(v, w) = s(w, v) = tv,w .
u
h : Pv→w 7−→ s(v, w) ∏ tS v→w
(S)
,
(5)
S
The kernel I(A) of h is called the toric ideal of A,
I(A) =
n
o
f ∈ Q[Pa : a ∈ A ] : f (t A(a) : a ∈ A ) = 0 .
The toric ideal I(A) is a prime ideal and the binomials
−
+
Pz − Pz ,
z ∈ ZA ,
Az = 0,
are a generating set of I(A) as a Q-vector space. In particular, a finite generating set of the ideal is formed by selecting a finite
subset of such binomials.
Proposition 7 (The K-ideal is toric) The K-ideal is the toric ideal of the matrix A.
+
−
Proof For each cycle ω the cycle vector z(ω) belongs to kerZ A = z ∈ ZA : Az = 0 . From Equation (3), Pz (ω) − Pz (ω) =
Pω − Prω , therefore the K-ideal is contained in the toric ideal I(A) because of Proposition 4.
To prove the equality we must show that each binomial in I(A) belongs to the K-ideal. From Proposition 6.1, it follows that
+
−
n
Pz − Pz = ∏(Pz
i=1
+ (ω )
i
n
− (ω
)α(ωi ) − ∏(Pz
i)
)α(ωi )
i=1
belongs to the K-ideal.
t
u
The Graver basis of a toric ideal is the set of binomials whose exponents are the positive and negative parts of a Graver basis
of kerZ (A). From Propositions 6 and 7 it follows that
Proposition 8 The binomials of the cycles form a Graver basis of the K-ideal.
Remark 3 We mention important consequences of the properties of the K-ideal we do not discuss further here.
1. It follows from general properties of toric ideals that a a Graver basis is a universal Gröbner basis and that a universal Gröbner
basis is a Markov basis, Sturmfels (1996). The Markov basis property is related with the connectedness of random walks on
the fibers of A, see Diaconis and Sturmfels (1998) and subsequent literature.
2. The knowledge of a Graver basis for the K-ideal provides efficient algorithms for discrete optimization, see De Loera et al
(2008) and Onn (to appear).
10
G. Pistone, M.P. Rogantin
3.4 Positive K-ideal
The knowledge that the K-ideal is toric is relevant, because the homomorphism definition in Equation (5) provides a parametric
representation of the the variety. In particular, the strictly positive Pa , a ∈ A , are given by:
u
Pv→w = s(v, w) ∏ tS v→w
(S)
= s(v, w)
tS
∏
S : v∈S,w∈S
/
S
tS−1
∏
s(v, w) > 0,
tS > 0.
(6)
S : w∈S,v∈S
/
We observe that the first set of parameters, s(v, w), is a function of the edge. As the rows of E are linearly independent, such
parameters carry #E degrees of freedom to represent symmetric transition matrices. The second set of parameters, tS , represent
the deviation from symmetry. The second set of parameters is not identifiable because the rows of the U matrix are not linearly
independent. The parametrization (6) can be used to derive an explicit form of the invariant probability, in particular a parametric
form of Theorem in Suomela (1979). All properties of the parametrization are collected in the following Proposition.
Proposition 9 Consider the strictly non-zero points on the K-variety.
1. The symmetric parameters s(e), e ∈ E , are uniquely determined in Equation (6). The parameters tS , S ⊂ V are confounded
by kerU = {U t t = 0}
2. An identifiable parametrization is obtained by taking a subset of parameters corresponding to linearly independent rows,
denoted by tS , S ⊂ S :
Pv→w = s(v, w)
(7)
∏ tS
∏ tS−1
S⊂S : v∈S,w∈S
/
S⊂S : w∈S,v∈S
/
3. The detailed balance equations, κ(v)Pv→w = κ(w)Pw→v , are verified if, and only if,
κ(v) ∝
∏
tS−2
(8)
S : v∈S
Proof 1. We have:
log P = E t s +U t t,
P = (Pv→w : (v → w) ∈ A ), s = (s(e) : e ∈ E ),t = (tS : 0/ 6= S ( V ).
If E t s1 + U t t1 = E t s2 + U t t2 , then E t (s1 − s2 ) = 0 because the rows of E are orthogonal to the rows of U. Hence, s1 = s2
because E has full rank. Finally, U t t1 = U t t2 .
2. The sub-matrix of A formed by E and by the rows of U in S has full rank.
3. Using Equations (6), we have:
κ(v) s(v, w)
∏
tS
S : v∈S,w∈S
/
∏
tS−1 = κ(w) s(v, w)
S : w∈S,v∈S
/
∏
tS
S : w∈S,v∈S
/
∏
tS−1
S : v∈S,w∈S
/
which is equivalent to
κ(v)
∏
tS2 = κ(w)
S : v∈S,w∈S
/
By multiplying both terms in the equality by ∏S
∏
S : v∈S
tS2 .
S : w∈S,v∈S
/
2
: v∈S,w∈S tS ,
κ(v)
∏
we obtain
tS2 = κ(w)
∏
tS2 ,
S : w∈S
so that κ(v) = ∏S : v∈S tS−2 depends only on v and satisfy the detailed balance condition.
t
u
We are now in the position of stating an algebraic version of Kolmogorov’s theorem.
Definition 3 The detailed balance ideal is the ideal of Q[κ(v) : v ∈ V, Pv→w , (v → w) ∈ A ]
*
+
∏ κ(v) − 1, κ(v)Pv→w − κ(w)Pv→w , (v → w) ∈ A
.
v∈V
The first polynomial states the positivity of κ’s parameters.
Proposition 10 1. The matrix P = [Pv→w ]v→w∈A is a point of the variety of the K-ideal if, and only if, there exists κ =
(κ(v) : v ∈ V ) such that (κ, P) belongs to the variety of the detailed balance ideal.
2. The detailed balance ideal is a toric ideal.
3. The K-ideal is the κ-elimination ideal of the detailed balance ideal.
Proof 1. One direction is the Kolmogorov’s theorem. The other direction is a rephrasing of Item 3 of Proposition 9.
u
(S)
2. This ideal is the kernel of the homomorphism defined by (5), i.e. Pv→w 7−→ s(v, w) ∏S tS v→w together with κ(v) 7→ ∏S : v∈S tS−2 .
3. The elimination ideal is generated by dropping the parametric equations of the indeterminates to be eliminated.
t
u
ARMC
11
3.5 Parametrization of reversible transitions
An other parametrization follows from Proposition 9. It is irrational, but is derived from the natural toric parametrization and, in
the case of transition probabilities, it involves explicitly the unnormalized invariant probability.
Proposition 11 1. There exist a (non algebraic) parametrization of the non-zero K-variety of the form
Pv→w = s(v, w) κ(w)1/2 κ(v)−1/2
2. Given Pa , a ∈ A , as in (9), then


Pv→w
Pvw = 1 − ∑w : v→w∈A Pv→w


0
(9)
if w : v → w ∈ A ,
if w = v,
otherwise,
is a transition probability if, and only if,
κ(v)1/2 ≥
∑ s(v, w)κ(w)−1/2 .
w6=v
In such a case the invariant probability is proportional to κ.
3. Given Pa , a ∈ A , as in (9), the transition matrix
(
Qvw =
Pv→w
∑w : v→w∈A Pv→w
if w : v → w ∈ A ,
0
otherwise.
is reversible with invariant probability proportional to κ(v) (∑w : v→w∈A Pv→w ), v ∈ V .
Proof 1. Follows from (7) and (8):
κ(w)1/2 κ(v)−1/2 =
∏S : v∈S tS
=
∏ tS
∏S : w∈S tS S⊂S : v∈S,w∈S
/
∏
tS−1
S⊂S : w∈S,v∈S
/
2. If P is a reversible positive transition probability, then
1≥
s(u, w)κ(w)1/2 κ(v)−1/2 .
∑
(v→w)∈A
t
u
The monomial parametrization of the positive K-ideal leads to an alternative presentation of the statistical model, cf. Diaconis
and Rolles (2006), and possibly leads to a variation of the methods used in that paper.
Example 6 (Running example continue) An over-parametrization of two transition probabilities of the K-variety is:
−1 −1
−1
−1
P3→4 = s(3, 4) t{3} t{1,3} t{2,3} t{1,2,3} t{4}
t{1,4} t{2,4}
t{1,2,4}
−1 −1
−1
−1
P4→3 = s(3, 4) t{4} t{1,4} t{2,4} t{1,2,4} t{3}
t{1,3} t{2,3}
t{1,2,3}
Choosing S = {{1}, {3}, {1, 2}}, we have:

−2 −2
κ(1) = t{1}
t{1,2}



 κ(2) = t −2
{1,2}
−2

κ(3)
=
t

{3}


κ(4) = 1
and

−1/2 κ(2)1/2
 t{1} = κ(1)
t = κ(3)−1/2
 {3}
t{1,2} = κ(2)−1/2
The transition matrix parameterized by s(e), e ∈ E and tS , S ∈ S is
1

1
?
2 
 s(1, 2) t{1}

3 
0
4 s(1, 4) t{1} t{1,2}
2
−1
s(1, 2) t{1}
3
0
?
−1
s(2, 3) t{1,2}
t{3}
−1
s(2, 3) t{3}
t{1,2}
s(2, 4) t{1,2}
?
s(3, 4) t{3}
4
−1 −1 
s(1, 4) t{1}
t{1,2}
−1

s(2, 4) t{1,2}


−1

s(3, 4) t{1}
?
and parameterized by s(e), e ∈ E and κ(v), v ∈ V is
1

1
?
1/2 κ(2)−1/2
2 
s(1,
2)κ(1)

3 
0
4
s(1, 4)κ(1)1/2
2
s(1, 2)κ(1)−1/2 κ(2)1/2
?
s(2, 3)κ(2)1/2 κ(3)−1/2
s(2, 4)κ(2)1/2
3
0
s(2, 3)κ(2)−1/2 κ(3)1/2
?
s(3, 4)κ(3)1/2
4

s(1, 4)κ(1)−1/2
s(2, 4)κ(2)−1/2 

s(3, 4)κ(3)−1/2 
?
12
G. Pistone, M.P. Rogantin
Remark 4 Given Pa , Pa 6= 0 for any a ∈ A , reversible, then there exist unique s(e) and unnormalized κ(v) representing Pa . In
the Hastings-Metropolis algorithm, we are given an unnormalized positive probability κ and a transition Qv→w > 0 if, and only
if, (v → w) ∈ A . We are required to produce a new transition Pv→w = Qv→w α(v, w) such that P is reversible with invariant
probability κ and 0 < α(v, w) ≤ 1. This problem has been discussed in Proposition 2. We derive again the Hastings solution via
our parametrization. We have
Qv→w α(v, w) = s(v, w)κ(w)1/2 κ(v)−1/2
and moreover we want
α(v, w) =
s(v, w)κ(w)1/2
≤ 1,
Qv→w κ(v)1/2
that is the symmetric s(v, w) must satisfy
s(v, w) ≤ Qv→w κ(v)1/2 κ(w)−1/2 .
The Hastings’s choice corresponds to the largest possible value of s(u, v), see also the discussion in Peskun (1973). In fact,
the largest choice is
s(v, w) = Qv→w κ(v)1/2 κ(w)−1/2 ∧ Qw→v κ(w)1/2 κ(v)−1/2 ,
which, in turn, leads to
Qw→v κ(w)
α(v, w) = 1 ∧
.
Qv→w κ(v)
3.6 Parametrization of reversible Markov chains
Assume the parametrization of Equation (9) in Proposition 11 and
κ(v)1/2 ≥
∑
s(v, w)κ(w)−1/2 .
w6=v:(v,w)∈E
Let L be the subset of V such that Pv→v > 0:
(
L =
)
1/2
v ∈ V : κ(v)
>
∑
−1/2
s(v, w)κ(w)
.
w6=v:(v,w)∈E
This provides a parameterization of reversible Markov chains with a given invariant probability. This parameterization depends on an arbitrary symmetric function. Further analysis of this special parameterization is work in progress.
Acknowledgment
During this research we have discussed various items with S. Sullivan, B. Sturmfels, G. Casnati and we want to thank them
for their encouragement and advice. Preliminary version of this paper have been presented at WOGAS2, Warwick U. 2010 and
CREST-SBM 2, Osaka 2010. The authors wish to thank G. Letac for bringing reference Suomela (1979) to their attention and F.
Rigat for the reference Peskun (1973). Also they thank S. Onn for pointing out the relevance of his own work on Graver bases.
References
Berge C (1985) Graphs, North-Holland Mathematical Library, vol 6. North-Holland Publishing Co., Amsterdam, second revised edition of part 1 of the 1973
English version
Bigatti A, Robbiano L (2001) Toric ideals. Matemática Contemporânea 21:1–25
Bollobás B (1998) Modern graph theory, Graduate Texts in Mathematics, vol 184. Springer-Verlag, New York
Cox D, Little J, O’Shea D (1997) Ideals, varieties, and algorithms: An introduction to computational algebraic geometry and commutative algebra, 2nd edn.
Undergraduate Texts in Mathematics, Springer-Verlag, New York
De Loera JA, Hemmecke R, Onn S, Weismantel R (2008) n-fold integer programming. Discrete Optim 5(2):231–241, DOI 10.1016/j.disopt.2006.06.006, URL
http://dx.doi.org/10.1016/j.disopt.2006.06.006
Diaconis P, Rolles SWW (2006) Bayesian analysis for reversible Markov chains. Ann Statist 34(3):1270–1292, DOI 10.1214/009053606000000290, URL
http://dx.doi.org/10.1214/009053606000000290
Diaconis P, Sturmfels B (1998) Algebraic algorithms for sampling from conditional distributions. Ann Statist 26(1):363–397
Dobrushin RL, Sukhov YM, Fritts Ĭ (1988) A. N. Kolmogorov—founder of the theory of reversible Markov processes. Uspekhi Mat Nauk 43(6(264)):167–188,
DOI 10.1070/RM1988v043n06ABEH001985, URL http://dx.doi.org/10.1070/RM1988v043n06ABEH001985
Drton M, Sturmfels B, Sullivant S (2009) Lectures on Algebraic Statistics. No. 39 in Oberwolfach Seminars, Birkhäuser
Gibilisco P, Riccomagno E, Rogantin M, Wynn HP (eds) (2009) Algebraic and Geometric Methods in Statistics. Cambridge University Press
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109, URL
http://dx.doi.org/10.1093/biomet/57.1.97
Kreuzer M, Robbiano L (2000) Computational commutative algebra. 1. Springer-Verlag, Berlin
Liu JS (2008) Monte Carlo strategies in scientific computing. Springer Series in Statistics, Springer, New York
Onn S (to appear) Theory and applications of n-fold integer programming. In: Mixed integer non-linear programming, Frontier Science, IMA, pp 1–35
Peskun PH (1973) Optimum Monte-Carlo sampling using Markov chains. Biometrika 60:607–612
Pistone G, Riccomagno E, Wynn HP (2001) Algebraic Statistics. Computational Commutative Algebra in Statistics, Monographs on Statistics and Applied
Probability, vol 89. Chapman & Hall/CRC, Boca Raton, FL
Strook DW (2005) An Introduction to Markov Processes. No. 230 in Graduate Texts in Mathematics, Springer-Verlag, Berlin
Sturmfels B (1996) Gröbner bases and convex polytopes. American Mathematical Society, Providence, RI
Suomela P (1979) Invariant measures of time-reversible Markov chains. J Appl Probab 16(1):226–229