Parameter identifiability in a class of
random graph mixture models
by
Elizabeth S. Allman and Catherine Matias and John A.
Rhodes
Research Report No. 29
June, 2010
Statistics for Systems Biology Group
Jouy-en-Josas/Paris/Évry, France
http://genome.jouy.inra.fr/ssb/
Parameter identifiability in a class of
random graph mixture models
Elizabeth S. Allman1
Catherine Matias2
John A. Rhodes1
1 Department
2 CNRS
of Mathematics and Statistics, University of Alaska Fairbanks, PO Box
756660, Fairbanks, AK 99775, U.S.A
UMR 8071, Laboratoire Statistique et Génome, 523, place des Terrasses de l’Agora,
91 000 Évry, FRANCE
Abstract: We prove identifiability of parameters for a broad class of random graph mixture models. These models are characterized by a partition
of the set of graph nodes into latent (unobservable) groups. The connectivities between nodes are independent random variables when conditioned
on the groups of the nodes being connected. In the binary random graph
case, in which edges are either present or absent, these models are known
as stochastic blockmodels and have been widely used in the social sciences
and, more recently, in biology. Their generalizations to weighted random
graphs, either in parametric or non-parametric form, are also of interest
in many areas. Despite a broad range of applications, the parameter identifiability issue for such models is involved, and previously has only been
touched upon in the literature. We give here a thorough investigation of
this problem. Our work also has consequences for parameter estimation.
In particular, the estimation procedure proposed by Frank and Harary for
binary affiliation models is revisited in this article.
Keywords and phrases: Identifiability, mixture model, random graph,
stochastic blockmodel.
1. Introduction
In modern statistical analyses, data is often structured using networks. Complex networks appear across many fields of science, including biology (metabolic
networks, transcriptional regulatory networks, protein-protein interaction networks), sociology (social networks of acquaintance, or other connections between
individuals), communications (the Internet), and others.
The literature contains many random graph models which incorporate a variety of characteristics of real-world graphs (such as scale-free or small-world
properties). We refer to Newman (2003) and the references therein for an interesting introduction to networks.
One of the earliest and most studied random graph models was formulated
by Erdős and Rényi (1959). In this setup, binary random graphs are modeled
as a set of independent and identically distributed Bernoulli edge variables over
a fixed set of nodes. The homogeneity of this model led to the introduction of
1
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
mixture versions to better capture heterogeneity in data. Stochastic blockmodels
(Daudin et al., 2008; Frank and Harary, 1982; Holland et al., 1983; Snijders
and Nowicki, 1997) were introduced in various forms, primarily in the social
sciences (White et al., 1976) to study relational data, and more recently in
biology (Picard et al., 2009). In this context, the nodes are partitioned into latent
groups (blocks) characterizing the relations between nodes. Blockmodelling thus
refers to the particular structure of the adjacency matrix of the graph (i.e., the
matrix containing edge indicators). By ordering the nodes by the groups to which
they belong, this matrix exhibits a block pattern. Diagonal and off-diagonal
blocks, respectively, represent intra-group and inter-group connections. In the
special case where blocks exhibit the same behavior within their type (diagonal
or off-diagonal), we obtain a model with an affiliation structure (Frank and
Harary, 1982).
Although the literature from the social sciences has focused mostly on binary
relations, there is a growing interest in weighted graphs (Barrat et al., 2004;
Newman, 2004). Mixture models have also been considered in the case of a finite
number of possible relations (Nowicki and Snijders, 2001), and more recently
with continuous edge variables (Ambroise and Matias, 2010; Mariadassou and
Robin, 2010). Some variations that we shall not discuss here include models with
covariates (Tallberg, 2005), mixed membership models (Airoldi et al., 2008;
Latouche et al., 2009), and models with continuous latent variables (Daudin
et al., 2010; Handcock et al., 2007). We also note that Newman and Leicht
(2007) proposed another version of a binary mixture model, slightly different
from the stochastic blockmodel considered here.
Many different parameter estimation procedures have been proposed for these
models, such as Bayesian methods (Nowicki and Snijders, 2001; Snijders and
Nowicki, 1997), variational Expectation-Maximization (EM) procedures (Daudin
et al., 2008; Picard et al., 2009), online classification EM methods (Zanghi et al.,
2008, 2010) and more recently, direct mixture model based approaches (Ambroise and Matias, 2010). Consistency of all these procedures relies strongly
on the identifiability of the model parameters. However, the literature on these
models has not addressed this question in any depth. The trivial label-swapping
problem is often mentioned: it is well known that the parameters may be recovered only up to permutations on the latent class labels. Whether this is the
only issue preventing unique identification of parameters from the distribution,
however, has never been investigated. Given the complex form of the model
parameterization, this is not surprising, as any such analysis seems likely to be
very involved.
In earlier work, (Allman et al., 2009, Theorem 7), the authors made a first
step towards an understanding of the parameter identifiability issue in binary
random graph mixture models. While that article addressed a variety of models
with latent variables, the present one focuses more specifically on random graph
mixtures, giving parameter identifiability results for a broad range of such models. Moreover, part of our work sheds some new light on parameter estimation
procedures.
2
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Allman et al. (2009) emphasized the usefulness of an algebraic theorem due to
Kruskal (1976, 1977) (see also Rhodes, 2010) to establish identifiability results
in various models whose common feature is the presence of latent groups and
at least three conditionally independent variables. Here, we rather focus on
the family of random graph mixture models and explore various techniques
to establish their parameters’ identifiability. Thus while the method developed
by Allman et al. (2009) is presented in Section 5.1 and finds further use in
several arguments, it is only one of several techniques we use. The issue at
the core of Kruskal’s result is the decomposition of a 3-way array as a sum of
rank one tensors. While there exist approximate methods of performing this
decomposition (see, e.g., Tomasi and Bro, 2006), we mention that this approach
seems poorly-suited to explicitly recover the parameters from the distribution,
and thus to construct estimation procedures.
Some of our results focus on moment equations, as did those of Frank and
Harary (1982), in one of the earliest works on binary affiliation models. In particular, we revisit some of their claims. The method consists in looking at the
distribution of Kn , a complete set of edge variables over a set of n nodes. A
natural question is then: What is the minimal value of n such that the complete
distribution over all edge variables (a potentially infinite set) is characterized by
the distribution of Kn ? Despite this question’s simplicity, we are far from having a complete answer to it. When looking at finite state distributions (e.g., for
binary random graphs), the knowledge of the distribution of Kn is equivalent
to the knowledge of a certain set of moments of the distribution. Expressing
the moments in terms of parameters gives a nonlinear polynomial system of
equations, which one uses to identify parameters. The uniqueness of solutions
to those systems, up to label swapping on parameters, is the issue at stake for
identifiability.
For random graphs with continuous edge weights given by a parametric family of distributions we shall see that the information contained in the model
might be recovered from the distribution of Kn for very small values of n. In
this case, we rely on classical results on the identifiability of the parameters
of a multivariate mixture due to Teicher (1967). Note that the main difference
between classical mixtures and random graph mixtures is the non-independence
of the variates.
In contrast to the approach based on Kruskal’s Theorem, both the method
utilizing moment equations and the one relying on multivariate mixtures lead to
practical estimation procedures. These are further developed by Ambroise and
Matias (2010).
In Allman et al. (2009), a large role was played by the notion of generic
identifiability, by which every parameter except those lying on a proper algebraic subvariety, are identifiable. In other words, in a parametric setting, the
non-identifiable parameters are included in a subset whose dimension is strictly
smaller than the dimension of the full parameter space. Thus with probability
one with respect to the Lebesgue measure, every parameter is identifiable. This
notion of generic identifiability is important for finite mixtures of multivari3
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
ate Bernoulli distributions (Allman et al., 2009; Carreira-Perpiñán and Renals,
2000; Gyllenberg et al., 1994) and also for hidden Markov models (Allman et al.,
2009; Petrie, 1969). Here, we stress that some of our identifiability results are
generic, while others are strict.
Finally, we note that our focus throughout will be on undirected graph models. While many of our results may be generalized to directed graphs, one must
pay careful attention to the models’ parametrization in doing so. For instance,
some of the results would become simpler if the connectivities from group q to
group l differed from those from group l to group q, as symmetry in a model
can have a strong impact on identifiability questions. However, such asymmetric
models require an increase in the number of parameters which may be excessive
for data analysis.
This paper is organized as follows. Section 2 presents the various random
graph mixture models: with either binary or, more generally, finite-state edges;
both parametric and non-parametric models for edges with continuous weights;
and the particular affiliation variant of these models. Section 3 gives parameter
identifiability results for binary random graphs. Note that the affiliation model
has to be handled separately. Section 4 takes up weighted random graphs, in
both parametric and non-parametric variants. All the proofs are postponed to
Section 5. In particular, Section 5.1 is devoted to a brief presentation of Kruskal’s
result and our use of it in the proofs of Theorems 2 and 14.
2. Notation and models
We consider a probabilistic model on undirected and possibly weighted graphs
as follows. Let n be a fixed number of nodes, with Z1 , . . . , Zn independent
identically distributed (i.i.d.) random variables, taking values in Z = {1, . . . , Q}
for some Q ≥ 2. These random variables represent the Q groups the nodes are
partitioned among, and areP
used to introduce heterogeneity in the model. With
πq = P(Zi = q) ∈ (0, 1), so q πq = 1, the vector π = (πq ) thus gives the priors
on the groups. Let {Xij }1≤i<j≤n be random edge variables taking values in a
state space X . Conditional on Z1 , . . . , Zn , we assume that the edge variables
{Xij }1≤i<j≤n are independent, and that the conditional distribution of Xij
depends only on Zi and Zj , the groups containing its endpoints.
We are interested in random graphs of various types: For binary random
graphs, where X = {0, 1}, an absent edge is represented by 0 and a present one
by 1. Random graphs whose edges may be of finitely many types are modeled
with X = {1, . . . , κ}, or equivalently, {0, . . . , κ − 1}. More general weighted random graphs are obtained when X = N or Rs , s ≥ 1.
In the binary state case, the distribution of Xij conditional on Zi , Zj follows a
Bernoulli distribution with parameter pZi Zj = P(Xij = 1|Zi , Zj ). As we consider
only undirected graphs, we implicitly assume equality of the parameters pql =
plq , for all 1 ≤ q, l ≤ Q.
4
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
More generally, in the finite state case, with X = {1, . . . , κ}, the vector
pZi Zj = (pZi Zj (1), . .P
. , pZi Zj (κ)) contains the values pZi Zj (k) = P(Xij = k|Zi , Zj ),
for 1 ≤ k ≤ κ, with k pZi Zj (k) = 1. We also implicitly assume equality of the
vectors pql = plq , for all 1 ≤ q, l ≤ Q. We introduce this model primarily as
a tool in the study of continuously weighted random graphs, though it might
be useful for studying relationships between nodes of different types (colors), or
of varying but discrete strengths (viewing the states as ordered). Note that a
related model is described by Nowicki and Snijders (2001), where the authors
consider more general relation types (not necessarily edges, whether directed or
not) occurring between a pair of nodes.
In the weighted random graph case, edges may be viewed as either absent
(Xij = 0) or present (Xij 6= 0), with those present having a weight, namely
a non-zero value in X = N, R, or Rs . The distribution of Xij conditional on
Zi , Zj may be assumed to have either a parametric or non-parametric form.
More precisely, we assume that the distribution of Xij conditional on Zi , Zj is
the probability measure µZi ,Zj on X given by
µql = (1 − pql )δ0 + pql Fql ,
1 ≤ q, l ≤ Q,
where pql ∈ (0, 1] is a sparsity parameter, δ0 is the Dirac mass at zero and Fql is
a probability measure on X with density fql with respect to either the counting
measure on N or the Lebesgue measure on R or Rs . We also implicitly assume
µql = µlq , for all 1 ≤ q, l ≤ Q. In the parametric case, we assume moreover that
Fql = F (·, θql ) and fql = f (·, θql ) where the parameter θql belongs to Θ ⊂ Rp .
In the non-parametric case we assume Fql is absolutely continuous.
We shall always assume that Fql has no point mass at zero, otherwise the
sparsity parameter pql cannot be identified from the mixture µql . For instance,
when considering Poisson weights, fql is the Poisson density truncated at zero,
fql (k) =
k
θql
(eθql − 1)−1 ,
k!
k ≥ 1.
A particular instance of these models is the affiliation one, which assumes
additionally only two distributions of connections between the edges, one for
intra-group connections and another for inter-group connections. Thus the binary state case of the affiliation model assumes
(
α if q = l,
for all q, l ∈ {1, . . . , Q}.
pql =
β if q 6= l,
The affiliation model in the continuous observations case is described similarly
with µql = µin 1q=l + µout 1q6=l , for all 1 ≤ q, l ≤ Q. More precisely, in the
continuous parametric case, for all q, l ∈ {1, . . . , Q} we set
(
(
α if q = l,
θin
if q = l,
pql =
and θql =
β if q 6= l,
θout if q 6= l.
5
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
For all these models, we consider restrictions of the model distribution by
focusing on a subset of the nodes. We denote by Kn the complete set of n2
edge variables associated to a subset of n nodes. Note that the distribution of
these variables is independent of the choice of which n nodes one considers.
Also, while this notation is motivated by that used in graph theory, where Kn
denotes the complete graph on n nodes, we emphasize that here Kn is a set of
random variables, and we are making no statement as to whether these edges
are present or absent in any realization of our model.
3. Binary random graphs
We first focus on models with binary edge states, considering the more general
case with arbitrary connectivity parameters, followed by affiliation models.
3.1. The non-affiliation case
When X = {0, 1}, a first result on identifiability of parameters was obtained by
Allman et al. (2009) for the special case of Q = 2 groups. For completeness, we
recall the statement here.
Theorem 1. (Allman et al., 2009, Theorem 7). The parameters π1 , π2 =
1 − π1 , p11 , p12 , p22 of the random graph mixture model with binary edge state
variables and Q = 2 groups are identifiable, up to label swapping, from the
distribution of K16 provided that the connectivity parameters {p11 , p12 , p22 } are
distinct.
In particular, the result remains valid when the group proportions πq are fixed.
Note the assumption that p11 6= p22 limits this theorem to the strict nonaffiliation case.
The proof of this theorem is based on a clever application of an algebraic
result, due to Kruskal (1976, 1977) (see also Rhodes, 2010), that deals with
decompositions of 3-way arrays. While generalizing the proof to more than two
groups requires substantially more effort, the basic method still applies. Here
we prove the following theorem.
Theorem 2. The parameters πq , 1 ≤ q ≤ Q, and pql = P(Xij = 1|Zi = q, Zj =
l), 1 ≤ q ≤ l ≤ Q, of the random graph mixture model with binary edge state
variables and Q ≥ 3 groups are generically identifiable, up to label swapping,
from the distribution of Km2 , when
m ≥ Q − 1 + (Q + 2)2 /4
if Q is even,
m ≥ Q − 1 + (Q + 1)(Q + 3)/4 if Q is odd.
Moreover, the result remains valid when the group proportions πq are fixed.
Note that the stated number of nodes ensuring that parameters are generically identifiable from the distribution of the edges may not be optimal. In
6
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
particular, when Q = 2, the proof of this theorem is still valid, yet it gives a
minimal number of m2 = 25 nodes. This is larger than the bound 16 obtained
in Theorem 1, and that number may itself not be optimal.
Also, while Theorem 1 gives exact restrictions on parameters producing identifiability, Theorem 2 is not explicit about the generic conditions. However, for
any fixed Q the argument in our proof does yield a straightforward, though
perhaps lengthy, means of checking whether a particular choice of parameters
meets the conditions. Among these is a requirement that the pql be distinct, so
the theorem does not apply to the affiliation model.
Moreover, a careful reading of the proof of the theorem shows that its generic
aspect concerns only the part of the parameter space with the connectivities
pql . This enables us to conclude that even when considering subsets defined
by restriction of the group proportions πq (for instance assuming the group
proportions are fixed, or equal), the result remains valid.
3.2. The affiliation model
In the particular case of the affiliation model, we can obtain results from arguments based on moments of the distribution. For a small number of nodes, one
may obtain explicit formulas for the moments in terms of model parameters.
By analyzing the solutions to this nonlinear multivariate polynomial system of
equations, one can address the question of parameter identifiability, as well as
develop estimation procedures.
3.2.1. Relying on the distribution of K3 .
Frank and Harary (1982) presented a method for estimation of the parameters
of the binary affiliation model based only on the distribution of triplet cycles
(Xij , Xjk , Xki ), 1 ≤ i < j < k ≤ n, of edge variables. From an identifiability
perspective, this corresponds to identifying the parameters from the distribution of K3 . They suggest estimation of the parameters by solving the empirical
moment equations. However, they omit discussing uniqueness of the solutions to
these equations, even though this issue is a delicate one for nonlinear equations.
In the following, we first explore the use of the distribution of only K3 to
identify model parameters. As a consequence, we exhibit a new estimation procedure for the parameters.
The distribution of a triplet (Xij , Xjk , Xki ), is expressible in terms of the
indeterminates α, β and πq s. Let us denote by s2 and s3 the sums of the squares
and cubes of the πq s and, more generally, let
sk =
Q
X
q=1
7
πqk .
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Then one easily computes (see also Frank and Harary, 1982) the moment formulas
m1
m2
= E(Xij ) = s2 α + (1 − s2 )β,
= E(Xij Xik ) = s3 α2 + 2(s2 − s3 )αβ + (1 − 2s2 + s3 )β 2 ,
m3
= E(Xij Xik Xjk ) = s3 α3 + 3(s2 − s3 )αβ 2 + (1 − 3s2 + 2s3 )β 3 , (3)
(1)
(2)
which completely characterize the distribution of (Xij , Xjk , Xki ).
Note that in the important case of a uniform node distribution, where πq =
1/Q for all q, we have sk = Q1−k . This implies s3 = s22 , and hence m2 = m21 , so
these equations reduce to two independent ones. As a consequence, the claim by
Frank and Harary (1982) that it is then possible to estimate the three unknowns
Q, α, β relying only on these moment equations is not correct.
Still, there are indeed several situations in which parameters are identifiable
from these moments, as we next discuss.
With Q = 2 latent groups and a possibly non-uniform group distribution,
there are 3 independent parameters in the affiliation model. In this case, the
three moments above are enough to identify parameters. To show this, we first
construct certain polynomials with roots at the connectivity parameters. Since
the construction easily extends to larger Q, we give it more generally.
Proposition 3. Consider the random graph affiliation mixture model with Q ≥
2 groups and binary edge state variables,
on Q + 1 nodes. Then the parameter
α is a real root of the degree Q+1
univariate
polynomial
2
Y
UQ (X) = E
(X − Xij ) .
1≤i<j≤Q+1
The polynomial
X
VQ (X, Y ) = E X + (Q − 1)Y −
Xi(Q+1)
1≤i≤Q
Y
(X − Xij )
1≤i<j≤Q
Q
of degree 2 +1 in X, and degree 1 in Y, vanishes at (X, Y ) = (α, β). Moreover,
the coefficient of Y in VQ (α, Y ) is non-zero precisely when α 6= β.
The utility of these polynomials is that from the
distribution of KQ+1 , the
candidate values for α, and
polynomial UQ allows one to recover at most Q+1
2
then for each such value VQ allows one to recover a unique candidate for β. While
some of these candidates could be ruled out as not lying in (0, 1), we do not
know when this leaves a unique α and β for Q ≥ 3. In the case of Q = 2 groups,
however, we prove that these polynomials uniquely identify the parameters.
Theorem 4. In the random graph affiliation mixture model with Q = 2 groups
and binary edge state variables, the parameter α is the unique real root of the
polynomial
U2 (X) = X 3 − 3m1 X 2 + 3m2 X − m3 .
8
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Moreover, as soon as α 6= β, the parameter β is the unique real root of the
polynomial V2 (α, Y ) where
V2 (X, Y ) = X 2 + XY − 3m1 X − m1 Y + 2m2 .
Once α and β are uniquely identified, we may determine from equation (1)
the value of s2 (again using that α 6= β), and hence π1 , π2 , up to permutation.
This proves the following corollary.
Corollary 5. The parameters {π1 , π2 = 1 − π1 }, up to label swapping, and α, β
of the random graph affiliation mixture model with Q = 2 groups and binary
edge state variables are strictly identifiable from the distribution of K3 provided
α 6= β.
Identifiability of α and β when Q and the πq s are known When the
πq s are known, Frank and Harary (1982) suggested solving any two of the three
empirical counterparts of equations (1), (2) and (3), leading to three different
methods of estimating α and β. However, numerical experiments convinced us
that two equations are in general not sufficient to uniquely determine the parameters. In fact, it is not immediately clear that even with the three moment
equations (either the theoretical ones for the question of identification, or their
empirical counterparts for estimation) a unique solution is determined. Below
we give explicit formulas for the solution to the system, which in most cases are
even rational, involving no extraction of roots. These can thus be easily used to
construct estimators.
Theorem 6. If m2 6= m21 , then π is non-uniform and we can recover the
parameters β and α via the rational formulas
β
=
α
=
(s3 − s2 s3 )m31 + (s32 − s3 )m2 m1 + (s3 s2 − s32 )m3
,
(m21 − m2 )(2s32 − 3s3 s2 + s3 )
m1 + (s2 − 1)β
.
s2
If m2 = m21 , then π is uniform and we have
β = m1 +
m31 − m3
Q−1
1/3
and
α = Qm1 + (1 − Q)β.
Implicit in this statement is the fact that denominators in the above formulas are non-zero. Note that the uniform group prior case formula is used for
estimation by Ambroise and Matias (2010).
We immediately obtain the following corollary.
Corollary 7. For any fixed and known values of πq ∈ (0, 1), 1 ≤ q ≤ Q, both
parameters α, β of the random graph affiliation model with binary edge state
variables are identifiable from the distribution of K3 .
9
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
The proofs of the previous statements lead to an interesting polynomial in
the moments, whose vanishing detects the Erdős-Rényi model, corresponding to
a single node group.
Proposition 8. The moments of a random graph affiliation model with binary
edge state variables, Q node states, and α 6= β satisfy
2m31 − 3m1 m2 + m3 = 0
if, and only if, Q = 1.
This proposition follows from expressing the moments in terms of parameters
to see that
2m31 − 3m1 m2 + m3 = (α − β)3 (2s32 − 3s2 s3 + s3 ),
together with the determination in the proof of Lemma 19 in Section 5.3 that
2s32 − 3s2 s3 + s3 6= 0 when πq > 0 for more than one group q.
3.2.2. Relying on the distribution of K4
We next investigate parameter identifiability from the distribution of the edge
variables over more than 3 nodes, paying particular attention to the case of
n = 4 nodes.
Necessary conditions for identifiability of the πq s, when Q is known
First, we establish that for an affiliation model, if the πq s are unknown and
are to be recovered from the distribution of Kn , then one must look at at least
n = Q nodes. Note that this applies not only to the binary edge state model,
but to more general weighted edge models as well.
Proposition 9. In order to identify, up to label swapping, the parameters
{πq }1≤q≤Q from an affiliation random graph mixture distribution on Kn (either binary or weighted), it is necessary that n ≥ Q.
The condition in this lemma is in general not sufficient to identify the πq .
Indeed, the binary edge state affiliation model with Q = 3 has 4 parameters.
However, the set of distributions over K3 has dimension at most 3 (according to
equations (1),(2) and (3)), which is not sufficient to identify the 4 parameters.
Distribution on K4 The moment formulas describing the distribution of the
affiliation random graph mixture model on K4 are given in Table 1. Note that
m31 is the same as m3 in the last subsection, and that we omit E(X12 X34 ) =
(E(X12 ))2 since edge variables with no endpoints in common are independent.
To facilitate understanding of the moments in the table, their corresponding
induced motifs are shown in Figure 1.
With Q arbitrary, but a uniform prior on the nodes (πq = 1/Q, so si = Q1−i ),
there are algebraic relationships between the moments on K4 , including
m2 = m21 , m32 = m33 = m31 , m42 = m1 m31 ,
10
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
s2 α + (1 − s2 )β
s3 α2 + 2αβ(s2 − s3 ) + (1 − 2s2 + s3 )β 2
s3 α3 + 3(s2 − s3 )αβ 2 + (1 − 3s2 + 2s3 )β 3
s4 α3 + 3(s3 − s4 )α2 β + 3(s2 − 2s3 + s4 )αβ 2
+(1 − 3s2 + 3s3 − s4 )β 3
m33
E(X12 X23 X34 )
s4 α3 + (s22 + 2s3 − 3s4 )α2 β + (3s2 − 2s22 − 4s3 + 3s4 )αβ 2
+(1 − 3s2 + s22 + 2s3 − s4 )β 3
m41
E(X12 X23
s4 α4 + 2(s22 + 2s3 − 3s4 )α2 β 2 + 4(s2 − s22 − 2s3 + 2s4 )αβ 3
X34 X41 )
+(1 − 4s2 + 2s22 + 4s3 − 3s4 )β 4
m42
E(X12 X13
s4 α4 + (s3 − s4 )α3 β + (s22 + 2s3 − 3s4 )α2 β 2
X14 X23 )
+(4s2 − 2s22 − 7s3 + 5s4 )αβ 3 + (1 − 4s2 + s22 + 4s3 − 2s4 )β 4
m5
E(X12 X23 X34
s4 α5 + 2(s3 − s4 )α3 β 2 + (2s3 − 4s4 + 2s22 )α2 β 3
X41 X13 )
+(5s2 − 4s22 − 10s3 + 9s4 )αβ 4 + (1 − 5s2 + 2s22 + 6s3 − 4s4 )β 5
m6
E(X12 X23 X34
s4 α6 + 4(s3 − s4 )α3 β 3 + 3(s22 − s4 )α2 β 4
X41 X13 X24 )
+6(s2 − s22 − 2s3 + 2s4 )αβ 5 + (1 − 6s2 + 8s3 − 6s4 + 3s22 )β 6
Table 1
Moment formulas describing the distribution of K4 , the complete graph on 4 nodes, for the
binary affiliation model.
m1
m2
m31
m32
m1
E(X12 )
E(X12 X13 )
E(X12 X13 X23 )
E(X12 X13 X14 )
m2
m31
m32
m33
m41
m42
m5
m6
Fig 1. Correspondence between moments and motifs for K4 .
and more complicated ones that can be computed using Gröbner basis methods
to eliminate α, β, and 1/Q from the equations. (Cox et al., 1997, provide an
excellent grounding on this computational algebra.) However, the 3 parameters
α, β, Q of this affiliation model are, in fact, identifiable. Indeed such calculations
show that the formulas for m1 , m31 , and m41 alone imply the following.
Proposition 10. The number of node groups, Q, in a random graph affiliation
model with binary edge state variables and uniform group priors can be identified
from the moments m1 , m31 , and m41 by
Q=
−m431 − m341 − 3m41 m81 + 3m241 m41 − 6m61 m231 + 4m91 m31 + 4m31 m331
.
(m41 − m41 )3
Note that, replacing the moments with empirical estimators, this formula
could be used for estimation of Q.
Of course once the formula in Proposition 10 is given, it can be most easily
verified by expressing the moments in terms of parameters, and simplifying. Note
that the denominator here does not vanish, as may be seen in two different ways:
either by Lemma 20 in Section 5.3, or by checking that that
m41 − m41 = (α − β)4
11
(Q − 1)
6= 0.
Q4
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Once Q is identified by this formula, since we are assuming πq = 1/Q, Corollary 7 applies so that α and β are identifiable as well. Thus we have shown the
following.
Corollary 11. The parameters α, β, and Q of the random graph affiliation
mixture model with binary edge state variables and uniform groups priors (πq =
1/Q) are identifiable from the distribution of K4 .
4. Weighted random graphs
4.1. The parametric case
In the parametric case, where Fql has parametric form F (·, θql ), we can uniquely
identify the connectivity parameters under very general conditions by considering the distribution of K3 only. Indeed, each triplet (Xij , Xik , Xjk ) follows a
mixture of Q3 distributions, each with three variates, comprising
• Q terms of the form µqq (Xij )µqq (Xik )µqq (Xjk ), each with prior πq3 , where
1 ≤ q ≤ Q,
• 3Q(Q−1) terms of the form µqq (Xij )µql (Xik )µql (Xjk ) (permuting i, j and
k), each with prior πq2 πl , with distinct q, l ∈ {1, 2, . . . , Q},
• Q(Q − 1)(Q − 2) terms of the form µql (Xij )µqm (Xik )µlm (Xjk ), each with
prior πq πl πm , with distinct q, l, m ∈ {1, 2, . . . , Q}.
By an old result due to Teicher (1967), the identifiability of finite mixtures of
some family of distributions is equivalent to identifiability of finite mixtures of
(multivariate) product distributions from this same family. In addition, identifiability of continuous univariate parametric mixtures is generally well understood
(Teicher, 1961, 1963). Thus, we introduce the following assumptions.
Assumption 1. The Q(Q + 1)/2 parameter values θql , 1 ≤ q ≤ l ≤ Q are
distinct.
Assumption 2. The family of measures M = {F (·, θ) | θ ∈ Θ} satisfies
i) all elements F (·, θ) have no point mass at 0,
ii) the parameters of finite mixtures of measures in M are identifiable, up to
label swapping. In other words, for any integer m ≥ 1,
if
m
X
i=1
αi F (·, θi ) =
m
X
αi0 F (·, θi0 )
then
m
X
i=1
i=1
αi δθi =
m
X
αi0 δθi0 ,
i=1
where δθ denotes the Dirac mass at θ.
Remark. Note that most of the classical parametric families satisfy this assumption. In particular, the truncated Poisson, Gaussian and Laplace families
{f (·, θ), θ ∈ Rp } satisfy Assumption 2 (see e.g., McLachlan and Peel, 2000;
Teicher, 1961, 1963).
12
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Theorem 12. Under Assumptions 1 and 2, the parameters π, θql , pql , 1 ≤
q ≤ l ≤ Q of the parametric random graph mixture model with weighted edge
variables are identifiable, up to label swapping, from the distribution of K3 .
The previous result is not applicable to the parametric affiliation model, for
which the set {θql , 1 ≤ q ≤ l ≤ Q} reduces to {θin , θout }, so Assumption 1 is
violated. However, in this case a similar argument again yields a full identifiability result. As suggested by Proposition 9, we use Q nodes to identify the
group priors.
Theorem 13. Under Assumption 2, the parameters α, β, θin , θout of the parametric affiliation random graph mixture model with weighted edge variables are
strictly identifiable from the distribution of K3 provided θin 6= θout . Once these
have been identified, the group priors π can further be identified, up to label
swapping, from the distribution of KQ .
A similar approach to that of this theorem has been successfully used by Ambroise and Matias (2010) to estimate the parameters of these models. They first
estimated the sparsity parameters from the induced binary edge state model,
but a procedure based on the preceding theorems would not require that these
be distinct.
We turn next to models with a finite number, κ, of edge weights. Our primary
reason for investigating such models is the role they play in our analysis of
models with non-parametric conditional distributions of edge weights, in Section
4.2. Thus we limit our investigation to the single result we need there.
Theorem 14. The parameters of the random graph mixture model, with κstate edge variables and Q ≥ 2 latent groups, are identifiable,
up to label swap
ping, from the distribution of K9 , provided κ ≥ Q+1
and
the
κ-entry vectors
2
{pql }1≤q≤l≤Q are linearly independent.
Note that the condition given here on the number of edge states is likely far
from optimal. In case Q = 2 the condition requires at least κ = 3 edge states
whereas we know from Theorem 1 that the parameters are identifiable for this
Q with only κ = 2 edge states.
4.2. The non-parametric case
In the most general case of non-parametric distributions, our arguments for
identifiability depend on binning the values of the edge variables into a finite
set. We then apply Theorem 14 to this discretization, to obtain the following.
Theorem 15. The parameters {πq , µql = (1 − pql )δ0 + pql Fql : 1 ≤ q, l ≤ Q} of
the random graph weighted non-parametric mixture model are identifiable, up to
label swapping, from the distribution of K9 provided the measures µql , 1 ≤ q ≤
l ≤ Q are linearly independent.
13
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
5. Proofs
5.1. Method of proofs based on Kruskal’s theorem
In this section we review Kruskal’s theorem and describe our technique for
employing it in the proofs of Theorems 2 and 14.
Kruskal’s result We first present Kruskal’s result in a statistical context.
Consider a latent random variable V with state space {1, . . . , r} and distribution given by the column vector v = (v1 , . . . , vr ). Assume that there are
three observable random variables Uj for j = 1, 2, 3, each with finite state space
{1, . . . , κj }. The Uj s are moreover assumed to be independent conditional on
V . Let Mj , j = 1, 2, 3 be the stochastic matrix of size r × κj whose ith row is
mji = P(Uj = · | V = i). Then consider the κ1 × κ2 × κ3 tensor [v; M1 , M2 , M3 ]
defined by
r
X
[v; M1 , M2 , M3 ] =
vi m1i ⊗ m2i ⊗ m3i .
i=1
Thus [v; M1 , M2 , M3 ] is a 3-dimensional array whose (s, t, u) element is
[v; M1 , M2 , M3 ]s,t,u =
r
X
vi m1i (s) m2i (t) m3i (u) = P(U1 = s, U2 = t, U3 = u),
i=1
for any 1 ≤ s ≤ κ1 , 1 ≤ t ≤ κ2 , 1 ≤ u ≤ κ3 . Note that [v; M1 , M2 , M3 ] is left
unchanged by simultaneously permuting the rows of all the Mj and the entries
of v, as this corresponds to permuting the labels of the latent classes. Knowledge of the distribution of (U1 , U2 , U3 ) is equivalent to knowledge of the tensor
[v; M1 , M2 , M3 ].
To state Kruskal’s result, we need some algebraic terminology. For a matrix
M , the Kruskal rank of M will mean the largest number I such that every set
of I rows of M are independent. Note that this concept would change if we
replaced “row” by “column,” but we only use the row version in this article.
With the Kruskal rank of M denoted by rankK M , we have
rankK M ≤ rank M,
and equality of rank and Kruskal rank does not hold in general. However, in the
particular case when a matrix M of size p × q has rank p, it also has Kruskal
rank p.
The fundamental algebraic result of Kruskal is the following.
Theorem 16. (Kruskal, 1976, 1977), (see also Rhodes, 2010) Let Ij = rankK Mj .
If
(4)
I1 + I2 + I3 ≥ 2r + 2,
then [v; M1 , M2 , M3 ] uniquely determines v and the Mj , up to simultaneous
permutation of the rows. In other words, the set of parameters {(v, P(Uj = · |
14
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
V ))} is uniquely identified, up to label swapping, from the distribution of the
random variables (U1 , U2 , U3 ).
Now, it will be useful to note that condition (4) holds for generic choices of
the Mj , provided the κj are large enough to allow it. More precisely, Kruskal’s
condition on the sum of Kruskal ranks can be expressed through a Boolean combination of polynomial inequalities (6=) involving matrix minors in the parameters. If we show there is even a single choice of parameters for which Kruskal’s
condition is satisfied, then the algebraic variety of parameters for which it does
not hold is a proper subvariety (defined by negating the polynomial condition
above, and so by a Boolean combination of equalities) of parameter space. As
proper subvarieties are necessarily of Lebesgue measure zero, it follows that the
Kruskal condition holds generically.
Our proof strategy for showing identifiability of certain random graph mixture models is to embed them in the model we just described. Applying Kruskal’s
result to the embedded model, we derive partial identifiability results on the
embedded model, and then, using details of the embedding, relate these to the
original model.
Embedding the random graph mixture model into Kruskal’s context
Let κ denote the cardinality of X , in either the binary state case or the general
finite state case.
To place the random graph mixture model in the context of Theorem 16, we
define a composite hidden variable and three composite observed variables that
reflect the conditional independence structure integral to Kruskal’s theorem.
For some n (to be determined), let V = (Z1 , Z2 , . . . , Zn ) be the latent random variable, with state space {1, . . . , Q}n , which describes the state of all n
nodes collectively, and denote by v the corresponding vector of its probability
n
distribution.
Note that the entries of v are of the form π1n1 · · · πQQ with nq ≥ 0
P
and
nq = n.
The observed variables will correspond to three pairwise disjoint subsets
G1 , G2 , G3 of the complete set of edges Kn . By choosing the Gi to have no
edges in common, we ensure their conditional independence.
The construction of the set of edges Gi proceeds in two steps. We begin by
considering a small complete graph, and an associated matrix: For a subset of
m
m nodes, we define a Qm × κ( 2 ) matrix A, with rows indexed by assignments
I ∈ {1, . . . , Q}m of states to these
m nodes, columns indexed by the state
edges
between them, and entries giving the
space of the complete set of m
2
probability of observing the specified states on all edges, conditioned on the
specified node states. In the case κ = 2, it is helpful to note that each column
index corresponds to a different graph on the m nodes, composed of those edges
assigned state 1. For larger κ one may similarly associate to a column index a
κ-coloring of the edges of the complete graph. We therefore refer to a column
index as a configuration.
15
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
In the step we call the base case, we exhibit a value of m such that this matrix
A generically has full row rank.
Then, an extension step builds on the base case, in order to construct a
larger set of n nodes which will be used in the application of Theorem 16. This
is accomplished by means of (Allman et al., 2009, Lemma 16, and subsequent
remark) which we paraphrase as follows.
Lemma 17. Suppose for the Q-node-state model, the number of nodes m is
m
such that the Qm × κ( 2 ) matrix A of probabilities of observing configurations
of Km conditioned on node state assignments has rank Qm . Then with n = m2
there exist pairwise disjoint subsets G1 , G2 , G3 of the complete set of edges Kn
such that for each Gi the Qn × κ|Gi | matrix Mi of probabilities of observing
configurations of Gi conditioned on node state assignments has rank Qn .
In our applications here, we only determine that A has full row rank generically. Hence the Lemma only allows us to conclude that the Mi have full row
rank generically, and hence have Kruskal rank Qn generically.
We also note (for use in the proof of Theorems 2 and 14) that in the construction of the Lemma, each subset Gj is the union of
m complete sets of edges each
over m different nodes, and thus contains m m
2 edges. In particular, if m ≥ 3,
then Gi contains a complete graph on 3 nodes.
Application of Kruskal’s theorem to the embedded model and conclusion Next, with v, M1 , M2 , M3 defined by the embedding given in the
previous paragraphs, we apply Kruskal’s Theorem (Theorem 16) to the table
[v; M1 , M2 , M3 ]. Knowledge of the distribution of the random graph mixture
model over n nodes implies knowledge of this 3-dimensional table. By our construction of the Mi , condition (4) is satisfied since 3Qn ≥ 2Qn + 2. Thus the
vector v and the matrices M1 , M2 , M3 are uniquely determined, up to simultaneous permutation of the rows.
With these embedded parameters in hand, it is still necessary to recover the
initial parameters of the random graph mixture model: the group proportions
and the connectivity vectors. As this requires a rather detailed argument, we
leave its exposition for a specific application.
Finally, we note that by discretizing continuous variables, this approach to establishing identifiability may also be used in the case of continuous connectivity
distributions.
5.2. Proof of Theorem 2
This proof follows the strategy described in the previous section. We use the
notation pql = P(Xij = 1 | Zi = q, Zj = l) = 1 − p̄ql .
Base case The initial step consists in finding a value of m such that the
m
matrix A of size Qm × 2( 2 ) containing the probabilities of the configurations
16
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
over these m nodes, conditional on the hidden node states, generically has full
row rank.
The condition of having full row rank can be expressed as the non-vanishing
of at least one Qm × Qm minor of A. Composing the map sending {pql } → A
with this collection of minors gives polynomials in the parameters of the model.
To see that these polynomials are not identically zero, and thus are non-zero for
generic parameters, it is enough to exhibit a single choice of the {pql } for which
the corresponding matrix A has full row rank.
With this in mind, we choose to consider {pql } of the form pql = sq sl /(sq sl +
tq tl ), so p̄ql = tq tl /(sq sl + tq tl ), with si , tj > 0 to be chosen later. However, since
the property of having full row rank is unchanged under non-zero rescaling of
the rows of the matrix A, and all entries of A are monomials with total degree
m
2 in {pql , p̄ql }, we may simplify the entries of A by removing denominators,
and consider the matrix (also called A) with entries in terms of pql = sq sl and
p̄ql = tq tl .
The rows of A are indexed by the composite node states I ∈ {1, . . . , Q}m ,
m
while its columns are indexed by the edge configurations {0, 1}( 2 ) . For any
composite hidden state I ∈ {1, . . . , Q}m and any vertex v ∈ {1, . . . , m}, let
I(v) ∈ {1, . . . , Q} denote the state of vertex v in the composite state I. With
our particular choice of the parameters pql , the (I, (xij )1≤i<j≤m )-entry of A is
given by
Y
m−1−dv
v
sdI(v)
tI(v)
,
1≤v≤m
P
where dv = w6=v xvw is the degree of node v in the graph associated to the
configuration (xij )1≤i<j≤m . Note that the entries in a column of A are now determined by the degree sequence d = (dv )1≤v≤m associated to the configuration.
In general, there is a many-to-one correspondence of configurations to their
degree sequences. (E.g., for m = 4 nodes, the configuration with edges (1, 2) and
(3, 4) in state 1, and that with edges (1, 3) and (2, 4) in state 1, both have degree
sequence (1, 1, 1, 1).) Thus if m > 3, there will be several identical columns in
A. For any degree sequence d = (dv )1≤v≤m arising from an m-node graph, let
Ad denote a corresponding column of A.
Now, for each vertex v ∈ {1, . . . , m} and each q ∈Q{1, . . . , Q}, introduce an indeterminate Uv,q and a Qm -entry row vector U = ( 1≤v≤m Uv,I(v) )I∈{1,...,Q}m .
For each degree sequence d, we have
UAd =
X
Y
m−1−dv
v
sdI(v)
tI(v)
Uv,I(v)
I∈{1,...,Q}m 1≤v≤m
=
Y v
sd1v t1m−1−dv Uv,1 + · · · + sdQv tm−1−d
Uv,Q .
Q
1≤v≤m
m
1
U1,i1 ) · · · (sdimm tim−1−d
Um,im )
To verify this, notice that each monomial (sdi11 tim−1−d
1
m
obtained from multiplying out the product on the right corresponds to a choice
17
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
of node states iv for nodes v, and hence a vector I = (i1 , . . . , im ). Moreover, we
obtain one such summand for each I.
In order to prove that the matrix A has full row rank, it is enough to exhibit
Qm independent columns of A. Note, however, that independence of a set of
columns {Ad } is equivalent to the independence of the corresponding set of
polynomial functions {UAd } in the indeterminates {Uv,q }.
Now for a set D of degree sequences, to prove that the polynomials {UAd }d∈D
are independent, we assume that there exist scalars ad such that
X
ad UAd ≡ 0,
(5)
d∈D
and show that necessarily all ad = 0. To this aim, we prove the following lemma.
Lemma 18. Suppose Q ≤ m. Let D be a set of degree sequences such that for
each node v ∈ {1, . . . , m}, the set of degrees {dv | d ∈ D} has cardinality at
most Q. Then for generic values of si , tj , for each v and each d? ∈ {dv | d ∈
D} there exist values of the indeterminates {Uv,q }1≤q≤Q that annihilate all the
polynomials UAd for d ∈ D except those for which dv = d? .
Proof. Fix a node v and let {d1 , . . . , dQ } be any set of Q distinct integers with
{dv | d ∈ D} ⊆ {d1 , . . . , dQ } ⊆ {0, 1, . . . , m − 1}.
i
i
i
i
m−1−d
). Since
Let M be the Q × Q matrix with ith row (sd1 t1m−1−d , . . . , sdQ tQ
i
all the integers d are different, the matrix M has full row rank for generic
choices of si , tj . (One way to see this is to consider a m × m Vandermonde
matrix, with (k, l)-entry (ul )k . Choosing distinct values of ul this has full rank,
and thus the Q × m submatrix composed of rows with indices {di } has rank Q.
But then Q of the columns can be chosen so that the Q × Q submatrix has full
rank. Letting the si be the values of ul in these columns, and tj = 1, gives one
choice for which the matrix M has full rank.)
Note d? = dk for some k, and let ek be the Q-entry vector of all zeros except
for a 1 in the kth position. Then for generic si , tj , the equation
M (Uv,1 , . . . , Uv,Q )T = ek
admits a unique solution, one that corresponds to the above-mentioned choice
of indeterminates {Uv,q }1≤q≤Q .
Now consider the following collection
m−1
n
X
D = (d1 , . . . , dm ) | dv ∈ {1, 2, . . . , Q} for v ≤ m − 1, and if
dv is even
v=1
o
then dm ∈ {0, 2, 4, . . . , 2Q − 2}, otherwise dm ∈ {1, 3, 5, . . . , 2Q − 1} .
Note that D has Qm elements and satisfies the assumption of Lemma 18 on
the number of different values per coordinate. Moreover, if we establish, as we
18
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
do below, that its elements are realizable as degree sequences of graphs over m
nodes, then by choosing one column of A associated to each degree sequence in
D, we obtain a collection of Qm different columns of A. These columns are independent since for each sequence d? ∈ D by Lemma 18 we can choose values of
the indeterminates {Uv,q }1≤v≤m,1≤q≤Q such that all polynomials UAd vanish,
except UAd? , leading to ad? = 0 in equation (5).
That each sequence d ∈ D is realizable as a degree sequence of a graph over
m nodes follows from a result of Erdős and Gallai (1961) (see also Berge, 1976,
Chapter 6, Theorem 6). Reordering the entries of d so that d1 ≥ d2 ≥ . . . ≥ dm ,
a necessary and sufficient condition for a sequence to be realizable by such a
graph is that for 1 ≤ k ≤ m − 1,
k
X
m
X
dv ≤ k(k − 1) +
v=1
min{k, dv }.
(6)
v=k+1
From the definition of d ∈ D, with coordinates reordered, it is easy to see that
for any 1 ≤ k ≤ m − 1, we have
k
X
dv ≤ (k − 1)Q + (2Q − 1)
m
X
and
v=1
min{k, dv } ≥ m − k.
v=k+1
Thus, for (6) to be satisfied, it is enough that for any 1 ≤ k ≤ m − 1, we have
−k 2 + (Q + 2)k + Q − 1 ≤ m.
But for m sufficently large
max {−k 2 + (Q + 2)k} =
1≤k≤m−1
Q+2 2
if Q is even,
if Q is odd.
2
(Q+1)(Q+3)
4
Thus, inequality (6) is satisfied as soon as
(
2
m ≥ Q − 1 + Q+2
2
m ≥ Q − 1 + (Q+1)(Q+3)
4
if Q is even,
if Q is odd.
This concludes the proof of the base case.
The extension step explained in Section 5.1 then applies, so that with n =
m2 , Kruskal’s Theorem may be applied to identify, up to simultaneous row
permutation, v, M1 , M2 , and M3 as defined in that section.
Conclusion The entries of v obtained via Kruskal’s
P theorem applied to the
n
nq = n, while the entries
embedded model are of the form π1n1 · · · πQQ with
of the Mi contain information on the pql . Although the ordering of the rows of
the Mi is arbitrary, crucially we do know how the rows of Mi are paired with
the entries of v.
19
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
By focusing on one of the matrices, say M1 , and adding appropriate columns
to marginalize to a single edge variable (e.g., all columns for configurations with
x12 = 1), we recover the set of values {pql }1≤q≤l≤Q , but without order. However,
if row k of M1 corresponds to the unknown node states I, then performing such
marginalizations for each of the 3 edges of a complete graph C on 3 nodes
contained in G1 recovers the set
Rk = {pql | for some edge (v, w) ∈ C, {I(v), I(w)} = {q, l} }.
By considering the cardinalities of the sets Rk in the generic case of all pql
distinct, we can now determine individual parameters.
Consider first those k for which Rk has one element. There are exactly Q
of these, arising from all 3 nodes being in the same group. Thus for such k,
Rk = {pqq } and vk = πqn . Choosing an arbitrary labeling, we have determined
all πq and pqq .
Next consider those k for which the Rk has two elements. These arise from
2 nodes being in the same group, with the other node in a different group, so
Rk = {pqq , pql } for some l 6= q. However, having already determined the pqq and
since generically the pql are distinct, we can find exactly two such k1 and k2 of
the form Rk1 = {pqq , pql } and Rk2 = {pll , pql }. Thus, we can also determine pql
for q 6= l.
Finally, note that all generic aspects of this argument, in the base case and
the requirement that the parameters pql be distinct, concern only the pql . Thus
if the group proportions πq are fixed to any specific values, the theorem remains
valid.
5.3. Proofs relying on moment equations
Proof of Proposition 3. Focusing on Q+1 nodes, let Z = (Z1 , . . . , ZQ+1 ) denote
the composite node random variable, and z = (z1 , . . . , zQ+1 ) any realization of
Z. Note that
X
Y
Y
UQ (X) =
πzk E
(X − Xij ) | Z = z
z∈{1,...,Q}Q+1
=
X
Y
z∈{1,...,Q}Q+1
1≤i<j≤Q+1
1≤k≤Q+1
1≤k≤Q+1
Y
πzk
(X − E(Xij | Zi = zi , Zj = zj )) ,
1≤i<j≤Q+1
since conditioned on Z = z, the edge variables Xij are independent. Now since
there are Q + 1 nodes and only Q groups, for each term in the sum there is some
zi = zj . Since
X − E(Xij |Zi = zi = zj = Zj ) = X − α,
each term in the sum vanishes at X = α, so UQ (α) = 0.
20
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Likewise,
X
VQ (X, Y ) =
Y
πzk ×
z∈{1,...,Q}Q+1
1≤k≤Q+1
E X + (Q − 1)Y −
Xi(Q+1)
(X − Xij ) Z = z .
1≤i≤Q
1≤i<j≤Q
X
Y
But
E X + (Q − 1)Y −
Xi(Q+1)
(X − Xij ) Z = z
1≤i≤Q
1≤i<j≤Q
X
= X + (Q − 1)Y −
E Xi(Q+1) | Zi = zi , ZQ+1 = zQ+1 ×
X
Y
1≤i≤Q
Y
(X − E(Xij | Zi = zi , Zj = zj )) .
1≤i<j≤Q
Letting X = α, one of the factors X − E(Xij | Zi = zi , Zj = zj ) will vanish
for any z except possibly those with the zi , 1 ≤ i ≤ Q, distinct. But in that
case, zQ+1 = zi for exactly one value of i ∈ {1, . . . , Q}, so that the first factor
becomes
α + (Q − 1)Y − (Q − 1)β − α.
Thus in addition setting Y = β ensures each summand is zero, so VQ (α, β) = 0.
Finally, the coefficient of Y in VQ (α, Y ) is the product of Q − 1 and
E
Y
(α − Xij )
1≤i<j≤Q
=
X
Y
z∈{1,...,Q}Q
Y
πzk
1≤k≤Q
E(α − Xij | Zi = zi , Zj = zj ).
1≤i<j≤Q
Q
But 1≤i<j≤Q E(α − Xij | Zi = zi , Zj = zj ) vanishes for all z except possibly
for those in which all zi , 1 ≤ i ≤ Q, are distinct, in which case it takes the value
Q
(α − β)( 2 ) . So the coefficent becomes
Y
Q
πk (α − β)( 2 ) .
(Q − 1)(Q!)
1≤k≤Q
This is zero if, and only if, α = β.
21
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Proof of Theorem 4. Since α is a real root of the cubic polynomial U2 (X), to
d
U2 (X) ≥ 0. But
show α is uniquely identifiable it is enough to show that dX
d
U2 (X) = 3X 2 − 6m1 X + 3m2 = 3 (X 2 − m1 )2 + (m2 − m21 ) .
dX
But m2 − m21 ≥ 0 because, using the Cauchy-Schwarz inequality,
m2 = E(Xij Xik ) = E[E(Xij |Zi )E(Xik |Zi )]
= E[E(Xij |Zi )2 ] ≥ [E(E(Xij |Zi ))]2 = m21 .
With α identified, since α 6= β, we may uniquely recover β as the root of the
linear polynomial V2 (α, Y ) with nonzero leading coefficient.
Proof of Theorem 6. Using equation (1) to eliminate α from equations (3) and
(2) respectively, gives two equations
R(β) = aβ 3 + bβ 2 + cβ + d = 0,
S(β) = Aβ 2 + Bβ + C = 0,
where
a
b
c
d
= −2s32 + 3s2 s3 − s3 ,
A
= 3m1 (s32 − 2s2 s3 + s3 ),
B
and
= 3m21 s3 (s2 − 1),
C
= m31 s3 − m3 s32 ,
= s3 − s22 ,
= −2m1 (s3 − s22 ),
= m21 s3 − m2 s22 .
To understand the degrees of these polynomials we need the following.
PQ
Lemma 19. Suppose π ∈ [0, 1]Q with q=1 πq = 1.
i) If πq > 0 for at least two values of q, then a 6= 0.
ii) A = 0 if, and only if, π is uniform on its support.
Proof. To establish claim i), first observe that 0 < s2 < 1. Moreover, since
s23 ≤ s2 s4 by the Cauchy-Schwarz inequality, and s4 < s22 by comparing terms
3/2
(since at least two πq > 0), we have s3 < s2 . If −2s32 + 3s2 s3 − s3 = 0, then
3/2
s2
> s3 =
2s32
,
3s2 − 1
where the denominator must be positive. Thus
3/2
1>
2s2
,
3s2 − 1
so
3/2
0 > 2s2
− 3s2 + 1.
22
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
However, the function x 7→ 2x3/2 − 3x + 1 is positive on (0, 1), so this is a
contradiction.
Turning to claim ii), we have A = s3 −s22 and by the Cauchy-Schwarz inequalP 3/2 1/2
3/2
3/2
ity, s22 = ( q πq πq )2 ≤ s3 , with equality if, and only if, (π1 , . . . , πQ ) =
1/2
1/2
λ(π1 , . . . , πQ ) for some value λ ∈ R. This can only occur if on its support π
is uniform.
Returning to the proof of Theorem 6, if π is not uniform, we thus have A 6= 0
and dividing the polynomial R(β) by S(β) produces a linear remainder T (β),
which is calculated to be
T (β) =
s22
s22 (m2 − m21 )(s3 − 3s3 s2 + 2s32 )β
− s3
+(s3 − s2 s3 )m31 + (s32 − s3 )m2 m1 + (s3 s2 − s32 )m3 .
Since any common zero of R(β) and S(β) must also be a zero of T (β), we can
recover the parameters β and α via the rational formulas
(s3 − s2 s3 )m31 + (s32 − s3 )m2 m1 + (s3 s2 − s32 )m3
,
(m21 − m2 )(2s32 − 3s3 s2 + s3 )
m1 + (s2 − 1)β
.
α=
s2
β=
(7)
(8)
Note that a calculation shows
m21 − m2 = (α − β)2 (s22 − s3 ),
(9)
which, since A 6= 0, is only zero in the trivial case of α = β. Otherwise, since
2s32 − 3s3 s2 + s3 = −a 6= 0 by part i) of Lemma 19, the formulas (7) and (8) are
valid.
Equation (9), together with part ii) of Lemma 19 further shows that if m2 6=
m21 , then π is not uniform.
If m2 = m21 , then π is uniform, and S(β) is identically zero. However, in this
case the coefficients of
R̃(β) =
Q3
R(β) = β 3 + b̃β 2 + c̃β + d˜
1−Q
simplify to
b̃ = −3m1 ,
c̃ = 3m21 ,
Qm31 − m3
m3 − m3
d˜ =
= −m31 + 1
.
1−Q
1−Q
Thus
R̃(β) = (β − m1 )3 +
23
m31 − m3
,
1−Q
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
which has a unique real root
β = m1 +
m31 − m3
Q−1
1/3
.
The parameter α can then be found by formula (8).
Proof of Proposition 9. First, note that the distribution of Kn may be parameterized using the elementary symmetric polynomials σi evaluated at the {πq }1≤q≤Q ,
instead of the values {πq }1≤q≤Q . Indeed, the affiliation model distribution only
involves the πq s through the symmetric expressions
X
πqi11 . . . πqiss ,
q1 ,...,qs ,
qi 6=qj
P
with s ≤ Q and k≤s ik = n, and these sums may be expressed as polynomials in the {σi (π1 , . . . , πQ )}1≤i≤n . Thus for identifiability of the {πq } from
the distribution of Kn , it is necessary that the {πq } be identifiable from the
PQ
{σi (π1 , . . . , πQ )}1≤i≤n . Note also that σ1 (π1 , . . . , πQ ) = q=1 πi = 1 carries no
information on the πq s that is not already known.
Now if n < Q, identifying Q−1 independent choices of the πq from the values
of n − 1 continuous functions of those πq is impossible.
Lemma 20. For the random graph affiliation model on Q nodes, with binary
edge state variables, uniform group priors, and connectivities α 6= β, the moment
inequality m41 > m41 holds.
Proof. Note
m41 = E[E(X12 X23 |Z1 , Z3 )E(X34 X41 |Z1 , Z3 )] = E[E(X12 X23 |Z1 , Z3 )2 ]
≥ (E[E(X12 X23 |Z1 , Z3 )])2 = m22 .
However, equality occurs above only if E(X12 X23 |Z1 , Z3 ) is constant. But
E(X12 X23 |Z1 = i = Z3 ) =
1 2 Q−1 2
α +
β ,
Q
Q
E(X12 X23 |Z1 = i 6= j = Z3 ) =
2
Q−2 2
αβ +
β ,
Q
Q
so the difference of these expectations is (α − β)2 /Q 6= 0. Thus m41 > m22 .
A similar argument that m2 ≥ m21 was given in the proof of Theorem 4, so
the claim is established.
24
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
5.4. Proofs for the continuous parametric model
Proof of Theorem 12. With p̄q` = 1 − pq` , the distribution of (Xij , Xik , Xjk ) is
given by the mixture
X
πq π` πm [p̄q` δ0 (Xij )+pq` F (Xij , θq` )]×[p̄qm δ0 (Xik )+pqm F (Xik , θqm )]
1≤q,`,m≤Q
× [p̄`m δ0 (Xjk ) + p`m F (Xjk , θ`m )]. (10)
Since the distributions F (·, θ) have no point masses at 0 by Assumption 2,
the family M ∪ {δ0 } has identifiable parameters for finite mixtures, so Theorem
1 of Teicher (1967) applies to it. Thus multiplying out the terms of the mixture
in (10) to view it as a mixture of products from M ∪ {δ0 }, and noting that by
Assumption 1 certain of the components arise from unique choices of q, `, m we
can identify the terms of the form
πq π` πm pq` pqm p`m F (Xij , θq` )F (Xik , θqm )F (Xjk , θ`m ),
and the vectors in
C = {(πq π` πm pq` pqm p`m ; θq` , θqm , θ`m ) | 1 ≤ q, `, m ≤ Q},
but only as an unordered set. But by Assumption 1, there are only Q vectors
in this set for which the last entries (θq` , θqm , θ`m ) are all equal. Indeed, these
entries are of the form (θqq , θqq , θqq ) for some 1 ≤ q ≤ Q, since the case where
these entries would be of the form (θq` , θq` , θq` ) for some q 6= ` is not possible.
Thus the θqq for 1 ≤ q ≤ Q may be identified as well as the corresponding
weights (πq pqq )3 , or equivalently the values πq pqq .
Now, among the vectors in C, exactly 3Q(Q − 1) of them have two of the last
three entries equal. These entries are, up to order, of the form (θqq , θq` , θq` ), for
any q 6= `. Thus we obtain the set {(πq2 π` p2q` pqq ; θqq , θq` , θq` )}1≤q<`≤Q , without
regard to order. Since we already identified the pairs (πq pqq , θqq ), we may take
the ratio between the weights πq2 π` p2q` pqq and πq pqq to recover the values πq π` p2q` .
Thus we identify the set {(πq π` p2q` ; θqq , θq` , θq` )}1≤q<`≤Q .
Among these vectors, we can match the ones whose two last entries are equal,
namely those of the form (πq π` p2q` ; θqq , θq` , θq` ) with (πq π` p2q` ; θ`` , θq` , θq` ). This
enables us to recover the values θq` , for 1 ≤ q, ` ≤ Q.
By marginalizing the distribution of (Xij , Xik , Xjk ), we also have the distribution of a single edge variable Xij ,
X
πq π` [p̄q` δ0 (Xij ) + pq` F (Xij , θq` )].
(11)
1≤q,`≤Q
and thus by our hypotheses can also identify {(πq π` pq` , θq` )}1≤q≤`≤Q , without
order. But as the θq` have already been identified, we may use this to match
πq π` pq` with πq π` p2q` and thus recover pq` from the ratio. From πq pqq and pqq
we can then recover πq .
Thus, all parameters of the model are identified, up to permutation on the
group labels.
25
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Proof of Theorem 13. From the distribution of K3 , we can distinguish (α, θin )
from (β, θout ) as follows: The distribution of K3 is the mixture of either 4 (when
Q = 2) or 5 (when Q ≥ 3) different 3-dimensional components. Since the distributions F (·, θ) do not have point masses at 0 by Assumption 2, we can identify
from this mixture that part with no such Dirac masses in it, which is the mixture
α3
Q
X
πq3 F (·, θin ) ⊗ F (·, θin ) ⊗ F (·, θin )
q=1
+ αβ 2
+ αβ
2
+ αβ
2
πq2 π` F (·, θin ) ⊗ F (·, θout ) ⊗ F (·, θout )
X
1≤q6=`≤Q
πq2 π` F (·, θout ) ⊗ F (·, θin ) ⊗ F (·, θout )
X
1≤q6=`≤Q
πq2 π` F (·, θout ) ⊗ F (·, θout ) ⊗ F (·, θin )
X
1≤q6=`≤Q
+ β3
X
πq π` πm F (·, θout ) ⊗ F (·, θout ) ⊗ F (·, θout ),
q,`,m distinct
where the last term appears only when Q ≥ 3.
By Theorem 1 of Teicher (1967) and Assumption 2, this 3-dimensional mixture has identifiable parameters, up to label swapping issues. At most two terms
in this mixture have the same measure F in each coordinate. The three remaining terms have two coordinates which are equal, involving θout , and one different,
involving θin . Thus we can distinguish
between θin and θout .
P
We may also determine α3 ( q πq3 ) as the weight of F (·, θin ) ⊗ F (·, θin ) ⊗
F (·, θin ). Similarly from the δ0 ⊗ F (·,Pθin ) ⊗ F (·, θin ) term in the full mixture, we
may recover the weight (1 − α)α2 ( q πq3 ). Summing these two weights yields
P
α2 ( q πq3 ), and then dividing the first by this, we recover α.
The parameter β is similarly recovered from the weights of F (·, θout )⊗F (·, θout )⊗
F (·, θin ) and δ0 ⊗ F (·, θout ) ⊗ F (·, θin ).
Next we consider
the distribution of Kn for various n. This is a mixture of
many different n2 -dimensional components. As above, we can identify up to
label swapping the components with no δ0 factors in this mixture. But as we
already know the value of θin , we can identify the term
1≤i<j≤n F (Xij , θin )
P ⊗
n
n
in this mixture, and thus its corresponding prior
α
π
.
q q Since α has been
P
previously identified, this uniquely determines q πqn . Note that using the distribution of KQ , P
we can obtain the distribution of each Kn with n ≤ Q and
thus the values { q πqn }n≤Q .
By the Newton identities, these values determine the values of elementary
symmetric polynomials {σn (π1 , . . . , πQ )}n≤Q . These, in turn, are (up to sign)
the coefficients of the monic polynomial whose roots (with multiplicities) are
precisely {πq }1≤q≤Q . Thus the node priors are determined, up to order.
26
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
5.5. Proof of Theorem 14
The proof follows the strategy described in Section 5.1. We thus proceed with
a base case, an extension step, and a conclusion.
Base case We consider a subset E of the set of all edges over m vertices,
with m and E to be chosen later. Let A be the Qm × κ|E| matrix containing
the probabilities of the clumped random variable Y = (Xe )e∈E with state space
{1, . . . , κ}|E| , conditional on the hidden states of the m vertices.
Let I ∈ {1, . . . , Q}m be a vector specifying particular states of all the node
variables. For each edge e ∈ E, the endpoints are in some set of hidden states
{q, l}, which we denote by I(e). The (I, (xe )e∈E )-entry of the matrix A is then
given by
κ
YY
(pI(e) (k))1xe =k ,
e∈E k=1
where 1A is the indicator function for a set A.
For each edge e in the graph, we introduce κ indeterminates, te,1 , . . . , te,κ .
We create a κ|E| -element column vector t indexed by the states of the clumped
variable Y , whose (xe )e∈E -th entry is given by
κ
YY
1
xe =k
te,k
.
e∈E k=1
Then the Ith entry of the Qm -entry vector At is the polynomial function
fI =
κ
X YY
{pI(e) (k)te,k }1xe =k =
(xe )e∈E e∈E k=1
Y
pI(e) (1)te,1 + · · · + pI(e) (κ)te,κ .
e∈E
Independence of the rows of A is equivalent to the independence of the polynomials {fI }I∈{1,...,Q}m . Thus, suppose that we have
X
aI fI ≡ 0,
(12)
I
and let us show then that every aI must be 0.
For a specific e ∈ E, and any choice {q, l} with 1 ≤ q ≤ l ≤ Q, one can choose a
point te,{q,l} = (te,1 , . . . , te,κ ) ∈ Rκ in the zero set of all the polynomial functions
fI in (12), except those with I(e) = {q, l}. To see this, let M be the Q+1
×κ
2
matrix whose {q, l}th row is given by the vector pql = (pql (1), . . . , pql (κ)). M
has full row rank since its rows are independent by assumption. Thus there is a
solution te,{q,l} to
M te,{q,l} = e{q,l} ,
where e{q,l} is the vector of size Q+1
with zero entries, except the {q, l}th
2
which is equal to 1. The independence assumption also implies κ ≥ Q+1
.
2
27
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Note that in this construction we have only specified group assignments to
two nodes up to node permutation. Thus if the {q, l} row of M is related to an
edge e = (i, j) because I(e) = {q, l}, we may have that either i is in state q and
j is in state l, or i is in state l and j is in state q.
By evaluating the fI at te,{q,l} for many edges e and choices of node states
{q, l}, we can annihilate all the polynomials fI except those satisfying specific
constraints on the node states. More precisely, we can make vanish all the fI
except those for which I satisfies the condition that for some subset of edges
E 0 ⊆ E and some sequence of unordered node assignments ({qe , le })e∈E 0 we have
\
I∈
S(e; {qe , le }),
(13)
e∈E 0
where S (e; {qe , le }) = {I ∈ {1, . . . , Q}m | I(e) = {qe , le }}.
To conclude that each aI = 0 in equation (12), it is enough to construct for
every I ∈ {1, . . . , Q}m a set as in (13) containing only I.
In fact, this can be achieved with only m = 3 vertices and the full set of edges
E = {(1, 2), (1, 3), (2, 3)}. Indeed, up to permutation of the nodes and of the labels of the groups, I can take only three different values, namely (1, 1, 1), (1, 1, 2)
and (1, 2, 3). Using a node assignment on the edges in E 0 = {(1, 2), (2, 3)}, we
get
{(1, 1, 1)}
= S ((1, 2); {1, 1}) ∩ S((2, 3); {1, 1})
{(1, 1, 2)}
= S ((1, 2); {1, 1}) ∩ S((2, 3); {1, 2})
{(1, 2, 3)}
= S ((1, 2); {1, 2}) ∩ S((2, 3); {2, 3}) .
Thus, we proved the following lemma.
Lemma 21. With E the complete set of edges over m = 3 vertices, the Q3 × κ3
matrix A containing the probabilities of the clumped variable Y = (Xe )e∈E ,
conditional on the hidden states Z = (Z1 , Z2 , Z3 ) ∈ {1, . . . , Q}3 has full row
rank Q3 , provided the κ-entry vectors {pql }1≤q≤l≤Q are linearly independent.
Conclusion of the proof The Lemma provides the base case, with the extension step of Section 5.1 then applying. Thus with n = m2 = 9 nodes, Kruskal’s
Theorem may be applied to identify, up to simultaneous row permutation, v,
M1 , M2 , and M3 as defined in that section.
The rest of the proof follows the same lines as the conclusion in the proof of
Theorem 2, replacing the numbers pql by the vectors pql and noting that these
vectors are assumed to be linearly independent.
5.6. Proof of Theorem 15
For convenience, we present the argument assuming the state space of the µql
is a subset of R. The more general situation of a multidimensional state space
can be handled similarly, along the lines of the proof of Theorem 9 of Allman
et al. (2009).
28
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
Let Mql denote the c.d.f. of µql = (1 − pql )δ0 + pql Fql . Since the measures
{µql | 1 ≤ q ≤ l ≤ Q} are assumed to be linearly independent, so are the
functions {Mql | 1 ≤ q ≤ l ≤ Q}. Applying Lemma 17 of Allman et al. (2009) to
this set of functions, there exists some κ ∈ N and cutpoints u1 < u2 < · · · < uκ−1
such that the vectors
{(Mql (u1 ), Mql (u2 ), . . . , Mql (uκ−1 ), 1) | 1 ≤ q ≤ l ≤ Q}
are independent. Note κ ≥ Q+1
. Also by adding additional cutpoints if nec2
essary, and thereby increasing κ, we may assume that among the ui are any
specific real numbers we like.
The independence of the above vectors is equivalent to the independence of
the vectors {M̄ql | 1 ≤ q ≤ l ≤ Q}, where
M̄ql = (Mql (u1 ), Mql (u2 ) − Mql (u1 ), . . . , Mql (uκ−1 ) − Mql (uκ−2 ), 1 − Mql (uκ−1 )) .
Note that the kth entry of M̄ql is simply the probability that a variable with
distribution µql takes values in the intervals Ik = (uk−1 , uk ] (with the convention
that u0 = −∞, uκ = ∞). To formalize this, let
Yij =
κ
X
k1Ik (Xij )
k=1
be the random variable with state space {1, 2, . . . , κ} indicating the interval in
which the value of Xij lies. Thus, conditional on Zi = q, Zj = l, the random
variables Xij and Yij have respective c.d.f.s Mql and M̄ql .
Now from the distribution of the continuous random graph mixture model
on K9 , with edge variables (Xij )Q
1≤i<j≤9 , by binning the values of the 36 edge
variables into sets of the form 1≤i<j≤9 Ikij with 1 ≤ kij ≤ κ, we obtain
the distribution for the discrete edge variables (Yij )1≤i<j≤9 of a random graph
mixture model with the same group priors on the nodes, and with mixture
components built from the distributions M̄ql associated to µql . By Theorem
14, the parameters of the discrete model are identifiable, up to label swapping.
Imposing an arbitrary labeling, we have identified the node group priors πq ,
1 ≤ q ≤ Q, and for each pair of groups q ≤ l the vector M̄ql . By summing
entries of M̄ql , we obtain values of Mql (uk ) for k = 1, 2, . . . , κ − 1. Since we
may additionally determine Mql (t) for any real number t by including it as a
cutpoint, Mql , and hence µql , is uniquely determined.
6. Acknowledgements
The authors thank the Statistical and Applied Mathematical Sciences Institute
for their support during residencies in which some of this work was undertaken.
ESA and JAR also thank the Laboratoire Statistique et Génome for its hospitality. JAR additionally thanks Université d’Évry Val d’Essonne for a Visiting
Professorship during which this work was completed. ESA and JAR received
29
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
support from the National Science Foundation, grant DMS 0714830, while CM
has been supported by the French Agence Nationale de la Recherche under grant
NeMo ANR-08-BLAN-0304-01.
References
Airoldi, E., Blei, D., Fienberg, S., Xing, E., 2008. Mixed-membership stochastic
blockmodels. Journal of Machine Learning Research 9, 1981–2014.
Allman, E., Matias, C., Rhodes, J., 2009. Identifiability of parameters in latent
structure models with many observed variables. Ann. Statist. 37 (6A), 3099–
3132.
Ambroise, C., Matias, C., 2010. New consistent and asymptotically normal estimators for random graph mixture models. Tech. rep., arXiv:1003.5165.
Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture of complex weighted networks. PNAS 101 (11), 3747–3752.
Berge, C., 1976. Graphs and hypergraphs. Translated by Edward Minieka. 2nd
rev. ed. North-Holland Mathematical Library. Vol. 6. Amsterdam - Oxford:
North- Holland Publishing Company; New York:American Elsevier Publishing.
Carreira-Perpiñán, M., Renals, S., 2000. Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Comp. 12 (1), 141–152.
Cox, D., Little, J., O’Shea, D., 1997. Ideals, varieties, and algorithms, 2nd Edition. Springer-Verlag, New York.
Daudin, J.-J., Picard, F., Robin, S., 2008. A mixture model for random graphs.
Statist. Comput. 18 (2), 173–183.
Daudin, J.-J., Pierre, L., Vacher, C., 2010. Model for heterogeneous random
networks using continuous latent variables and an application to a tree-fungus
network. Biometrics, to appear.
Erdős, P., Gallai, T., 1961. Graphs with points of prescribed degree. (Graphen
mit Punkten vorgeschriebenen Grades.). Mat. Lapok 11, 264–274.
Erdős, P., Rényi, A., 1959. On random graphs. I. Publ. Math. Debrecen 6, 290–
297.
Frank, O., Harary, F., 1982. Cluster inference by using transitivity indices in
empirical graphs. J. Amer. Statist. Assoc. 77 (380), 835–840.
Gyllenberg, M., Koski, T., Reilink, E., Verlaan, M., 1994. Nonuniqueness in
probabilistic numerical identification of bacteria. J. Appl. Probab. 31 (2),
542–548.
Handcock, M., Raftery, A., Tantrum, J., 2007. Model-based clustering for social
networks. J. Roy. Statist. Soc. Ser. A 170 (2), 301–354.
Holland, P., Laskey, K., Leinhardt, S., 1983. Stochastic blockmodels: some first
steps. Social networks 5, 109–137.
Kruskal, J., 1976. More factors than subjects, tests and treatments: an indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41 (3), 281–293.
Kruskal, J., 1977. Three-way arrays: rank and uniqueness of trilinear decomposi30
SSB - RR No. 29
Allman, E.S., Matias, C. and Rhodes, J.A.
tions, with application to arithmetic complexity and statistics. Linear Algebra
and Appl. 18 (2), 95–138.
Latouche, P., Birmelé, E., Ambroise, C., 2009. Overlapping stochastic block
models. Tech. rep., arXiv:0910.2098.
Mariadassou, M., Robin, S., 2010. Uncovering latent structure in valued graphs:
a variational approach. Annals of Applied Statistics, to appear.
McLachlan, G., Peel, D., 2000. Finite mixture models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley-Interscience,
New York.
Newman, M. E. J., 2003. The structure and function of complex networks. SIAM
Rev. 45 (2), 167–256 (electronic).
Newman, M. E. J., 2004. Analysis of weighted networks. Phys. Rev. E 70,
056131.
Newman, M. E. J., Leicht, E. A., 2007. Mixture models and exploratory analysis
in networks. PNAS 104 (23), 9564–9569.
Nowicki, K., Snijders, T., 2001. Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 (455), 1077–1087.
Petrie, T., 1969. Probabilistic functions of finite state Markov chains. Ann.
Math. Statist 40, 97–115.
Picard, F., Miele, V., Daudin, J.-J., Cottret, L., Robin, S., 2009. Deciphering
the connectivity structure of biological networks using MixNet. BMC Bioinformatics 10, 1–11.
Rhodes, J., 2010. A concise proof of Kruskal’s theorem on tensor decomposition.
Linear Algebra and its Applications 432 (7), 1818–1824.
Snijders, T., Nowicki, K., 1997. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 (1), 75–100.
Tallberg, C., 2005. A Bayesian approach to modeling stochastic blockstructures
with covariates. Journal of Mathematical Sociology 29 (1), 1–23.
Teicher, H., 1961. Identifiability of mixtures. Ann. Math. Statist. 32, 244–248.
Teicher, H., 1963. Identifiability of finite mixtures. Ann. Math. Statist. 34, 1265–
1269.
Teicher, H., 1967. Identifiability of mixtures of product measures. Ann. Math.
Statist 38, 1300–1302.
Tomasi, G., Bro, R., 2006. A comparison of algorithms for fitting the PARAFAC
model. Comput. Statist. Data Anal. 50 (7), 1700–1734.
White, H., Boorman, S., Breiger, R., 1976. Social structure from multiple networks i: Blockmodels of roles and positions. American Journal of Sociology
81, 730–779.
Zanghi, H., Ambroise, C., Miele, V., 2008. Fast online graph clustering via Erdős
Rényi mixture. Pattern Recognition 41 (12), 3592–3599.
Zanghi, H., Picard, F., Miele, V., Ambroise, C., 2010. Strategies for online inference of network mixture. Annals of Applied Statistics, to appear.
31
© Copyright 2026 Paperzz