The order-chaos phase transition for a general class of complex

THE ORDER-CHAOS PHASE TRANSITION FOR A GENERAL CLASS OF
COMPLEX BOOLEAN NETWORKS
By Shirshendu Chatterjee∗
New York University∗
We consider a model for heterogeneous gene regulatory networks
that is a generalization of the model proposed by Chatterjee and
Durrett [6] as an “annealed approximation” of Kauffmann’s [16] random Boolean networks. In this model, genes are represented by the
vertices of a random directed graph Gn on n vertices having specified
in-degree distribution pin (resp. joint distribution pin,out of in-degree
and out-degree), and the expression bias (the expected fraction of 1’s
in the Boolean functions) p is same for all vertices. Following [6] and
a standard practice in the physics literature, we use a discrete-time
threshold contact process with parameter q = 2p(1 − p) (in which vertices are either occupied or vacant, a vertex with at least one (resp. no)
occupied input at time t will be occupied at time t + 1 with probability q (resp. 0)) on Gn to approximate the dynamics of the Boolean
network. We find a parameter r, which has an explicit expression in
terms of certain moments of pin (resp. pin,out ), such that, with probability tending to 1 as n goes to infinity, if r·2p(1−p) > 1, then starting
from all occupied sites the threshold contact process on Gn maintains
a positive (quasi-stationary) density π(pin ) (resp. π(pin,out )) of occupied sites till time which is exponential in n, whereas if r·2p(1−p) < 1,
then the persistence time of the threshold contact process on Gn is at
most logarithmic in n. These two phases correspond to the chaotic
and ordered behavior of the gene networks respectively.
1. Introduction. Experimental evidence [2, 25, 29] suggests that in various biological systems, the complex kinetics of genetic control is reasonably well approximated by Boolean network
models. This kind of models were first formulated by Kauffman [16]. Over the last few years, these
models have received significant attention, both at the level of model formulations and numerical
simulations (see e.g. the surveys [14, 15, 17, 23, 27] and the references therein) and at the rigorous
level (see e.g. [6, 11]). The class of models can be described as follows. Genes are represented by
the vertices of a directed graph Gn = Gn ([n], En ) on n vertices, where [n] := {1, 2, . . . , n} denotes
the vertex set and En denotes the edge set of Gn . The state ηt (x) of a vertex x ∈ [n] at time
t = 0, 1, 2, . . . is either 1 (‘on’) or 0 (‘off’), and each vertex x receives input from the vertices
which point to it in Gn , namely
Y x := {y ∈ [n] : hy, xi ∈ En }.
Y x is called the input set and its members are called the input vertices for x. The states
{ηt (x)}t>0,x∈[n] evolve according to the update rule
(1.1)
ηt+1 (x) = fx ((ηt (y), y ∈ Y x )),
x ∈ [n],
AMS 2000 subject classifications: Primary 60K35, 05C80; secondary 06E30, 82B26.
Keywords and phrases: random graphs, inhomogeneous graphs, directed networks, Boolean networks, threshold
contact process, phase transition, gene regulatory networks.
1
2
S. CHATTERJEE
x
where each fx : {0, 1}|Y | → {0, 1} is some time-independent Boolean function defined on the
set of states of the input vertices for x. Here and later we use |A| to denote the size of a set A.
In order to understand general properties of such dynamical systems, various random Boolean
network models have been formulated, which form an important subfamily of these models.
The simplest such model with parameters r (number of inputs per vertex) and p (expression
bias), which we denote by RBN0r,p , consists of the following specification of the constructs in
the above general model. The ground graph Gn is constructed by letting the input set Y x
consist of r distinct vertices Y1 (x), . . . , Yr (x), which are uniformly from [n] \ {x}. The values
fx (v), x ∈ [n], v ∈ {0, 1}r , are assigned independently and each equals 1 with probability p.
The dynamics then proceeds as in (1.1) from a specified starting configuration {η0 (x) : x ∈ [n]}
at time t = 0. Note that the functions fx and the graph Gn are fixed at time 0 and then the
dynamics of this system is deterministic.
Kauffman introduced RBN0r,1/2 in [16], and that model has been analyzed in detail for r = 1
[11]. The general model RBN0r,p has been studied extensively via simulations (see e.g. [15, 24])
and using heuristics from Statistical Physics (see e.g., [8, 9]). It has been argued in [7] that the
behavior of RBN0r,p undergo a phase transition for r > 3, and
(1.2)
the order-chaos phase transition curve for RBN0r,p is given by 2p(1 − p) · r = 1
in the sense that RBN0r,p is “ordered” (the configuration of zeros and ones rapidly converges to
a fixed point or attractor) for 2p(1 − p) · r < 1, whereas RBN0r,p is “chaotic” (the configuration
keeps on changing for an exponentially long time) when 2p(1 − p) · r > 1. This phase transition
picture has recently been proved for an “annealed approximation” of the deterministic dynamical
system [6, 20], which we shall describe soon.
Although RBN0r,p has been studied extensively, the model deals with an idealized setting of
homogeneous graphs, where every vertex has the same in-degree. This assumption constrains the
model in the context of biological applications where such networks have quite heterogeneous
degrees [1]. So, the obvious natural questions are that if the update rule is similar to that for
RBN0r,p (namely, given the input sets {Y x : x ∈ [n]}, (1.1) holds for the values {fx (v) : v ∈
{0, 1}|Yx | , x ∈ [n]} chosen via independent coin flips such that each value is 1 with probability
p), then whether similar phase transition occurs in case of heterogeneous and more complex
ground graphs, and if it occurs, how do the corresponding phase transition curves depend on
the underlying parameters describing the properties of the ground graphs.
In the context of gene networks, heterogeneous graphs have been considered in the physics
and biology literature. Mainly two classes of such graph models have been formulated.
(i) Graphs with prescribed in-degree: A number of authors (see e.g., [3, 12, 19, 26]) have
considered directed graph models with prescribed
in-degree distribution. Here, one starts
in in
with a probability mass function p := pk k≥1 and chooses the ground graph uniformly
at random from the collection of all simple directed graphs for which the proportion of
vertices having in-degree k is asymptotically pin
k for all k ≥ 1. The precise method of
construction is described in Section 1.2. We denote the associated random Boolean network
model, where the ground graph has in-degree distribution pin and the update rule is similar
to that of RBN0 with expression bias p, by RBN1 (pin , p). It has been argued in the above
3
COMPLEX BOOLEAN NETWORKS
papers that
(1.3)
the order-chaos phase transition curve for RBN1 (pin , p) is 2p(1 − p) · rin = 1,
X
where rin :=
k · pin
k is the average in-degree.
k
Remark 1.1. An analogous model can be built, where one starts with the out-degree
distribution pout = {pout
k }k>0 and the ground graph is chosen uniformly at random from
the collection of all directed graphs for which the proportion of vertices having out-degree
k is asymptotically pout
for all k ≥ 0. In that case,Pthe associated in-degree distribution
k
turns out to be asymptotically Poisson with mean k kpout
k , which means that a positive
fraction of vertices will have no input vertex for them. For these vertices the update rule
cannot be defined. So we avoid this particular model. We didn’t find occurrence of this
model in the physics and biology model as well.
(ii) Graphs with prescribed joint distribution of in-degree and out-degree: Another
class of models have been considered in order to incorporate correlation between the indegree and out-degree via prescribing their joint distribution [18]. These models are more
general and complex in nature. Here, one starts with a bivariate probability mass function
pin,out = {pin,out
}k>1,l>0 and chooses a graph uniformly at random from the collection of
k,l
all graphs for which the asymptotic density of vertices having in-degree k and out-degree l
. We defer a complete description of the construction to Section 1.2. We denote the
is pin,out
k,l
analogue of RBN1 with joint distribution pin,out and expression bias p by RBN2 (pin,out , p).
It is argued non-rigorously in [18] that for this model
the order-chaos phase transition curve for RBN2 (pin,out , p) is 2p(1 − p) ·
X
X
in
(1.4) where rin,out :=
.
,
r
:=
k · pin,out
k · l · pin,out
k,l
k,l
k,l
rin,out
= 1,
rin
k,l
One of our aims is to prove (1.3) and (1.4) for a certain approximation of the dynamics.
1.1. Annealed approximations to Boolean networks. Proving rigorous results about the formulated discrete dynamical systems turns out to be quite hard. In order to understand the
conjectured phase transitions in (1.3) and (1.4) rigorously, we consider a different process called
the threshold contact process. To motivate this process and explain its connection to the Boolean
network model {ηt (x) : x ∈ Gn , t ≥ 0}, first consider the process {ζt (x), x ∈ [n], t > 1}, where
ζt (x) = 1 if ηt (x) 6= ηt−1 (x) and ζt (x) = 0 otherwise. Fix vertex x ∈ [n]. Suppose at least one
of the inputs y ∈ Y x changes its state between time epochs t − 1 and t so that ηt (y) 6= ηt−1 (y).
Then the state of vertex x at time t+1 is computed by looking at a different entry of fx . Ignoring
the fact that we may have used this entry before, one approximately has
P(ζt+1 (x) = 1|ζt (y) = 1 for at least one y ∈ Y x ) = 2p(1 − p).
If ζt (y) = 0 for all y ∈ Y x , then obviously ζt+1 (x) = 0. This dynamics motivates to consider the
following process, which will be the main subject of this study.
4
S. CHATTERJEE
Definition 1.2. The discrete-time threshold contact process with parameter q ∈ (0, 1) on a
directed graph having vertex set [n], where Y x denotes the set of input vertices for x ∈ [n], is the
Markov process {ξt (x) : x ∈ [n], t ≥ 0} on {0, 1}n for which the evolution dynamics is
P ( ξt+1 (x) = 1| ξt (y) = 1 for at least one y ∈ Y x ) = q, and
(1.5)
P ( ξt+1 (x) = 0| ξt (y) = 0 for all y ∈ Y x ) = 1.
Conditional on the state at time t, the decisions on the values of ξt+1 (x), x ∈ [n], are independent.
This process has been called the annealed approximation to the random Boolean network
model [7]. For the remainder of the paper we will write q = 2p(1 − p). For the threshold contact
process, we shall prove that the conjectures in (1.3) and (1.4) do represent the order-chaos phase
transition for the dynamics.
1.2. Construction of random directed ground graphs. In this section, we provide precise mathematical formulation of the ground graph models which are used for defining RBN1 and RBN2 .
1
1.2.1. Construction of the graph for RBN1 . For RBN
start with a prescribed in-degree
P , we
in
in
in
distribution p = {pk }k>1 having finite mean r := k k · pin
k < ∞. Following [6], we construct
the (directed) ground graph Gn = Gn ([n], En ) having in-degree distribution pin as follows. First,
we choose the in-degrees I1 , I2 , . . . , In independently with common distribution pin . Then, for
each vertex x ∈ [n] we construct the corresponding input set Y x := {Y1 (x), Y2 (x), . . . , YIx (x)}
by choosing |Ix | many distinct vertices uniformly from [n] \ {x}. Finally we place oriented edges
from these chosen vertices to x to obtain the edge set
En := {hYi (x), xi : x ∈ [n], 1 6 i 6 Ix }
of the graph Gn . In other words, if we write P1,I for the conditional (“quenched”) distribution of
Gn given the in-degree sequence I = (I1 , I2 , . . . , In ) and P1,n for the unconditional (“annealed”)
distribution of Gn , and if Ezx denotes the number of directed edges from vertex z to x, then
X
(1.6)
P1,n (·) :=
P1,I (·) · pin
⊗n (I),
I∈Nn
n
in
where pin
⊗n is the product measure on N with marginal p , and
, n n
X
Y n − 1
(1.7) P1,I (Ezx = ezx , z, x ∈ [n]) = 1
if ez,x ∈ {0, 1}, ex,x = 0,
ez,x = Ix ,
Ii
i=1
z=1
for all x ∈ [n] and P1,I (Ez,x = ez,x , 1 6 z, x 6 n) = 0 otherwise. Thus, under P1,I Gn is
distributed uniformly over all simple directed graphs having in-degree sequence I.
1.2.2. Construction of the graph for RBN2 . We follow the construction procedure of Newman, Strogatz and Watts [21, 22] to obtain the configuration model. We start with a prescribed joint distribution for in-degree and out-degree pin,out = {pin,out
}k>1,l>0 such that
k,l
P
P
P
in,out
in,out
in,out
rin,out := k,l k · l · pk,l < ∞ and rin := k,l k · pk,l = k,l l · pk,l =: rout . It is easy to see
that the later condition is necessary for pin,out to be eligible for our purpose. Let {(Ii , Oi )}ni=1
5
COMPLEX BOOLEAN NETWORKS
be i.i.d. with common distribution pin,out ; here Ix and Ox denote the in-degree and out-degree
of vertex x respectively. We need to condition on the event
( n
)
n
X
X
(1.8)
En :=
Ii =
Oi
i=1
i=1
in order to have a valid degree sequence. Having chosen the degree sequence (I, O) =
((I1 , O1 ), . . . , (In , On )), we allocate Ix many “inward arrows” and Ox many “outward arrows”
for vertex x. Then we pick a uniform random matching between the sets of inward and outward
arrows. If one of the inward arrows of x is matched with one of the outward arrows of z, then
we let hz, xi ∈ En . We will use P2,I,O to denote the conditional (“quenched”) distribution of Gn
given the in-degree and out-degree sequence (I, O). We also condition on the event
(1.9)
Fn := {Gn is simple},
i.e., it neither contains any self-loop at some vertex, nor contains multiple edges between two
vertices. So if P2,n denotes the unconditional (“annealed”) distribution of Gn , then
X
(1.10)
P2,n (·) =
P2,I,O (·|Fn ) · pin,out
⊗n ((I, O)|En ),
I,O∈Nn
where pin,out
is the product measure on (N2 )n with marginal pin,out .
⊗n
Remark 1.3. Given the prescribed in-degree (resp. out-degree) distribution pin (resp. pout )
having finite mean, one can construct a graph Gn by following the construction of Gn for RBN2
(see Section 1.2.2) corresponding to any pin,out with marginal pin (resp. pout ). But the distribution of Gn will no longer be uniform over all candidates.
1.3. Dynamics. Once the ground graph has been fixed via one of the above construction
procedures, we will be interested in properties of the threshold contact process {ξt (x), x ∈ [n]}t≥0
as defined in Definition 1.2 with parameter q = 2p(1 − p) ∈ [0, 1/2]. We will often view this as
a set valued process {ξt }t>0 , where ξt := {x ∈ [n] : ξt (x) = 1}. We will sometimes refer to ξt as
the set of occupied sites and [n] \ ξt as the se of vacant sites at time t. We will write PGn ,q for
the distribution of the threshold contact process {ξt }t>0 with
parameter q, conditioned on the
ground graph Gn . For a fixed set A ⊆ [n], we will write ξtA t≥0 for the process started with
ξ0A = A.
1.4. Main results. As described in Section 1.2 and 1.3, there are two layers of randomness:
one corresponds to the distribution of the ground graph Gn , and the other corresponds to the
distribution of the threshold contact process conditioned on the ground graph Gn .
We need another ingredient to present our main results.For a probability distribution µ on
{0, 1, . . .} and q ∈ [0, 1] let π(µ, q) ∈ [0, 1] denote the survival probability for the branching
process with offspring distribution (1−q)δ 0 +qµ starting from one individual. From the branching
process theory,
(1.11)
π(µ, q) = 1 − θ,
where θ ∈ [0, 1] is the minimum value satisfying θ = 1 − q +
ingredients we now present our main result.
P
k>0 q
· µ{k} · θk . Using the above
6
S. CHATTERJEE
Theorem 1.4. For any probability distribution pin = {pin
k }k≥1 on N, let P1,n be the probability distribution (as defined in (1.6)) on the set of random directed simple graphs on n vertices
having in-degree distribution pin . Suppose pin has finite second moment, pin
1 = 0 and mean
rin > 2. Let 1/rin < q 6 1/2 and π := π(pin , q) > 0 be the branching process survival probability as defined in (1.11). For any ε > 0 there is a constant ∆(ε) > 0 and a ‘good’ set of
graphs Gn satisfying P1,n (Gn ) = 1 − o(1) such that if Gn ∈ Gn and {tn } is any sequence with
limn→∞ tn = ∞, then
!
1 [n] 1 [n] PGn ,q
inf
sup
ξt > π − ε,
ξt 6 π + ε = 1 − o(1).
t6exp(∆n) n
tn 6t6exp(∆n) n
[n]
Moreover, if q satisfies q ·rin < 1, then there is a constant C(q, rin ) > 0 such that PGn ,q (ξC log n 6=
∅) = o(1) for all Gn ∈ Gn .
The above theorem proves (1.3) for the threshold-contact process. It also proves exponential
persistence in the supercritical regime. Next we move to RBN2 .
}k≥1,l≥0 on N2 , let
Theorem 1.5. For any bivariate probability distribution pin,out = {pin,out
k,l
P2,n be the probability distribution (as defined in (1.10)) on the set of random directed simple
graphs on n vertices for which the joint distribution of in-degree and out-degree is pin,out . Suppose
= 0 whenever k ≤ 1, and the marginal distributions corresponding to pin,out have equal
pin,out
k,l
P
in,out
in −1
}k>2 be the size-biased
mean rin and finite second moment. Let p̃in = {p̃in
l l · pk,l
k := (r )
in
in
in
in-degree distribution with mean r̃ , 1/r̃ < q 6 1/2 and π := π(p̃ , q) > 0 be the branching
process survival probability as defined in (1.11). For any ε > 0 there is a constant ∆(ε) > 0
and a ‘good’ set of graphs Gn satisfying P2,n (Gn ) = 1 − o(1) such that if Gn ∈ Gn and tn is any
sequence satisfying tn 6 exp(∆n) and limn→∞ tn = ∞, then
!
1 [n] 1 [n] PGn ,q
inf
sup
ξt > π − ε,
ξt 6 π + ε = 1 − o(1).
t6exp(∆n) n
tn 6t6exp(∆n) n
[n]
Moreover, if q satisfies qr̃in < 1, then there is a constant C(q, r̃in ) > 0 such that PGn ,q (ξC log n 6=
∅) = o(1) for all Gn ∈ Gn .
The above theorem proves (1.4) for the threshold-contact process. It also proves exponential
persistence for the supercritical regime in RBN2 .
1.5. Proof ideas. The subcritical case is relatively easier, as we can couple with a subcritical
branching process. Although there are technical challenges because of heterogeneity of the ground
graph, these have been tackled using a suitable second moment argument.
In the supercritical regime, there are two main aspects: estimate the probability of survival
of the dual starting from (i) a single vertex (ii) large initial sets. To tackle the first, we have
approximated the small neighborhoods of vertices within distance O(log log n) by appropriate
trees, and hence the survival probability becomes approximately that of a suitable branching
process (see Proposition 4.2).
The remaining part is the most challenging. Here we have shown that there is a ‘good’ set of
graphs (see Proposition 4.1) such that whenever the size of the occupied set in the dual process
(irrespective of its location) drops below εn · n, where εn · n ∈ [(log n)a , εn], then comparing
COMPLEX BOOLEAN NETWORKS
7
with the behavior of the dual process on a suitably truncated forest (see Lemma 5.1) the size
reaches above εn · n within time O(log log(1/εn ) with exponentially small error probability (see
Proposition 5.2). This result enables us to prove our main results.
1.6. Discussion. It is needless to say that one of the main challenges to prove prolonged
persistence in the supercritical regime for the random Boolean network models that we consider
is the heterogeneity of the ground graphs. The techniques used in [6], which are mostly based
on “isoperimetric inequalities” for uniformly chosen directed graphs with constant in-degree, is
not helpful for us. The approach adopted in [20] are also not applicable, as they need the same
constraint too. So, a new approach would be required to deal with the heterogeneity.
On the other hand, not much is understood about the roles of different initial subsets of
occupied sites in the prognosis of the dynamics beyond the level of the “first moment method”.
It is also not understood whether the dynamics avoids certain ‘bad’ configurations of occupied
sites from which survival of the dynamics is difficult when q is near the critical value, but stays
in the supercritical regime. So, we need to have a very good control (more than exponential)
over the probabilities of unlikely events, as the number of possible initial subsets of size m is
super-exponentially large in m.
Keeping these in mind, the technique used in this paper, namely coupling the dual process
with a suitably truncated branching process upto a certain time, seems to be the only effective
way to obtain exponential persistence in the supercritical regime.
However, this technique does not work that well in the critical regime where the parameters
p, rin and rin,out satisfy the equality in (1.3) and (1.4). Based on the behavior of critical contact
process on d-dimensional torus, one expects to have persistence of activity till some polynomial
time in this case.
1.7. Organization of the paper. The remainder of the paper is organized as follows. In Section
2, we describe the dual process for the threshold contact process, which will play a crucial role
in our argument, and give quantitative estimates for the error involved in approximating the
local neighborhoods of certain small subsets of vertices in the dual (edge-reversed) graph. Then,
in Section 3 we mention some more ingredient lemmas, which are used later. Section 4 contains
description of the set of ‘good’ graphs that appears in the theorems and proof ofs the facts that
their probabilities are 1 − o(1). Finally, in Section 5 we put all the ingredients together to have
the proofs of the main theorems.
2. Preliminaries. Before jumping into the core of the proof we need some preliminary
facts. We begin this section with the definition of the dual process for the threshold contact
process, which will play a major role in proving the main results.
We also collect
asymptotic
properties of local neighborhoods of the random graph models in RBNi : i = 1, 2 .
←
−
2.1. Dual coalescing branching process. For a given directed graph Gn = ([n], En ), let G n =
←
−
←
−
([n], E n ) be the dual (directed) graph obtained by reversing the edges, i.e., E n := {hx, yi :
←
−
hy, xi ∈ En }. We write x → z if hx, zi ∈ E n , and in that case we will occasionally refer to
←
−
z as a child of x in G n . Now for the threshold contact process {ξt : t ≥ 0} on the graph Gn ,
←
−
the dual process { ξ t : t > 0} is the following coalescing branching process on the dual graph
←
−
←
−
G n . For any t ≥ 0, each site of ξ t gives birth at time t independently with probability q. If
←
−
←
−
←
−
x ∈ ξ t gives birth, all of its children are included in ξ t+1 . In other words, every vertex ξ t
8
S. CHATTERJEE
←
−
gives birth with probability q independently across vertices, and z ∈ [n]nis includedoin ξ t+1
←
−
←
−
if there exists x ∈ ξ t which gives birth at time t and x → z. We write ξ B
t : t ≥ 0 for the
←
−B
coalescing branching process starting from ξ 0 = B.
It is easy to check [13] that the following duality relation holds. For any t ≥ 0 and for any
two sets A, B ⊆ [n] we have
←
−
(2.1)
P ξtA ∩ B 6= ∅ = P ξ B
t ∩ A 6= ∅ .
This will enable us to do suitable analysis for the dual process, and then carry the implications
forward to prove our main results about {ξt }.
←
−
We will need the following notation in the sequel. For the graph G n and U ⊂ [n] define
U ∗1 := {z ∈ [n] : x → z for some x ∈ U }.
2.2. Local neighborhoods. Next, we need to understand the structure of the neighborhood
←
−
of a small set of vertices of G n . The goal is to see whether the oriented neighborhood of a
typical small vertex set contains an oriented forest having offspring distribution is close to the
←
−
out-degree distribution for G n .
←
−
For A ⊂ [n] we let Z A
0 = A and for l > 1 let
←
−
←
−
ZA
l := {z ∈ [n] : there is an oriented path in G n of length l from some x ∈ A to z}.
Let {Km }nm=1 be a sequence of numbers, which will be specified later (see (4.4)). Using t to
←
−
denote disjoint union, we introduce the following coupling between the directed subgraph of G n
−
K|A| ←
A
A
A
A
A
induced by ∪i=0
ZA
i and a forest {Zt , 0 6 t 6 K|A| } along with partitions Zt = Ct t Ot t Rt ,
where CtA , OtA and RtA represent ‘closed’, ‘open’ and ‘removed’ sites at level t respectively. Let
A = {u1 , . . . , u|A| }. For the root level of the forest we choose Z0A = A and C0A = R0A = ∅, so
O0A = A. The sites in Z0A are labeled u1 , . . . , u|A| . For each t > 0, every site of OtA mimics the
←
−
A
corresponding vertex in Z A
t with same label, so a site of Ot having label u gives birth to Iu
many children at level t + 1. The new born sites at level t + 1 are assigned the same labels
←
−
1
2
in
in
following those of Z A
t+1 . Writing r = r (resp. r̃ ) in case of RBN (resp. RBN ), and letting
Π(l, A) denote the subset of A consisting of l ∧ |A| elements with minimum indices,
A in an increasing order of labels, and define
we scan the sites of Zt+1
A
A
A
Rt+1
:= Zt+1
\ Π(2r|OtA |, Zt+1
).
A \ RA , we say that a “collision” has occurred if its label either matches with
For a site in Zt+1
t+1
that of a site in ∪ts=0 OsA , or has already been found while scanning the sites of level t + 1. We
A . If collision does not occur at a site, we include that in O A .
include all of these sites in Ct+1
t+1
←
−
A
A
For u ∈ Zt , let u t ∈ A denote the label of the unique ancestor of u having level 0. For any
subset B ⊂ A and t > 1 let Z0A,B = O0A,B = B and
−A ∈ B}
ZtA,B := {u ∈ ZtA : ←
u
t
A,B A,B
CtA,B := Π 2r Ot−1
∩ CtA ,
, Zt
A,B A,B
OtA,B := Π 2r Ot−1
∩ OtA .
, Zt
9
COMPLEX BOOLEAN NETWORKS
←
−
SO, each site in OtA corresponds to a unique vertex in Z A
t with the same label. Note that this
←
−A
A
map from Ot to Z t may not be onto because of collisions and removal of sites.
←
−
The law of G n induces the law of {ZtA } along with its partitions. We identify these two
laws. Now our aim is to estimate the probability of collision, and then understand the offspring
distribution in the above forest. We write
(2.2) ϑ(m) := m + m
Km
X
(2r)l , and
l=1
n
n
1X
1X 2
Iz , In2 :=
Iz ,
I n :=
n
n
z=1
n
On :=
z=1
n
1X
1X 2
Oz , On2 :=
Oz ,
n
n
z=1
n
IOn :=
z=1
1X
Iz O z .
n
z=1
For η ∈ (0, 1) and any probability distribution µ, we define
Z η
µ← (1 − t) dt, where µ← (t) := inf{y ∈ R : µ((−∞, y]) > t} for t ∈ [0, 1].
(2.3) Γ(η, µ) :=
0
Recall from Section 1.2 that Ix denotes the in-degree of x and {Y1x , . . . , YIxx } denotes the set of
input vertices for x in Gn .
Lemma 2.1 (For RBN1 ). If pin has finite second moment, then there is a constant C2.1 > 0
and a set of in-degree sequences In ⊂ Nn such that pin
⊗n (In ) = 1 − o(1) and I ∈ In implies
(1)
(2)
P1,I (x ∈ CtA ) 6 2ϑ(|A|)/n for any A ⊂ [n], x ∈ ZtA \ RtA and 1 6 t 6 K|A|
h
i
P1,I [(Yix , x ∈ [n], 1 6 i 6 Ix ) ∈ ·] 6 C2.1 P1,I (Ỹix , x ∈ [n], 1 6 i 6 Ix ) ∈ · ,
where {Ỹix }x∈[n],i6Ix are i.i.d. with common distribution U nif orm([n]).
Proof. We take
n
o
In := max Iz 6 n3/4 , I n < c1 , In2 < c2 ,
z
where ci = 2
in
i in
k k pk . Obviously p⊗n (In ) =
PK| A| A A
t=0 |Zt \Rt | 6 ϑ(|A|). So if
P
1 − o(1).
I ∈ In , then it is easy to see from the construction
(1). Note that
of Gn under the law P1,I that for any 1 6 t 6 K|A| and x ∈ ZtA \ RtA ,
P1,I (x ∈ CtA ) 6
ϑ(|A|)
6 2ϑ(|A|)/n.
n − maxz Iz
(2). It is easy to see that
d
(Yix , 1 6 i 6 Ix , x ∈ [n]) =
Ỹix , 1 6 i 6 Ix , x ∈ [n] Ỹiz 6= Ỹjz 6= z∀z ∈ [n], i 6= j ,
so it suffices to show that I ∈ In implies P1,I (Ỹiz 6= Ỹjz 6= z∀z ∈ [n], 1 6 i 6= j 6 Iz ) > c for
some constant. Using the inequality 1 − x > e−2x for x > 0 small enough, if I ∈ In , then
!
Iz
Iz
n Y
n X
Y
X
P1,I (Ỹiz 6= Ỹjz 6= z∀z ∈ [n], 1 6 i 6= j 6 Iz ) =
(1−i/n) > exp −2
(i/n) = exp(−I n −In2 ) > e−c1 −c2
z=1 i=1
for large enough n.
z=1 i=1
10
S. CHATTERJEE
Lemma 2.2 (For RBN2 ).
If the marginal distributions pin and pout corresponding to
in,out
p
have finite second moment, then there is a constant C2.2 > 0 and a set of degree
sequences A ⊂ (N2 )n such that pin,out
⊗n (A |En ) = 1 − o(1) and for all (I, O) ∈ A ,
(1)
P2,I,O (x ∈ CtA ) 6 4Γ(2ε, pout )/(rin − 2ε) whenever A ⊂ [n] satisfies ϑ(|A|) 6 εn
for some ε > 0 small, x ∈ ZtA \ RtA and 1 6 t 6 K|A|
(2)
P2,I,O [ (Yix , 1 6 i 6 Ix , x ∈ [n]) ∈ ·| Fn ]
"
#
Ix
n X
X
x
6 C2.2 P2,I,O
1{Ỹ x =z} = Oz ∀z ∈ [n] ,
Ỹi , 1 6 i 6 Ix , x ∈ [n] ∈ ·
i
x=1 i=1
P
P
where {Ỹix } are i.i.d. with common distribution
z∈[n] Oz δ z /
x∈[n] Ox . Moreover, if
out
−α
out
(α−2)/(α−1)
pk ∼ ck
for some α > 2, then Γ(η, p ) ∼ cη
for some constant c > 0
which depends on α only.
If N = rn for some constant r and {Ỹi }N
i=1 are i.i.d. with common distribution
M ultinomial(1; α1 , α2 , . . . , αn ), then for any small ε > 0,
N
!
X
Ỹ1 , . . . , ỸεN ∈ · 1{Ỹi =z} = N αz ∀z ∈ [n]
(3)
P
i=1
6 (1 + o(1)) exp(−ε log(1 − ε)N )P Ỹ1 , . . . , Ỹεn ∈ · ,
!
N
X
Ỹ1 , . . . , Ỹεn ∈ · Ỹ1 , . . . , Ỹεn ∈ · (4)
1{Ỹi =z} = N αz ∀z ∈ [n] − P
P
i=1
TV
2
6 O(ε n) + o(1).
Proof. For ϑ(·) as in (2.2) and Γ(·, ·) as in (2.3) we take
( Pϑ(|A|)
)
O
n,n−i+1
An := En ∩ Pi=1
6 4Γ(2ε, pout )/(rin − 2ε) ,
z Oz − ϑ(|A|)
where On,1 6 On,2 6 · · · 6 On,n are the order statistics for O1 , . . . , On . In order to prove
in and pout have the same mean r in . So using
pin,out
⊗n (An |En ) = 1 − o(1), first recall that p
in,out
Chebyshev inequality, p⊗n (On 6 rin /2) = O(1/n), and hence
(2.4)
pin,out
On 6 rin /2 En = o(1),
⊗n
√
because pin,out
⊗n (En ) = O(1/ n) by the local central limit theorem.
Now we apply Theorem 1 of [28] for the function


for 0 6 t 6 1 − 2ε
0
J(t) = 1
for 1 − ε 6 t 6 1 .


(t − 1)/ε + 2 for 1 − 2ε 6 t 6 1 − ε
and the i.i.d. random variables O1 , . . . , On . Since pout has finite second moment, it can be
checked easily that the quantity given in (10) of [28], σ 2 (J, pout ) is finite. This together with
11
COMPLEX BOOLEAN NETWORKS
Theorem 4 of [28] implies
n
in,out
E⊗n
(Sn − µ)2 = O(1/n), where Sn :=
1X
J(i/(n + 1))On,i and µ :=
n
i=1
Z
1
J(t)(pout )← (t) dt
0
is as in (11) of [28]. Note that
Z
1
out ←
(p
1−ε
Z
1
) (t) dt 6 µ 6
(pout )← (t) dt, which means Γ(ε, pout ) 6 µ 6 Γ(2ε, pout ).
1−2ε
Consequently, using Chebyshev inequality
pin,out
⊗n (Sn > 2µ) = O(1/n), which implies


ϑ(|A|)
X
1

On,n−i+1 > 2Γ(2ε, pout ) = O(1/n), and consequently
pin,out
⊗n
n
i=1


ϑ(|A|)
X
in,out  1
out p⊗n
On,n−i+1 > 2Γ(2ε, p ) En  = o(1),
n
i=1
√
as pin,out
⊗n (En ) = O(1/ n). Combining the last display with (2.4) and noting that ϑ(|A|) 6 εn,
we see that pin,out
⊗n (A ) = 1 − o(1).
(1). It is easy to see from the construction of Gn that if we write the labels of the sites in
K|A| A
Zt \ RtA in an increasing order, then a collision can occur at the k-th site (in this ordering)
∪t=1
P|A|+k−1
P
with probability 6 i=1
On,n−i+1 /[ z Oz − k]. Therefore, for any 1 6 t 6 K|A| and x ∈
ZtA \ RtA
Pϑ(|A|)
On,n−i+1
A
P2,I,O (x ∈ Ct ) 6 Pi=1
z Oz − ϑ(|A|)
K
|A|
|ZtA \ RtA | 6 ϑ(|A|) − |A|. So the assertion follows from the definition of A .
as | ∪t=1
(2). We can imitate the argument of Theorem 3.1.2 of [10] to see that under P2,I,O the number of
self-loops and multiple edges are asymptotically independent, and P
both of them have asymptotic
, i, j ∈ {0, 1, 2}.
Poisson distribution whose means are functions of the moments k,l k i lj pin,out
k,l
So P2,I,O (Fn ) has a positive limit. This together with the fact that
(Yix , 1
d
6 i 6 Ix , x ∈ [n]) =
Ỹix , 1
!
Ix
n X
X
6 i 6 Ix , x ∈ [n]
1{Ỹ x =z} = Oz ∀z ∈ [n] .
i
x=1 i=1
gives the desired inequality.
(3). For a vector of positive integers y, we write Xi (y) for the number of components of y which
are i. We also write Ỹ = (Ỹ1 , . . . , ỸN ) and Ỹa:b = (Ỹa , Ỹa+1 , . . . , Ỹb ).
For any y ∈ [n]εN ,
P Ỹ1:εN = y Xz (Ỹ) = N αz ∀z ∈ [n]
P Xz (Ỹ(εN +1):N ) = N αz − Xz (y)∀z ∈ [n]
(2.5)
=
.
P Ỹ1:εN = y
P Xz (Ỹ ) = N αz ∀z ∈ [n]
12
S. CHATTERJEE
In order to bound the fraction in (2.5) recall that
(2.6)
n
!
X
Z1 , . . . , Z n Zi = N ,
d
M ultinomial(N ; α1 , . . . , αn ) =
i=1
n
where {Z
Pin}i=1 are independent and√Zi ∼ P oisson(N αi ). In that case, it is not hard to check
that P ( i=1 Zi = N ) = (1 + o(1))/ 2πN by Stirling’s formula. So, the ratio in (2.5) is
(2.7)
Y P (Yz = N αz − Xz (y))
√
(1 + o(1)) 1 − ε
,
P (Zz = N αz )
z∈[n]
where Yi ∼ P oison(N (1 − ε)αi ), Zi ∼ P oison(N αi ), i ∈ [n], and they are independent. The
expression in the last display equals
n
Y
√
(1 + o(1)) 1 − ε
z=1
(2.8)
e−(N αz (1−ε)) (N αz (1 − ε))(N αz −Xz (y))
(N αz )!
·
e−N αz (N αz )N αz
(N αz − Xz (y))!
z (y) n XY
Y
√
i−1
ε
N
.
= (1 + o(1)) 1 − ε[e (1 − ε)] exp (−log(1 − ε)εN )
1−
N αz
z=1 i=1
Using the inequality 1 − ε 6 e−ε we get the desired bound.
(4). Using the bound of part (3), the total variation distance between these two measures is
P Ỹ
=
y
X
(
Ỹ
)
=
N
α
∀z
∈
[n]
X
z
z
1:εN
1:N
1
P Ỹ1:εN = y − 1
2
P Ỹ1:εN = y
y∈[n]εN
6[1 + exp(−ε log(1 − ε)N )]P Ỹ1:εN ∈ Ac
P Ỹ
=
y
X
(
Ỹ
)
=
N
α
∀z
∈
[n]
z
z
1:εN
1:N
+ sup − 1
(2.9)
y∈A P Ỹ1:εN = y
for any set A. Now recall from (2.8) that
P Ỹ1:εN = y |Xz (ỹ) = N αz ∀z ∈ [n]
P Ỹ1:εN = y
(2.10)
z (y) n XY
Y
√
i−1
= (1 + o(1)) 1 − ε[eε (1 − ε)]N exp (−log(1 − ε)εN )
1−
N αz
z=1 i=1
Since eε > 1 − ε, the first term in the right hand side of (2.10) lies between 1 − ε2 N and 1,
whereas the second term lies between exp(ε2 N ) and exp(2ε2 N ) when ε > 0 is small. Also the
product term in (2.10) lies between 1 and
1−
z (y)
n XX
X
i−1
z=1 i=1
N αz
=1−
n
X
Xz (y)(Xz (y) − 1)
z=1
2N αz
.
13
COMPLEX BOOLEAN NETWORKS
Consequently, if we take
(
Aη :=
y ∈ [n]εN :
n
X
Xz (y)(Xz (y) − 1)
z=1
then
2N αz
)
< ηε2 N
,
P Ỹ
=
y
X
(
Ỹ
)
=
N
α
∀z
∈
[n]
z 1:N
z
1:εN
− 1 = O(ε2 N )
P Ỹ1:εN = y
whenever y ∈ Aη . So, in view of (2.9), it suffices to show that P (Ỹ1:εN ∈ Acη ) = O(exp(−CεN ))
for some constant C > 0 and for some suitable choice of η.
Note that the joint distribution of {Xz (Ỹ1:εN )}z∈[n] is M ultunomial(εN ; α1 , . . . , αn ), so using
(2.6) and local central limit theorem
!
n
X
√
Ŷi (Ŷi − 1)
c
2
P Ỹ1:εN ∈ Aη = (1 + o(1)) 2πN P
> ηε N ,
N αi
i=1
where {Ŷi }ni=1 are independent and Ŷi ∼ P oisson(εN αi ). Hence using standard large deviation
argument, the above probability is at most exp(−C(η)εN ) for some constant C(η) such that
C(η) > 0 when η is large enough. This completes the argument.
Remark 2.3. The assertions (3) and (4) of Lemma 2.2 is true if we replace the index set
{1, 2, . . . , εN } by (possibly random) {i1 , i2 , . . . , iεN }.
3. Ingredients. In this section, we will state and prove some of the basic lemmas which
will be required in proving our main results.
Lemma 3.1. Let X be any nonnegative random variable such that 2(EX)2 6 EX 2 < ∞.
Then log Ee−tX 6 var(X)t2 /2 − E(X)t for any t > 0.
√
Proof. Let µ = EX and µ2 = EX 2 so that σ 2 = var(X) = µ22 − µ2 . We choose p = µ2 /µ22
and α = µ22 /µ so that Y := (1 − p)δ 0 + pδ α satisfies EY = µ and EY 2 = µ22 . By Benette’s
inequality [5],
(3.1)
for any t > 0, log Ee−tX 6 log Ee−tY = log[(1 − p) + pe−αt ] =: ϕ(t).
Differentiating the function ϕ and noting that pα = µ and µα = µ22 we get
ϕ0 (t) =
−pαe−αt
−µ
=
,
−αt
(1 − p) + pe
(1 − p)eαt + p
ϕ00 (t) = σ 2
eαt
.
[(1 − p)eαt + p]2
Also note that the quadratic function f (x) = [(1 − p)x + p]2 − x has nonnegative slope at x = 1
if 2(1 − p) > 1, which is true by our hypothesis. So ϕ00 (t) 6 σ 2 for any t > 0. Finally using
Taylor series expansion for the function ϕ we see that for any t > 0,
ϕ(t) = ϕ(0) + ϕ0 (0)t + ϕ00 (u)t2 /2 for some u ∈ [0, t]
6 −µt + σ 2 t2 /2.
This inequality together with (3.1) gives the desired result.
14
S. CHATTERJEE
Lemma 3.2. For any κ > 0 and ∆ > 1 the function φκ,∆ (γ) := γ[log(∆/γ)]κ is increasing
for γ 6 ∆e−κ and decreasing for γ > ∆e−κ . Hence φκ,∆ (γ) 6 ∆(κ/e)κ .
Proof. We get the conclusion using elementary method.
Lemma 3.3. For δ > 0, ϑ(m) := β1 m[log(n/m)]β2 , 0 < γ 6 1 and any integer Λ > 1 there is
an 3.3 > 0 depending on Λ, β1 , β2 , γ such that m 6 3.3 n and M ∈ N imply
P (Binomial(ΛM, (ϑ(m)/n)γ ) >
1
(1 + δ)M ) 6 exp(−(1 + δ/2)M log(n/m)).
γ
Proof. A standard large deviations result for the Binomial distribution, see e.g., Lemma
2.8.4 in [10] implies P (Binomial(ΛM, q) > ΛM r) 6 exp(−ΛM Hq (r)) for any r > q, where
r
1−r
(3.2)
Hq (r) := r log
+ (1 − r) log
.
q
1−q
When r = (1 + δ)/(γΛ), the first term in the large deviation bound (3.2) is
1
n γ
Λγβ1γ
n
exp(−ΛM r log(r/q)) 6 exp − (1 + δ)M log
− log
− β2 γ log log
γ
m
1+δ
m
For the second term in the large deviation bound in (3.2) we note that 1/(1 − q) > 1 and
(1 − r) log(1 − r) > −1/e by Lemma 3.2 (with κ = ∆ = 1), and conclude
1−r
exp −ΛM (1 − r) log
6 exp (−ΛM (1 − r) log(1 − r)) 6 exp(ΛM/e).
1−q
Combining the last two estimates
1
P (Binomial(ΛM, (ϑ(m)/n)γ ) > (1 + δ)M )
γ
n
6 exp −(1 + δ)M log(n/m) + β4 M + β5 M log log
,
m
for constants β4 and β5 .
Now we choose
(
−2β5 /δ
3.3 := max ∈ (0, e
1
) : log
2β5 /δ
)
6 exp(−2β4 /δ) .
Clearly 3.3 > 0 and, in view of Lemma 3.2 with κ = 2β5 /δ and ∆ = 1, m 6 3.3 n implies
n i2β5 /δ
1 2β5 /δ
(m/n) log
6 3.3 log
6 exp(−2β4 /δ),
m
3.3
h
which in turn implies β4 + β5 log log[n/m] 6 (δ/2) log(n/m). This completes the proof.
15
COMPLEX BOOLEAN NETWORKS
4. Choice of ‘good’ graphs Gn . For B ⊂ A ⊂ [n], recall the definition of the forest
K|A|
{ZtA }t=0
(as described in Section 2.2) with associated subsets {OtA,B } of ‘open’ sites. Let p be
←
−
the limiting out-degree distribution for G n , namely

pin
for RBN1
n
o∞
P
(4.1)
p=
p̃in = (rin )−1 l lpin,out
for RBN2
k,l
k=2
with p0 = p1 = 0 and mean r =
P
k
kpk > 2.
Proposition 4.1. There are constants c1 , c2 , δ > 0 such that for q ∈ (1/r, 1/2), Km =
c1 log2 (c2 log(n/m)) and for A ⊂ [n] if
n
o
A,B
−K|A|
EA := ∩B∈{B⊂A:|B|>(1−δ)|A|} |OK
|
>
(4/δ)q
|B|
,
|A|
then there is an 4.1 > 0 such that for any a > 0 the probability of
Gn1 := ∩A∈{A⊂[n]:(log n)a 6|A|64.1 n} EA
under Pi,n , i = 1, 2, is 1 − o(1).
Proof. Let pn = {pn,k }k>2 be the distribution
( P
1
z∈[n] δ Iz
n
(4.2)
pn := P
P
z∈[n] Oz
z∈[n] Oz δ Iz /
for RBN1
for RBN2
.
←
−
In view of Lemma 2.1 andP
2.2, pn approximates out-degree distribution for the graph G n with
pn,0 = pn,1 = 0. Let rn := k kpn,k ∈ (2, ∞) be the mean of pn . It is easy to see that rn → r.
In this proof, we write P for the probability distribution on the forests {ZtA }t>0,A⊂[n] , when
pn is used as its offspring distribution. We also use P̃ as a dummy replacement for P1,I and
P2,I,O (·|Fn ). Lemma 2.1 and 2.2 suggest that for any event F involving the structure of the
graph which depends on at most εn many vertices of the graph,

n
o

I
:
P̃(F
)
6
C
P(F
)
pin

2.1
⊗n

(4.3)
n
o  = 1 − o(1).

in,out
p
(I, O) : P̃(F ) 6 C2.2 exp(−rout ε log(1 − ε)n) P(F ) En 
⊗n
Now fix η ∈ (0, 1 − 1/qr), and
(
1
for RBN1
γ := α−2
.
−α and α > 3
for RBN2 when pout
k ∼ ck
α−1
When the tail of pout is lighter than polynomial, then γ is taken to be 1. Clearly 1/γ < 2 <
r(1 − η), so we can choose δ ∈ (0, 1/10) such that (1 + 5δ)/(2γ) < 1. We need to introduce some
16
S. CHATTERJEE
more notations, let
r̃ := rn (1 − η) so that qr̃ > 1 for large enough n,
1 + 5δ
1 + 5δ
ρ−1
ρ
1−
ρ > 0 be such that 2
> 1 and (qr̃) 1 −
> 1,
2γ
γ r̃
1 + 5δ σ
ρ
σ > 1 be such that (qr̃) 1 −
> rnρ ,
γ r̃
!!
X
In (η) := sup θr(1 − −η) − log
eθk pn,k
> 0 be the large deviation rate function for pn
θ
k
(4.4)
km := log2
ϑ(m) := m +
n
1 + 3δ
log
(r̃ − γ −1 (1 + 5δ))In (η)
m
and Km = ρσkm for m 6 n, so that
Km
X
(2r)l 6 β1 m[log(n/m)]β2
l=1
for β1 = 1 +
2r
2r − 1
1 + 3δ
−1
(r̃ − γ (1 + 5δ))In (η)
ρσ log2 (2r)
and β2 := ρσ log2 (2r).
Suppose B ⊂ A ⊂ [n] are subsets such that |A| = m and |B| > (1 − δ)m. For k > 1, define
the events
( ρ
)
X
1
A,B
A,B
HkA,B :=
|Cρ(k−1)+i
| 6 (1 + 5δ)|Oρ(k−1)
| ,
γ
i=1
n
o
A,B
A,B
LA,B
:=
|Z
|
>
r̃|O
|
and LA,B
:= ∩ρj=1 LA,B
k,j
k
k,j .
ρ(k−1)+j
ρ(k−1)+j−1
Note that on the event HkA,B ,
A,B
A,B
A,B
|Oρk
| > 2|Oρk−1
| − |Cρk
|
A,B
A,B
A,B
> 22 |Oρk−2
| − 2|Cρk−1
| − |Cρk
|
> ···
> 2
ρ
A,B
|Oρ(k−1)
|
−
ρ
X
A,B
2ρ−i |Cρ(k−1)+i
|
i=1
A,B
> 2ρ |Oρ(k−1)
| − 2ρ−1
ρ
X
A,B
|Cρ(k−1)+i
|
i=1
(4.5)
ρ
> (2 − 2
ρ−1 −1
γ
A,B
A,B
(1 + 5δ))|Oρ(k−1)
| > 2|Oρ(k−1)
|,
A,B
by the choice of ρ. Since |OtA,B | 6 (2r)|Ot−1
| for any t > 1, a similar argument which leads to the
COMPLEX BOOLEAN NETWORKS
17
previous display suggests that the following inequalities are true on the event HkA,B ∩ ∩ij=1 LA,B
k,j .
A,B
A,B
A,B
|Oρ(k−1)+i
| > r̃|Oρ(k−1)+i−1
| − |Cρ(k−1)+i
|
A,B
A,B
A,B
> r̃2 |Oρ(k−1)+i−2
| − r̃|Cρ(k−1)+i−1
| − |Cρ(k−1)+i
| > ···
> r̃
i
A,B
|Oρ(k−1)
|
−
i
X
A,B
r̃i−j |Cρ(k−1)+j
|
j=1
> r̃
i
A,B
|Oρ(k−1)
|
− r̃
i−1
i
X
A,B
|Cρ(k−1)+j
|
j=1
A,B
> (r̃ − r̃ γ (1 + 5δ))|Oρ(k−1)
|
1 + 5δ
A,B
> r̃ 1 −
|Oρ(k−1)
|,
γ r̃
i
(4.6)
(4.7)
i−1 −1
Taking i = ρ in (4.6),
A,B
|Oρk
|
(4.8)
1 + 5δ
A,B
|Oρ(k−1)
| on the event HkA,B ∩ LA,B
> r̃ 1 −
k .
γ r̃
ρ
Recalling Km = ρσkm and using (4.5) and (4.8) repeatedly,
1 + 5δ (σ−1)km
A,B
A,B
A,B
km
ρ
m
m
|OK
|
>
|B|2
r̃
1
−
∩ ∩σk
on the event ∩σk
k=km +1 Lk .
k=1 Hk
m
γ r̃
Now note that
q
ρσ
r̃
ρ
1 + 5δ
1−
γ r̃
σ−1
1 + 5δ σ −ρ
ρ
> (qr̃) 1 −
r >1
γ r̃
by the choices of ρ and σ. So if
4(r̃ − γ −1 (1 + 5δ))In (η)
(4.9)
(m/n) 6 exp −
so that 2km > (4/δ),
δ(1 + 3δ)
then
(4.10)
A,B
A,B
A,B
m
m
|OK
| > |B|(4/δ)q −Km on the event ∩σk
∩ ∩σk
k=1 Hk
k=km +1 Lk .
m
A,B
A,B
To estimate the probability of the event in (4.10) recall that |Ct+1
|, |Ot+1
| 6 (2r)|OtA,B | for
any t > 0 and by Lemma 2.1 and 2.2 each site is included in CtA,B ⊂ CtA with probability at
P
A,B
A,B
| conditionally on |Oρ(k−1)
| is stochastically
most c(ϑ(m)/n)γ . So for any k > 1, ρi=1 |Cρ(k−1)+i
γ
dominated by the Binomial(ΛM, c(ϑ(m)/n) ) distribution, where Λ = (2r) + (2r)2 + · · · + (2r)ρ
A,B
and M = |Oρ(k−1)
|. Hence, applying Lemma 3.3 with the above choices of Λ and M if
(4.11)
then
m 6 3.3 (Λ, 5δ, η, γ)n,
A,B
A,B
P (HkA,B )c |Oρ(k−1)
| 6 exp −(1 + 5δ/2)|Oρ(k−1)
| log(n/m) .
18
S. CHATTERJEE
A,B
k−1 A,B
Since |Oρ(k−1)
| > |B| on the event ∩j=1
Hj
by (4.5), the above inequality reduces to
k−1 A,B
P (HkA,B )c ∩ ∩j=1
Hj
6 exp (−(1 + 5δ/2)|B| log(n/m))
(4.12)
6 exp (−(1 + δ)m log(n/m)) .
The last inequality follows from the fact that |B| > (1 − δ)m and δ ∈ (0, 1/10), which makes
(1 + 5δ/2)(1 − δ) > 1 + δ.
By the choice of In (η), a standard large deviation argument for the sum of i.i.d. random
variables yields
A,B
A,B
c
|O
|
6
exp
−|O
|I
(η)
P (LA,B
)
k,i
ρ(k−1)+i−1
ρ(k−1)+i−1 n
for any k > 1 and 1 6 i 6 ρ. Now repeated applications of the inequality in (4.5) suggest
A,B
m
that |Oρk
| > 2km |B| on the event ∩kj=1
HjA,B . In view of (4.5) and (4.7), for any k > km and
m
1 6 i 6 ρ,
A,B
A,B
A,B
|
|Oρ(k−1)+i−1
| > (r̃ − γ −1 (1 + 5δ))|Oρ(k−1)
| > (r̃ − γ −1 (1 + 5δ))|Oρk
m
A,B
on the event ∩kj=km +1 HjA,B ∩i−1
j=1 Lk,j . So the inequality in the last display reduces to
A,B
−1
km
c i−1 A,B k
|B|I
(η)
∩
H
6
exp
−(r̃
−
γ
(1
+
5δ))2
)
∩
L
P (LA,B
n
j=1
j
j=1 k,j
k,i
(4.13)
6 exp(−(1 + 3δ)|B| log(n/m)) 6 exp(−(1 + δ)m log(n/m)).
The last two inequalities follow from the definition of km and the facts that |B| > (1 − δ)m,
which implies (1 + 3δ)|B| > (1 + δ)m for δ ∈ (0, 1/10). Applying Lemma 3.2 with κ = ∆ = 1,
(4.14)
m log(n/m) = nφ1,1 (m/n) > nφ1,1 (1/n) = log n for m 6 n/e.
Combining (4.10), (4.12) and (4.13) if m/n is small satisfying (4.9), (4.11) and (4.14), then
c A,B
A,B
σkm
σkm A,B
−Km
L
H
∩
∩
6
P
∩
P |OK
|
<
|B|(4/δ)q
k=km +1 k
k=1 k
m
6
+
σk
m
X
σk
m
X
A,B
P (HkA,B )c ∩k−1
H
j=1
j
k=1
ρ
X A,B
c i−1 A,B k
P (LA,B
k,i ) ∩j=1 Lk,j ∩j=1 Hj
k=km +1 i=1
6 [σ + ρ(σ − 1)]km exp(−(1 + δ)m log(n/m))
6 [σ + ρ(σ − 1)] log2 [C log n]
exp(−(1 + 3δ/4)m log(n/m))n−δ/4
6 exp(−(1 + 3δ/4)m log(n/m))
for large enough n. Since the event considered in the last display involves at most ϑ(m) vertices
of the graph, the above estimate together with (4.3) implies
A,B
−Km
P̃ |OK
|
<
|B|(4/δ)q
6 exp(−(1 + 3δ/8)m log(n/m)),
m
19
COMPLEX BOOLEAN NETWORKS
in,out
with (pin
⊗n /p⊗n ) probability 1 − o(1) provided m/n 6 ε is small.
Using this estimate and union bound we see that if m/n is small, then
c
P̃ ∪A∈{A⊂[n]:|A|=m} EA
o
n
A,B
−Km
6 P̃ ∪m0 ∈[(1−δ)m,m] ∪{(A,B):B⊂A⊂[n],|A|=m,|B|=m0 } |OK
|
<
(4/δ)q
|B|
m
X
n
m
n
exp −(1 + 3δ/8)m log
6
(4.15)
.
0
m
m
m
0
m ∈[(1−δ)m,m]
l
It is easy to check that Ll 6 Ll! 6 (Le/l)l for any positive integers l 6 L and the function
φ1,e (·) defined in Lemma 3.2 is increasing on (0, 1). So for m0 > (1 − δ)m,
m
n
ne m
m
and
=
6
0
m−m
m
m0
m
m−m0
m − m0
me
6 m−m0
= exp mφ1,e
m
6 exp(mφ1,e (δ)) 6 (e/δ)δm .
Also there are at most m 6 em choices for m0 . Using these bounds the right hand side of (4.15)
6 exp[m + m log(ne/m) + mδ log(e/δ) − (1 + 3δ/8)m log(n/m)]
6 exp[−(3δ/8)m log(n/m) + ∆1 m]
for some constant ∆1 . If m/n 6 exp(−8∆1 /δ), then the right hand side of the last display is
6 exp[−(δ/4)m log(n/m)]. Therefore, if 4.1 is chosen small enough, then for any m 6 4.1 n,
c
P̃ ∪A∈{A⊂[n]:|A|=m} EA
6 exp[−(δ/4)m log(n/m)].
Combining this with the fact that m 7→ m log(n/m) is increasing for m 6 n/e (by Lemma 3.2),
X
X
c
P̃((Gn1 )c ) 6
P̃ ∪A∈{A⊂[n]:|A|=m} EA
6
exp[−(δ/4)(log n)a log(n/(log n)a )]
m6[(log n)a ,4.1 n]
m∈[(log n)a ,4.1 n]
1+a
6 n exp[−(δ/4)(log n)
√
(1 + o(1))] = o(1/ n).
This together with (4.3) completes the proof.
←
−{x}
Recall the definition of π(·, ·) from (1.11) and let CLxk := ∪kl=0 Z l be the oriented cluster
←
−
of depth k starting from x ∈ [n] in the graph G n . For any sequence {tn } as in the statement of
our theorems, define the events
n←
o
n←
o
−{x}
−{x}
Ax :=
ξ 2a log log n/ log(qr̃) > (log n)a , Ãx := ξ tn ∧2a log log n/ log(qr̃) 6= ∅ ,
n
o
Ax,y :=
CLx2a log log n/ log(qr̃) ∩ CLy2a log log n/ log(qr̃) = ∅ .
Proposition 4.2. For π(·, ·) as in (1.11), p as in (4.1), q > 1/r and ε > 0 let π := π(p, q)
and

 


  X

X
X
Gn2 := Gn : n(π − ε) 6
PGn ,q (Ax ),
PGn ,q (Ãx ) 6 n(π + ε) ∩
1Acx,y 6 n9/5 .

 

x∈[n]
Then Pi,n (Gn2 ) = 1 − o(1) for i = 1, 2.
x∈[n]
x,y∈[n],x6=y
20
S. CHATTERJEE
Proof. In this proof also the notations P and P̃ serve the same purpose as they did in
the proof of Proposition 4.1. E and Ẽ denote the corresponding expectations. Also let sn =
2a log log n/ log(qr̃), tn ∧ 2a log log n/ log(qr̃).
First we note that if
n
o
x
(4.16)
Cx := CL2a log log n/ log(qr̃) 6 n1/4 , then P(Cx ), P̃(Cx ) > 1 − n−1/8 ,
using Markov inequality. This bound together with (4) of Lemma 2.2 with ε = n−3/4 implies
(4.17)
E PGn ,q (Ax ) − Ẽ PGn ,q (Ax ) = o(1) + E PGn ,q (Ax ) 1Cx − Ẽ PGn ,q (Ax ) 1Cx = o(1).
Now if Bx denotes the event that collision does not occur in the cluster CLx2a log log n/ log(qr̃) , then
combining (4.16) with Lemma 2.1 and 2.2,
P(Bxc ) = o(1) + P(Bxc ∩ Cx ) = o(1) + P̃(Bxc ∩ Cx ) = o(1).
←
−{x}
On the event Bx , the law of | ξ t |, 0 6 t 6 2a log log n/ log(qr̃) under the annealed measure
P × PGn ,q is same as that of a branching process with offspring distribution (1 − q)δ 0 + qpn ,
where pn is as in (4.2). So if {Zt }t>0 is such a branching process with Z0 = 1, then its survival
probability is π(pn , q) and using (4.18),
E[PGn ,q (Ax )] = P Zsn > (qr̃)sn /2 + o(1).
(4.18)
Imitating the proof of Proposition 2.1 in [20], the right hand side is π(pn , q)+o(1). This together
with (4.17) and the fact that π(pn , q) → π as n → ∞ implies
(4.19)
Ẽ[PGn ,q (Ax )] = π + o(1), using similar argument Ẽ[PGn ,q (Ãx )] = π + o(1).
Also using (4.16) and Lemma 2.1 and 2.2
(4.20)
P̃(Acx,y ) 6 P̃(Cxc ) + P̃(Cyc ) + P̃(Acx,y ∩ Cx ∩ Cy )
!(α−2)/(α−1)
n1/4
−1/8
1/4
6 2n
+ cn
6 cn−1/8
n
for some constant c > 0. Using the last inequality and following the argument which leads to
(2.13) in [6], if x1 , x2 ∈ [n] are such that x1 6= x2 , then
Ẽ[PGn ,q (Ax1 ) PGn ,q (Ax2 )] − Ẽ[PGn ,q (Ax1 )]Ẽ[PGn ,q (Ax2 )]
6 P̃(Acx1 ,x2 )[1 + 1/P̃(Ax1 ,x2 )] 6 cn−1/8 .
So using a standard second moment argument and then combining with (4.19)


X
X
P̃ n(π − ε) 6
PGn ,q (Ax ),
PGn ,q (Ãx ) 6 n(π + ε) = 1 − o(1).
x∈[n]
x∈[n]
By a similar argument if x1 , x2 , x3 , x4 ∈ [n] are such that x1 6= x2 , x3 6= x4 and {x1 , x2 } ∩
{x3 , x4 } = ∅, then
Ẽ[Ax1 ,x2 ∩ Ax3 ,x4 ] − Ẽ[Ax1 ,x2 ]Ẽ[Ax3 ,x4 ]
6 P̃(∪i∈{1,2},j∈{3,4} Acxi ,xj )[1 + 1/P̃(∩i∈{1,2},j∈{3,4} Axi ,xj )] 6 cn−1/8 6 cn−1/8 ,
21
COMPLEX BOOLEAN NETWORKS
and hence combining with (4.20) and using the standard second moment argument,


X
n 
= o(1).
1Acx,y > n−1/10
P̃ 
2
x,y∈[n],x6=y
This completes the proof.
5. Proofs of the Theorems. Let F be a forest consisting of m rooted directed trees and
let Fk,i denote the set of vertices of the i-th tree which are at oriented distance k from the root
level and Fk = ∪m
i=1 Fk,i .
←
−
Lemma 5.1. If PF ,q denotes the law of { ξ A
t , A ⊂ F0 , t > 0} on the directed forest F and
if |Fk | > 2q −k |F0 |, then
!
m
←
X
− F0 PF ,q ξ k 6 |F0 | 6 exp −cq k |Fk |2 /
|Fk,i |2 .
i=1
←
− F0
P Proof. Let m = |F0 |. For x ∈ Fk let Yx := 1{x ∈ ξ k } and for 1 6 i 6 m let Ni :=
x Yx 1{x ∈ Fk,i }. It is easy to see that if l(x, y) equals half of the distance between x and y in
the forest ignoring the orientation of the edges, then

k

if x = z
q
k
k+l(x,y)−1
EF ,q Yx = q , EF ,q (Yx Yz ) = q
if 1 6 l(x, y) 6 k, so that

 2k
q
otherwise
i
h
X
EF ,q Ni = q k |Fk,i |, EF ,q Ni2 =
EF ,q (Yx Yz ) ∈ q 2k−1 |Fk,i |2 , q k |Fk,i |2 .
x,z∈Fk,i
Pm
> 2m. So applying Lemma 3.1
!
!
m
m
←
X
X
− F0 −tNi
PF ,q ξ k 6 m
= PF ,q
Ni 6 m 6 exp tm +
log EF ,q e
By our hypothesis,
i=1 EF ,q Ni
i=1
6 exp tm +
6 exp −
i=1
m
X
i=1
m
Xh
−tEF ,q Ni + EF ,q Ni2 − [EF ,q Ni ]
(t/2)EF ,q Ni − EF ,q Ni2 − EF ,q Ni
2
2 t2 /2
!
t2 /2
!
i
i=1
for any t > 0. Optimizing the last expression with respect to t and noting that at−bt2 /2 6 a2 /2b
for any a, b > 0 we have
!
2
Pm
←
− F0 i=1 EF ,q Ni
6 exp − Pm
PF ,q ξ k 6 m
(5.1)
8 i=1 EF ,q Ni2 − (EF ,q Ni )2


!2 m
m
2k /8
X
X
q
(5.2)
6 exp − k
|Fk,i | /
|Fk,i |2  .
q − q 2k
i=1
i=1
22
S. CHATTERJEE
Proposition 5.2. Let 4.1 and Gn1 be as in Proposition 4.1 and Km be as in (4.4). There
are constants C5.2 , b > 0 such that if Gn ∈ Gn1 and A ⊂ [n] has size m 6 4.1 n, then
←
− −b
6
m
6
exp
−C
m(log(n/m))
.
PGn ,q ξ A
5.2
Km
Proof. For Gn ∈ Gn1 , any A ⊂ [n] with |A| = m 6 4.1 n and δ as in Proposition 4.1, define
n
o
A,{x}
τA := x ∈ A : |OKm | > (4/δ)q −Km .
Clearly |τA | > δ|A|, because otherwise B = A \ τA will have |B| > (1 − δ)|A| and
X A,{x}
A,B
|
6
|OKm | < (4/δ)q −Km |B|
|OK
m
x∈B
by the definition of τA and this contradicts the fact that Gn ∈ Gn1 .
m
Let F be the subgraph of {ZtA,τA }K
t=0 induced by the vertex set
A,{x}
Km −1 A,{x}
∪x∈τA ∪i=0
Ot
∪ Π(d(4/δ)q −Km e, OKm ) .
So F is a labeled directed forest with depth Km such that |F0 | > δm and |FKm ,i | = d(4/δ)q −Km e
for all i. Applying Lemma 5.1 with k replaced by Km and m replaced by δm, and noting that
←
−
←
−F0
|ξA
Km | stochastically dominates | ξ Km |,
←
1 Km
−A PGn ,q ξ Km 6 m 6 exp − q δm .
8
This proves the result.
Proof of Theorem 1.4 and 1.5. We take Gn := Gn1 ∩ Gn2 , where Gn1 and Gn2 are as in
Proposition 4.1 and 4.2 respectively, and we will see that
1
∆ := C5.2 4.1 [log(1/4.1 )]−b
2
will suffice, where C5.2 , b are as in Proposition 5.2. Clearly Pi,n (Gn ) = 1 − o(1). Define
n
o
←
−{x}
Tx := inf t > 1 : | ξ t | > 4.1 n .
←
−{x}
We take Tx = ∞ if ξ t never reaches 4.1 n. Recalling the definition of the event Ax from (4.19)
and then applying Proposition 4.1, Gn ∈ Gn implies



4.1
n−1


X
Km 
PGn ,q Ax ∩ Tx > 2a log log n/ log(qr̃) +


a
m=(log n)
6
X
4.1 n
exp −C5.2 m(log(n/m))−b
m=(log n)a
(5.3)
6 n exp −C5.2 (log n)a (log(n/(log n)a ))−b = o(1/n)
COMPLEX BOOLEAN NETWORKS
23
←
−{x}
if a is large enough. For i > 1 if | ξ Tx +(i−1)Km | > 4.1 n, then we can again apply Proposition 4.1
←
−{x}
with A replaced by any subset of ξ Tx +(i−1)Km consisting of 4.1 n many vertices to have
PGn ,q
o n←
o
n ←
−{x}
−{x}
6 exp −C5.2 4.1 [log(1/4.1 )]−b n ,
| ξ Tx +iKm | < 4.1 n ∩ | ξ Tx +(i−1)Km | > 4.1 n
which in turn implies
n←
o
−{x}
(5.4)
PGn ,q
ξ Tx +e∆n Km < 4.1 n ∩ {Tx < ∞} = e∆n e−2∆n = o(1/n).
Combining (5.3) and (5.4) and using union bound,
h
n←
oi
−{x}
PGn ,q ∪x∈[n] Ax ∩ ξ exp(∆n) = ∅
6 no(1/n) = o(1).
←
−
This together with the duality relationship between ξt and ξ t suggests
(5.5)
←
−{x}
[n]
PGn ,q ξexp(∆n) ⊃ {x ∈ [n] : Ax occurs} = PGn ,q ξ exp(∆n) 6= ∅ if Ax occurs = 1 − o(1).
NowPin order to estimate the size of {x ∈ [n] : Ax occurs}, we will use a second moment argument
for x∈[n] 1Ax . Note that
2

EGn ,q 
X
1Ax −
x∈[n]
X
PGn ,q (Ax ) =
x∈[n]
X
[PGn ,q (Ax ∩ Ay ) − PGn ,q (Ax ) PGn ,q (Ay )].
x,y∈[n]
Recalling the definition of the event Ax,y from (4.20) if Ax,y occurs, then the corresponding
summand in the above sum is 0, otherwise the summands are at most 1. Keeping this observation
in mind and using the fact that Gn ∈ Gn2 ,

2
X
X
X
n


EGn ,q
1A x −
PGn ,q (Ax ) 6 n +
1Acx,y 6 n +
o(1).
2
x∈[n]
x∈[n]
x,y∈[n],x6=y
The same estimate
Proposition
P is true if we replace
PAx by Ãx in the above display. Also from
2
4.2 n(π − ε) 6 x∈[n] PGn ,q (Ax ) and x∈[n] PGn ,q (Ãx ) 6 n(π + ε) for Gn ∈ Gn . Therefore, by
Chebyshev inequality


X
X
PGn ,q (π − 2ε)n 6
1Ax ,
1Ãx 6 (π + 2ε)n = 1 − o(1).
x∈[n]
x∈[n]
Combining this with (5.5)
[n]
PGn ,q ξexp(∆n) > n(π − 2ε) = 1 − o(1).
←
−
Also, using duality between ξt and ξ t again, for any t > tn


X
[n] PGn ,q ξt > n(π + 2ε) = PGn ,q 
1Ãx > (π + 2ε)n = o(1).
x∈[n]
24
S. CHATTERJEE
So the required result follows from attractiveness of the threshold contact process.
To prove the last assertion suppose q < 1/r. Then we take
←
n
o
−{x} Gn := Gn : EGn ,q ξ C log n 6 nC log(qr)/2 for all x ∈ [n]
for some constant C > 0. In order to see that Pi,n (Gn ) = 1 − o(1), recall the definition of P and
P̃ from the proof of Proposition 4.1. Using union bound and Markkov inequality,
←
X
−{x} n−C log(qr)/2 E EGn ,q ξ C log n 6 n · n−C log(qr)/2 · n−C log(qrn ) .
P((Gn )c ) 6
x∈[n]
Since a branching process starting from x with offspring distribution pn stochastically dominates
←
−{x}
| ξ t |, P̃((Gn )c ) has the same upper bound. This together with the fact that rn → r implies
Pi,n (Gn ) = 1 − o(1). Finally for Gn ∈ Gn , again using union bound and Markov inequality
←
←
X
−[n]
−{x} [n]
PGn ,q (ξC log n 6= ∅) = PGn ,q ξ C log n 6= ∅ 6
EGn ,q ξ C log n = o(1)
x∈[n]
if C is chosen large enough.
Acknowledgments. The author thanks Shankar Bhamidi for various helpful comments and
discussions while writing this article.
References.
[1] R. Albert, Scale-free graphs in cell biology, Journal of cell science 118 (2005), no. 21, 4947–4957.
[2] R. Albert and H. G. Othmer, The topology of the regulatory interactions predicts the expression pattern of the
segment polarity genes in drosophila melanogaster, Journal of Theoretical Biology 223 (2003), no. 1, 1–18.
[3] M. Aldana and P. Cluzel, A natural class of robust graphs, Proceedings of the National Academy of Sciences
100 (2003), no. 15, 8710.
[4] K. B. Athreya, Large deviation rates for branching processes–i. single type case, The Annals of Applied
Probability 4 (1994), no. 3, 779–790.
[5] G. Bennett, Probability inequalities for the sum of independent random variables, Journal of the American
Statistical Association (1962), 33–45.
[6] S. Chatterjee and R. Durrett, Persistence of activity in threshold contact processes, an annealed approximation
of random boolean graphs, Random Structures & Algorithms (2011).
[7] B. Derrida and Y. Pomeau, Random graphs of automata: a simple annealed approximation, EPL (Europhysics
Letters) 1 (1986), 45.
[8] B. Drossel, Number of attractors in random boolean graphs, Physical Review E 72 (2005), no. 1, 016110.
[9]
, 3 random boolean graphs, Reviews of nonlinear dynamics and complexity 1 (2008), 69.
[10] R. Durrett, Random graph dynamics, Vol. 20, Cambridge Univ Pr, 2007.
[11] H. Flyvbjerg and N. Kjr, Exact solution of kauffman’s model with connectivity one, Journal of Physics A:
Mathematical and General 21 (1988), 1695.
[12] J. J. Fox and C. C. Hill, From topology to dynamics in biochemical graphs, Chaos: An Interdisciplinary
Journal of Nonlinear Science 11 (2001), no. 4, 809–815.
[13] D. Griffeath and D. Griffeath, Additive and cancellative interacting particle systems, Springer-Verlag Berlin,
1979.
[14] S. Huang, Gene expression profiling, genetic graphs, and cellular states: an integrating concept for tumorigenesis and drug discovery, Journal of Molecular Medicine 77 (1999), no. 6, 469–480.
COMPLEX BOOLEAN NETWORKS
25
[15] L. Kadanoff, S. Coppersmith, and M. Aldana, Boolean dynamics with random couplings, Arxiv preprint
nlin/0204062 (2002).
[16] S. A. Kauffman, Metabolic stability and epigenesis in randomly constructed genetic nets, Journal of theoretical
biology 22 (1969), no. 3, 437–467.
[17] S. A. Kauffman, The origins of order: Self organization and selection in evolution, Oxford University Press,
USA, 1993.
[18] D. S. Lee and H. Rieger, Broad edge of chaos in strongly heterogeneous boolean graphs, Journal of Physics A:
Mathematical and Theoretical 41 (2008), 415001.
[19] B. Luque and R. V. Solé, Phase transitions in random graphs: simple analytic determination of critical points,
Physical Review E 55 (1997), no. 1, 257–260.
[20] T. Mountford and D. Valesin, Supercriticality for annealed approximations of boolean graphs, Arxiv preprint
arXiv:1007.0862 (2010).
[21] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Random graphs with arbitrary degree distributions and
their applications, Phys. Rev. E. 64 (2001), 026118.
[22] M. E. J. Newman, D. J. Watts, and S. H. Strogatz, Random graph models of social graphs, Proceedings of
the National Academy of Sciences of the United States of America 99 (2002), no. Suppl 1, 2566–2572.
[23] A. Pomerance, E. Ott, M. Girvan, and W. Losert, The effect of graph topology on the stability of discrete
state models of genetic control, Proceedings of the National Academy of Sciences 106 (2009), no. 20, 8209.
[24]
, The effect of graph topology on the stability of discrete state models of genetic control, Proceedings
of the National Academy of Sciences 106 (2009), no. 20, 8209–8214.
[25] I. Shmulevich and W. Zhang, Binary analysis and optimization-based normalization of gene expression data,
Bioinformatics 18 (2002), no. 4, 555–565.
[26] R. V. Solé and B. Luque, Phase transitions and antichaos in generalized kauffman graphs, Physics Letters A
196 (1994), no. 1-2, 331–334.
[27] R. Somogyi and C. Sniegoski, Modeling the complexity of genetic graphs: understanding multigenic and
pleiotropic regulation, Complexity 1 (1996), 45–63.
[28] S. M Stigler, Linear functions of order statistics with smooth weight functions, The Annals of Statistics
(1974), 676–693.
[29] I. Tabus, R. Jorma, and J. Astola, Normalized maximum likelihood models for boolean regression with application to prediction and classification in genomics, Computational and Statistical Approaches to Genomics
(2006), 235–258.
[30] J. A Wellner, A glivenko-cantelli theorem and strong laws of large numbers for functions of order statistics,
The Annals of Statistics (1977), 473–480.
Courant Institute of Mathematical Sciences
New York University
251 Mercer Street
New York, NY 10012, USA
E-mail: [email protected]