Maximum Entropy Distributions on Graphs

Maximum Entropy Distributions on Graphs
by
Andre Yohannes Wibisono
A thesis submitted in partial satisfaction of the
requirements for the degree of
Master of Arts
in
Statistics
in the
Graduate Division
of the
University of California, Berkeley
Committee in charge:
Professor Michael I. Jordan, Chair
Professor Peter L. Bartlett
Professor Martin J. Wainwright
Spring 2013
Maximum Entropy Distributions on Graphs
Copyright 2013
by
Andre Yohannes Wibisono
1
Abstract
Maximum Entropy Distributions on Graphs
by
Andre Yohannes Wibisono
Master of Arts in Statistics
University of California, Berkeley
Professor Michael I. Jordan, Chair
We study the maximum entropy distribution on weighted graphs with a given expected degree sequence. This distribution on graphs is characterized by independent edge weights
parameterized by vertex potentials at each node. Using the general theory of exponential
family distributions, we prove the existence and uniqueness of the maximum likelihood estimator (MLE) of the vertex parameters. We also prove the consistency of the MLE from a
single graph sample, extending the results of Chatterjee, Diaconis, and Sly for unweighted
(binary) graphs. Interestingly, our extensions require an intricate study of the inverses of diagonally dominant positive matrices. Along the way, we derive analogues of the Erdős-Gallai
criterion of graphical sequences for weighted graphs.
CONTENTS
i
Contents
Contents
i
1 Introduction
1
2 Preliminaries and general theory
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 General theory via exponential family distributions . . . . . . . . . . . . . .
4
4
4
3 Analysis for specific edge weights
3.1 Finite discrete weighted graphs . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Continuous weighted graphs . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Infinite discrete weighted graphs . . . . . . . . . . . . . . . . . . . . . . . . .
10
10
14
16
4 Inverses of diagonally dominant matrices
4.1 Statement of the result . . . . . . . . . . . .
4.2 Reduction to exact limiting cases . . . . . .
4.3 Boundary combinatorics and exact formulae
4.4 Analysis of kJ −1 k∞ in a neighborhood of S .
4.5 Proof of the bound . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
23
25
32
33
5 Proofs of main results
5.1 Preliminaries . . . . . . . . . . . . . . . . . .
5.2 Proofs for the finite discrete weighted graphs .
5.3 Proofs for the continuous weighted graphs . .
5.4 Proofs for the infinite discrete weighted graphs
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
37
51
57
6 Discussion and future work
64
Bibliography
65
CHAPTER 1. INTRODUCTION
1
Chapter 1
Introduction
Maximum entropy models are an important class of statistical models for biology. For instance, they have been found to be a good model for protein folding [35, 42], antibody
diversity [30], neural population activity [37, 40, 47, 46, 4, 54, 41], and flock behavior [5]. In
this thesis we develop a general framework for studying maximum entropy distributions on
weighted graphs, extending recent work of Chatterjee, Diaconis, and Sly [8]. The development of this theory is partly motivated by the problem of sensory coding in neuroscience.
In the brain, information is represented by discrete electrical pulses, called action potentials or spikes [34]. This includes neural representations of sensory stimuli which can take
on a continuum of values. For instance, large photoreceptor arrays in the retina respond to a
range of light intensities in a visual environment, but the brain does not receive information
from these photoreceptors directly. Instead, retinal ganglion cells must convey this detailed
input to the visual cortex using only a series of binary electrical signals. Continuous stimuli
are therefore converted by networks of neurons to sequences of spike times.
An unresolved controversy in neuroscience is whether information is contained in the
precise timings of these spikes or only in their “rates” (i.e., counts of spikes in a window
of time). Early theoretical studies [28] suggest that information capacities of timing-based
codes are superior to those that are rate-based (also see [18] for an implementation in a
simple model). Moreover, a number of scientific articles have appeared suggesting that
precise spike timing [1, 3, 32, 52, 27, 6, 29, 31, 12, 23] and synchrony [48] are important for
various computations in the brain.1 Here, we briefly explain a possible scheme for encoding
continuous vectors with spiking neurons that takes advantage of precise spike timing and
the mathematics of maximum entropy distributions.
Consider a network of n neurons in one region of the brain which transmits a continuous
vector θ ∈ Rn using sequences of spikes to a second receiver region. We assume that this
second region contains a number of coincidence detectors that measure the absolute difference in spike times between pairs of neurons projecting from the first region. We imagine
1
Although it is well-known that precise spike timing is used for time-disparity computation in animals
[7], such as when owls track prey with binocular hearing or when electric fish use electric fields around their
bodies for locating objects.
CHAPTER 1. INTRODUCTION
2
three scenarios for how information can be obtained by these detectors. In the first, the
detector is only measuring for synchrony between spikes; that is, either the detector assigns
a 0 to a nonzero timing difference or a 1 to a coincidence of spikes. In another scenario,
timing differences between projecting neurons can assume an infinite but countable number
of possible values. Finally, in the third scenario, we allow these differences to take on any
nonnegative real values. We further assume that neuronal output and thus spike times are
stochastic variables. A basic question now arises: How can the first region encode θ so that
it can be recovered robustly by the second?
We answer this question by first asking the one symmetric to this: How can the second
region recover a real vector transmitted by an unknown sender region from spike timing
measurements? We propose the following possible solution to this problem. Fix one of the
detector mechanics as described above, and set aij to be the measurement of the absolute
timing difference between spikes from projecting neurons
i and j. We assume that the receiver
P
a
population can compute the (local) sums di =
j6=i ij efficiently. The values a = (aij )
represent a weighted graph G on n vertices, and we assume that aij is randomly drawn from
a distribution on timing measurements (Aij ). Making no further assumptions, a principle of
Jaynes [21] suggests that the second region propose that the timing differences are drawn
from the (unique) distribution over weighted graphs with the highest entropy
[38, 10] having
P
the vector d = (d1 , . . . , dn ) for the expectations of the degree sums j6=i Aij . Depending
on which of the three scenarios described above is true for the coincidence detector, this
prescription produces one of three different maximum entropy distributions.
Consider the third scenario above (the other cases are also subsumed by our results).
As we shall see in Section 3.2, the distribution determined in this case is parameterized by
a real vector θ = (θ1 , . . . , θn ), and finding the maximum likelihood estimator (MLE) for
these parameters using d as sufficient statistics boils down to solving the following set of n
algebraic equations in the n unknowns θ̂1 , . . . , θ̂n :
X 1
for i = 1, . . . , n.
(1.1)
di =
j6=i θ̂i + θ̂j
Given our motivation, we call the system of equations (1.1) the retina equations for theoretical neuroscience, and note that they have been studied in a more general context by
Sanyal, Sturmfels, and Vinzant [36] using matroid theory and algebraic geometry. Remarkably, a solution θ̂ to (1.1) has the property that with high probability, it is arbitrarily close
to the original parameters θ for sufficiently large network sizes n (in the scenario of binary
measurements, this is a result of [8]). In particular, it is possible for the receiver region to
recover reliably a continuous vector θ from a single cycle of neuronal firing emanating from
the sender region.
We now know how to answer our first question: The sender region should arrange spike
timing differences to come from a maximum entropy distribution. We remark that this
conclusion is consistent with modern paradigms in theoretical neuroscience and artificial
intelligence, such as the concept of the Boltzmann machine [2], a stochastic version of its
(zero-temperature) deterministic limit, the Little-Hopfield network [26, 17].
CHAPTER 1. INTRODUCTION
3
Organization
The organization of this thesis is as follows. In Chapter 2, we lay out the general theory
of maximum entropy distributions on weighted graphs. In Chapter 3, we specialize the
general theory to three classes of weighted graphs. For each class, we provide an explicit
characterization of the maximum entropy distributions and prove a generalization of the
Erdős-Gallai criterion for weighted graphical sequences. Furthermore, we also present the
consistency property of the MLE of the vertex parameters from one graph sample. A key step
in proving the consistency result is a new bound on the inverses of symmetric, diagonally
dominant positive matrices, and Chapter 4 is devoted to the development of this bound.
Chapter 5 provides the proofs of the main technical results presented in Chapter 3. Finally,
in Chapter 6 we discuss the results in this thesis and some future research directions.
Acknowledgments. This thesis represents in part joint work with Christopher J. Hillar
and Shaowei Lin, based on the manuscripts [14, 15].
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
4
Chapter 2
Preliminaries and general theory
In this chapter we develop the general machinery of maximum entropy distributions on
graphs via the theory of exponential family distributions [53], and in subsequent chapters
we specialize our analysis to some particular cases of weighted graphs.
2.1
Notation
We first introduce some notation that we use in this
Q R+ = (0, ∞), R0 = [0, ∞),
P thesis. Let
N = {1, 2, . . . }, and N0 = {0, 1, 2, . . . }. We write {i,j} and {i,j} for the summation and
product, respectively, over all n2 pairs {i, j} with i 6= j. For a subset C ⊆ Rn , C ◦ and C
n
n
denote the
Pn interior and closure of C in R , respectively. For a vector x = (x1 , . . . , xn ) ∈ R ,
kxk1 = i=1 |xi | and kxk∞ = max1≤i≤n |xi | denote the `1 and `∞ norms of x. For an n × n
matrix J = (Jij ), kJk1 denotes the matrix norm induced by the k · k1 -norm on vectors in
Rn , that is,
n
X
kJxk1
= max
kJk1 = max
|Jij |.
1≤j≤n
x6=0 kxk1
i=1
Similarly, kJk∞ denotes the matrix norms induced by the k · k∞ -norm on vectors in Rn , so
kJk∞
2.2
n
X
kJxk∞
= max
= max
|Jij |.
1≤i≤n
x6=0 kxk∞
j=1
General theory via exponential family
distributions
Consider an undirected graph G on n ≥ 3 vertices with edge (i, j) having weight aij ∈ S,
where S ⊆ R is the set of possible weight values. We will later consider the following specific
cases:
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
5
1. Finite discrete weighted graphs, with edge weights in S = {0, 1, . . . , r − 1}, r ≥ 2.
2. Infinite discrete weighted graphs, with edge weights in S = N0 .
3. Continuous weighted graphs, with edge weights in S = R0 .
A graph G is fully specified by its adjacency matrix a = (aij )ni,j=1 , which is an n×n symmetric
matrix with zeros along its diagonal. For fixed n, a probability distribution over graphs G
(n2 ) . Given a graph
corresponds to a distribution over adjacency matrices
a
=
(a
)
∈
S
ij
P
with adjacency matrix a = (aij ), let degi (a) = j6=i aij be the degree of vertex i, and let
deg(a) = (deg1 (a), . . . , degn (a)) be the degree sequence of a.
Characterization of maximum entropy distribution
Let S be a σ-algebra over the set of weight values S, and assume there is a canonical σn
n
finite probability measure ν on (S, S). Let ν ( 2 ) be the product measure on S ( 2 ) , and let
n
P be the set of all probability distributions on S ( 2 ) that are absolutely continuous with
n
n
respect to ν ( 2 ) . Since ν ( 2 ) is σ-finite, these probability distributions can be characterized
n
by their density functions, i.e. the Radon-Nikodym derivatives with respect to ν ( 2 ) . Given
a sequence d = (d1 , . . . , dn ) ∈ Rn , let Pd be the set of distributions in P whose expected
degree sequence is equal to d,
Pd = {P ∈ P : EP [deg(A)] = d},
n
where in the definition above, the random variable A = (Aij ) ∈ S ( 2 ) is drawn from the
distribution P. Then the distribution P∗ in Pd with maximum entropy is precisely the exponential family distribution with the degree sequence as sufficient statistics [53, Chapter 3].
n
Specifically, the density of P∗ at a = (aij ) ∈ S ( 2 ) is given by1
p∗ (a) = exp − θ> deg(a) − Z(θ) ,
(2.1)
where Z(θ) is the log-partition function,
Z
n
Z(θ) = log
− θ> deg(a) ν ( 2 ) (da),
n exp
S(2)
and θ = (θ1 , . . . , θn ) is a parameter that belongs to the natural parameter space
Θ = {θ ∈ Rn : Z(θ) < ∞}.
We will also write P∗θ if we need to emphasize the dependence of P∗ on the parameter θ.
1
We choose to use −θ in the parameterization (2.1), instead of the canonical parameterization p∗ (a) ∝
exp(θ> deg(a)), because it simplifies the notation in our later presentation.
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
Using the definition degi (a) =
P
j6=i
6
aij , we can write
X
Y
exp − θ> deg(a) = exp −
(θi + θj )aij =
exp − (θi + θj )aij .
{i,j}
{i,j}
Hence, we can express the log-partition function as
YZ
X
Z(θ) = log
exp − (θi + θj )aij ν(daij ) =
Z1 (θi + θj ),
{i,j}
S
(2.2)
{i,j}
in which Z1 (t) is the marginal log-partition function
Z
Z1 (t) = log exp(−ta) ν(da).
(2.3)
S
Consequently, the density in (2.1) can be written as
Y
p∗ (a) =
exp − (θi + θj )aij − Z1 (θi + θj ) .
{i,j}
This means the edge weights Aij are independent random variables, with Aij ∈ S having
distribution P∗ij with density
p∗ij (a) = exp − (θi + θj )a − Z1 (θi + θj ) .
In particular, the edge weights Aij belong to the same exponential family distribution but
with different parameters that depend on θi and θj (or rather, on their sum θi + θj ). The
parameters θ1 , . . . , θn can be interpreted as the potential at each vertex that determines how
strongly the vertices are connected to each other. Furthermore, we can write the natural
parameter space Θ as
Θ = {θ ∈ Rn : Z1 (θi + θj ) < ∞ for all i 6= j}.
Maximum likelihood estimator and moment-matching equation
Using the characterization of P∗ as the maximum entropy distribution in Pd , the condition
P∗ ∈ Pd means we need to choose the parameter θ for P∗θ such that Eθ [deg(A)] = d.2 This
is an instance of the moment-matching equation, which, in the case of exponential family
distributions, is well-known to be equivalent to finding the maximum likelihood estimator
(MLE) of θ given an empirical degree sequence d ∈ Rn .
Specifically, suppose we draw graph samples G1 , . . . , Gm i.i.d. from the distribution
∗
P with parameter θ∗ , and we want to find the MLE θ̂ of θ∗ based on the observations
2
Here we write Eθ in place of EP∗ to emphasize the dependence of the expectation on the parameter θ.
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
7
G1 , . . . , Gm . Using the parametric form of the density (2.1), this is equivalent to solving the
maximization problem
max F(θ) ≡ −θ> d − Z(θ),
θ∈Θ
where d is the average of the degree sequences of G1 , . . . , Gm . Setting the gradient of F(θ)
to zero reveals that the MLE θ̂ satisfies
− ∇Z(θ̂) = d.
(2.4)
Recall that the gradient of the log-partition function in an exponential family distribution
is equal to the expected sufficient statistics. In our case, we have −∇Z(θ̂) = Eθ̂ [deg(A)], so
the MLE equation (2.4) recovers the moment-matching equation Eθ̂ [deg(A)] = d.
In Chapter 3 we study the properties of the MLE of θ from a single sample G ∼ P∗θ . In
the remainder of this chapter, we address the question of the existence and uniqueness of
the MLE with a given empirical degree sequence d.
Define the mean parameter space M to be the set of expected degree sequences from all
n
n
distributions on S ( 2 ) that are absolutely continuous with respect to ν ( 2 ) ,
M = {EP [deg(A)] : P ∈ P}.
The set M is necessarily convex, since a convex combination of probability distributions in
P is also a probability distribution in P. Recall that an exponential family distribution is
minimal if there is no linear combination of the sufficient statistics that is constant almost
surely with respect to the base distribution. This minimality property clearly holds for
P∗ , for which the sufficient statistics are the degree sequence. We say that P∗ is regular
if the natural parameter space Θ is open. By the general theory of exponential family
distributions [53, Theorem 3.3], in a regular and minimal exponential family distribution,
the gradient of the log-partition function maps the natural parameter space Θ to the interior
of the mean parameter space M, and this mapping3
−∇Z : Θ → M◦
is bijective. We summarize the preceding discussion in the following result.
Proposition 2.1. Assume Θ is open. Then there exists a solution θ ∈ Θ to the MLE
equation Eθ [deg(A)] = d if and only if d ∈ M◦ , and if such a solution exists then it is
unique.
We now characterize the mean parameter space M. We say that a sequence d =
(d1 , . . . , dn ) is graphic (or a graphical sequence) if d is the degree sequence of a graph G
with edge weights in S, and in this case we say that G realizes d. It is important to note
that whether a sequence d is graphic depends on the weight set S, which we consider fixed
for now.
3
The mapping is −∇Z, instead of ∇Z, because of our choice of the parameterization in (2.1) using −θ.
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
8
Proposition 2.2. Let W be the set of all graphical sequences, and let conv(W) be the convex
hull of W. Then M ⊆ conv(W). Furthermore, if P contains the Dirac delta measures, then
M = conv(W).
Proof. The inclusion M ⊆ conv(W) is clear, since any element of M is of the form EP [deg(A)]
for some distribution P and deg(A) ∈ W for every realization of the random variable A. Now
n
suppose P contains the Dirac delta measures δB for each B ∈ S ( 2 ) . Given d ∈ W, let B
be the adjacency matrix of the graph that realizes d. Then d = EδB [deg(A)] ∈ M, which
means W ⊆ M, and hence conv(W) ⊆ M since M is convex.
As we shall see in Chapter 3, the result above allows us to conclude that M = conv(W)
for the case of discrete weighted graphs. On the other hand, for the case of continuous
weighted graphs we need to prove M = conv(W) directly since P in this case does not
contain the Dirac measures.
Remark 2.3. We emphasize the distinction between a valid solution θ ∈ Θ and a general
solution θ ∈ Rn to the MLE equation Eθ [deg(A)] = d. As we saw from Proposition 2.1, we
have a precise characterization of the existence and uniqueness of the valid solution θ ∈ Θ,
but in general, there are multiple solutions θ to the MLE equation. In this thesis we shall be
concerned only with the valid solution; Sanyal, Sturmfels, and Vinzant study some algebraic
properties of general solutions [36].
We close this chapter by discussing the symmetry of the valid solution to the MLE equation. Recall the decomposition (2.2) of the log-partition function Z(θ) into the marginal logpartition functions Z1 (θi + θj ). Let Dom(Z1 ) = {t ∈ R : Z1 (t) < ∞}, and let µ : Dom(Z1 ) →
R denote the (marginal) mean function
Z
µ(t) =
a exp − ta − Z1 (t) ν(da).
S
Observing that we can write
Z
Eθ [Aij ] =
a exp − (θi + θj )a − Z1 (θi + θj ) ν(da) = µ(θi + θj ),
S
the MLE equation Eθ [deg(A)] = d then becomes
X
di =
µ(θi + θj ) for i = 1, . . . , n.
(2.5)
j6=i
In the statement below, sgn denotes the sign function: sgn(t) = t/|t| if t 6= 0, and
sgn(0) = 0.
Proposition 2.4. Let d ∈ M◦ , and let θ ∈ Θ be the unique solution to the system of
equations (2.5). If µ is strictly increasing, then
sgn(di − dj ) = sgn(θi − θj )
for all i 6= j,
CHAPTER 2. PRELIMINARIES AND GENERAL THEORY
9
and similarly, if µ is strictly decreasing, then
sgn(di − dj ) = sgn(θj − θi )
for all i 6= j.
Proof. Given i 6= j,
!
di − dj =
µ(θi + θj ) +
X
µ(θi + θk )
k6=i,j
=
X
!
−
µ(θj + θi ) +
X
µ(θj + θk )
k6=i,j
µ(θi + θk ) − µ(θj + θk ) .
k6=i,j
If µ is strictly increasing, then µ(θi + θk ) − µ(θj + θk ) has the same sign as θi − θj for each
k 6= i, j, and thus di −dj also has the same sign as θi −θj . Similarly, if µ is strictly decreasing,
then µ(θi + θk ) − µ(θj + θk ) has the opposite sign of θi − θj , and thus di − dj also has the
opposite sign of θi − θj .
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
10
Chapter 3
Analysis for specific edge weights
In this chapter we analyze the maximum entropy random graph distributions for several
specific choices of the weight set S. For each case, we specify the distribution of the edge
weights Aij , the mean function µ, the natural parameter space Θ, and characterize the mean
parameter space M. We also study the problem of finding the MLE θ̂ of θ from one graph
sample G ∼ P∗θ and prove the existence, uniqueness, and consistency of the MLE. Along the
way, we derive analogues of the Erdős-Gallai criterion of graphical sequences for weighted
graphs. We defer the proofs of the results presented here to Chapter 5.
3.1
Finite discrete weighted graphs
We first study weighted graphs with edge weights in the finite discrete set S = {0, 1, . . . , r −
1}, where r ≥ 2. The case r = 2 corresponds to unweighted graphs, and our analysis in this
section recovers the results of Chatterjee, Diaconis, and Sly [8]. The proofs of the results in
this section are provided in Section 5.2.
Characterization of the distribution
We take ν to be the counting measure on S. Following the development in Chapter 2, the
edge weights Aij ∈ S are independent random variables with density
p∗ij (a) = exp − (θi + θj )a − Z1 (θi + θj ) , 0 ≤ a ≤ r − 1,
where the marginal log-partition function Z1 is given by
(
r−1
X
log 1−exp(−rt)
1−exp(−t)
Z1 (t) = log
exp(−at) =
log
r
a=0
if t 6= 0,
if t = 0.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
11
Since Z1 (t) < ∞ for all t ∈ R, the natural parameter space Θ = {θ ∈ Rn : Z1 (θi + θj ) <
∞, i 6= j} is given by Θ = Rn . The mean function is given by
r−1
X
Pr−1
a exp(−at)
.
µ(t) =
a exp(−at − Z1 (t)) = Pa=0
r−1
exp(−at)
a=0
a=0
(3.1)
At t = 0 the mean function takes the value
Pr−1
a
r−1
,
µ(0) = a=0 =
r
2
while for t 6= 0, the mean function simplifies to
µ(t) = −
1 − exp(−t)
1 − exp(−rt)
r−1
1
r
d X
exp(−at) =
−
.
·
dt a=0
exp(t) − 1 exp(rt) − 1
(3.2)
Figure 3.1 shows the behavior of the mean function µ(t) and its derivative µ0 (t) as r varies.
r
r
r
r
r
r
r
r
r
8
µ(t)
6
=
=
=
=
=
=
=
=
=
0
2
3
4
5
6
7
8
9
10
−2
−4
µ !(t)
10
4
−6
2
−8
0
−3
−2
−1
0
t
1
2
3
(a)
−10
−1
r
r
r
r
r
r
r
r
r
−0.5
0
t
0.5
=
=
=
=
=
=
=
=
=
2
3
4
5
6
7
8
9
10
1
(b)
Figure 3.1: Plot of the mean function µ(t) (left) and its derivative µ0 (t) (right) as r varies.
Student Version of MATLAB
Student Version of MATLAB
Remark 3.1. For r = 2, the edge weights Aij are independent Bernoulli random variables
with
exp(−θi − θj )
1
P∗ (Aij = 1) = µ(θi + θj ) =
=
.
1 + exp(−θi − θj )
1 + exp(θi + θj )
As noted above, this is the model recently studied by Chatterjee, Diaconis, and Sly [8] in the
context of graph limits. When θ1 = θ2 = · · · = θn = t, we recover the classical Erdős-Rényi
model with edge emission probability p = 1/(1 + exp(2t)).
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
12
Existence, uniqueness, and consistency of the MLE
Consider the problem of finding the MLE of θ from one graph sample. Specifically, let θ ∈ Θ
and suppose we draw a sample G ∼ P∗θ . Then, as we saw in Chapter 2, the MLE θ̂ of θ is a
solution to the moment-matching equation Eθ̂ [deg(A)] = d, where d is the degree sequence
of the sample graph G. As in (2.5), the moment-matching equation is equivalent to the
following system of equations:
X
di =
µ(θ̂i + θ̂j ), i = 1, . . . , n.
(3.3)
j6=i
Since the natural parameter space Θ = Rn is open, Proposition 2.1 tells us that the MLE
θ̂ exists and is unique if and only if the empirical degree sequence d belongs to the interior
M◦ of the mean parameter space M.
n
n
n
We also note that since ν ( 2 ) is the counting measure on S ( 2 ) , all distributions on S ( 2 )
n
are absolutely continuous with respect to ν ( 2 ) , so P contains all probability distributions
n
on S ( 2 ) . In particular, P contains the Dirac measures, and by Proposition 2.2, this implies
M = conv(W), where W is the set of all graphical sequences.
The following result characterizes when d is a degree sequence of a weighted graph with
edge weights in S; we also refer to such d as a (finite discrete) graphical sequence. The case
r = 2 recovers the classical Erdős-Gallai criterion [13].
Theorem 3.2. A sequence (d1 , . . . , dn ) ∈ Nn0 with d1 ≥ d2 ≥ · · · ≥ dn is the degree
Pn sequence
of a graph G with edge weights in the set S = {0, 1, . . . , r − 1}, if and only if i=1 di is even
and
k
n
X
X
min{dj , (r − 1)k} for k = 1, . . . , n.
(3.4)
di ≤ (r − 1)k(k − 1) +
i=1
j=k+1
Although the result above provides a precise characterization of the set of graphical
sequences W, it is not immediately clear how to characterize the convex hull conv(W), or
how to decide whether a given d belongs to M◦ = conv(W)◦ . Fortunately, in practice we
can circumvent this issue by employing the following algorithm to compute the MLE. The
case r = 2 recovers the iterative algorithm proposed by Chatterjee et al. [8] in the case of
unweighted graphs.
Theorem 3.3. Given d = (d1 , . . . , dn ) ∈ Rn+ , define the function ϕ : Rn → Rn by ϕ(x) =
(ϕ1 (x), . . . , ϕn (x)), where
!
X
1
log
µ(xi + xj ) − log di .
(3.5)
ϕi (x) = xi +
r−1
j6=i
Starting from any θ(0) ∈ Rn , define
θ(k+1) = ϕ(θ(k) ),
k ∈ N0 .
(3.6)
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
13
Suppose d ∈ conv(W)◦ , so the MLE equation (3.3) has a unique solution θ̂. Then θ̂ is a
fixed point of the function ϕ, and the iterates (3.6) converge to θ̂ geometrically fast: there
exists a constant β ∈ (0, 1) that only depends on (kθ̂k∞ , kθ(0) k∞ ), such that
kθ(k) − θ̂k∞ ≤ β k−1 kθ(0) − θ̂k∞ ,
k ∈ N0 .
(3.7)
Conversely, if d ∈
/ conv(W)◦ , then the sequence {θ(k) } has a divergent subsequence.
r
r
r
r
r
r
r
r
r
0
−4
−6
2
3
4
5
6
7
8
9
10
−8
−10
n = 200, r = 5
1
0.5
0.5
0
−0.5
−12
−14
−16
0
n = 200, r = 2
1
ML E θ̂
log ||θ t − θ̂ || ∞
−2
=
=
=
=
=
=
=
=
=
ML E θ̂
2
0
−0.5
−1
−1
100
200
300
Iter ations (t)
400
500
(a)
−1
−0.5
0
Tr ue θ ∗
0.5
1
(b)
−1
−0.5
0
Tr ue θ ∗
0.5
1
(c)
Figure 3.2: (a) Plot of log kθ(t) − θ̂k∞ for various values of r, where θ̂ is the final value of
θ(t) when the algorithm converges; (b) Scatter plot of the estimate θ̂ vs. the true parameter
θ for r = 2; (c) Scatter plot for r = 5.
Student Version of MATLAB
Student Version of MATLAB
Student Version of MATLAB
Figure 3.2 demonstrates the performance of the algorithm presented above. We set
n = 200 and sample θ ∈ [−1, 1]n uniformly at random. Then for each 2 ≤ r ≤ 10, we sample
a graph from the distribution P∗θ , compute the empirical degree sequence d, and run the
iterative algorithm starting with θ(0) = 0 until convergence. The left panel (Figure 3.2(a))
shows the rate of convergence (on a logarithmic scale) of the algorithm for various values of
r. We observe that the iterates {θ(t) } indeed converge geometrically fast to the MLE θ̂, but
the rate of convergence decreases as r increases. By examining the proof of Theorem 3.3 in
Section 5.2, we see that the term β has the expression
2
1
exp(2K) − 1
µ0 (2K)
2
min
β =1−
, −
,
(r − 1)2
exp(2rK) − 1
µ(−2K)
where K = 2kθ̂k∞ +kθ(0) k∞ . This shows that β is an increasing function of r, which explains
the empirical decrease in the rate of convergence as r increases.
Figures 3.2(b) and (c) show the plots of the estimate θ̂ versus the true θ. Notice that the
points lie close to the diagonal line, which suggests that the MLE θ̂ is very close to the true
parameter θ. Indeed, the following result shows that θ̂ is a consistent estimator of θ. Recall
that θ̂ is consistent if θ̂ converges in probability to θ as n → ∞.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
14
Theorem 3.4. Let M > 0 and k > 1 be fixed. Given θ ∈ Rn with kθk∞ ≤ M , consider the
problem of finding the MLE θ̂ of θ based on one graph sample G ∼ P∗θ . Then for sufficiently
large n, with probability at least 1 − 2n−(k−1) the MLE θ̂ exists and satisfies
r
k log n
kθ̂ − θk∞ ≤ C
,
n
where C is a constant that only depends on M .
3.2
Continuous weighted graphs
In this section we study weighted graphs with edge weights in R0 . The proofs of the results
presented here are provided in Section 5.3.
Characterization of the distribution
We take ν to be the Lebesgue measure on R0 . The marginal log-partition function is
(
Z
log(1/t) if t > 0
Z1 (t) = log
exp(−ta) da =
∞
if t ≤ 0.
R0
Thus Dom(Z1 ) = R+ , and the natural parameter space is
Θ = {(θ1 , . . . , θn ) ∈ Rn : θi + θj > 0 for i 6= j}.
For θ ∈ Θ, the edge weights Aij are independent exponential random variables with density
p∗ij (a) = (θi + θj ) exp − (θi + θj ) a
for a ∈ R0
and mean parameter Eθ [Aij ] = 1/(θi + θj ). The corresponding mean function is given by
1
µ(t) = ,
t
t > 0.
Existence, uniqueness, and consistency of the MLE
We now consider the problem of finding the MLE of θ from one graph sample G ∼ P∗θ . As
we saw previously, the MLE θ̂ ∈ Θ satisfies the moment-matching equation Eθ̂ [deg(A)] = d,
where d is the degree sequence of the sample graph G. Equivalently, θ̂ ∈ Θ is a solution to
the system of equations
X 1
, i = 1, . . . , n.
(3.8)
di =
θ̂
+
θ̂
i
j
j6=i
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
15
Remark 3.5. The system (3.8) is a special case of a general class that Sanyal, Sturmfels,
and Vinzant [36] study using algebraic geometry and matroid theory (extending the work of
Proudfoot and Speyer [33]). Define
χ(t) =
n X
n
k=0
k
n−1
(2)
+n
(t − 1)k ,
k
(2)
in which nk is the Stirling number of the second kind and (x)k+1 = x(x − 2) · · · (x − 2k)
is a generalized falling factorial. Then, there is a polynomial H(d) in the di such that for
d ∈ Rn with H(d) 6= 0, the number of solutions θ ∈ Rn to (3.8) is (−1)n χ(0). Moreover, the
polynomial H(d) has degree 2(−1)n (nχ(0) + χ0 (0)) and characterizes those d for which the
equations above have multiple roots. We refer to [36] for more details.
Since the natural parameter space Θ is open, Proposition 2.1 tells us that the MLE θ̂
exists and is unique if and only if the empirical degree sequence d belongs to the interior
M◦ of the mean parameter space M. We characterize the set of graphical sequences W and
determine its relation to the mean parameter space M.
We say d = (d1 , . . . , dn ) is a (continuous) graphical sequence if there is a graph G with
edge weights in R0 that realizes d. The finite discrete graphical sequences from Section 3.1
have combinatorial constraints because there are only finitely many possible edge weights
between any pair of vertices, and these constraints translate into a set of inequalities in the
generalized Erdős-Gallai criterion in Theorem 3.2. In the case of continuous weighted graphs,
however, we do not have these constraints because every edge can have as much weight as
possible. Therefore, the criterion for a continuous graphical sequence should be simpler than
in Theorem 3.2, as the following result shows.
Theorem 3.6. A sequence (d1 , . . . , dn ) ∈ Rn0 is graphic if and only if
n
1X
di .
max di ≤
1≤i≤n
2 i=1
(3.9)
We note that condition (3.9) is implied by the case k = 1 in the conditions (3.4). This is
to be expected, since any finite discrete weighted graph is also a continuous weighted graph,
so finite discrete graphical sequences are also continuous graphical sequences.
Given the criterion in Theorem 3.6, we can write the set W of graphical sequences as
n
n
1X o
n
W = (d1 , . . . , dn ) ∈ R0 : max di ≤
di .
1≤i≤n
2 i=1
Moreover, we can also show that the set of graphical sequences coincide with the mean
parameter space.
Lemma 3.7. The set W is convex, and M = W.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
16
The result above, together with the result of Proposition 2.1, implies that the MLE θ̂
exists and is unique if and only if the empirical degree sequence d belongs to the interior of
the mean parameter space, which can be written explicitly as
n
n
1 X 0o
M◦ = (d01 , . . . , d0n ) ∈ Rn+ : max d0i <
d .
1≤i≤n
2 i=1 i
Example 3.8. Let n = 3 and d = (d1 , d2 , d3 ) ∈ Rn with d1 ≥ d2 ≥ d3 . It is easy to see that
the system of equations (3.8) gives us
1
1
= (d1 + d2 − d3 ),
2
θ̂1 + θ̂2
1
1
= (d1 − d2 + d3 ),
2
θ̂1 + θ̂3
1
1
= (−d1 + d2 + d3 ),
2
θ̂2 + θ̂3
from which we obtain a unique solution θ̂ = (θ̂1 , θ̂2 , θ̂3 ). Recall that θ̂ ∈ Θ means θ̂1 + θ̂2 > 0,
θ̂1 + θ̂3 > 0, and θ̂2 + θ̂3 > 0, so the equations above tell us that θ̂ ∈ Θ if and only if
d1 < d2 + d3 . In particular, this also implies d3 > d1 − d2 ≥ 0, so d ∈ R3+ . Hence, there is
a unique solution θ̂ ∈ Θ to the system of equations (3.8) if and only if d ∈ M◦ , as claimed
above.
Finally, we prove that the MLE θ̂ is a consistent estimator of θ.
Theorem 3.9. Let M ≥ L > 0 and k ≥ 1 be fixed. Given θ ∈ Θ with L ≤ θi + θj ≤ M ,
i 6= j, consider the problem of finding the MLE θ̂ ∈ Θ of θ from one graph sample G ∼ P∗θ .
Then for sufficiently large n, with probability at least 1 − 2n−(k−1) the MLE θ̂ ∈ Θ exists and
satisfies
s
2
k log n
100M
,
kθ̂ − θk∞ ≤
L
γn
where γ > 0 is a universal constant.
3.3
Infinite discrete weighted graphs
We now turn our focus to weighted graphs with edge weights in N0 . The proofs of the results
presented here can be found in Section 5.4.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
17
Characterization of the distribution
We take ν to be the counting measure on N0 . In this case the marginal log-partition function
is given by
(
∞
X
− log 1 − exp(−t)
if t > 0,
Z1 (t) = log
exp(−at) =
∞
if t ≤ 0.
a=0
Thus, the domain of Z1 is Dom(Z1 ) = (0, ∞), and the natural parameter space is
Θ = {(θ1 , . . . , θn ) ∈ Rn : θi + θj > 0 for i 6= j},
which is the same natural parameter space as in the case of continuous weighted graphs in
the preceding section. Given θ ∈ Θ, the edge weights Aij are independent geometric random
variables with probability mass function
P∗ (Aij = a) = 1 − exp(−θi − θj ) exp − (θi + θj ) a , a ∈ N0 .
The mean parameters are
EP∗ [Aij ] =
exp(−θi − θj )
1
=
,
1 − exp(−θi − θj )
exp(θi + θj ) − 1
induced by the mean function
µ(t) =
1
,
exp(t) − 1
t > 0.
Existence, uniqueness, and consistency of the MLE
Consider the problem of finding the MLE of θ from one graph sample G ∼ P∗θ . Let d denote
the degree sequence of G. Then the MLE θ̂ ∈ Θ, which satisfies the moment-matching
equation Eθ̂ [deg(A)] = d, is a solution to the system of equations
di =
X
1
j6=i
exp(θ̂i + θ̂j ) − 1
,
i = 1, . . . , n.
(3.10)
We note that the natural parameter space Θ is open, so by Proposition 2.1, the MLE θ̂
exists and is unique if and only if d ∈ M◦ , where M is the mean parameter space. Since
n
(n)
ν ( 2 ) is the counting measure on N 2 , the set P contains all the Dirac measures, so we know
0
M = conv(W) from Proposition 2.2. Here W is the set of all (infinite discrete) graphical
sequences, namely, the set of degree sequences of weighted graphs with edge weights in N0 .
The following result provides a precise criterion for such graphical sequences. Note that
condition (3.11) below is implied by the limit r → ∞ in Theorem 3.2.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
Theorem 3.10. A sequence (d1 , . . . , dn ) ∈ Nn0 is graphic if and only if
18
Pn
i=1
di is even and
n
1X
max di ≤
di .
1≤i≤n
2 i=1
(3.11)
The criterion in Theorem 3.10 allows us to write an explicit form for the set of graphical
sequences W,
n
n
n
X
1X o
n
W = (d1 , . . . , dn ) ∈ N0 :
di is even and max di ≤
di .
1≤i≤n
2 i=1
i=1
Now we need to characterize conv(W). Let W1 denote the set of all continuous graphical
sequences from Theorem 3.6, when the edge weights are in R0 ,
n
n
1X o
W1 = (d1 , . . . , dn ) ∈ Rn0 : max di ≤
di .
1≤i≤n
2 i=1
It turns out that when we take the convex hull of W, we essentially recover W1 .
Lemma 3.11. conv(W) = W1 .
Recalling that a convex set and its closure have the same interior points, the result above
gives us
n
n
◦
1X o
◦
n
M = conv(W) = conv(W) = W1 = (d1 , . . . , dn ) ∈ R+ : max di <
di .
1≤i≤n
2 i=1
◦
◦
Example 3.12. Let n = 3 and d = (d1 , d2 , d3 ) ∈ Rn with d1 ≥ d2 ≥ d3 . It can be easily
verified that the system of equations (3.10) gives us
2
θ̂1 + θ̂2 = log 1 +
,
d1 + d2 − d3
2
θ̂1 + θ̂3 = log 1 +
,
d1 − d2 + d3
2
θ̂2 + θ̂3 = log 1 +
,
−d1 + d2 + d3
from which we can obtain a unique solution θ̂ = (θ̂1 , θ̂2 , θ̂3 ). Recall that θ̂ ∈ Θ means
θ̂1 + θ̂2 > 0, θ̂1 + θ̂3 > 0, and θ̂2 + θ̂3 > 0, so the equations above tell us that θ̂ ∈ Θ if and only
if 2/(−d1 + d2 + d3 ) > 0, or equivalently, d1 < d2 + d3 . This also implies d3 > d1 − d2 ≥ 0,
so d ∈ R3+ . Thus, the system of equations (3.10) has a unique solution θ̂ ∈ Θ if and only if
d ∈ M◦ , as claimed above.
Finally, we prove that with high probability the MLE θ̂ exists and converges to θ.
CHAPTER 3. ANALYSIS FOR SPECIFIC EDGE WEIGHTS
19
Theorem 3.13. Let M ≥ L > 0 and k ≥ 1 be fixed. Given θ ∈ Θ with L ≤ θi + θj ≤ M ,
i 6= j, consider the problem of finding the MLE θ̂ ∈ Θ of θ from one graph sample G ∼ P∗θ .
Then for sufficiently large n, with probability at least 1 − 3n−(k−1) the MLE θ̂ ∈ Θ exists and
satisfies
s
8 exp(5M ) k log n
kθ̂ − θk∞ ≤
,
L
γn
where γ > 0 is a universal constant.
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
20
Chapter 4
Inverses of diagonally dominant
matrices
In this chapter we prove a bound on the inverses of symmetric, diagonally dominant positive matrices. This bound plays a crucial role in proving the consistency of the MLE in
Sections 3.2 and 3.3. The result that we prove in this chapter also applies to a large class of
matrices to which the classical bounds in literature are not applicable. Our analysis requires
an intricate study of the structure of the problem space and the combinatorics of the limit
of matrix inverses. We refer the reader to the manuscript [14] for other applications and
additional results, including a refinement of Hadamard’s inequality for diagonally dominant
matrices.
4.1
Statement of the result
An n × n real matrix J is diagonally dominant if
X
∆i (J) := |Jii | −
|Jij | ≥ 0,
for i = 1, . . . , n.
j6=i
Irreducible, diagonally dominant matrices are always invertible, and such matrices arise often
in theory and applications. As an example, the work of Spielman and Teng [44, 45] gives
algorithms to solve symmetric, diagonally dominant linear systems in nearly-linear time
in the input size, a fundamental advance in algorithmic complexity theory and numerical
computation. These systems are important since they arise naturally in many practical
applications of linear algebra to graph theory [43].
In this chapter, we study the inverse of diagonally dominant matrices that have only
positive entries. A classical result of Varah [49] states that if J is strictly diagonally dominant,
i.e. if ∆i (J) > 0 for 1 ≤ i ≤ n, then the inverse of J satisfies the bound:
kJ −1 k∞ ≤ max
1≤i≤n
1
.
∆i (J)
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
21
Recall that k · k∞ is the maximum absolute row sum of a matrix, which is the matrix norm
induced by the `∞ -norm on vectors in Rn . Generalizations of this basic estimate can be found
in [50], [39], and [25], but all involve the quantity max1≤i≤n 1/∆i (J). In practice, however,
one sometimes requires bounds when ∆i (J) = 0 for some i, in which case the estimates
appearing in [49, 50, 39, 25] do not apply. A particularly interesting case is when ∆i (J) = 0
for all i; we call such matrices diagonally balanced. As we shall see in Chapter 5, this is
indeed the scenario that we need to analyze in proving Theorems 3.9 and 3.13.
We prove a tight bound on kJ −1 k∞ for symmetric diagonally dominant J with positive
entries that is independent of the quantities ∆i (J), and thus also of the maximum entry of J.
Let S = (n − 2)In + 1n 1>
n be the diagonally balanced matrix whose off-diagonal entries are
all equal to 1, and recall the Loewner partial ordering on symmetric matrices: A B means
that A − B is positive semidefinite. We shall also write A ≥ B if A − B is a nonnegative
matrix. It can be immediately seen that if ` > 0 and J is a symmetric diagonally dominant
matrix satisfying J ≥ `S, then J `S 0; in particular, J is invertible. Throughout this
chapter, In and 1n denote the n × n identity matrix and the n-dimensional column vector
consisting of all ones, respectively. We also write I and 1 if the dimension n is understood.
The following is the main result in this chapter.
Theorem 4.1. Let n ≥ 3. For any symmetric diagonally dominant matrix J with Jij ≥ ` >
0, we have
3n − 4
1
.
kJ −1 k∞ ≤ kS −1 k∞ =
`
2`(n − 2)(n − 1)
Moreover, equality is achieved if and only if J = `S.
Remark 4.2. Theorem 4.1 fails to hold if we relax the assumption that J be symmetric.
For t ≥ 0, consider the following diagonally balanced matrices and their inverses:
 t+3 t−1



−t − 1
2+t
1
1+t
t+1
t+1
1

t+3
−t − 1 .
2 + t 1 + t ; Jt−1 =  t−1
Jt =  1
t+1
t+1
4
1
1
2
−1 −1 t + 3
We have kJt−1 k∞ → ∞ as t → ∞.
In other words, the map J 7→ kJ −1 k∞ over the (translated) cone of symmetric diagonally
dominant matrices J ≥ `S is maximized uniquely at the point of the cone; i.e., when
J = `S. If the off-diagonal entries of J are bounded above by m and the largest of the
diagonal dominances ∆i (J) is δ, we also have the trivial lower bound:
1
≤ kJ −1 k∞ ,
2m(n − 1) + δ
which follows from submultiplicativity of the matrix norm k · k∞ . Therefore, the value of
kJ −1 k∞ = Θ(1/n) is tightly constrained for bounded `, m and δ.
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
22
We now probe some of the difficulty of Theorem 4.1 by first deriving an estimate using
standard bounds in matrix analysis. The relation J `S is equivalent to J −1 (`S)−1 =
1 −1
S [19, Corollary 7.7.4], and therefore by a basic inequality [20, p. 214, Ex. 14], we have
`
kJ −1 k ≤ 1` kS −1 k for any unitarily invariant matrix norm k · k, such as the spectral k · k2 ,
Frobenius, or Ky-Fan norms. It follows, for instance, that
√
√
√
n −1
n
−1
−1
kS k2 =
.
(4.1)
kJ k∞ ≤ n kJ k2 ≤
`
(n − 2)`
√
However, this bound is O(1/ n), whereas the bound given in Theorem 4.1 is O(1/n). This
difference turns out to be important in the proofs of Theorems 3.9 and 3.13.
Another standard approach to proving norm estimates such as the one in Theorem 4.1
is a perturbation analysis. More specifically, given a symmetric diagonally dominant J with
entries bounded below by `, one tries to replace each entry Jij by ` and prove that the norm
of the inverse of the resulting matrix is larger. However, such a method will not succeed,
even in the balanced case, as the following examples demonstrate.
Example 4.3. Consider the following two balanced matrices:




12 4 1 7
9 1 1 7
4 9 3 2


 and H = 1 6 3 2  .
J =
1 3 7 3
1 3 7 3 
7 2 3 12
7 2 3 12
Here, H is obtained from J by changing its (1, 2)-entry to 1 and then keeping the resulting
matrix balanced. A direct computation shows that kH −1 k∞ < kJ −1 k∞ . As another example,
the following two matrices:




3 2 1
3 1 1
J = 2 3 1 and H = 1 3 1
1 1 2
1 1 2
also have kH −1 k∞ < kJ −1 k∞ . Here, the (1, 2)-entry was changed without keeping the matrix
balanced.
We next describe an interesting special case revealing some surprising combinatorics
underlying Theorem 4.1. Let P be a symmetric diagonally dominant matrix with Pij ∈ {0, 1}
and ∆i (P ) ∈ {0, 2}. Each such matrix P is a signless Laplacian of an undirected, unweighted
graph G, possibly with self-loops. The limits
N = lim (S + tP )−1
t→∞
(4.2)
form special cases of Theorem 4.1, and we compute them explicitly in Section 4.3. As we
shall see, they are an essential calculation for our proof. The matrices N are determined by
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
23
the bipartition structure of the connected components of G. For instance, if G is connected
and not bipartite, the limit (4.2) is the zero matrix (see Corollary 4.11); see also the example
below. For some recent work on the general eigenstructure of signless Laplacians, we refer
the reader to [11] and the references therein.
Example 4.4. Consider the chain graph G with edges {1, n} and {i, i+1} for i = 1, . . . , n−1.
If n is odd then N = 0 since G is not bipartite, while if n is even, the limit N has alternating
entries:
(−1)i+j
.
Nij =
n(n − 2)
As another example, consider the star graph G, which has edges {1, i} for i = 2, . . . , n. In
this case,
1
N1i = Ni1 = −
, i = 2, . . . , n,
2(n − 1)(n − 2)
and
Nij =
1
2(n − 1)(n − 2)
otherwise.
We finish this introduction with an organization of the rest of this chapter. Theorem 4.1
will be generalized in Theorem 4.23 where we consider diagonally dominant matrices J ≥
S(α, `) := αIn + `1n 1>
n with α ≥ (n − 2)` > 0. We break up the proof of this general theorem
into three main steps in Sections 4.2 to 4.5, where we write S(α, `) as S for simplicity. The
first step considers the problem of maximizing kJ −1 k∞ over symmetric diagonally dominant
J with Jij and ∆i (J) in some finite intervals (Section 4.2). In this case, the maximum is
achieved when J is on the corners of the space; namely, when Jij and ∆i (J) are one of the
endpoints of the finite intervals. In the second step (Section 4.3), we analyze the behavior
of corner matrices at infinity and show that the limit k(S + tP )−1 k∞ as t → ∞ when P is
a signless Laplacian (Pij ∈ {0, 1} and ∆i (P ) ∈ {0, 2}) is at most kS −1 k∞ . Combined with
the first step, this verifies that the matrix S maximizes kJ −1 k∞ over the space of symmetric
diagonally dominant matrices J ≥ S. The remainder of the argument deals with the behavior
of kJ −1 k∞ near S to show that S is indeed the unique maximizer (Section 4.4). All three
steps are combined in Section 4.5 to give the proof of the main result.
4.2
Reduction to exact limiting cases
To prove Theorem 4.1, we need to show that the maximum of kJ −1 k∞ over the space of
symmetric diagonally dominant matrices J ≥ S := αIn + `1n 1>
n is achieved at J = S. A
priori, it is not even clear that a maximum exists since this space is not compact. In this
section we consider maximizing kJ −1 k∞ over compact sets of symmetric diagonally dominant
matrices J, and we show that the maxima occur at the corners of the space. In subsequent
sections we analyze these corner matrices in more detail.
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
24
Fix m ≥ 1, and let D = Dm denote the set of n×n matrices of the form J = S +(m−`)P
where P is some symmetric diagonally dominant matrix satisfying
0 ≤ Pij ≤ 1 for i 6= j
and
0 ≤ ∆i (P ) ≤ 2 for i = 1, . . . , n.
We say that J ∈ D is a corner matrix if
Pij ∈ {0, 1} for i 6= j
and
∆i (P ) ∈ {0, 2} for i = 1, . . . , n.
Equivalently, J is a corner matrix if P is a signless Laplacian matrix. Let T denote the set of
matrices J ∈ D that maximize kJ −1 k∞ . This set is closed and nonempty since the function
J 7→ kJ −1 k∞ is continuous and D is compact in the usual topology. Let e1 , . . . , en be the
>
standard column basis for Rn , and set eii = ei e>
i and eij = (ei + ej )(ei + ej ) for all i 6= j.
Our main result in this section is the following.
Proposition 4.5. Every J ∈ T is path-connected to a corner matrix.
Proof. Let J ∈ T . We will show that if ` < Jij < m for some i 6= j, there is a path
in T from J to a matrix J 0 that differs from J only in the (i, j)-entry, with Jij0 ∈ {`, m}.
Similarly, if 0 < ∆i (J) < 2(m − `) for some 1 ≤ i ≤ n, then we find a suitable J 0 with
∆i (J 0 ) ∈ {0, 2(m − `)} that differs from J in the (i, i)-entry. Repeatedly applying these
steps, it follows that there is a path in T from any J ∈ T to a corner matrix.
For the first part, suppose that ` < Jij < m for some i 6= j. Consider the nonempty,
closed set
W = {J + teij : ` ≤ Jij + t ≤ m} ∩ T .
We claim that W contains a matrix J 0 with Jij0 ∈ {`, m}. Suppose not, and let J 0 ∈ W
be a matrix with minimum (i, j)-entry. By Proposition 4.6 (a) below, J 0 + teij ∈ T for
all t in a small neighborhood of the origin. Thus, there is another matrix in T that has a
smaller (i, j)-entry than J 0 , a contradiction. The proof for the other part is similar (using
Proposition 4.6 (b)).
To complete the proof of Proposition 4.5, it remains to show the following.
Lemma 4.6. Let J ∈ T and i 6= j be distinct indices in {1, . . . , n}.
(a) If 1 < Jij < m, then J + teij ∈ T for all t ∈ R in some neighborhood of the origin.
(b) If 0 < ∆i (J) < 2(m − 1), then J + teii ∈ T for all t ∈ R in some neighborhood of the
origin.
Proof. We only prove (a) as (b) is analogous. Suppose that 1 < Jij < m for some i 6= j.
Let K = J −1 and set K1 , . . . , Kn to be its columns. Also, let q ∈ arg max1≤p≤n kKp k1 and
I = {1 ≤ p ≤ n : Kpq 6= 0} so that
X
|Kpq | = kKq k1 = kKk∞ = kJ −1 k∞ .
p∈I
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
25
By the Sherman-Morrison-Woodbury formula, we have
t
t
(J + teij )−1 = K −
Ke
K
=
K
−
(Ki + Kj )(Ki + Kj )>
ij
>
1 + t(ei + ej ) K(ei + ej )
1 + ηij t
where ηij := (ei + ej )> K(ei + ej ) > 0 since K 0. The formula implies that for sufficiently
small ε > 0 and all |t| ≤ ε, the (p, q)-entry of (J + teij )−1 has the same sign as Kpq 6= 0 for
all p ∈ I. Let us further suppose that ε is small enough so that J + teij ∈ D and 1 + ηij t > 0
−1
now
for all |t| ≤ ε. The 1-norm of the q-th column (J + teij )−1
q of the matrix of (J + teij )
satisfies:
φij t
ψij |t|
−1
k∞ −
k(J + teij )−1
+
, for all |t| ≤ ε
q k1 = kJ
1 + ηij t 1 + ηij t
P
P
where φij := (Kiq + Kjq ) p∈I sgn(Kpq )(Kip + Kjp ) and ψij := |Kiq + Kjq | p∈I
/ |Kip + Kjp |.
−1
Recalling that J ∈ T achieves the maximum kJ k∞ , it follows that
φij t ≥ ψij |t|, for all |t| < ε.
This inequality implies that ψij = φij = 0, so k(J + teij )−1 k∞ = kJ −1 k∞ for all |t| < ε as
required.
4.3
Boundary combinatorics and exact formulae
In the previous section, we saw that corner matrices played an important role in the optimization of kJ −1 k∞ over all matrices J ≥ S := αIn + `1n 1>
n . A corner matrix may be
expressed as:
J = S + (m − `)P,
in which P is a symmetric diagonally dominant matrix with Pij ∈ {0, 1} for all i 6= j and
∆i (P ) ∈ {0, 2} for 1 ≤ i ≤ n. Every such matrix P is a signless Laplacian of an undirected
unweighted graph (possibly with self-loops) G = (V, E) on vertices V = {1, . . . , n} and edges
E. That is, we can write P = D + A where D is the diagonal degree matrix of G and A is its
adjacency matrix. We study the limit (4.2) using the combinatorics of the graphs associated
to the matrix P .
Example 4.7. Let S = (n − 2)In + 1n 1>
n and G be the chain graph from Example 4.4. For
n = 4,


3 + 2t 1 + t
1
1+t
 1 + t 3 + 2t 1 + t
1 
,
S + tP = 
 1
1 + t 3 + 2t 1 + t 
1+t
1
1 + t 3 + 2t
with inverse:
2

t + 5t + 5 −(t + 1)2 t2 + t − 1 −(t + 1)2
 −(t + 1)2 t2 + 5t + 5 −(t + 1)2 t2 + t − 1 
1

.
(S + tP )−1 =
4(2t + 3)(t + 1)  t2 + t − 1 −(t + 1)2 t2 + 5t + 5 −(t + 1)2 
−(t + 1)2 t2 + t − 1 −(t + 1)2 t2 + 5t + 5
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
26
Each entry of (S + tP )−1 is a rational function of t, with numerator and denominator both
quadratic. Thus, each entry converges to a constant as t → ∞, and from the expression
above, we see that the limit matrix N has entries Nij = (−1)i+j /8, as predicted by the
formula in Example 4.4.
If one adds the edge {1, 3}, the corresponding matrix (S + tP )−1 is


2t + 5 −t − 1
−1
−t − 1
−t − 1 3t + 5 −t − 1 t − 1 
1

.
(S + tP )−1 =
−t − 1 2t + 5 −t − 1
4(t + 3)(t + 1)  −1
−t − 1 t − 1 −t − 1 3t + 5
Thus N = 0, as the graph is no longer bipartite (see Corollary 4.11 below).
We begin with the following simple fact that allows us to invert certain classes of matrices
explicitly.
Lemma 4.8. The following identity holds for any α 6= 0 and ` 6= −α/n:
−1
(αIn + `1n 1>
=
n)
1
`
In −
1n 1>
n.
α
α(α + `n)
Given a signless Laplacian P , let L ∈ Rn×|E| be the incidence matrix of the graph G
associated to P ; that is, for every vertex v ∈ V and edge e ∈ E, we have:


1
if v is in e and e is not a self-loop,
√
Lv,e =
2 if v is in e and e is the self-loop (v, v),


0
otherwise.
Consequently, P = LL> . Using this decomposition of P , we derive the following formula for
N.
Proposition 4.9. The limit N in (4.2) satisfies:
N = S −1 − S −1/2 (X > )† X > S −1/2 ,
where X = S −1/2 L and (X > )† is the Moore-Penrose pseudoinverse of X > . Furthermore,
N L = 0.
Proof. Using the Sherman-Morrison-Woodbury matrix formula to expand (S + tLL> )−1 , we
calculate:
N = limt→∞ (S + tLL> )−1
= limt→∞ [ S −1 − S −1 L(t−1 I + L> S −1 L)−1 L> S −1 ]
= S −1 − S −1/2 [ limt→∞ X(t−1 I + X > X)−1 ] X > S −1/2
= S −1 − S −1/2 (X > )† X > S −1/2 ,
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
27
where in the last step we used an elementary pseudoinverse identity in matrix analysis [19,
p. 422, Ex. 9]. To show that N L = 0, we note that N is symmetric and that
(N L)> = L> N = X > S −1/2 − X > (X > )† X > S −1/2 = 0.
Let N1 , . . . , Nn denote the columns of the matrix N . The following is immediate from
the fact that N L = 0.
Corollary 4.10. For each edge {i, j} of G, we have Ni = −Nj . In particular, Ni = 0 for
each self-loop {i, i}.
Suppose that the graph G has connected components G1 , . . . , Gk . After relabeling the
vertices, both P and L are block-diagonal with matrices P1 , . . . , Pk and L1 , . . . , Lk along the
diagonal. Furthermore, Pi = Li L>
i for each 1 ≤ i ≤ k. The components of G also induce a
block-structure on the limit N , and we denote these blocks by N [i, j] for i, j ∈ {1, . . . , k}.
These blocks display many interesting symmetries. Firstly, the entries in each block are all
equal up to sign. Secondly, the signs in a block N [i, j] depend on the bipartite structures of
the components Gi and Gj . We say that a bipartite graph is (p, q)-bipartite if the partitions
are of sizes p and q, respectively. Note that bipartite graphs cannot have self-loops.
Corollary 4.11. In each block N [i, j], the entries are equal up to sign. If the i-th component
Gi of G is not bipartite, then N [i, j] = N [j, i] = 0 for all j = 1, . . . , k. In particular, N = 0
if and only if all components Gi are not bipartite. Suppose Gi is (pi , qi )-bipartite and Gj is
(pj , qj )-bipartite. Then, after relabeling the vertices, the matrix N [i, j] has the block structure
!
−1pi 1>
1pi 1>
qj
pj
, for some constant cij ∈ R.
N [i, j] = cij
1qi 1>
−1qi 1>
qj
pj
Now that we understand the block structure of N , we want to compute the constants
cij . Our approach is to simplify the formula in Proposition 4.9 by expressing the incidence
matrix L in a more suitable form.
Proposition 4.12. Let G be a connected graph with n vertices. If G is not bipartite, then
rank L = n.
Proof. It suffices to prove that the positive semidefinite matrix P = LL> has rank n; i.e., if
x ∈ Rn satisfies x> P x = 0, then x = 0. Write
x> P x =
n X
n
X
X
Pij xi xj =
i=1 j=1
i6=j
{i,j}∈E
(xi + xj )2 +
X
2x2i
i : {i,i}∈E
so that x> P x = 0 implies
xi = 0 if {i, i} ∈ E
and
xi + xj = 0 if {i, j} ∈ E.
(4.3)
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
28
If G has a self-loop {i, i}, then xi = 0 by (4.3). If G has no self-loops, there is an odd cycle
(i1 , i2 , . . . , i2m+1 ) for some m ≥ 1 (because G is not bipartite). Applying condition (4.3) to
each successive edge in this cycle shows xi1 = · · · = xi2m+1 = 0. In either case xi = 0 for
some vertex i ∈ V . A repeated application of (4.3) then reveals that xj = 0 for all vertices
j connected to i. Since G is connected, we have x = 0 as desired.
Let ei ∈ Rn denote the i-th standard basis column vector.
Proposition 4.13. Let G be a connected bipartite graph on n vertices. Then rank L = n − 1.
Let U be the n × (n − 1) matrix with columns e1 + σi ei for 2 ≤ i ≤ n, where σi = −1 if
vertex i is in the same partition as vertex 1 and σi = 1 otherwise. Then, L = U B for some
(n − 1) × |E| matrix B of rank n − 1.
Proof. Recall that the columns of L are of the form ei + ej where {i, j} ∈ E (there are no
self-loops because G is bipartite). There is a path (1, i1 , i2 , · · · , im , j) from vertex 1 to vertex
j for each 2 ≤ j ≤ n, so
e1 + (−1)m ej = (e1 + ei1 ) − (ei1 + ei2 ) + · · · + (−1)m (eim + ej ).
Thus, rank L ≥ n − 1. Conversely, for each edge {i, j} where vertex i is in the same partition
as vertex 1,
ei + ej = −(e1 − ei ) + (e1 + ej ),
so rank L ≤ n − 1. This equation also allows us to write L = U B for some matrix B, and
the rank condition on B follows from that of L.
Recall that G has k components G1 , . . . , Gk and L is block-diagonal with matrices
L1 , . . . , Lk . If Gi is bipartite, we write Li = Ui Bi as in Proposition 4.13. If Gi is not
bipartite, we write Li = Ui Bi where Ui = I is the identity matrix and Bi = Li . Let r be
the number of components of G which are bipartite. If U and B are block-diagonal matrices constructed from U1 , . . . , Uk and B1 , . . . , Bk , then L = U B where U ∈ Rn×(n−r) and
B ∈ R(n−r)×|E| both have rank (n − r). Note that U contains information about the sizes
of the bipartitions of each component whereas B contains information about the edges. Let
U(p,q) denote the matrix
 >

1p−1 1>
q


U(p,q) :=  −Ip−1 0  .
0
After relabeling the vertices, we have

U(p1 ,q1 )
0

0
U(p2 ,q2 )


.
..
.
U =
.
.


0
0
0
0
···
···
Iq
0
0
..
.
· · · U(pr ,qr )
···
0
0
0
..
.

 
e 0

U
,
=

0 Is
0 
Is
(4.4)
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
29
P
where s = n − ri=1 (pi + qi ) is the total number of vertices in the non-bipartite components
of G. Our next result shows that dependence on the matrix B can be removed in Proposition
4.9. This new formula for N also gives us a method to compute its entries explicitly.
Proposition 4.14. Let U be as in (4.4). The limit N in (4.2) satisfies:
N = S −1 − S −1 U (U > S −1 U )−1 U > S −1 .
Therefore, N depends only on the sizes of the bipartitions of each component Gi .
Proof. Write L = U B and P = U (BB > U > ). First, note that BB > is positive definite since
B ∈ R(n−r)×|E| has rank (n − r). Now, U > S −1 U is positive semidefinite, being a congruence
of a positive definite matrix S −1 (Sylvester’s Law of Inertia). Moreover, since U is injective
as a linear map and S −1 0, for any x ∈ Rn−r :
x> U > S −1 U x = 0
⇒
Ux = 0
⇒
x = 0.
Thus, U > S −1 U is positive definite. Then the eigenvalues of BB > U > S −1 U are all positive,
being a product of two positive definite matrices (see [22, Lemma 2]). Therefore, the matrix
t−1 I + BB > U > S −1 U is invertible for all t > 0, and so by the Sherman-Morrison-Woodbury
formula, we have
(S + tP )−1 = (S + tU (BB > U > ))−1
= S −1 − S −1 U (t−1 I + BB > U > S −1 U )−1 BB > U > S −1 .
Taking limits of this equation as t → ∞, the result follows.
To state our ultimate formula for N , let us first define:
γ=
r
X
(pi − qi )2
i=1
>
y =

1

 p1 + q1



Y =




pi + q i
,
pr − qr >
p1 − q 1 >
>
>
>
(1 , −1q1 ), . . . ,
(1 , −1qr ), 0 · 1s ,
p1 + q1 p1
pr + qr pr
1p 1 1>
p1
−1p1 1>
q1
−1q1 1>
p1
..
.
1q1 1>
q1

!
···
0
···
0
···
0
1
pr + qr
..
.
1pr 1>
pr
−1qr 1>
pr
0
0
..
.
−1pr 1>
qr
1qr 1>
qr
!
0
0 · Is





.




CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
30
Proposition 4.15. Set γ, y, and Y as above, which depend only on the bipartite structures
of the components of the underlying graph G. We have the following formula for the limit
in (4.2):
1
`
N = lim (S + tP )−1 = Y −
yy > .
t→∞
α
α(α + `γ)
Proof. We outline the computation of N . For simplicity, let us write S −1 = aIn − b1n 1>
n.
Then,
−1 >
U (U > S −1 U )−1 U > = U (U > (aIn − b1n 1>
n )U ) U
= U (aW − bvv > )−1 U > ,
(4.5)
where W = U > U and v = U > 1n . By the Sherman-Morrison-Woodbury identity, we have
that
> −1
(aW − bvv )
a−1 W −1 (−bv)v > a−1 W −1
=a W −
1 + v > a−1 W −1 (−bv)
1
b
= W −1 +
W −1 vv > W −1 ,
a
a(a − bς)
−1
−1
where ς = v > W −1 v (it is easy to show that a − bς > 0). Substituting this back into (4.5)
gives us
1
b
U (U > S −1 U )−1 U > = Z +
zz > ,
a
a(a − bς)
(4.6)
where Z = U (U > U )−1 U > , z = Z1n , and ς = 1>
n Z1n . From the block-diagonal structure of
U , the matrix Z is also block-diagonal with blocks
!
>
>
−1
1
1
1
1
p
p
i qi
i pi
, for i = 1, . . . , r,
Ui (Ui> Ui )−1 Ui> = Ipi +qi −
>
pi + qi −1qi 1>
1
1
qi qi
pi
where we have computed (Ui> Ui )−1 using Lemma 4.8. Next, one shows that Z = In − Y ,
z = 1n − y, and ς = n − γ. Finally, substituting
S −1 = aIn − b1n 1>
n,
a=
1
,
α
b=
`
,
α(α + `n)
(4.7)
and equation (4.6) into Proposition 4.14 gives the desired result.
Corollary 4.16. For 1 ≤ i, j ≤ r, the constants cij in the blocks N [i, j] in Corollary 4.11
are:
`
α
(pi − qi )2
,
cii =
+γ−
α(α + `γ)(pi + qi ) `
pi + q i
−`
pi − qi
pj − qj
cij =
, j 6= i.
α(α + `γ) pi + qi
pj + qj
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
31
Finally, we write down an explicit formula for kN k∞ and verify that kN k∞ ≤ kS −1 k∞ .
P
Corollary 4.17. If r = 0, then kN k∞ = 0. If r ≥ 1, let d = ri=1 |pi − qi |. Then
kN k∞ =
1
`
|pi − qi |(d − 2|pi − qi |)
+
max
.
1≤i≤r
α α(α + `γ)
pi + qi
Proof. Recall that if r = 0 (i.e. no component of G is bipartite), then N = 0 by Corollary 4.11. Now if r ≥ 1, we may assume that pi ≥ qi for all i after a relabeling of vertices.
Observe that cii > 0 and cij ≤ 0 for all i and j 6= i. Indeed, this follows from α > 0, ` > 0,
and
r
X
(pi − qi )2 X (pj − qj )2
(pj − qj )2
≥ 0, γ −
=
≥ 0.
γ=
pj + qj
p i + qi
p
j + qj
i=1
j6=i
Consequently, the 1-norm of rows in the i-th block of N is
X
1 `(pi − qi )(d − 2pi + 2qi )
(pi + qi )cii −
(pj + qj )cij =
+
,
α
α(α
+
`γ)(p
i + qi )
j6=i
(4.8)
and kN k∞ is the maximum of these 1-norms.
Corollary 4.18. For all signless Laplacians P , we have
kN k∞ = k lim (S + tP )−1 k∞ ≤ kS −1 k∞ =
t→∞
1
`(n − 2)
+
,
α α(α + `n)
with equality if and only if P is the zero matrix.
Proof. Let N = limt→∞ (S + tP )−1 . If r = 0 then kN k∞ = 0, so suppose that r ≥ 1. As
before, assume that pi ≥ qi for all i = 1, . . . , r. It suffices to show that the 1-norms of the
rows of N , as computed in (4.8), are at most kS −1 k∞ , with equality achieved only at J = S.
The inequality is trivial if pi = qi so we may assume pi − qi ≥ 1. We outline the proof and
leave the details to the reader. The key is to show that
pi − qi
≤ (α + `n)(d − 2) ≤ (α + `γ)(n − 2).
(α + `n)(d − 2pi + 2qi )
pi + qi
The latter inequality is equivalent to
0 ≤ (n − d)(α + 2` − `n) + 2`(d − γ) + n`(n − 2d + γ).
The first summand is nonnegative because S is diagonally dominant, while the last summand
satisfies
n − 2d + γ = s +
r
X
j=1
4qj2
≥ 0.
pj + q j
Finally, if equality kN k∞ = kS −1 k∞ is achieved, then s = 0 and qj = 0 for all j, so P = 0.
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
4.4
32
Analysis of kJ −1k∞ in a neighborhood of S
The arguments in Sections 4.2 and 4.3 show that for J ≥ S := αIn + `1n 1>
n , the maximum
−1
of kJ k∞ is attained at J = S. To prove that S is the unique maximizer, we will show
that kJ −1 k∞ is strictly decreasing near S. Let P ≥ 0 be a nonzero symmetric diagonally
dominant matrix, and consider the function
f (t) = k(S + tP )−1 k∞ ,
t ≥ 0.
In our proof, we study the linear part S −1 − tS −1 P S −1 of the Neumann series for (S + tP )−1 .
Let us define g(t) = kS −1 − tS −1 P S −1 k∞ and h(t) = f (t) − g(t). Our main result in this
section is the following.
Proposition 4.19. The function f (t) is differentiable at t = 0 and f 0 (0) < 0.
Proof. Since f (t) = g(t) + h(t), the result follows from Propositions 4.20 and 4.21.
Proposition 4.20. The function h(t) is differentiable at t = 0 and h0 (0) = 0 .
Proof. For sufficiently small t > 0, by the Neumann series for (I + tS −1/2 P S −1/2 )−1 we can
write
(S + tP )−1 = S −1/2 (I + tS −1/2 P S −1/2 )−1 S −1/2
!
∞
X
−1/2
k
−1/2
−1/2 k
=S
(−t) S
PS
S −1/2 .
k=0
By the reverse triangle inequality and submultiplicativity of k · k∞ , we have
|h(t)| ≤ (S + tP )−1 − S −1 + tS −1 P S −1 ∞
!
∞
k
−1/2 X
k
−1/2
−1/2
−1/2 = S
(−t) S
PS
S
k=2
∞
2 −1/2
−1/2
−1/2 2 1/2
−1 = t S
S
PS
S (S + tP ) ∞
2
≤ 2t kS
−1
PS
−1
P k∞ kS
−1
k∞ ,
where the last inequality holds for sufficiently small t > 0 since by continuity k(S+tP )−1 k∞ ≤
2kS −1 k∞ for small t. It follows that h0 (0) = limt→0 (h(t) − h(0))/t = 0.
Proposition 4.21. The function g(t) is differentiable at t = 0 and g 0 (0) < 0.
Proof. Set Q = S −1 P S −1 . Note that for sufficiently small t > 0, the entries of S −1 − tQ
have the same sign as the corresponding entries of S −1 . Since (S −1 )ii > 0 and (S −1 )ij < 0
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
33
for i 6= j, we can write
g(t) = kS −1 − tQk∞ = maxi
P −1
(S )ij − tQij j
P
= maxi (S −1 )ii − tQii + j6=i tQij − (S −1 )ij
P
= maxi − (Qii − j6=i Qij ) t + kS −1 k∞
= −ξt + kS −1 k∞ .
where ξ = mini (Qii −
required.
P
j6=i
Qij ) > 0 by Proposition 4.22 below. Thus, g 0 (0) = −ξ < 0 as
Proposition 4.22. Let Q = S −1 P S −1 . Then Qii −
P
j6=i
Qij > 0 for all i.
Proof. For simplicity, let us write S −1 = aIn − b1n 1>
n . Then,
Q = (aI − b11> ) P (aI − b11> ) = a2 P − abp1> − ab1p> + b2 π11>
where p = P 1 = (p1 , . . . , pn ) and π = 1> P 1. It is straightforward to check that
P
Qii − j6=i Qij = a2 (2Pii − pi ) + abpi (n − 4) + bπ(a + 2b − bn).
From equation (4.7), we get a/b = α/` + n. Substituting this relation and rearranging gives
us
h α
α
i
P
+n−2
+ 4 + 4 ∆i (P ) +
Qii − j6=i Qij = b2
`
αi
α
X
h ` α
2
+n−1 n−3 +
Pii + b2
+2
Pjk .
b 2
`
`
`
j,k6=i
Because S is diagonally dominant, we have α/` ≥ n − 2 > 0. It is not difficult to deduce
that if P 6= 0, then the above expression is always positive, as required.
4.5
Proof of the bound
Theorem 4.1 is a special case of the following theorem when α = (n − 2)`.
Theorem 4.23. Let n ≥ 3 and suppose S = αIn + `1n 1>
n is diagonally dominant with
α, ` > 0. For all n × n symmetric diagonally dominant matrices J ≥ S, we have
kJ −1 k∞ ≤ kS −1 k∞ =
α + 2`(n − 1)
.
α(α + `n)
Furthermore, equality is achieved if and only if J = S.
CHAPTER 4. INVERSES OF DIAGONALLY DOMINANT MATRICES
34
Proof. Recall from Section 4.2 that Dm is the set of symmetric diagonally dominant matrices
J with ` ≤ Jij ≤ m and ∆i (S) ≤ ∆i (J) ≤ ∆i (S) + 2(m − `) for all i 6= j. Recall also that
S + (m − `)P ∈ Dm is a corner matrix if P is a signless Laplacian. Let Tm be the set of
matrices J ∈ Dm maximizing kJ −1 k∞ . We claim that for sufficiently large m > `, we have
Tm = {S}. Indeed, from Corollary 4.18 and for large m:
kJ −1 k∞ < kS −1 k∞ , for all corner matrices J ∈ Dm \ {S}.
Thus, by Proposition 4.5, every J ∈ Tm must be path-connected to the corner matrix S.
Since Proposition 4.19 implies that S is an isolated point in Tm , we must have Tm = {S} as
claimed.
Finally, suppose J ∗ ≥ S is a symmetric diagonally dominant matrix with k(J ∗ )−1 k∞ ≥
−1
kS k∞ . We will show that J ∗ = S, which proves Theorem 4.1. We assume that m is
sufficiently large with J ∗ ∈ Dm and Tm = {S}. Then S is the unique maximizer of kJ −1 k∞
for J ∈ Dm so that J ∗ = S, as desired.
CHAPTER 5. PROOFS OF MAIN RESULTS
35
Chapter 5
Proofs of main results
In this chapter we provide proofs for the technical results presented in Chapter 3. We use
Theorem 4.1 in the proofs of Theorems 3.9 and 3.13. The proofs of the characterization of
weighted graphical sequences (Theorems 3.2, 3.6, and 3.10) are inspired by the constructive
proof of the classical Erdős-Gallai criterion by Choudum [9].
5.1
Preliminaries
We begin by presenting several results on sub-exponential random variables, which will be
useful in the proofs of Theorems 3.9 and 3.13. We use the definition and concentration
inequality presented in [51].
We say that a real-valued random variable X is sub-exponential with parameter κ > 0 if
E[|X|p ]1/p ≤ κp
for all p ≥ 1.
Note that if X is a κ-sub-exponential random variable with finite first moment, then the
centered random variable X − E[X] is also sub-exponential with parameter 2κ. This follows
from the triangle inequality applied to the p-norm, followed by Jensen’s inequality for p ≥ 1:
p 1/p
E X − E[X]
≤ E[|X|p ]1/p + E[X] ≤ 2E[|X|p ]1/p .
Sub-exponential random variables satisfy the following concentration inequality.
Theorem 5.1 ([51, Corollary 5.17]). Let X1 , . . . , Xn be independent centered random variables, and suppose each Xi is sub-exponential with parameter κi . Let κ = max1≤i≤n κi . Then
for every ≥ 0,
n
!
1 X 2 Xi ≥ ≤ 2 exp −γ n · min 2 ,
,
P n
κ κ
i=1
where γ > 0 is an absolute constant.
CHAPTER 5. PROOFS OF MAIN RESULTS
36
We will apply the concentration inequality above to exponential and geometric random
variables, which are the distributions of the edge weights of continuous weighted graphs
(from Section 3.2) and infinite discrete weighted graphs (from Section 3.3).
Lemma 5.2. Let X be an exponential random variable with E[X] = 1/λ. Then X is subexponential with parameter 1/λ, and the centered random variable X −1/λ is sub-exponential
with parameter 2/λ.
Proof. For any p ≥ 1, we can evaluate the moment of X directly:
Z ∞
Z ∞
1
Γ(p + 1)
p
p
x · λ exp(−λx) dx = p
E[|X| ] =
y p exp(−y) dy =
,
λ 0
λp
0
where Γ is the gamma function, and in the computation above we have used the substitution
y = λx. It can be easily verified that Γ(p + 1) ≤ pp for p ≥ 1, so
p 1/p
E[|X| ]
Γ(p + 1)
=
λ
1/p
p
.
λ
≤
This shows that X is sub-exponential with parameter 1/λ.
Lemma 5.3. Let X be a geometric random variable with parameter q ∈ (0, 1), so
P(X = a) = (1 − q)a q,
a ∈ N0 .
Then X is sub-exponential with parameter −2/ log(1 − q), and the centered random variable
X − (1 − q)/q is sub-exponential with parameter −4/ log(1 − q).
Proof. Fix p ≥ 1, and consider the function f : R0 → R0 , f (x) = xp (1 − q)x . One can easily
verify that f is increasing for 0 ≤ x ≤ λ and decreasing on x ≥ λ, where λ = −p/ log(1 − q).
In particular, for all x ∈ R0 we have f (x) ≤ f (λ), and
p p
p
p
p
λ
−1/ log(1−q)
f (λ) = λ (1 − q) =
· (1 − q)
=
.
− log(1 − q)
−e · log(1 − q)
R a+1
Now noteRthat for 0 ≤ a ≤ bλc − 1 we have f (a) ≤ a f (x) dx, and for a ≥ dλe + 1 we
a
have f (a) ≤ a−1 f (x) dx. Thus, we can bound
∞
X
bλc−1
f (a) =
a=0
X
f (a) +
a=0
Z
dλe
X
f (a) +
a=bλc
bλc
≤
∞
X
Z
∞
f (x) dx + 2f (λ) +
Z
≤
∞
f (x) dx + 2f (λ).
0
f (x) dx
dλe
0
f (a)
a=dλe+1
CHAPTER 5. PROOFS OF MAIN RESULTS
37
Using the substitution y = −x log(1 − q), we can evaluate the integral to be
Z ∞
Z ∞
xp exp (x · log(1 − q)) dx
f (x) dx =
0
0
Z ∞
1
=
y p exp(−y) dy
p+1
(− log(1 − q))
0
Γ(p + 1)
=
(− log(1 − q))p+1
pp
≤
,
(− log(1 − q))p+1
where in the last step we have again used the relation Γ(p + 1) ≤ pp . We use the result
above, along with the expression of f (λ), to bound the moment of X:
p
E[|X| ] =
∞
X
p
a · (1 − q) q = q
a=0
Z
a
∞
X
f (a)
a=0
∞
≤q
f (x) dx + 2q f (λ)
0
p
p (2q)1/p p
q 1/p p
+
≤
(− log(1 − q))1+1/p
−e · log(1 − q)
p
1/p
q p
(2q)1/p p
≤
+
,
(− log(1 − q))1+1/p −e · log(1 − q)
where in the last step we have used the fact that xp + y p ≤ (x + y)p for x, y ≥ 0 and p ≥ 1.
This gives us
q 1/p
21/p q 1/p
1
E[|X|p ]1/p ≤
+
p
(− log(1 − q))1+1/p −e · log(1 − q)
!
1/p
1
q
21/p q 1/p
.
=
+
− log(1 − q)
− log(1 − q)
e
Now note that q ≤ − log(1 − q) for 0 < q < 1, so (−q/ log(1 − q))1/p ≤ 1. Moreover,
(2q)1/p ≤ 21/p ≤ 2. Therefore, for any p ≥ 1, we have
1
2
2
1
p 1/p
E[|X| ] ≤
1+
<
.
p
− log(1 − q)
e
− log(1 − q)
Thus, we conclude that X is sub-exponential with parameter −2/ log(1 − q).
5.2
Proofs for the finite discrete weighted graphs
In this section we present the proofs of the results presented in Section 3.1.
CHAPTER 5. PROOFS OF MAIN RESULTS
38
Proof of Theorem 3.2
We first prove the necessity of (3.4). Suppose
, . . . , dn ) is the degree sequence of a
P d = (d1P
graph G with edge weights aij ∈ S. Then ni=1 di = 2 (i,j) aij is even. Moreover, for each
P
1 ≤ k ≤ n, ki=1 di counts the total edge weights coming out from the vertices 1, . . . , k. The
total edge weights from these k vertices to themselves is at most (r − 1)k(k − 1), and for
each vertex j ∈
/ {1, . . . , k}, the total edge weights from these k vertices to vertex j is at most
min{dj , (r − 1)k}, so by summing over j ∈
/ {1, . . . , k} we get (3.4).
P
To prove the sufficiency of (3.4) we use induction on s := ni=1 di . The base case s = 0
is trivial. Assume the statement holds forPs − 2, and suppose we have a sequence d with
d1 ≥ d2 ≥ · · · ≥ dn satisfying (3.4) with ni=1 di = s. Without loss of generality we may
assume dn ≥ 1, for otherwise we can proceed with only the nonzero elements of d. Let
1 ≤ t ≤ n − 1 be the smallest index such that dt > dt+1 , with t = n − 1 if d1 = · · · = dn .
Define d0 = (d1 , . . . , dt−1 , dt P
− 1, dt+1 , . . . , dn−1 , dn − 1), so we have d01 = · · · = d0t−1 > d0t ≥
d0t+1 ≥ · · · ≥ d0n−1 > d0n and ni=1 d0i = s − 2.
We will show that d0 satisfies (3.4). By the inductive hypothesis, this means d0 is the
degree sequence of a graph G0 with edge weights a0ij ∈ {0, 1, . . . , r − 1}. We now attempt to
modify G0 to obtain a graph G whose degree sequence is equal to d. If the weight a0tn of the
edge (t, n) is less than r − 1, then we can obtain G by increasing a0tn by 1, since the degree
of vertex t is now d0t + 1 = dt , and the degree of vertex n is now d0n + 1 = dn . Otherwise,
suppose a0tn = r − 1. Since d0t = d01 − 1, there exists a vertex u 6= n such that a0tu < r − 1.
Since d0u > d0n , there exists another vertex v such that a0uv > a0vn . Then we can obtain the
graph G by increasing a0tu and a0vn by 1 and reducing a0uv by 1, so that now the degrees of
vertices t and n are each increased by 1, and the degrees of vertices u and v are preserved.
It now remains to show that d0 satisfies (3.4). We divide the proof into several cases for
different values of k. We will repeatedly use the fact that d satisfies (3.4), as well as the
inequality min{a, b} − 1 ≤ min{a − 1, b}.
1. For k = n:
n
X
d0i
i=1
=
n
X
di − 2 ≤ (r − 1)n(n − 1) − 2 < (r − 1)n(n − 1).
i=1
2. For t ≤ k ≤ n − 1:
k
X
i=1
d0i
=
k
X
di − 1 ≤ (r − 1)k(k − 1) +
i=1
≤ (r − 1)k(k − 1) +
= (r − 1)k(k − 1) +
n
X
min{dj , (r − 1)k} − 1
j=k+1
n−1
X
j=k+1
n
X
j=k+1
min{dj , (r − 1)k} + min{dn − 1, (r − 1)k}
min{d0j , (r − 1)k}.
CHAPTER 5. PROOFS OF MAIN RESULTS
39
3. For 1 ≤ k ≤ t − 1: first suppose dn ≥ 1 + (r − 1)k. Then for all j we have
min{d0j , (r − 1)k} = min{dj , (r − 1)k} = (r − 1)k,
so
k
X
d0i
=
i=1
k
X
di ≤ (r − 1)k(k − 1) +
i=1
n
X
min{dj , (r − 1)k}
j=k+1
n
X
= (r − 1)k(k − 1) +
min{d0j , (r − 1)k}.
j=k+1
4. For 1 ≤ k ≤ t − 1: suppose d1 ≥ 1 + (r − 1)k, and dn ≤ (r − 1)k. We claim that d
satisfies (3.4) at k with a strict inequality. If this claim is true, then, since dt = d1 and
min{d0t , (r − 1)k} = min{dt , (r − 1)k} = (r − 1)k,
k
X
i=1
d0i
=
k
X
di ≤ (r − 1)k(k − 1) +
i=1
n
X
min{dj , (r − 1)k} − 1
j=k+1
= (r − 1)k(k − 1) +
n−1
X
min{d0j , (r − 1)k} + min{dn , (r − 1)k} − 1
j=k+1
≤ (r − 1)k(k − 1) +
= (r − 1)k(k − 1) +
n−1
X
j=k+1
n
X
min{d0j , (r − 1)k} + min{dn − 1, (r − 1)k}
min{d0j , (r − 1)k}.
j=k+1
Now to prove the claim, suppose the contrary that d satisfies (3.4) at k with equality.
Let t + 1 ≤ u ≤ n be the smallest integer such that du ≤ (r − 1)k. Then, from our
assumption,
kdk =
k
X
di = (r − 1)k(k − 1) +
i=1
n
X
min{dj , (r − 1)k}
j=k+1
≥ (r − 1)k(k − 1) + (u − k − 1)(r − 1)k +
n
X
j=u
= (r − 1)k(u − 2) +
n
X
j=u
dj .
dj
CHAPTER 5. PROOFS OF MAIN RESULTS
40
Therefore, since dk+1 = dk = d1 ,
k+1
X
i=1
n
k+1X
di = (k + 1)dk ≥ (r − 1)(k + 1)(u − 2) +
dj
k j=u
> (r − 1)(k + 1)k + (r − 1)(k + 1)(u − k − 2) +
n
X
dj
j=u
≥ (r − 1)(k + 1)k +
n
X
min{dj , (r − 1)(k + 1)},
j=k+2
which contradicts the fact that d satisfies (3.4) at k + 1. Thus, we have proved that d
satisfies (3.4) at k with a strict inequality.
5. For 1 ≤ k ≤ t − 1: suppose d1 ≤ (r − 1)k. In particular, we have min{dj , (r − 1)k} = dj
and min{d0j , (r − 1)k} = d0j for all j. First, if we have
dk+2 + · · · + dn ≥ 2,
(5.1)
then we are done, since
k
X
i=1
d0i
=
k
X
di = (k − 1)d1 + dk+1
i=1
≤ (r − 1)k(k − 1) + dk+1 + dk+2 + · · · + dn − 2
n
X
d0j
= (r − 1)k(k − 1) +
= (r − 1)k(k − 1) +
j=k+1
n
X
min{d0j , (r − 1)k}.
j=k+1
Condition (5.1) is obvious if dn ≥ 2 or k + 2 ≤ n − 1 (since there are n − k − 1 terms
in the summation and each term is at least 1). Otherwise, assume k + 2 ≥ n and
dn = 1, so in particular, we have k = n − 2 (since k ≤ t − 1 ≤ n − 2), t = n − 1,
and
Pn d1 ≤ (r − 1)(n − 2). Note that we cannot have d1 = (r − 1)(n − 2), for then
i=1 di = (n − 1)d1 + dn = (r − 1)(n − 1)(n − 2) + 1 would be
Podd, so we must have
d1 < (r − 1)(n − 2). Similarly, n must be even, for otherwise ni=1 di = (n − 1)d1 + 1
CHAPTER 5. PROOFS OF MAIN RESULTS
41
would be odd. Thus, since 1 ≤ d1 < (r − 1)(n − 2) we must have n ≥ 4. Therefore,
k
X
d0i = (n − 2)d1 = (n − 3)d1 + dn−1
i=1
≤ (r − 1)(n − 2)(n − 3) − (n − 3) + dn−1
≤ (r − 1)(n − 2)(n − 3) + (dn−1 − 1) + (dn − 1)
n
X
= (r − 1)(n − 2)(n − 3) +
min{d0j , (r − 1)k}.
j=k+1
This shows that d0 satisfies (3.4) and finishes the proof of Theorem 3.2.
Proof of Theorem 3.3
We follow the outline of the proof of [8, Theorem 1.5]. We first present the following properties of the mean function µ(t) and the Jacobian matrix of the function ϕ (3.5). We then
combine these results at the end of this section into a proof of Theorem 3.3.
Lemma 5.4. The mean function µ(t) is positive and strictly decreasing, with µ(−t) + µ(t) =
r − 1 for all t ∈ R, and µ(t) → 0 as t → ∞. Its derivative µ0 (t) is increasing for t ≥ 0, with
the properties that µ0 (t) < 0, µ0 (t) = µ0 (−t) for all t ∈ R, and µ0 (0) = −(r2 − 1)/12.
Proof. It is clear from (3.1) that µ(t) is positive. From the alternative representation (3.2)
it is easy to see that µ(−t) + µ(t) = r − 1, and µ(t) → 0 as t → ∞. Differentiating
expression (3.1) yields the formula
Pr−1
Pr−1
P
2
2
−( r−1
0
a=0 a exp(−at))
a=0 exp(−at)) + (
a=0 a exp(−at))(
µ (t) =
,
(5.2)
Pr−1
( a=0 exp(−at))2
and substituting t = 0 gives us µ0 (0) = −(r2 − 1)/12. The Cauchy-Schwarz inequality tells
us that µ0 (t) < 0, where the inequality is strict because the vectors (a2 exp(−at))r−1
a=0 and
r−1
(exp(−at))a=0 are not linearly dependent. Thus, µ(t) is strictly decreasing for all t ∈ R.
The relation µ(−t) + µ(t) = r − 1 gives us µ0 (−t) = µ0 (t). Furthermore, by differentiating
the expression (3.2) twice, one can verify that µ00 (t) ≥ 0 for t ≥ 0, which means µ0 (t) is
increasing for t ≥ 0. See also Figure 3.1 for the behavior of µ(t) and µ0 (t) for different values
of r.
Lemma 5.5. For all t ∈ R, we have
µ0 (t)
1
≥ −r + 1 + Pr−1
> −r + 1.
µ(t)
a=0 exp(at)
CHAPTER 5. PROOFS OF MAIN RESULTS
42
Proof. Multiplying the numerator and denominator of (3.1) by exp((r − 1)t), we can write
Pr−1
Pr−1
(r − 1 − a) exp(at)
a=0 a exp((r − 1 − a)t)
µ(t) = Pr−1
= a=0Pr−1
.
a=0 exp((r − 1 − a)t)
a=0 exp(at)
Therefore,
!
r−1
r−1
X
X
µ0 (t)
d
d
=
log µ(t) =
log
(r − 1 − a) exp(at) − log
exp(at)
µ(t)
dt
dt
a=0
a=0
Pr−1
Pr−1
a exp(at)
a=0 a(r − 1 − a) exp(at)
= Pr−1
− Pa=0
r−1
(r − 1 − a) exp(at)
a=0 exp(at)
Pa=0
r−1
a exp(at)
≥ − Pa=0
r−1
a=0 exp(at)
Pr−1
Pr−1
(r − 1 − a) exp(at)
a=0 (r − 1) exp(at) −
=−
Pr−1 a=0
a=0 exp(at)
Pr−1
(r − 1 − a) exp(at)
= −r + 1 + a=0Pr−1
a=0 exp(at)
1
≥ −r + 1 + Pr−1
.
a=0 exp(at)
We recall the following definition and result from [8]. Given δ > 0, let Ln (δ) denote the
set of n × n matrices A = (aij ) with kAk∞ ≤ 1, aii ≥ δ, and aij ≤ −δ/(n − 1), for each
1 ≤ i 6= j ≤ n.
Lemma 5.6 ([8, Lemma 2.1]). If A, B ∈ Ln (δ), then
kABk∞ ≤ 1 −
2(n − 2)δ 2
.
(n − 1)
In particular, for n ≥ 3,
kABk∞ ≤ 1 − δ 2 .
Given θ, θ0 ∈ Rn , let J(θ, θ0 ) denote the n × n matrix whose (i, j)-entry is
Z 1
∂ϕi
0
Jij (θ, θ ) =
(tθ + (1 − t)θ0 ) dt.
∂θ
j
0
Lemma 5.7. For all θ, θ0 ∈ Rn , we have kJ(θ, θ0 )k∞ = 1.
(5.3)
CHAPTER 5. PROOFS OF MAIN RESULTS
43
Proof. The partial derivatives of ϕ (3.5) are
and for i 6= j,
P
0
∂ϕi (x)
1
j6=i µ (xi + xj )
P
=1+
,
∂xi
(r − 1) j6=i µ(xi + xj )
(5.4)
1
µ0 (xi + xj )
∂ϕi (x)
P
=
< 0,
∂xj
(r − 1) k6=i µ(xi + xk )
(5.5)
where the last inequality follows from µ0 (xi + xj ) < 0. Using the result of Lemma 5.5 and
the fact that µ is positive, we also see that
P
P
0
1
1
∂ϕi (x)
j6=i µ (xi + xj )
j6=i (−r + 1)µ(xi + xj )
P
P
=1+
>1+
= 0.
∂xi
(r − 1) j6=i µ(xi + xj )
(r − 1)
j6=i µ(xi + xj )
Setting x = tθ + (1 − t)θ0 P
and integrating over 0 ≤ t ≤ 1, we also get that Jij (θ, θ0 ) < 0 for
i 6= j, and Jii (θ, θ0 ) = 1 + j6=i Jij (θ, θ0 ) > 0. This implies kJ(θ, θ0 )k∞ = 1, as desired.
Lemma 5.8. Let θ, θ0 ∈ Rn with kθk∞ ≤ K and kθ0 k∞ ≤ K for some K > 0. Then
J(θ, θ0 ) ∈ Ln (δ), where
exp(2K) − 1
µ0 (2K)
1
min
, −
.
(5.6)
δ=
(r − 1)
exp(2rK) − 1
µ(−2K)
Proof. From Lemma 5.7 we already know that J ≡ J(θ, θ0 ) satisfies kJk∞ = 1, so to show
that J ∈ Ln (δ) it remains to show that Jii ≥ δ and Jij ≤ −δ/(n − 1) for i 6= j. In particular,
it suffices to show that for each 0 ≤ t ≤ 1 we have ∂ϕi (x)/∂xi ≥ δ and ∂ϕi (x)/∂xj ≤
−δ/(n − 1), where x ≡ x(t) = tθ + (1 − t)θ0 .
Fix 0 ≤ t ≤ 1. Since kθk∞ ≤ K and kθ0 k∞ ≤ K, we also know that kxk∞ ≤ K, so
−2K ≤ xi + xj ≤ 2K for all 1 ≤ i, j ≤ n. Using the properties of µ and µ0 from Lemma 5.4,
we have
0 < µ(2K) ≤ µ(xi + xj ) ≤ µ(−2K)
and
µ0 (0) ≤ µ0 (xi + xj ) ≤ µ0 (2K) < 0.
Then from (5.5) and using the definition of δ,
∂ϕi (x)
µ0 (2K)
δ
≤
≤−
.
∂xj
(n − 1)(r − 1)µ(−2K)
n−1
Furthermore, by Lemma 5.5 we have
µ0 (xi + xj )
exp(xi + xj ) − 1
exp(2K) − 1
≥ −r + 1 +
≥ −r + 1 +
.
µ(xi + xj )
exp(r(xi + xj )) − 1
exp(2rK) − 1
CHAPTER 5. PROOFS OF MAIN RESULTS
44
So from (5.4), we also get
P
exp(2K)−1
∂ϕi (x)
1
j6=i (−r + 1 + exp(2rK)−1 )µ(xi + xj )
P
≥1+
∂xi
(r − 1)
j6=i µ(xi + xj )
1
exp(2K) − 1
=
(r − 1) exp(2rK) − 1
≥ δ,
as required.
We are now ready to prove Theorem 3.3.
Proof of Theorem 3.3: By the mean-value theorem for vector-valued functions [24,
p. 341], for any θ, θ0 ∈ Rn we can write
ϕ(θ) − ϕ(θ0 ) = J(θ, θ0 )(θ − θ0 ),
(5.7)
where J(θ, θ0 ) is the Jacobian matrix defined in (5.3). Since kJ(θ, θ0 )k∞ = 1 (Lemma 5.7),
this gives us
kϕ(θ) − ϕ(θ0 )k∞ ≤ kθ − θ0 k∞ .
(5.8)
First suppose there is a solution θ̂ to the system of equations (3.3), so θ̂ is a fixed point
of ϕ. Then by setting θ = θ(k) and θ0 = θ̂ to the inequality above, we obtain
kθ(k+1) − θ̂k∞ ≤ kθ(k) − θ̂k∞ .
(5.9)
In particular, this shows that kθ(k) k∞ ≤ K for all k ∈ N0 , where K := 2kθ̂k∞ + kθ(0) k∞ . By
Lemma 5.8, this implies J(θ(k) , θ̂) ∈ Ln (δ) for all k ∈ N0 , where δ is given by (5.6). Another
application of the mean-value theorem gives us
θ(k+2) − θ̂ = J(θ(k+1) , θ̂) J(θ(k) , θ̂) (θ(k) − θ̂),
so by Lemma 5.7,
kθ(k+2) − θ̂k∞ ≤ kJ(θ(k+1) , θ̂) J(θ(k) , θ̂)k∞ kθ(k) − θ̂k∞ ≤ 1 − δ 2 kθ(k) − θ̂k∞ .
Unrolling the recursive bound above and using (5.9) gives us
kθ(k) − θ̂k∞ ≤ (1 − δ 2 )bk/2c kθ(0) − θ̂k∞ ≤ (1 − δ 2 )(k−1)/2 kθ(0) − θ̂k∞ ,
√
which proves (3.7) with τ = 1 − δ 2 .
Now suppose the system of equations (3.3) does not have a solution, and suppose the
contrary that the sequence {θ(k) } does not have a divergent subsequence. This means {θ(k) }
is a bounded sequence, so there exists K > 0 such that kθ(k) k∞ ≤ K for all k ∈ N0 . Then
CHAPTER 5. PROOFS OF MAIN RESULTS
45
by Lemma 5.8, J(θ(k) , θ(k+1) ) ∈ Ln (δ) for all k ∈ N0 , where δ is given by (5.6). In particular,
by the mean value theorem and Lemma 5.7, we get for all k ∈ N0 ,
kθ(k+3) − θ(k+2) k∞ ≤ (1 − δ 2 )kθ(k+1) − θ(k) k∞ .
P
(k+1)
This implies ∞
− θ(k) k∞ < ∞, which means {θ(k) } is a Cauchy sequence. Thus,
k=0 kθ
the sequence {θ(k) } converges to a limit, say θ̂, as k → ∞. This limit θ̂ is necessarily a
fixed point of ϕ, as well as a solution to the system of equations (3.3), contradicting our
assumption. Hence we conclude that {θ(k) } must have a divergent subsequence.
A little computation based on the proof above gives us the following result, which will
be useful in the proof of Theorem 3.4.
Proposition 5.9. Assume the same setting as in Theorem 3.3, and assume the MLE equation 3.3 has a unique solution θ̂. Then
kθ(0) − θ̂k∞ ≤
2 (0)
kθ − θ(1) k∞ ,
δ2
where δ is given by (5.6) with K = 2kθ̂k∞ + kθ(0) k∞ .
Proof. With the same notation as in the proof of Theorem 3.3, by applying the mean-value
theorem twice and using the bound in Lemma 5.7, for each k ≥ 0 we have
kθ(k+3) − θ(k+2) k∞ ≤ (1 − δ 2 )kθ(k+1) − θ(k) k∞ .
Therefore, since {θ(k) } converges to θ̂,
kθ
(0)
− θ̂k∞ ≤
∞
X
kθ(k) − θ(k+1) k∞
k=0
1
(0)
(1)
(1)
(2)
kθ
−
θ
k
+
kθ
−
θ
k
∞
∞
δ2
2
≤ 2 kθ(0) − θ(1) k∞ ,
δ
≤
where the last inequality follows from (5.8).
Proof of Theorem 3.4
Our proof of Theorem 3.4 follows the outline of the proof of Theorem 1.3 in [8]. Recall
that W is the set of graphical sequences, and the MLE equation (3.3) has a unique solution
θ̂ ∈ Rn if and only if d ∈ conv(W)◦ . We first present a few preliminary results. We will also
use the properties of the mean function µ as described in Lemma 5.4.
The following property is based on [8, Lemma 4.1].
CHAPTER 5. PROOFS OF MAIN RESULTS
46
Lemma 5.10. Let d ∈ conv(W) with the properties that
c2 (r − 1)(n − 1) ≤ di ≤ c1 (r − 1)(n − 1),
i = 1, . . . , n,
(5.10)
and
min

X
B⊆{1,...,n}, 
j ∈B
/
|B|≥c22 (n−1)

X 
min{dj , (r − 1)|B|} + (r − 1)|B|(|B| − 1) −
di ≥ c3 n2 ,

(5.11)
i∈B
where c1 , c2 ∈ (0, 1) and c3 > 0 are constants. Then the MLE equation (3.3) has a solution
θ̂ with the property that kθ̂k∞ ≤ C, where C ≡ C(c1 , c2 , c3 ) is a constant that only depends
on c1 , c2 , c3 .
Proof. First assume θ̂ exists, so θ̂ and d satisfy
X
µ(θ̂i + θ̂j ),
di =
i = 1, . . . , n.
j6=i
Let
dmax = max di ,
1≤i≤n
dmin = min di ,
1≤i≤n
θ̂max = max θ̂i ,
1≤i≤n
θ̂min = min θ̂i ,
1≤i≤n
and let i∗ , j ∗ ∈ {1, . . . , n} be such that θ̂i∗ = θ̂max and θ̂j ∗ = θ̂min .
We begin by observing that since µ is a decreasing function and we have the assumption (5.10),
X
dmin
di∗
1
c2 (r − 1) ≤
≤
=
µ(θ̂max + θ̂j ) ≤ µ(θ̂max + θ̂min ),
n−1
n−1
(n − 1) j6=i∗
so
θ̂max + θ̂min ≤ µ−1 c2 (r − 1) .
Thus, if we have a lower bound on θ̂min by a constant, then we also get a constant upper
bound on θ̂max and we are done.
We now proceed to prove the lower bound θ̂min ≥ −C. If θmin ≥ 0, then there is nothing
to prove, so let us assume that θ̂min < 0. We claim the following property.
Claim. If θ̂min satisfies µ(θ̂min /2) ≥ c1 (r − 1) and µ(θ̂min /4) ≥ (r − 1)/(1 + c2 ), then the
set A = {i : θ̂i ≤ θ̂min /4} has |A| ≥ c22 (n − 1).
Proof of claim: Let S = {i : θ̂i < −θ̂min /2} and m = |S|. Note that j ∗ ∈ S since
θ̂j ∗ = θ̂min < 0, so |m| ≥ 1. Then using the property that µ is a decreasing function and the
assumption on µ(θ̂min /2), we obtain
X
c1 (r − 1)(n − 1) ≥ dmax ≥ dj ∗ =
µ(θ̂min + θ̂i )
i6=j ∗
≥
X
i∈S\{j ∗ }
µ(θ̂min + θ̂i ) > (m − 1) µ
θ̂min
2
!
≥ c1 (r − 1)(m − 1).
CHAPTER 5. PROOFS OF MAIN RESULTS
47
This implies m < n, which means there exists i ∈
/ S, so θ̂i ≥ −θ̂min /2 > 0.
Let Si = {j : j 6= i, θ̂j > −θ̂i /2}, and let mi = |Si |. Then, using the properties that µ is
decreasing and bounded above by r − 1, and using the assumption on µ(θ̂min /4), we get
X
X
c2 (r − 1)(n − 1) ≤ dmin ≤ di =
µ(θ̂i + θ̂j ) +
µ(θ̂i + θ̂j )
j∈Si
< mi µ
θ̂i
2
j ∈S
/ i ,j6=i
!
+ (n − 1 − mi )(r − 1)
= (n − 1)(r − 1) − mi
r−1−µ
θ̂i
= (n − 1)(r − 1) − mi µ −
2
≤ (n − 1)(r − 1) − mi µ
≤ (n − 1)(r − 1) −
θ̂i
2
!!
!
θ̂min
4
!
mi (r − 1)
.
1 + c2
Rearranging the last inequality above gives us mi ≤ (1 − c22 )(n − 1).
Note that for every j 6= Si , j 6= i, we have θ̂j ≤ −θ̂i /2 ≤ θ̂min /4. Therefore, if A =
{j : θ̂j ≤ θ̂min /4}, then we see that Sic \ {i} ⊆ A, so
|A| ≥ |Sic \ {i}| = n − mi − 1 ≥ c22 (n − 1),
as desired.
Now assume
θ̂min
r−1
−1
−1
≤ min 2µ (c1 (r − 1)), 4µ
, −16 ,
1 + c2
for otherwise we are done. Then µ(θ̂min /2) ≥ c1 (r − 1) and µ(θ̂min /4) ≥ (r − 1)/(1 + c2 ), so
by the claim above, the size of the set A = {i : θ̂i ≤ θ̂min /4} is at least c22 (n − 1). Let
q
h = −θ̂min > 0,
and for integers 0 ≤ k ≤ dh/16e − 1, define the set
1
1
Dk = i : − θ̂min + kh ≤ θ̂i < − θ̂min + (k + 1)h .
8
8
CHAPTER 5. PROOFS OF MAIN RESULTS
48
Since the sets {Dk } are disjoint, by the pigeonhole principle we can find an index 0 ≤ k ∗ ≤
dh/16e − 1 such that
n
16n
|Dk∗ | ≤
≤
.
dh/16e
h
Fix k ∗ , and consider the set
1
1
∗
h .
B = i : θ̂i ≤ θ̂min − k +
8
2
Note that θ̂min /4 ≤ θ̂min /8 − (k ∗ + 1/2)h, which implies A ⊆ B, so |B| ≥ |A| ≥ c22 (n − 1).
For 1 ≤ i ≤ n, define
X
dB
=
µ(θ̂i + θ̂j ),
i
j∈B\{i}
and observe that
X
dB
j =
XX
µ(θ̂i + θ̂j ) =
(5.12)
i∈B
j ∈B
/ i∈B
j ∈B
/
X
(di − dB
i ).
We note that for i ∈ B we have θ̂i ≤ θ̂min /8, so
X X X
B
r − 1 − µ(θ̂i + θ̂j )
di =
(r − 1)|B|(|B| − 1) −
i∈B
i∈B j∈B\{i}
≤ |B|(|B| − 1) r − 1 − µ
θ̂min
= |B|(|B| − 1) µ −
4
θ̂min
4
!!
!
2
≤n µ
(5.13)
h2
4
,
where in the last inequality we have used the definition h2 = −θ̂min > 0. Now take j ∈
/ B,
∗
so θ̂j > θ̂min /8 − (k + 1/2)h. We consider three cases:
1. If θ̂j ≥ −θ̂min /8 + (k ∗ + 1)h, then for every i ∈
/ B, we have θ̂i + θ̂j ≥ h/2, so
X
h
B
B
µ(θ̂j + θ̂i ) ≤ nµ
.
min{dj , (r − 1)|B|} − dj ≤ dj − dj =
2
i∈B,i6
/
=j
2. If θ̂j ≤ −θ̂min /8 + k ∗ h, then for every i ∈ B, we have θ̂i + θ̂j ≤ −h/2, so
X
min{dj , (r − 1)|B|} − dB
≤
(r
−
1)|B|
−
µ(θ̂j + θ̂i )
j
i∈B
h
h
h
≤ (r − 1)|B| − |B| µ −
= |B| µ
≤ nµ
.
2
2
2
CHAPTER 5. PROOFS OF MAIN RESULTS
49
3. If −θ̂min /8 + k ∗ h ≤ θ̂j ≤ −θ̂min /8 + (k ∗ + 1)h, then j ∈ Dk∗ , and in this case
min{dj , (r − 1)|B|} − dB
j ≤ (r − 1)|B| ≤ n(r − 1).
There are at most n such indices j in both the first and second cases above, and there are
at most |Dk∗ | ≤ 16n/h such indices j in the third case. Therefore,
X
16n2 (r − 1)
h
B
2
min{dj , (r − 1)|B|} − dj ≤ n µ
+
.
2
h
j ∈B
/
Combining this bound with (5.13) and using (5.12) give us
2
X
X
h
16n2 (r − 1)
h
2
2
di ≤ n µ
min{dj , (r − 1)|B|} + (r − 1)|B|(|B| − 1) −
+n µ
+
.
2
h
4
i∈B
j ∈B
/
Assumption (5.11) tells us that the left hand side of the inequality above is bounded below
by c3 n2 , so we obtain
2
h
h
16(r − 1)
+µ
µ
+
≥ c3 .
2
h
4
The left hand side is a decreasing function of h > 0, so the bound above tells us that h ≤
C(c3 ) for a constant C(c3 ) that only depends on c3 (and r), and so θ̂min = −h2 ≥ −C(c3 )2 ,
as desired.
Showing existence of θ̂. Now let d ∈ conv(W) satisfy (5.10) and (5.11). Let {d(k) }k≥0
be a sequence of points in conv(W)◦ converging to d, so by Proposition 2.1, for each k ≥ 0
there exists a solution θ̂(k) ∈ Rn to the MLE equation (3.3) with d(k) in place of d. Since
d satisfy (5.10), (5.11), and d(k) → d, for all sufficiently large k, d(k) also satisfy (5.10)
and (5.11) with some constants c01 , c02 , c03 depending on c1 , c2 , c3 . The preceding analysis then
shows that kθ̂(k) k∞ ≤ C for all sufficiently large k, where C ≡ C(c01 , c02 , c03 ) = C(c1 , c2 , c3 ) is a
constant depending on c1 , c2 , c3 . This means {θ̂(k) }k≥0 is a bounded sequence, so it contains
a convergent subsequence {θ̂(ki ) }ki ≥0 , say θ̂(ki ) → θ̂. Then kθ̂k∞ ≤ C, and since θ̂(ki ) is a
solution to the MLE equation (3.3) for d(ki ) , θ̂ is necessarily a solution to (3.3) for d, and
we are done.
We are now ready to prove Theorem 3.4.
Proof of Theorem
3.4: Let d∗ = (d∗1 , . . . , d∗n ) denote the expected degree sequence under
P
P∗θ , so d∗i = j6=i µ(θi + θj ). Since −2M ≤ θi + θj ≤ 2M and µ is a decreasing function, we
see that
(n − 1) µ(2M ) ≤ d∗i ≤ (n − 1) µ(−2M ), i = 1, . . . , n.
(5.14)
For B ⊆ {1, . . . , n}, let
g(d∗ , B) =
X
j ∈B
/
min{d∗j , (r − 1)|B|} + (r − 1)|B|(|B| − 1) −
X
i∈B
d∗i ,
CHAPTER 5. PROOFS OF MAIN RESULTS
50
and similarly for g(d, B). Using the notation (d∗j )B as introduced in the proof of Lemma 5.10,
we notice that for j ∈
/ B,
X
X
d∗j =
µ(θj + θi ) ≥
µ(θj + θi ) = (d∗j )B ,
i∈B
i6=j
and similarly,
(r − 1)|B| ≥
X
µ(θj + θi ) = (d∗j )B .
i∈B
Therefore, using the relation (5.12), we see that
X
X
g(d∗ , B) ≥
(d∗j )B + (r − 1)|B|(|B| − 1) −
d∗i
i∈B
j ∈B
/
= (r − 1)|B|(|B| − 1) −
X
(d∗i )B
i∈B
=
X X
r − 1 − µ(θi + θj ))
i∈B j∈B\{i}
≥ |B|(|B| − 1) r − 1 − µ(−2M )
= |B|(|B| − 1) µ(2M ).
We now recall that the edge weights (Aij ) are independent random variables taking
values in {0, 1, . . . , r − 1}, with Eθ [Aij ] = µ(θi + θj ). By Hoeffding’s inequality [16], for each
i = 1, . . . , n we have
!
!
r
r
kn
log
n
k(n
−
1)
log
n
P |di − d∗i | ≥ (r − 1)
≤ P |di − d∗i | ≥ (r − 1)
2
2
s
!
1 X
k log n
(Aij − µ(θi + θj )) ≥ (r − 1)
=P n − 1
2(n − 1)
j6=i
2(n − 1) (r − 1)2 k log n
≤ 2 exp −
·
(r − 1)2
2(n − 1)
2
= k.
n
k−1
Therefore,
we have kd − d∗ k∞ ≤
p by union bound, with probability at least 1 − 2/n
(r − 1) kn log n/2. Assume we are in this situation. Then from (5.14) we see that for all
i = 1, . . . , n,
r
r
kn log n
kn log n
(n − 1) µ(2M ) − (r − 1)
≤ di ≤ (n − 1) µ(−2M ) + (r − 1)
.
2
2
Thus, for sufficiently large n, we have
c2 (r − 1)(n − 1) ≤ di ≤ c1 (r − 1)(n − 1),
i = 1, . . . , n.
CHAPTER 5. PROOFS OF MAIN RESULTS
with
c1 =
3µ(−2M )
,
2(r − 1)
51
c2 =
µ(2M )
.
2(r − 1)
Moreover,
it is easy to see that for every B ⊆ {1, . . . , n} we have |g(d, B) − g(d∗ , B)| ≤
Pn
∗
∗
∗
i=1 |di − di | ≤ nkd − d k∞ . Since we already know that g(d , B) ≥ |B|(|B| − 1) µ(2M ),
this gives us
r
kn3 log n
∗
∗
.
g(d, B) ≥ g(d , B) − nkd − d k∞ ≥ |B|(|B| − 1)µ(2M ) − (r − 1)
2
Thus, for |B| ≥ c22 (n − 1) and for sufficiently large n, we have g(d, B) ≥ c3 n2 with c3 =
1 4
c µ(2M ).
2 2
We have shown that d satisfies the properties (5.10) and (5.11), so by Lemma 5.10, the
MLE θ̂ exists and satisfies kθ̂k∞ ≤ C, where the constant C only depends on M (and r).
Assume further that C ≥ M , so kθk∞ ≤ C as well.
To bound the deviation of θ̂ from θ, we use the convergence rate in the iterative algorithm
to compute θ̂. Set θ̂(0) = θ in the algorithm in Theorem 3.3, so by Proposition 5.9, we have
kθ̂ − θk∞ ≤
2
kθ − ϕ(θ)k∞ ,
δ2
(5.15)
where δ is given by (5.6) with K = 2kθ̂k∞ + kθk∞ ≤ 3C. From the definition of ϕ (3.5), we
see that for each 1 ≤ i ≤ n,
!
X
1
1
di
θi − ϕi (θ) =
log di − log
µ(θi + θj ) =
log ∗ .
r−1
r−1
di
j6=i
Noting that (y − 1)/y ≤ log y ≤ y − 1 for y > 0, we have | log(di /d∗i )| ≤ |di − d∗i |/ min{di , d∗i }.
Using the bounds on kd − d∗ k∞ and di , d∗i that we have developed above, we get
kd − d∗ k∞
min{mini di , mini d∗i }
r
kn log n
2
≤ (r − 1)
·
2
µ(2M ) (n − 1)
r
2(r − 1) k log n
≤
.
µ(2M )
n
kθ − ϕ(θ)k∞ ≤
Plugging this bound to (5.15) gives us the desired result.
5.3
Proofs for the continuous weighted graphs
In this section we present the proofs of the results presented in Section 3.2.
CHAPTER 5. PROOFS OF MAIN RESULTS
52
Proof of Theorem 3.6
Clearly if (d1 , . . . , dn ) ∈ Rn0 is a graphical sequence, then so is (dπ(1) , . . . , dπ(n) ), for any
permutation π of {1, . . . , n}. Thus, without loss of generality we can assume d1 ≥ d2 ≥
· · · ≥ dn , and in this case condition (3.9) reduces to
d1 ≤
n
X
di .
(5.16)
i=2
First suppose (d1 , . . . , dn ) ∈ Rn0 is graphic, so it is the degree sequence of a graph with
adjacency matrix a = (aij ). Then condition (5.16) is satisfied since
d1 =
n
X
i=2
a1i ≤
n X
X
i=2 j6=i
aij =
n
X
di .
i=2
For the converse direction, we first note the following easy properties of weighted graphical
sequences:
(i) The sequence (c, c, . . . , c) ∈ Rn0 is graphic for any c ∈ R0 , realized by the “cycle graph”
with weights ai,i+1 = c/2 for 1 ≤ i ≤ n − 1, a1n = c/2, and aij = 0 otherwise.
(ii) A sequence d = (d1 , . . . , dn ) ∈ Rn0 satisfying (5.16) with an equality is graphic, realized
by the “star graph” with weights a1i = di for 2 ≤ i ≤ n and aij = 0 otherwise.
0
(iii) If d = (d1 , . . . , dn ) ∈ Rn0 is graphic, then so is d = (d1 , . . . , dn , 0, . . . , 0) ∈ Rn0 for any
n0 ≥ n, realized by inserting n0 − n isolated vertices to the graph that realizes d.
(iv) If d(1) , d(2) ∈ Rn0 are graphic, then so is d(1) + d(2) , realized by the graph whose edge
weights are the sum of the edge weights of the graphs realizing d(1) and d(2) .
We now prove the converse direction by induction on n. For the base case n = 3, it is
easy to verify that (d1 , d2 , d3 ) with d1 ≥ d2 ≥ d3 ≥ 0 and d1 ≤ d2 + d3 is the degree sequence
of the graph G with edge weights
1
a12 = (d1 + d2 − d3 ) ≥ 0,
2
1
a13 = (d1 − d2 + d3 ) ≥ 0,
2
1
a23 = (−d1 + d2 + d3 ) ≥ 0.
2
Assume that the claim holds for n − 1; we will prove it also holds for n. So suppose we have
a sequence d = (d1 , . . . , dn ) with d1 ≥ d2 ≥ · · · ≥ dn ≥ 0 satisfying (5.16), and let
!
n
X
1
K=
di − d1 ≥ 0
n − 2 i=2
If K = 0 then (5.16) is satisfied with an equality, and by property (ii) we know that d is
graphic. Now assume K > 0. We consider two possibilities.
CHAPTER 5. PROOFS OF MAIN RESULTS
53
1. Suppose K ≥ dn . Then we can write d = d(1) + d(2) , where
d(1) = (d1 − dn , d2 − dn , . . . , dn−1 − dn , 0) ∈ Rn0
and
d(2) = (dn , dn , . . . , dn ) ∈ Rn0 .
Pn−1
The assumption K ≥ dn implies d1 −dn ≤ i=2
(di −dn ), so (d1 −dn , d2 −dn , . . . , dn−1 −
n−1
dn ) ∈ R0 is a graphical sequence by induction hypothesis. Thus, d(1) is also graphic
by property (iii). Furthermore, d(2) is graphic by property (i), so d = d(1) + d(2) is
also a graphical sequence by property (iv).
2. Suppose K < dn . Then write d = d(3) + d(4) , where
d(3) = (d1 − K, d2 − K, . . . , dn − K) ∈ Rn0
and
d(4) = (K, K, . . . , K) ∈ Rn0 .
P
By construction, d(3) satisfies d1 −K = ni=2 (di −K), so d(3) is a graphical sequence by
property (ii). Since d(4) is also graphic by property (i), we conclude that d = d(3) +d(4)
is graphic by property (iv).
This completes the induction step and finishes the proof of Theorem 3.6.
Proof of Lemma 3.7
We first prove that W is convex. Given d = (d1 , . . . , dn ) and d0 = (d01 , . . . , d0n ) in W, and
given 0 ≤ t ≤ 1, we note that
max tdi + (1 − t)d0i ≤ t max di + (1 − t) max d0i
1≤i≤n
1≤i≤n
n
X
1
≤ t
2
=
i=1
n
1X
2
1≤i≤n
n
X
1
di + (1 − t)
2
d0i
i=1
tdi + (1 − t)d0i ,
i=1
which means td + (1 − t)d0 ∈ W.
Next, recall that we already have M ⊆ conv(W) = W from Proposition 2.2, so to
conclude M = W it remains to show that W ⊆ M. Given d ∈ W, letPG be a graph
that realizes d and let w = (wij ) be the edge weights of G, so that di = j6=i wij for all
(n)
i = 1, . . . , n. Consider a distribution P on R0 2 that assigns each edge weight Aij to be an
independent exponential random variable with mean parameter wij , so P has density
Y 1
aij
(n)
p(a) =
exp −
, a = (aij ) ∈ R0 2 .
wij
wij
{i,j}
CHAPTER 5. PROOFS OF MAIN RESULTS
54
Then by construction, we have EP [Aij ] = wij and
X
X
EP [degi (A)] =
EP [Aij ] =
wij = di ,
j6=i
i = 1, . . . , n.
j6=i
This shows that d ∈ M, as desired.
Proof of Theorem 3.9
We first prove that the MLE θ̂ exists almost surely. Recall from the discussion in Section 3.2
that θ̂ exists if and only if d ∈ M◦ . Clearly d ∈ W since d is the degree sequence of the
sampled graph G. Since M = W (Lemma 3.7), we see that the MLE θ̂ does not exist if and
only if d ∈ ∂M = M \ M◦ , where
)
(
n
X
1
d0 .
∂M = d0 ∈ Rn0 : min d0i = 0 or max d0i =
1≤i≤n
1≤i≤n
2 i=1 i
In particular, note that ∂M has Lebesgue measure 0. Since the distribution P∗ on the edge
weights A = (Aij ) is continuous (being a product of exponential distributions) and d is a
continuous function of A, we conclude that P∗ (d ∈ ∂M) = 0, as desired.
We now prove the consistency of θ̂. Recall that θ is the true parameter that we wish to
estimate, and that the MLE θ̂ satisfies −Z(θ̂) = d. Let d∗ = −∇Z(θ) denote the expected
degree sequence of the maximum entropy distribution P∗θ . By the mean value theorem for
vector-valued functions [24, p. 341], we can write
d − d∗ = ∇Z(θ) − ∇Z(θ̂) = J(θ − θ̂).
(5.17)
Here J is a matrix obtained by integrating (element-wise) the Hessian ∇2 Z of the logpartition function on intermediate points between θ and θ̂:
Z 1
J=
∇2 Z(tθ + (1 − t)θ̂) dt.
0
Recalling that −∇Z(θ) = Eθ [deg(A)], at any intermediate point ξ ≡ ξ(t) = tθ + (1 − t)θ̂,
we have
X
X 1
.
∇Z(ξ) i = −
µ(ξi + ξj ) = −
ξi + ξj
j6=i
j6=i
Therefore, the Hessian ∇2 Z is given by
∇2 Z(ξ) ij =
and
X
∇2 Z(ξ) ii =
j6=i
1
(ξi + ξj )2
i 6= j,
X
1
2
=
∇
Z(ξ)
.
ij
(ξi + ξj )2
j6=i
CHAPTER 5. PROOFS OF MAIN RESULTS
55
Since θ, θ0 ∈ Θ and we assume θi + θj ≤ M , it follows that for i 6= j,
0 < ξi + ξj ≤ max{θi + θj , θ̂i + θ̂j } ≤ max{M, 2kθ̂k∞ } ≤ M + 2kθ̂k∞ .
Therefore, the Hessian ∇2 Z is a diagonally balanced matrix with off-diagonal entries bounded
below by 1/(M + 2kθ̂k∞ )2 . In particular, J is also a symmetric, diagonally balanced matrix with off-diagonal entries bounded below by 1/(M + 2kθ̂k∞ )2 , being an average of such
matrices. By Theorem 4.1, J is invertible and its inverse satisfies the bound
kJ −1 k∞ ≤
(M + 2kθ̂k∞ )2 (3n − 4)
2
≤ (M + 2kθ̂k∞ )2 ,
2(n − 1)(n − 2)
n
where the last inequality holds for n ≥ 7. Inverting J in (5.17) and applying the bound on
kJ −1 k∞ gives
2
(M + 2kθ̂k∞ )2 kd − d∗ k∞ .
(5.18)
n
P
Let A = (Aij ) denote the edge weights of the sampled graph G ∼ P∗θ , so di = j6=i Aij for
i = 1, . . . , n. Moreover,
since d∗ is the expected degree sequence from the distribution P∗θ ,
P
∗
we also have di = j6=i 1/(θi + θj ). Recall that Aij is an exponential random variable with
rate λ = θi + θj ≥ L, so by Lemma 5.2, Aij − 1/(θi + θj ) is sub-exponential with parameter
2/(θi + θj ) ≤ 2/L. For each i = 1, . . . , n, the random variables (Aij − 1/(θi + θj ), j 6= i) are
independent sub-exponential random variables, so we can apply the concentration inequality
in Theorem 5.1 with κ = 2/L and
1/2
4k log n
=
.
γ(n − 1)L2
p
Assume n is sufficiently large such that /κ = k log n/γ(n − 1) ≤ 1. Then by Theorem 5.1,
for each i = 1, . . . , n we have
s
s
!
!
4kn
log
n
4k(n
−
1)
log
n
≤ P |di − d∗i | ≥
P |di − d∗i | ≥
γL2
γL2
!
s
1 X
1
4k log n
=P Aij −
≥
n − 1
θ
γ(n − 1)L2
i + θj
j6=i
L2
4k log n
≤ 2 exp −γ (n − 1) ·
·
4 γ(n − 1)L2
2
= k.
n
By the union bound,
s
s
!
!
n
X
4kn
log
n
4kn
log
n
2
P kd − d∗ k∞ ≥
≤
P |di − d∗i | ≥
≤ k−1 .
2
2
γL
γL
n
i=1
kθ − θ̂k∞ ≤ kJ −1 k∞ kd − d∗ k∞ ≤
CHAPTER 5. PROOFS OF MAIN RESULTS
56
p
Assume for the rest of this proof that kd − d∗ k∞ ≤ 4kn log n/(γL2 ), which happens
with probability at least 1 − 2/nk−1 . From (5.18) and using the triangle inequality, we get
s
4
k log n
kθ̂k∞ ≤ kθ − θ̂k∞ + kθk∞ ≤
(M + 2kθ̂k∞ )2 + M.
(5.19)
L
γn
What we have shown is that for sufficiently large n, kθ̂k∞ satisfies the inequality Gn (kθ̂k∞ ) ≥
0, where Gn (x) is the quadratic function
s
k log n
4
Gn (x) =
(M + 2x)2 − x + M.
L
γn
It is easy to see that for sufficiently large n we have Gn (2M ) < 0 and Gn (log n) < 0. Thus,
Gn (kθ̂k∞ ) ≥ 0 means either kθ̂k∞ < 2M or kθ̂k∞ > log n. We claim that for sufficiently
large n we always have kθ̂k∞ < 2M . Suppose the contrary that there are infinitely many n
for which kθ̂k∞ > log n, and consider one such n. Since θ̂ ∈ Θ we know that θ̂i + θ̂j > 0 for
each i 6= j, so there can be at most one index i with θ̂i < 0. We consider the following two
cases:
1. Case 1: suppose θ̂i ≥ 0 for all i = 1, . . . , n. Let i∗ be an index with θ̂i∗ = kθ̂k∞ > log n.
Then, using the fact that θ̂ satisfies the system of equations (3.8) and θ̂i∗ + θ̂j ≥ θ̂i∗ for
j 6= i∗ , we see that
1 X
1
1
≤
M
n − 1 j6=i∗ θi∗ + θj
X
1
1 1 X
1
1 X
≤
−
+
n − 1 j6=i∗ θi∗ + θj j6=i∗ θ̂i∗ + θ̂j n − 1 j6=i∗ θ̂i∗ + θ̂j
1
1
1 X
=
|d∗i − di | +
n−1
n − 1 j6=i∗ θ̂i∗ + θ̂j
1
1
kd∗ − dk∞ +
n−1
kθ̂k∞
s
1
4kn log n
1
≤
+
,
n−1
γL2
log n
≤
which cannot hold for sufficiently large n, as the last expression tends to 0 as n → ∞.
2. Case 2: suppose θ̂i < 0 for some i = 1, . . . , n, so θ̂j > 0 for j 6= i since θ̂ ∈ Θ. Without
loss of generality assume θ̂1 < 0 < θ̂2 ≤ · · · ≤ θ̂n , so θ̂n = kθ̂k∞ > log n. Following the
CHAPTER 5. PROOFS OF MAIN RESULTS
57
same chain of inequalities as in the previous case (with i∗ = n), we obtain
!
n−1
X
1
1
1
1
1
∗
≤
kd − dk∞ +
+
M
n−1
n − 1 θ̂n + θ̂1 j=2 θ̂j + θ̂n
s
1
4kn log n
1
n−2
≤
+
+
n−1
γL2
(n − 1)(θ̂n + θ̂1 ) (n − 1)kθ̂k∞
s
4kn log n
1
1
1
+
.
≤
+
n−1
γL2
(n − 1)(θ̂n + θ̂1 ) log n
So for sufficiently large n,
1
θ̂1 + θ̂n
≥ (n − 1)
1
1
−
M
n−1
s
4kn log n
1
−
2
γL
log n
!
≥
n
,
2M
and thus θ̂1 + θ̂i ≤ θ̂1 + θ̂n ≤ 2M/n for each i = 2, . . . , n. However, then
s
4kn log n
≥ kd∗ − dk∞ ≥ |d∗1 − d1 |
γL2
n
n
X
X
1
1
+
≥−
θ + θj j=2 θ̂1 + θ̂j
j=2 1
≥−
(n − 1) n(n − 1)
+
,
L
2M
which cannot hold for sufficiently large n, as the right hand side of the last expression
tends to ∞ faster than the left hand side.
The analysis above shows that kθ̂k∞ < 2M for all sufficiently large n. Plugging in this result
to (5.18), we conclude that for sufficiently large n, with probability at least 1 − 2n−(k−1) we
have the bound
s
s
2
4kn
log
n
100M
k log n
2
=
,
kθ − θ̂k∞ ≤ (5M )2
2
n
γL
L
γn
as desired.
5.4
Proofs for the infinite discrete weighted graphs
In this section we prove the results presented in Section 3.3.
CHAPTER 5. PROOFS OF MAIN RESULTS
58
Proof of Theorem 3.10
Without
P loss of generality we may assume d1 ≥ d2 ≥ · · · ≥ dn , so condition (3.11) becomes
d1 ≤ ni=2 di . The necessary partPis easy: if (d
P1 , . . . , dn ) is a degree sequence of a graph G
n
with edge weights aij ∈ N0 , then i=1 di = 2 {i,j} aij is even, and the total weight coming
Pn
out of
Pnvertex 1 is at most i=2 di . For the converse direction, we proceed by induction on
s = i=1 di . The statement is clearly true for s = 0 and s = 2. Assume the statement is true
n
for
Pn some even s ∈ N, and suppose
Pn we are given d = (d1 , . . . , dn ) ∈ N0 with d1 ≥ · · · ≥ dn ,
i=1 di = s + 2, and d1 ≤
i=2 di . Without loss of generality we may assume dn ≥ 1,
for otherwise we can proceed with only the nonzero elements of d. Let 1 ≤ t ≤ n − 1
be the smallest index such that dt > dt+1 , with t = n − 1 if d1 = · · · = dn , and let
d0 = (d1 , . . . , dt−1 , dt − 1, dt+1 , . . . , dn − 1). We will show that d0 is graphic. This will imply
that d is graphic, because if d0 is realized by the graph G0 with edge weights a0ij , then d is
0
realized by the graph G with edge weights atn = a0tn + 1 and aij = aP
ij otherwise.
Pn
n
0
0
0
0
0
0
d
=
Now for d = (d1 , . . . , dn ) given above, we have
d
≥
·
·
·
≥
d
and
i
1
n
i=1 di −2 =
i=1
P
s is even. So it suffices to show that d01 ≤ ni=2 d0i , for then we canPapply the induction
Pn 0
hypothesis to conclude that d0 isP
graphic. If t = 1, then d01 = d1 − 1 ≤ ni=2 dP
i −1 =
i=2 di .
n
n
d
is
even,
d
since
d
≥
1.
In
particular,
since
If
t
>
1
then
d
=
d
,
so
d
<
n P
1P
2
1
i=1 i
i=2 i
Pn
n
n
0
di − d1 =P i=1 di − 2d1 is also even, hence i=2 di − d1 ≥ 2. Therefore, d1 = d1 ≤
Pi=2
n
n
0
i=2 di . This finishes the proof of Theorem 3.10.
i=2 di − 2 =
Proof of Lemma 3.11
Clearly W ⊆ W1 , so conv(W) ⊆ W1 since W1 is closed and convex, by Lemma 3.7. Conversely, let Q denote the set of rational numbers. We will first show that W1 ∩Qn ⊆ conv(W)
and then proceed by a limit argument.
Let d ∈ W1 ∩ Qn , so d = (d1 , . . . , dn ) ∈ Qn with
Pn
1
di ≥ 0 and max1≤i≤n di ≤ 2 i=1 di . Choose K ∈ N large enough such that Kdi ∈ N0
for
. , 2Kdn ) ∈ Nn0 has the property that
Pn all i = 1, . . . , n. Observe that 2Kd = (2Kd1 , .1.P
n
i=1 2Kdi ∈ N0 is even and max1≤i≤n 2Kdi ≤ 2
i=1 2Kdi , so 2Kd ∈ W by definition. Since 0 = (0, . . . , 0) ∈ W as well, all elements along the segment joining 0 and
2Kd lie in conv(W), so in particular, d = (2Kd)/(2K) ∈ conv(W). This shows that
W1 ∩ Qn ⊆ conv(W), and hence W1 ∩ Qn ⊆ conv(W).
To finish the proof it remains to show that W1 ∩ Qn = W1 . On the one hand we have
W1 ∩ Qn ⊆ W1 ∩ Qn = W1 ∩ Rn0 = W1 .
For the other direction, given d ∈ W1 , choose d1 , . . . , dn ∈ W1 such that d, d1 , . . . , dn are in
general position, so that the convex hull C of {d, d1 , . . . , dn } is full dimensional. This can
be done, for instance, by noting that the following n + 1 points in W1 are in general position:
{0, e1 + e2 , e1 + e3 , · · · , e1 + en , e1 + e2 + · · · + en },
(m)
where e1 , . . . , en are the standard basis of Rn . For each m ∈ N and i = 1, . . . , n, choose di
(m)
(m)
on the line segment between d and di such that the convex hull Cm of {d, d1 , . . . , dn } is
CHAPTER 5. PROOFS OF MAIN RESULTS
59
full dimensional and has diameter at most 1/m. Since Cm is full dimensional we can choose
a rational point rm ∈ Cm ⊆ C ⊆ W1 . Thus we have constructed a sequence of rational
points (rm ) in W1 converging to d, which shows that W1 ⊆ W1 ∩ Qn .
Proof of Theorem 3.13
We first address the issue of the existence of θ̂. Recall from the discussion in Section 3.3
that the MLE θ̂ ∈ Θ exists if and only if d ∈ M◦ . Clearly d ∈ W since d is the degree
sequence of the sampled graph G, and W ⊆ conv(W) = M from Proposition 2.2. Therefore,
the MLE θ̂ does not exist if and only if d ∈ ∂M = M \ M◦ , where the boundary ∂M is
explicitly given by
)
(
n
X
1
d0 .
∂M = d0 ∈ Rn0 : min d0i = 0 or max d0i =
1≤i≤n
1≤i≤n
2 i=1 i
Using union bound and the fact that the edge weights Aij are independent geometric random
variables, we have
P(di = 0 for some i) ≤
=
n
X
i=1
n
X
P(di = 0)
P(Aij = 0 for all j 6= i)
i=1
=
n Y
X
(1 − exp(−θi − θj ))
i=1 j6=i
≤ n (1 − exp(−M ))n−1 .
Furthermore, again by union bound,
!
!
!
n
n
X
X
X
1X
di = P di =
dj for some i ≤
P di =
dj .
P max di =
1≤i≤n
2 i=1
i=1
j6=i
j6=i
P
Note that we have di = j6=i dj for some i if and only if the edge weights Ajk = 0 for all
j, k 6= i. This occurs with probability
Y
n−1
P (Ajk = 0 for j, k 6= i) =
(1 − exp(−θj − θk )) ≤ (1 − exp(−M ))( 2 ) .
j,k6=i
j6=k
CHAPTER 5. PROOFS OF MAIN RESULTS
60
Therefore,
n
1X
P(d ∈ ∂M) ≤ P(di = 0 for some i) + P max di =
di
1≤i≤n
2 i=1
!
n−1
≤ n (1 − exp(−M ))n−1 + n (1 − exp(−M ))( 2 )
1
≤ k−1 ,
n
where the last inequality holds for sufficiently large n. This shows that for sufficiently large
n, the MLE θ̂ exists with probability at least 1 − 1/nk−1 .
We now turn to proving the consistency of θ̂. For the rest of this proof, assume that
the MLE θ̂ ∈ Θ exists, which occurs with probability at least 1 − 1/nk−1 . The proof of the
consistency of θ̂ follows the same outline as in the proof of Theorem 3.9. Let d∗ = −∇Z(θ)
denote the expected degree sequence of the distribution P∗θ , and recall that the MLE θ̂
satisfies d = −∇Z(θ̂). By the mean value theorem [24, p. 341], we can write
d − d∗ = ∇Z(θ) − ∇Z(θ̂) = J(θ − θ̂),
(5.20)
where J is the matrix obtained by integrating the Hessian of Z between θ and θ̂,
Z 1
J=
∇2 Z(tθ + (1 − t)θ̂) dt.
0
Let 0 ≤ t ≤ 1, and note that at the point ξ = tθ + (1 − t)θ̂ the gradient ∇Z is given by
X
1
.
∇Z(ξ) i = −
exp(ξ
i + ξj ) − 1
j6=i
Thus, the Hessian ∇2 Z is
∇2 Z(ξ) ij =
and
X
∇2 Z(ξ) ii =
j6=i
exp(ξi + ξj )
(exp(ξi + ξj ) − 1)2
i 6= j,
X
exp(ξi + ξj )
2
=
∇
Z(ξ)
.
ij
(exp(ξi + ξj ) − 1)2
j6=i
Since θ, θ̂ ∈ Θ and we assume θi + θj ≤ M , for i 6= j we have
0 < ξi + ξj ≤ max{θi + θj , θ̂i + θ̂j } ≤ max{M, 2kθ̂k∞ } ≤ M + 2kθ̂k∞ .
This means J is a symmetric, diagonally dominant matrix with off-diagonal entries bounded
below by exp(M + 2kθ̂k∞ )/(exp(M + 2kθ̂k∞ ) − 1)2 , being an average of such matrices. Then
by Theorem 4.1, we have the bound
kJ −1 k∞ ≤
(3n − 4)
(exp(M + 2kθ̂k∞ ) − 1)2
2 (exp(M + 2kθ̂k∞ ) − 1)2
≤
,
2(n − 2)(n − 1)
n
exp(M + 2kθ̂k∞ )
exp(M + 2kθ̂k∞ )
CHAPTER 5. PROOFS OF MAIN RESULTS
61
where the second inequality holds for n ≥ 7. By inverting J in (5.20) and applying the
bound on J −1 above, we obtain
kθ − θ̂k∞ ≤ kJ −1 k∞ kd − d∗ k∞ ≤
2 (exp(M + 2kθ̂k∞ ) − 1)2
kd − d∗ k∞ .
n
exp(M + 2kθ̂k∞ )
(5.21)
P
Let A = (Aij ) denote the edge weights of the sampled graph G ∼ P∗θ , so di = j6=i Aij
for i = 1, . .P
. , n. Since d∗ is the expected degree sequence from the distribution P∗θ , we also
have d∗i = j6=i 1/(exp(θi + θj ) − 1). Recall that Aij is a geometric random variable with
emission probability
q = 1 − exp(−θi − θj ) ≥ 1 − exp(−L),
so by Lemma 5.3, Aij −1/(exp(θi +θj )−1) is sub-exponential with parameter −4/ log(1−q) ≤
4/L. For each i = 1, . . . , n, the random variables (Aij − 1/(exp(θi + θj ) − 1), j 6= i) are
independent sub-exponential random variables, so we can apply the concentration inequality
in Theorem 5.1 with κ = 4/L and
1/2
16k log n
=
.
γ(n − 1)L2
p
Assume n is sufficiently large such that /κ = k log n/γ(n − 1) ≤ 1. Then by Theorem 5.1,
for each i = 1, . . . , n we have
s
s
!
!
16kn log n
16k(n − 1) log n
∗
∗
P |di − di | ≥
≤ P |di − di | ≥
γL2
γL2
!
s
1 X
1
16k log n
=P Aij −
≥
n − 1
exp(θi + θj ) − 1 γ(n − 1)L2
j6=i
L2 16k log n
·
≤ 2 exp −γ (n − 1) ·
16 γ(n − 1)L2
2
= k.
n
The union bound then gives us
s
s
!
!
n
X
2
16kn
log
n
16kn
log
n
P kd − d∗ k∞ ≥
≤
P |di − d∗i | ≥
≤ k−1 .
2
2
γL
γL
n
i=1
p
Assume now that kd − d̂k∞ ≤ 16kn log n/(γL2 ), which happens with probability at
least 1 − 2/nk−1 . From (5.21) and using the triangle inequality, we get
s
8
k log n (exp(M + 2kθ̂k∞ ) − 1)2
kθ̂k∞ ≤ kθ − θ̂k∞ + kθk∞ ≤
+ M.
(5.22)
L
γn
exp(M + 2kθ̂k∞ )
CHAPTER 5. PROOFS OF MAIN RESULTS
62
This means kθ̂k∞ satisfies the inequality Hn (kθ̂k∞ ) ≥ 0, where Hn (x) is the function
s
8
k log n (exp(M + 2x) − 1)2
Hn (x) =
− x + M.
L
γn
exp(M + 2x)
One can easily verify that Hn is a convex function, so Hn assumes the value 0 at most
twice, and moreover, Hn (x) → ∞ as x → ∞. It is also easy to see that for all sufficiently
large n, we have Hn (2M ) < 0 and Hn ( 41 log n) < 0. Therefore, Hn (kθ̂k∞ ) ≥ 0 implies
either kθ̂k∞ < 2M or kθ̂k∞ > 14 log n. We claim that for sufficiently large n we always have
kθ̂k∞ < 2M . Suppose the contrary that there are infinitely many n for which kθ̂k∞ > 14 log n,
and consider one such n. Since θ̂i + θ̂j > 0 for each i 6= j, there can be at most one index i
with θ̂i < 0. We consider the following two cases:
1. Case 1: suppose θ̂i ≥ 0 for all i = 1, . . . , n. Let i∗ be an index with θ̂i∗ = kθ̂k∞ >
1
log n. Then, since θ̂i∗ + θ̂j ≥ θ̂i∗ for j 6= i∗ ,
4
1 X
1
1
≤
exp(M ) − 1
n − 1 j6=i∗ exp(θi∗ + θj ) − 1
X
X
1
1 1
≤
−
n − 1 j6=i∗ exp(θi∗ + θj ) − 1 j6=i∗ exp(θ̂i∗ + θ̂j ) − 1 1 X
1
+
n − 1 j6=i∗ exp(θ̂i∗ + θ̂j ) − 1
1
1
kd − d∗ k∞ +
n−1
exp(kθ̂k∞ ) − 1
s
16kn log n
1
1
≤
+ 1/4
,
2
n−1
γL
n −1
≤
which cannot hold for sufficiently large n, as the last expression tends to 0 as n → ∞.
2. Case 2: suppose θ̂i < 0 for some i = 1, . . . , n, so θ̂j > 0 for j 6= i. Without loss of
generality assume θ̂1 < 0 < θ̂2 ≤ · · · ≤ θ̂n , so θ̂n = kθ̂k∞ > 41 log n. Following the same
chain of inequalities as in the previous case (with i∗ = n), we obtain
!
n−1
X
1
1
1
1
1
≤
kd − d∗ k∞ +
+
exp(M ) − 1
n−1
n − 1 exp(θ̂n + θ̂1 ) − 1 j=2 exp(θ̂j + θ̂n ) − 1
s
1
16kn log n
1
n−2
≤
+
+
n−1
γL2
(n − 1)(exp(θ̂n + θ̂1 ) − 1) (n − 1)(exp(kθ̂k∞ ) − 1)
s
1
16kn log n
1
1
≤
+
.
+
n−1
γL2
(n − 1)(exp(θ̂n + θ̂1 ) − 1) n1/4 − 1
CHAPTER 5. PROOFS OF MAIN RESULTS
63
This implies
1
exp(θ̂1 + θ̂n ) − 1
1
1
−
exp(M ) − 1 n − 1
≥ (n − 1)
≥
s
16kn log n
1
− 1/4
2
γL
n −1
!
n
,
2(exp(M ) − 1)
where the last inequality assumes n is sufficiently large. Therefore, for i = 2, . . . , n,
1
exp(θ̂1 + θ̂i ) − 1
≥
1
exp(θ̂1 + θ̂n ) − 1
≥
n
.
2(exp(M ) − 1)
However, this implies
s
16kn log n
≥ kd − d∗ k∞ ≥ |d1 − d∗1 |
γL2
n
n
X
X
1
1
+
≥−
exp(θ1 + θj ) − 1 j=2 exp(θ̂1 + θ̂n ) − 1
j=2
≥−
(n − 1)
n(n − 1)
+
,
exp(L) − 1 2(exp(M ) − 1)
which cannot hold for sufficiently large n, as the right hand side in the last expression
grows faster than the left hand side on the first line.
The analysis above shows that we have kθ̂k∞ < 2M for all sufficiently large n. Plugging
in this result to (5.21) gives us
s
s
2 (exp(5M ) − 1)2 16kn log n
8 exp(5M ) k log n
≤
.
kθ − θ̂k∞ ≤
n
exp(5M )
γL2
L
γn
Finally, taking into account the issue of the existence of the MLE, we conclude that for
sufficiently large n, with probability at least
1
2
3
1 − k−1
1 − k−1 ≥ 1 − k−1 ,
n
n
n
the MLE θ̂ ∈ Θ exists and satisfies
kθ − θ̂k∞
8 exp(5M )
≤
L
as desired. This finishes the proof of Theorem 3.13.
s
k log n
,
γn
CHAPTER 6. DISCUSSION AND FUTURE WORK
64
Chapter 6
Discussion and future work
In this thesis, we have studied the maximum entropy distribution on weighted graphs
with a given expected degree sequence. In particular, we focused our study on three
classes of weighted graphs: the finite discrete weighted graphs (with edge weights in the
set {0, 1, . . . , r − 1}, r ≥ 2), the infinite discrete weighted graphs (with edge weights in the
set N0 ), and the continuous weighted graphs (with edge weights in the set R0 ). We have
shown that the maximum entropy distributions are characterized by the edge weights being
independent random variables having exponential family distributions parameterized by the
vertex potentials. We also studied the problem of finding the MLE of the vertex potentials,
and we proved the remarkable consistency property of the MLE from only one graph sample.
In the case of finite discrete weighted graphs, we also provided a fast, iterative algorithm
for finding the MLE with a geometric rate of convergence. Finding the MLE in the case of
continuous or infinite discrete weighted graphs can be performed via standard gradient-based
methods, and the bounds that we proved on the inverse Hessian of the log-partition function
can also be used to provide a rate of convergence for these methods. However, it would be
interesting if we can develop an efficient iterative algorithm for computing the MLE, similar
to the case of finite discrete weighted graphs.
Another interesting research direction is to explore the theory of maximum entropy distributions when we impose additional structures on the underlying graph. We can start
with an arbitrary graph G0 on n vertices, for instance a lattice graph or a sparse graph,
and consider the maximum entropy distributions on the subgraphs G of G0 . By choosing
different types of the underlying graphs G0 , we can incorporate additional prior information
from the specific applications we are considering.
Finally, given our initial motivation for this project, we would also like to apply the
theory that we developed in this thesis to applications in neuroscience, in particular, in
modeling the early-stage computations that occur in the retina. There are also other problem
domains where our theory are potentially useful, including applications in clustering, image
segmentation, and modularity analysis.
BIBLIOGRAPHY
65
Bibliography
[1] M. Abeles. Local Cortical Circuits: An Electrophysiological Study. Springer, 1982.
[2] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for Boltzmann
machines. Cognitive Science, 9 (1985), 147–169.
[3] W. Bair and C. Koch. Temporal precision of spike trains in extrastriate cortex of the
behaving macaque monkey. Neural Computation, 8(6):1185–202, August 1996.
[4] M. Bethge and P. Berens. Near-maximum entropy models for binary neural representations of natural images. Advances in Neural Information Processing Systems 20, 2008.
[5] W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Silvestri, M. Viale, and A. Walczak.
Statistical mechanics for natural flocks of birds. Proceedings of the National Academy of
Sciences, 109 (2012), 4786–4791.
[6] D. A. Butts, C. Weng, J. Jin, C. I. Yeh, N. A. Lesica, J. M. Alonso, and G. B. Stanley. Temporal precision in the neural code and the timescales of natural vision. Nature,
449(7158):92–5, 2007.
[7] C. E. Carr. Processing of temporal information in the brain. Annual Review of Neuroscience, 16 (1993), 223–243.
[8] S. Chatterjee, P. Diaconis, and A. Sly. Random graphs with a given degree sequence.
Annals of Applied Probability, 21 (2011), 1400–1435.
[9] S. A. Choudum. A simple proof of the Erdős-Gallai theorem on graph sequences. Bull.
Austr. Math. Soc. 33:67–70, 1986.
[10] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience,
2006.
[11] D. Cvetković and S. Simić. Towards a spectral theory of graphs based on the signless
Laplacian, III. Applicable Analysis and Discrete Mathematics, 4(1):156–166, 2010.
[12] G. Desbordes, J. Jin, C. Weng, N. A. Lesica, G. B. Stanley, and J. M. Alonso. Timing
precision in population coding of natural scenes in the early visual system. PLoS biology,
6(12), December 2008.
BIBLIOGRAPHY
66
[13] P. Erdős and T. Gallai. Graphs with prescribed degrees of vertices. Mat. Lapok, 11
(1960), 264–274.
[14] C. Hillar, S. Lin, and A. Wibisono. Inverses of symmetric, diagonally dominant positive
matrices and applications. http://arxiv.org/abs/1203.6812, 2012.
[15] C. Hillar and A. Wibisono. Maximum
http://arxiv.org/abs/1301.3321, 2013.
entropy
distributions
on
graphs.
[16] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of
the American Statistical Association, 58 (1963), 13–30.
[17] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79 (1982).
[18] J. J. Hopfield. Pattern recognition computation using action potential timing for stimulus
representation. Nature, 376 (1995), 33–36.
[19] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990.
[20] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press,
1991.
[21] E. T. Jaynes. Information theory and statistical mechanics. Physical Review, 106 (1957).
[22] C. R. Johnson and C. Hillar. Eigenvalues of words in two positive definite letters. SIAM
Journal on Matrix Analysis and Applications, 23 (2002), 916–928.
[23] B. E. Kilavik, S. Roux, A. Ponce-Alvarez, J. Confais, S. Grün, and A. Riehle. Long-term
modifications in motor cortical dynamics induced by intensive practice. The Journal of
Neuroscience, 29 (2009), 12653–12663.
[24] S. Lang. Real and Functional Analysis. Springer, 1993.
[25] W. Li. The infinity norm bound for the inverse of nonsingular diagonal dominant matrices. Applied Mathematics Letters, 21 (2008), 258–263.
[26] W. A. Little, The existence of persistent states in the brain. Mathematical Biosciences,
19 (1974), 101–120.
[27] R. C. Liu, S. Tzonev, S. Rebrik, and K. D. Miller. Variability and information in a
neural code of the cat lateral geniculate nucleus. Journal of Neurophysiology, 86 (2001),
2789–2806.
[28] D. M. MacKay and W. S. McCulloch. The limiting information capacity of a neuronal
link. Bulletin of Mathematical Biology, 14 (1952), 127–135.
BIBLIOGRAPHY
67
[29] P. Maldonado, C. Babul, W. Singer, E. Rodriguez, D. Berger, and S. Grün. Synchronization of neuronal responses in primary visual cortex of monkeys viewing natural images.
Journal of Neurophysiology, 100 (2008), 1523–1532.
[30] T. Mora, A. M. Walczak, W. Bialek, and C. G. Callan. Maximum entropy models for
antibody diversity. Proceedings of the National Academy of Sciences, 107 (2010), 5405–
5410.
[31] I. Nemenman, G. D. Lewen, W. Bialek, and R. R. R. van Steveninck. Neural coding of
natural stimuli: information at sub-millisecond resolution. PLoS Computational Biology,
4 (2008).
[32] S. Neuenschwander and W. Singer. Long-range synchronization of oscillatory light responses in the cat retina and lateral geniculate nucleus. Nature, 379 (1996), 728–733.
[33] N. Proudfoot and D. Speyer. A broken circuit ring. Beiträge zur Algebra and Geometrie,
47 (2006), 161–166.
[34] F. Rieke, D. Warland, R. R. van Steveninck, and W. Bialek. Spikes: Exploring the
Neural Code. MIT Press, 1999.
[35] W. Russ, D. Lowery, P. Mishra, M. Yae, and R. Ranganathan. Natural-like function in
artificial WW domains. Nature, 437 (2005), 579–583.
[36] R. Sanyal, B. Sturmfels, and C. Vinzant. The entropic discriminant. Advances in Mathematics, 2013.
[37] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply
strongly correlated network states in a neural population. Nature, 440 (2006), 1007–1012.
[38] C. Shannon, The mathematical theory of communication, Bell Syst. Tech. J., 27 (1948).
[39] P. N. Shivakumar, J. J. Williams, Q. Ye, and C. A. Marinov. On two-sided bounds related
to weakly diagonally dominant M -matrices with application to digital circuit dynamics.
SIAM Journal on Matrix Analysis and Applications, 17 (1996) 298.
[40] J. Shlens, G. D. Field, J. L. Gauthier, M. I. Grivich, D. Petrusca, A. Sher, A. M. Litke,
and E. J. Chichilnisky. The structure of multi-neuron firing patterns in primate retina.
Journal of Neuroscience, 26(32):8254–8266, 2006.
[41] J. Shlens, G. D. Field, J. L. Gauthier, M. Greschner, A. Sher, A. M. Litke, and
E. J. Chichilnisky. The structure of large-scale synchronized firing in primate retina.
Journal of Neuroscience, 29(15):5022–5031, 2009.
[42] M. Socolich, S. Lockless, W. Russ, H. Lee, K. Gardner, and R. Ranganathan. Evolutionary information for specifying a protein fold. Nature, 437 (2005), 512–518.
BIBLIOGRAPHY
68
[43] D. A. Spielman. Algorithms, graph theory, and linear equations in Laplacian matrices.
Proceedings of the International Congress of Mathematicians, 4 (2010), 2698–2722.
[44] D. A. Spielman and S. H. Teng. Nearly-linear time algorithms for graph partitioning,
graph sparsification, and solving linear systems. Proceedings of the Symposium on Theory
of Computing (STOC 2004), 81–90.
[45] D. A. Spielman and S. H. Teng. Nearly-linear time algorithms for preconditioning and
solving symmetric, diagonally dominant linear systems. http://arxiv.org/abs/0808.4134,
2006.
[46] A. Tang, D. Jackson, J. Hobbs, W. Chen, J. Smith, and et al. A maximum entropy
model applied to spatial and temporal correlations from cortical networks in vitro. Journal
of Neuroscience, 28 (2008), 505–518.
[47] G. Tkacik, E. Schneidman, M. Berry, and W. Bialek. Ising models for networks of real
neurons. http://arxiv.org/abs/q-bio/0611072, 2006.
[48] P. J. Uhlhaas, G. Pipa, G. B. Lima, L. Melloni, S. Neuenschwander, D. Nikolić, and
W. Singer. Neural synchrony in cortical networks: history, concept and current status.
Frontiers in Integrative Neuroscience, 3 (2009), 1–19.
[49] J. M. Varah. A lower bound for the smallest singular value of a matrix. Linear Algebra
and its Applications, 11 (1975), 3–5.
[50] R. S. Varga. On diagonal dominance arguments for bounding kA−1 k∞ . Linear Algebra
and its Applications, 14 (1976), 211–217.
[51] R. Vershynin. Compressed sensing, theory and applications. Cambridge University Press,
2012.
[52] J. D. Victor and K. P. Purpura. Nature and precision of temporal coding in visual cortex:
a metric-space analysis. Journal of Neurophysiology, 76(2):1310–1326, 1996.
[53] M. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational
inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, January 2008.
[54] S. Yu, D. Huang, W. Singer, and D. Nikolic. A small world of neuronal synchrony.
Cerebral Cortex, 18 (2008), 2891–2901.

Download Report

Maximum Entropy Distributions on Graphs

Paperzz.com

Your Paperzz