A short course on random matrices Preliminary draft

A short course on random matrices
Preliminary draft
Charles Bordenave
Preprint – Compiled June 17, 2014
2
Chapter 1
Why random matrices ?
1.1
1.1.1
Some motivation (beyond curiosity)
In theoretical physics
Random matrix theory was born with the seminal work of Eugene Wigner in nuclear physics
in the 50’s [Wig55, Wig58]. It still attracts a lot of attention in many branches of physics
including quantum chaos, quantum gravity in two dimensions, . . .
1.1.2
In multivariate statistics
Initiated by Wishart [Wis28] in 1928. Consider X ∈ Rp a centered Gaussian vector with
covariance matrix Σ = EXX ∗ . Let (Xi )i≥1 iid samples. Sample covariance matrix
n
1X
Σn =
Xi Xi∗
n
i=1
For p fixed and n large : usual setting in multivariate statistic. Fundamental theorem of
statistics, as n → ∞, a.s.
kΣn − Σk → 0.
where k · k is a matrix norm. For example operator norm
kAk2→2 = maxp
x∈R
p
kAxk2
= max λi (AA∗ ).
1≤i≤n
kxk2
Random matrix regime : what happens if p = p(n) grows to infinity with n ? (high dimensional data).
For example, assume for simplicity that Σ = Ip and p(n) ∼ cn, with 0 ≤ c ≤ 1, then, a.s.
kΣn − Ip k2→2 → (1 +
√
c)2 − 1.
Also, as p → ∞, a.s.
log | det(Σn − Ip )| =
p
X
log |λi (Σn ) − 1| → −∞
i=1
In the general case : is it possible to estimate the covariance Σ from Σn ?
See notably the monograph by Bai and Silverstein [BS10]. Applications in finance.
3
4
CHAPTER 1. WHY RANDOM MATRICES ?
1.1.3
Numerical analysis
John von Neumann and Herman Goldstine [GvN47, GvN48a, GvN48b]. Test of usual matrix
operations and algorithms on typical matrices : Gram-Schmidt decomposition, SVD, Simplex
algorithm, . . . .
Spielman and Teng [ST04], Edelman and Rao [ER05]. Analysis of X or A + σ 2 X with X
random.
The condition number, for A ∈ Mn,p (R), n ≤ p,
s
λmax (AA∗ )
κ(A) =
λmin (AA∗ )
plays a central role in the analysis of matrix algorithms.
1.1.4
In signal processing
Compressed sensing : x0 ∈ Cn , A ∈ Mp×n (C), p n.
y = Ax0 .
The system is undetermined. We suppose however that the signal x0 is sparse : it contains a
lot of 0, If A are iid Gaussian variables with mean 0 and variance 1/n, then, there exists c > 0
(Candès- Romberg-Tao [CRT06]) with high probability, for any vector x0 with |supp(x)| ≤
cp/ log(n/p),
x0 = arg min |supp(x)| = arg min kxk1
x:Ax=y
x:Ax=y
(`1 -recovery : linear programming works on the RHS). There is no chance that it could work
with
arg min kxk2 .
x:Ax=y
Other available version with noise
y = Ax + b.
What are the other matrices A which satisfy such amazing recovery property ?
1.1.5
In wireless communication
Wireless channel : b ∈ Cn , s ∈ Cp , A ∈ Mn×p (C).
r = As + b,
b is a Gaussian noise vector with covariance Ebb∗ = σ 2 In , s is a sent signal. What is the
Shannon capacity of such communication channel ?
max
Q:Q≥0, p1 TrQ=1
log det(σ 2 In + AQA∗ ).
For some relevant applications : p and n are large and that the channel itself is random : for
example Aij iid (fading).
Another problem : how to effectively reconstruct the signal s1 from the received vector r
? Linear mean square error : ŝ1 = g ∗ r. If A = [x, X] then g = (XX ∗ + ρIn )−1 x. Assume si
mean 0, variance 1, covariance 0. Then the performance of the reconstruction is estimated by
the signal to noise ratio:
E|g ∗ (s1 x)|2
= Ex(XX ∗ + σ 2 In )−1 x∗ .
E|g ∗ (r − s1 x)|2
See the monograph by Tulino and Verdu [TV04].
1.2. MAIN MATRIX MODELS
1.1.6
5
In other branches of mathematics
Connection (or conjectured connection) with analytic number theory, group theory, combinatorics, non-commutative geometry, in high-dimensional geometry . . .
1.2
1.2.1
Main matrix models
Wigner matrix
For 1 ≤ i < j
Xij = X̄ji
iid with law P on C, independent of Xii , i ≥ 1 iid with common law Q on R. Then Xn =
(Xij )1≤i,j≤n is a random Hermitian matrix.
Important cases :
-
√
2Re(Xij ),
(GUE).
√
2Im(Xij ) and Xii are independent N (0, 1) : Gaussian Unitary Ensemble
- P = N (0, 1), Q = N (0, 2) (Var(X11 ) = 2) : Gaussian Orthogonal Ensemble (GOE).
- P = Q is the Bernoulli law.
1.2.2
Wishart matrix
Also, very important (especially for applications) (Yij )i,j≥1 iid. We define the random matrix :
Y = (Yij )1≤i≤n,1≤j≤p .
Wishart matrix :
W = Y ∗ Y.
Important case : Re(Yij ) and Im(Yij ) indepdendent N (0, 1/2) : the law of W is called the
Laguerre ensemble.
1.2.3
Random square matrix
Y is as above with n = p.
Important case : Re(Yij ) and Im(Yij ) indepdendent N (0, 1/2) : the law of Y is then called
the Ginibre ensemble.
1.2.4
Many other matrix models
Random unitary/orthogonal matrices, Yi = (Yij ) ∈ Rp sampled from a more general (nonproduct) law, random distance matrices, . . .
1.3
1.3.1
Some results in random matrix theory
Global regime
Convergence of ESD
If A ∈ Mn (C), eigenvalues λ1 (A), · · · , λn (A) counting multiplicities. We consider the empirical
spectral distribution (ESD)
n
1X
µA =
δλi (A)
n
i=1
6
CHAPTER 1. WHY RANDOM MATRICES ?
This is a global function of the spectrum. We consider the Wigner matrix Xn . The first
fundamental result is Wigner’s semicircular law. It gives the convergence of the empirical
spectral distribution.
To be more precise, we denote by
the weak convergence of probability measures on a
metric space X : µn
µ if for all bounded continuous functions f : X → R,
Z
Z
f dµn → f dµ.
d
d
If Zn has law µn and Z has law µ, we will then write Zn → Z or Zn → µ.
For example, if (Zi )i≥n are iid with common law µ then, a.s.
n
1X
µn =
δZi
n
µ.
i=1
Theorem 1.1 (Semi-circular Law). Assume that Var(X12 ) = 1, then a.s.
µX/√n
where µsc (dx) =
1
2π
√
µsc
4 − x2 1|x|≤2 dx
We will give a few different proofs of this theorem. Note that only the second moment
counts, not the first, neither all the others. This theorem is the analog of a law of large number.
For the case Wishart matrices, we have the Marcenko-Pastur’s Theorem [MP67].
Theorem 1.2 (Marcenko-Pastur Law). Assume that Var(Y11 ) = 1, and p/n → c ∈ [0, 1] then
a.s.
µY Y ∗ /n
µMP
p
√
√
1
where µMP (dx) = 2πx
(x − b− )(b+ − x)1b− ≤x≤b+ dx and b− = (1 − c)2 , b+ = (1 + c)2 .
The circular law deals with the spectrum of random non-Hermitian square matrices. First
found by Metha 1967 [Meh67], Girko [Gir90], Bai [Bai97], . . . , Tao and Vu [TV10a].
Theorem 1.3 (Circular Law). Assume that Var(Y11 ) = 1, and p = n then a.s.
µ Y /√ n
µc
where µc (dxdy) = π1 1|x+iy|≤1 dxdy.
Central Limit Theorem
If (Zi ) is iid with law µ, the central limit theorem asserts that for any function f ∈ L2 (µ),
√
Z
n
Z
f dµn −
n
1 X
d
f dµ = √
(f (Zi ) − Ef (Zi )) → N (0, Varµ (f )).
n
i=1
√
d
In other words weakly n(µn − µ) → W where W is Rthe Gaussian
process
(field) indexed by
R
R
functions f ∈ L2 (µ) with covariance E[hW, f ihW, gi] = f gdµ − f dµ gdµ (This is called the
Brownian bridge).
For Wigner matrices, there is also an analog of this CLT but with a different rate ! Let
f ∈ C01 (R). For GUE matrix, we have
Z
n
f dµX/√n
Z
−
f dµsc
=
n
X
i=1
√
f (λi (X/ n) − n
Z
d
f dµsc → N (0, σ 2 (f ))
1.3. SOME RESULTS IN RANDOM MATRIX THEORY
7
where
1
σ (f ) = 2
4π
2
Z
[−2,2]2
(f (λ1 ) − f (λ2 ))2
4 − λ1 λ2
p
p
dλ1 dλ2 .
2
(λ1 − λ2 )
4 − λ21 4 − λ22
This result is also known for other Wigner matrices: the variance σ 2 (f ) depends on the fourth
cumulant of the law of X12 (recall that κ4 (µ) = E[(X − EX)4 ] − 3E[(X − EX)2 ] where X has
law µ).
Also
R the CLT depends on the regularity of the function f . For example if f = 1I then
Var(n f dµX/√n ) ∼ c(log n) (this is known for matrices of the GUE case and Wigner matrices
with κ4 = 0 and high enough moments, see Gustavsson [Gus05], Dallaporta and Vu [DV11]).
1.3.2
Local regime
We now consider a single or a few eigenvalues. If A ∈ Hn , λn (A) ≤ · · · ≤ λ1 (A).
Edge eigenvalues
Füredi and Komlós [FK81], Yin and Bai [BY88].
Theorem 1.4 (Support convergence). Assume that EX11 = EX12 = 0, Var(X12 ) = 1 and
E|X12 |4 , E|X11 |2 < ∞ then a.s.
√
√
lim λ1 (X/ n) = − lim λn (X/ n) = 2.
n
n
The conditions E|X12 |4 , E|X11 |2 < ∞ are necessary for the convergence to hold.
There are also fluctuation results, for the GUE it is known that
√
d
n2/3 (λ1 (X/ n) − 2) → T W2 .
The distribution T W2 is not a usual probability measure from extremal value theory. It is
the Tracy-Widom distribution, found in 1994, see [TW94]. In other cases, recent results due
to Tao-Vu and Erdős-Schlein-Yau have extended this result to other Wigner matrices, see the
surveys [Erd11], [TV12].
Bulk eigenvalues
Inside the spectrum, define the i-th n-quantile of µsc as the element of [−2, 2] defined by the
formula
Z λ̄i (n)
i
fsc (y)dy = .
n
−2
Wigner semicircle theorem asserts that if −2 < u < 2 and λ̄i (n) = u(1 + o(1)) then a.s.
√
λi (X/ n) − λ̄i (n) → 0.
In the GUE case we have, if −2 < u < 2 and λ̄i (n) = u(1 + o(1)) and
d
√
n
p
λi (X/ n) − λ̄i (n) → N (0, 1)
fsc (u) log n/2π
(Costin and Lebowitz, [CL95], Soshnikov [Sos02]). Also known in some cases in the non-GUE
case. Random eigenvalues are extremely rigid compared to independent variables.
8
CHAPTER 1. WHY RANDOM MATRICES ?
Limit point process
More generally, we could consider the local regime and look at the point process, for u ∈ [−2, 2],
√
{λi ( nX) − un, 1 ≤ i ≤ n}
the statistics of this process. For −2 < u < 2, there is a limit point process which is described
by a stationary determinental point process with kernel sinus cardinal. xxx incomplete.
1.3.3
Invertibility of random matrices and related results
Take n = p and consider the random square matrix Y . Non-asymptotic estimates on
kY k2→2 , kY −1 k2→2
For example assume that Y12 is a ±1 Bernoulli with parameter 1/2, there exists 0 < c < 1, such
that
P( Y is singular ) ≤ cn .
Conjecture of Komlós, Kahn, and Szemerédi
P ( Y is singular ) =
1
+ o(1)
2
n
.
√
Best result to date : Bourgain, Vu and Wood [BVW10] with c = 1/ 2 + o(1).
Spielman and Teng conjecture
√
P nkY −1 k2→2 ≤ t ≤ t + cn .
with a constant in front of t : Rudelson and Vershynin [RV08], Rudelson [Rud13], with a limit
in n on both side : Tao and Vu [TV10b].
For n ≥ p and EY11 = 0, E|Y11 |2 = 1, explicit control on
1
k Y ∗ Y − Ip k2→2 .
n
Some mathematical issues of compressed sensing falls into this category, see Vershynin [Ver12].
Chapter 2
First approach to Wigner’s Theorem
: the method of moments
We will give a first proof of Wigner’s Theorem.
2.1
Method of moments
Let Z be a real random variables with all its moments finite for all integer k ≥ 1, E[Z k ] = mk <
∞. Assume
that there exists a unique probability measure P on R such that for all integer
R
k ≥ 1, xk dP = mk . From Carleman’s Theorem, this is indeed the case if
X
−
1
m2k2k = ∞.
k≥1
If the random variable has bounded support, the Carleman condition is satisfied. Note
that the Weierstrass theorem implies that a random variable with bounded support is uniquely
determined by its moments.
Then, a commonly used method to prove that a sequence of real random variables (Zn )n≥1
converges weakly to a random variable Z is to show that for all integer k ≥ 1, limn E[Znk ] =
E[Z k ] = mk .
Lemma
R k 2.1. Assume
R kthat the law P is uniquely determined by its moments. If for all k ≥ 1,
limn x dPn (x) = x dP then Pn
P.
Proof. We have EZn2 = m2 + o(1). Hence, Zn is tight and from Prohorov’s theorem {L(Zn ), n ≥
1} is relatively compact. Let W be an accumulation point, since the law of Z is uniquely
determined by its moments, it is sufficient to check that for any k ≥ 1, EW k = mk . This
amounts to prove that x 7→ xk is uniformly integrable for (Pn )n≥1 .
Let us check this by hand. Since EZn2k is uniformly bounded, we have P(|Zn | > t) ≤ Ct−2k
and from Portemanteau Theorem P(|W | > t) ≤ Ct−2k . It follows that for any ε > 0, there is
T ≥ 1, such that E|W |k 1|W |≥T < ε and E|Zn |k 1|Zn |≥T < ε. Consider f continuous f (x) = xk
on [−T, T ] and f (x) = ±T on R\[−T, T ]. Then
EW k = Ef (W ) + EW k 1|W |≥T − Ef (W )k 1|W |≥T = Ef (W ) + EW k 1|W |≥T − T k P(|W | ≥ T )
= EZnk + Ef (W ) − Ef (Zn ) + R,
With |R| ≤ 4ε.
There are many drawbacks to this method. First, the random variable Zn needs to have
finite moments of any order for all n large enough. Secondly, the computation of moments can
be tedious. This method is however very robust.
9
10CHAPTER 2. FIRST APPROACH TO WIGNER’S THEOREM : THE METHOD OF MOMENTS
2.2
Graphs, plane rooted tree and Catalan numbers
Let G = (V, E) be a connected graph. A tree is a graph without cycles. Define the excess of G
as
Exc(G) = |E| − |V | − 1
Lemma 2.2. If G is connected then Exc(G) ≥ 0 and Exc(G) = 0 if and only if G is a tree.
Proof. A possible proof by recursion. Otherwise. let u ∈ V be a distinguished vertex and
consider for all v ∈ V \{u} a shortest path {u0 (v), u1 (v), · · · , ukv (v)} from v to u : u0 (v) = v,
ukv (v) = u. Define the mapping σ from V \{u} to E by setting σ(v) = {v, u1 (v)}. Since the
paths are the shortest possible, σ is an injection, and it follows that |V \{u}| ≤ |E|.
In the case of equality |V \{u}| = |E|, σ is a bijection. Note that σ gives an orientation of
the edges of G and, by construction, any vertex in V \{u} has exactly one outgoing oriented
edge and u has no outgoing oriented edge. Observe also that path along oriented edges is a
shortest path to u. In particular, there cannot be an oriented cycle and G must be a tree.
A rooted graph is a graph with a distinguished vertex. In a rooted tree T = (V, E) rooted
at u, the offsprings of v ∈ V \{u} are the set of vertices w ∈ V \{u, v} such that the shortest
path from w to u is (w, v, · · · , u). A rooted plane tree is a tree where the offsprings of each
vertex are ordered.
Definition 2.3 (Depth-first search). Let T be a rooted plane tree. The depth-first search path
in the unique connected closed path starting from the root which visits each edge twice and which
preserves the orientation.
Note, that the depth-first search, gives a bijection ϕ from the vertices {v1 , · · · , v` } of a
rooted plan tree into {1, · · · , `}. The root is send to 1 and the vertices are ordered by order of
appearance in the depth-first search. We say that two rooted plan tree T and T 0 are isomorphic
if ϕ(T ) = ϕ(T 0 ) (where ϕ acts transitively on the edges). We may then define ck as the number
of (isomorphic classes) of rooted plane trees with k edges. An isomorphism class of rooted plan
tree is called an unlabeled rooted plan tree.
We have the recursion equation c0 = 1 (convention) and for k ≥ 1,
ck =
k−1
X
c` ck−`−1 .
(2.2.1)
`=0
(and thanks to the bijection with Dyck paths, ck ≤ 22k ). Now, define the generating function
S(z) =
∞
X
ck z k
k=0
then radius ≥ 1/4 and we find from (2.2.1)
S(z) = 1 + zS(z)2 .
Hence
S(z) =
1±
Since S(0) = 1, we find
S(z) =
1−
(2.2.2)
√
1 − 4z
.
2z
√
1 − 4z
.
2z
By Taylor expansion,
√
1 − 4z = 1 − 2z −
∞
X
2−(k+1) (2k − 1)(2k − 3) · · · (1)
k=1
(k + 1)!
(4z)k+1 .
2.3. MOMENTS OF SEMI-CIRCULAR LAW
Yielding,
S(z) = 1 + 2
11
∞
X
2−(k+1) (2k − 1)(2k − 3) · · · (1)
(k + 1)!
k=1
We find
2.3
(4z)k
2k
(2k)!
k
ck =
=
.
(k + 1)!k!
k+1
Moments of semi-circular law
The moments of the semi-circular law are
m2k+1 = 0
and
m2k = ck =
2.4
2k
k
(2k)!
.
(k + 1)!k!
=
k+1
Computation of moments
We will follow the classical argument, as in [Gui09]. Recall that, for a random probability
measure µ ∈ P(X ), we may define its expectation Eµ ∈ P(X ), as for all Borel sets B,
(Eµ)(B) = E[µ(B)].
We will prove the following statement
2 = 1, X
Proposition 2.4. Assume EX11 = EX12 = 0, EX12
ij real and X11 , X12 has finite
moments of any order. Then
EµX/√n
µsc .
We set
X
A= √ .
n
From the spectral theorem
Z
n
xk dµA =
1X
1
λi (A)k = ETrAk .
n
n
i=1
Hence, in view of lemma 2.1, it is sufficient to prove that
Lemma 2.5. For each integer k,
√
1
ETrAk = mk + O(1/ n).
n
Proof.
1
1
ETrAk = E k/2+1
n
n
=
1
X
(i1 ,··· ,ik ) `=1
X
k
n 2 +1
P (i),
(i1 ,··· ,ik )
where ik+1 = i1 and
P (i) = E
k
Y
`=1
k
Y
Xi` i`+1 .
Xi` i`+1
12CHAPTER 2. FIRST APPROACH TO WIGNER’S THEOREM : THE METHOD OF MOMENTS
Define G(i) = (V (i), E(i)) as the graph (with loops) obtained by setting V (i) = {i1 , · · · , ik }
and E(i) = {{i1 , i2 }, · · · , {ik , i1 }} (by convention, {xj , 1 ≤ j ≤ J} is a set : if x1 = x2 , then
{x1 , x2 } = {x1 }). For e ∈ E(i) we define the multiplicity of the edge e as
me (i) =
k
X
1({i` , i`+1 } = e).
`=1
We fix i. We note that G = G(i) is connected. Note also that, since EX12 = 0, we may restrict
ourselves to graphs such that for all e ∈ E, me ≥ 2 (otherwise P (i) = 0). Therefore, the identity
k=
X
me
e∈E
yields
|E| ≤
k
,
2
where [x] is the integer part of x. By lemma 2.2, we find
k
|V | ≤ |E| + 1 ≤
+ 1.
2
Also, by assumption and Hölder inequality,
P (i) =
Y
(EXeme )
|P (i)| ≤
and
e∈E
Y
1
me
2
= βk .
(E|Xe |2k ) 2k ≤ E|Xe |2k
e∈E
If αk is the number of graphs with vertex set {1, · · · , k}, we get that
[ k2 ]+1
1
ETrAk ≤ αk βk n
.
k
n
n 2 +1
√
In particular, if k is odd the above expression is O(1/ n). If k = 2` is even then


X
1
1
1
2`


ETrA = `+1
,
1 +O
n
n
n
i∈I(n,`)
where I(n, `) is the set of i = (i1 , · · · , i2` ) such that G(i) is a tree with ` edges and the multiplicity of each edge e has multiplicity exactly 2. From definition 2.3, the path i = (i1 , · · · , i2` )
is a depth-first search of a rooted plane tree.
For i ∈ I(n, `), we may map bijectively {1, · · · , ` + 1} into {i1 , · · · , i2` } by setting σ(1) = i1 ,
σ(2) = i2 and σ(u) equal to the u-th distinct element in (i1 , · · · , i2` ). Then T (i) = σ −1 (G(i))
is a rooted plan tree. Let T (`) be the set of unlabelled rooted plane tree on ` + 1 vertices. We
consider the map ϕ : I(n, `) 7→ T (`) which maps i to T (i). This map is surjective and for each
element of T ∈ T (`), |ϕ−1 (T )| = n · · · (n − `) = n`+1 (1 + O( n1 )). We find finally

1
1
ETrA2` = `+1 
n
n

X
T ∈T (`)
The statement follows.
n
`+1
1
(1 + O( )) + O
n
1
1
= |T (`)| + O
.
n
n
2.4. COMPUTATION OF MOMENTS
2.4.1
13
Computation of joints moments
How to improve the above convergence of µA to a.s. ?
2 = 1 and X
Proposition 2.6. Assume EX11 = EX12 = 0, Xij real, EX12
12 has finite moments
of any order. Then, a.s.
µA
µsc .
We could prove the following lemma
Lemma 2.7. For each integer k,
1
k
Var
ETrA = O(n−2 ).
n
Then, it is the classical argument of the proof of the strong LLN with 4 moments. From
the monotone convergence theorem
2
Z
X Z
k
k
E
x dµA − E x dµA < ∞.
n≥0
2
R
R
R
Hence a.s. n≥0
xk µA − E xk µA < ∞ and, a.s. xk µA −E xk µA → 0. The proposition
follows from lemma 2.1.
P
R
Proof of Lemma. We start with

1
1
Var
TrAk
= E k/2+1
n
n
=
1
nk+2
X
k
Y
2
Xi` i`+1 − P (i)
(i1 ,··· ,ik ) `=1
X
P (i, j) − P (i)P (j),
(i1 ,··· ,ik ),(j1 ,··· ,jk )
where
P (i, j) = E
k
Y
Xi` i`+1 Xj` j`+1 .
`=1
We define G(i, j) = (V (i, j), E(i, j), m(i, j)) as the corresponding weighted graph. Note that
P (i, j) − P (i)P (j) = 0 if i and j do not have P
two successive indices in common. Hence we may
restrict to G = G(i, j) connected. We have e∈E me = 2k and me ≥ 2. Hence |E| ≤ k and
|V | ≤ |E| + 1 ≤ k + 1 and we get
k+1 n
1
1
k
Var
TrA = O
=O
.
k+2
n
n
n
Hence, we may restrict ourself to indices such that
|V | = |E| + 1 = k + 1 and me = 2.
Up to changing the root, we can assume without loss of generality that (i1 , i2 ) = (j1 , j2 ).
Consider the path π = (i1 , · · · , ik , i1 ). Since me = 2, we have ik 6= i2 . Hence π is a closed path
in G which contains a cycle. This contradicts the assumption that G is a tree. Therefore, since
does not occur, we have |E| ≤ k − 1 and |V | ≤ k. It follows that
k n
1
1
k
TrA = O
=O
.
Var
k+2
n
n2
n
It concludes the proof.
Remark 2.8. From here, we could also prove a CLT for the moments. See [AGZ10, theorem
2.1.31].
Remark 2.9. Considering oriented graphs instead of graphs, the same proof can be adapted for
the Hermitian case.
14CHAPTER 2. FIRST APPROACH TO WIGNER’S THEOREM : THE METHOD OF MOMENTS
Chapter 3
Perturbation of matrices and
application to concentration
inequalities
We will need more tools to develop the theory.
3.1
3.1.1
Fundamental matrix inequalities
Variational formula for the eigenvalues
We order the eigenvalues of A ∈ Hn non-increasingly
λn (A) ≤ · · · ≤ λ1 (A).
Lemma 3.1 (Courant-Fischer min-max theorem). Let A ∈ Hn (C). Then
λk (A) =
max
min
hAx, xi.
H:dim(H)=k x∈H,kxk2 =1
Proof. Let ui be an eigenvector basis of A associated to λ1 , · · · , λn . We choose H = span(u1 , · · · , uk ).
We find
max
min hAx, xi ≥ λk .
H:dim(H)=k x∈H,kxk2 =1
On the other hand, let H be a vector space such that dim(H) = k. Define S = span(un , · · · , uk )
so that dim(S) = n − k + 1. Since
n ≥ dim(H ∪ S) = dim(H) + dim(S) − dim(S ∩ H)
we find S ∩ H 6= 0. In particular,
min
hAx, xi ≤ λk .
x∈H,kxk2 =1
3.1.2
Interlacing of eigenvalues
An important corollary of the Courant-Fischer min-max theorem is the interlacing of eigenvalues. By convention if A ∈ Hn (C), we set for integer i ≥ 1,
λn+i (A) = −∞
and
15
λ1−i (A) = +∞
(3.1.1)
16
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
Lemma 3.2 (Weak interlacing). Let A, B in Hn (C) and assume that dim(A − B) = r Then,
for any 1 ≤ k ≤ n,
λk+r (A) ≤ λk (B) ≤ λk−r (A).
Proof. We prove λk+r (A) ≤ λk (B). We may assume that k + r ≤ n. By definition, for some
vector space H of dimension k + r,
λk+r (A) =
min
hAx, xi.
x∈H,kxk2 =1
Take H 0 = H ∩ ker(E), where E = A − B. By construction
λk+r (A) ≤
min
hAx, xi =
x∈H 0 ,kxk2 =1
min
hBx, xi ≤ λk0 (B)
x∈H 0 ,kxk2 =1
where k 0 = dim(H 0 ). Now, the inequality,
n − dim(H 0 ) ≤ (n − dim(H)) + dim(im(E))
yields k 0 ≥ k. This concludes the proof of the inequality λk+r (A) ≤ λk (B). For the proof of
λk (B) ≤ λk−r (A), we may assume that k − r ≥ 1. Then, simply replace A and B in the above
argument.
There are variants of the above interlacing inequality. The following can be proved as the
above lemma.
Lemma 3.3 (Strong Interlacing). Let A, B in Hn and assume that A = B +ρxx∗ with kxk2 = 1
and ρ ≥ 0. Then
λn (B) ≤ λn (A) ≤ λn−1 (B) ≤ · · · ≤ λ2 (A) ≤ λ1 (B) ≤ λ1 (A).
Lemma 3.4 (Cauchy law for minor interlacing). Let A, B in Hn and assume that B is a
principal minor of A. Then
λn (A) ≤ λn−1 (B) ≤ · · · ≤ λ2 (A) ≤ λ1 (B) ≤ λ1 (A).
We now give a perturbation inequality which is a consequence of interlacing. For µ, µ0 two
real probability measure, we introduce the Kolomogorov-Smirnov distance
dKS (µ, µ0 ) = sup kµ(−∞, t) − µ0 (−∞, t)k = kFµ − Fµ0 k∞ .
t∈R
The Kolomogorov-Smirnov distance is closely related to functions with bounded variations.
More precisely, for f : R → R the bounded variation distance is defined as
X
kf kBV = sup
|f (xk+1 ) − f (xk )|,
k∈Z
where the supremum is over all sequence (xk )k∈Z with xn ≤ xn+1 . If the f = 1((−∞, t)) then
kf kBV = 1 while if the derivative of f is in L1 (R), we have
Z
kf kBV = |f 0 |
We have a variational formula for the Kolomogorov-Smirnov distance.
Z
Z
dKS (µ, µ0 ) = sup
f dµ − f dµ0 : kf kBV ≤ 1 .
3.1. FUNDAMENTAL MATRIX INEQUALITIES
17
Choosing f = 1((−∞, t)) gives the inequality dKS ≥ · · · . The other way around, if f ∈ C01 (R)
and τ is a continuity point of Fµ , we have the integration by part formula
Z τ
Z τ
f 0 (t)Fµ (t)dt.
f (t)dµ(t) = f (τ )Fµ (τ ) −
−∞
−∞
This yields, letting τ tend to infinity,
Z
Z
Z
0
f (t)dµ(t) − f (t)dµ (t) = − f 0 (t) Fµ (t) − Fµ0 (t) dt.
In particular,
Z
Z
Z
f dµ − f dµ0 ≤ |f 0 (t)|Fµ (t) − Fµ0 (t)dt = kf kBV kFµ − Fµ0 k∞ .
(3.1.2)
Using the density of C01 for k · kBV norm, this inequality can also be extended to any f with
kf kBV < ∞. We will apply it when µ and µ0 are empirical probability measures where it is
easy to check from the definition. We have the following consequence of interlacing.
Lemma 3.5 (Rank inequality for ESD). Let A, B in Hn (C) and assume that dim(A − B) = r
Then,
r
dKS (µA , µB ) ≤ .
n
and for any f with kf kBV < ∞,
Z
Z
f (t)dµA (t) − f (t)dµB (t) ≤ r kf kBV .
n
Proof. Fix t ∈ R. Let k and k 0 be the smallest indices such that λk (A) ≤ t and λk0 (B) < t
(recall our convention (3.1.1)). By lemma 3.2, we find
|k − k 0 | ≤ r.
This yields
0 Fµ (t) − Fµ0 (t) = (n + 1 − k) − (n + 1 − k ) ≤ r .
n
n
This gives the first statement. The second statements follows from (3.1.2).
3.1.3
Hoffman-Wielandt inequality
We now present another matrix inequality which is particularly useful.
Lemma 3.6 (Hoffman-Wielandt inequality). Let A, B in Hn (C),
n
X
|λi (A) − λi (B)|2 ≤ Tr(A − B)2 = kA − Bk2F .
i=1
Proof. Proof in [AGZ10].
For p ≥ 1, µ, µ0 two real probability measure such that
define the Lp -Wasserstein distance as
Z
Wp (µ, µ ) = inf
0
π
R×R
R
p
|x|p dµ and
|x − y| dπ
1
p
R
|x|p dµ0 are finite. We
18
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
where the infimum is over all coupling π of µ and µ0 (i.e. π is probability measure on R×R whose
first marginal is equal to µ and second marginal is equal to µ0 ). Note that Hölder inequality
gives for 1 ≤ p ≤ p0 ,
Wp ≤ Wp 0 .
For any p ≥ 1, if Wp (µn , µ) converges to 0 then µn
µ. This follows for example from
Kantorovich-Rubinstein duality
Z
Z
0
0
W1 (µ, µ ) = sup
f dµ − f dµ : kf kL ≤ 1 .
Corollary 3.7 (Hoffman-Wielandt inequality for ESD). Let A, B in Hn (C), then
r
1
W2 (µA , µB ) ≤
Tr(A − B)2 .
n
Proof. Consider the coupling π of (µA , µB ) defined as
n
1X
π=
δλi (A),λi (B) .
n
i=1
We find
Z
|x − y|2 dπ ≤
R×R
1
Tr(A − B)2 .
n
The left hand side is lower bounded W22 (µA , µB ) by construction (in fact it is even equal).
In the next corollary, we identify C and R2 .
Corollary 3.8 (Continuity of the spectrum). Let f 7→ Rn → R be a Lipchitz function. Then
2
the map F : Rn 7→ R
F : (Xij )1≤i≤j≤n → f (λ1 (X), · · · , λn (X)).
√
is Lispchitz with constant 2kf kL . In particular, if g 7→ R → R is Lipschitz,
Z
√
1
G : (Xij )1≤i≤j≤n → Trg(X/ n) = gdµX/√n
n
√
is Lipschitz with constant 2kgkL /n.
Proof. From Hoffman-Wielandt inequality, we get
2
|F (X) − F (Y )| ≤
kf k22
n
X
|λk (X) − λk (Y )|2 ≤ kf k22 Tr(X − Y )2 .
k=1
Now, construction,
Tr(X − Y )2 =
n
X
i=1
|Xii − Yii |2 + 2
X
|Xij − Yij |2 ≤ 2kX − Y k22 .
i>j
The second statement follows from the first statement, since the map
(x1 , · · · , xn ) →
n
X
xi
i=1
is Lipschitz with constant
√
n (thanks to Cauchy-Schwartz inequality).
3.2. GENERAL WIGNER’S SEMI-CIRCULAR LAW
3.2
19
General Wigner’s semi-circular law
Proposition 3.9. Let Xij be a real Wigner matrix such that X12 has law P and and X11 has
law Q. If Var(X12 ) = 1, then with a.s.
µX/√n
µsc .
For the complex case, see Bai and Silverstein ”Spectral Analysis of Large Dimensional
Random Matrices” §2.1.3
Proof. We define the distance
0
Z
Z
f dµ −
d(µ, µ ) = sup
0
f dµ : kf kL ≤ 1, kf kBV
≤1 ,
so that we can apply both corollary 3.7 and lemma 3.5 to this distance :
!
r
rank(A − B)
1
,
Tr(A − B)2 .
d(µA , µB ) ≤ min
n
n
The distance d is a metric for the weak convergence. Hence, in order to prove the theorem, it
is sufficient to prove that, for any ε > 0, a.s.
lim sup d(µX/√n , µn ) ≤ 3ε,
n
for some sequence of probability measures such that a.s. µn
µsc . We first truncate the
off-diagonal term, let ε > 0. For some τ > 0 we set
Xij 1(|Xij | ≤ τ ) if i =
6 j
(1)
Xij =
Xii
otherwise
Then from Corollary, 3.7,
v
u
n
u1 X
d(µX/√n , µX (1) /√n ) ≤ t 2
|Xij2 |1(|Xij | > τ ).
n
i6=j=1
By taking τ large enough, we have
2
E|X12
|1(|X12 |
Z
∞
> τ) =
x2 dP < ε2 .
τ
From the strong law of large numbers, we obtain. a.s.
lim sup d(µX/√n , µX (1) /√n ) ≤ ε.
n
We may also have chosen τ large enough so that
σ 2 = Var(X12 1(|Xij | ≤ τ ))
satisfies
1 − 1 ≤ ε.
σ
Then we center the off-diagonal term and consider the matrix, with m = EX12 1(|Xij | ≤ τ ),
X (2) = X (1) − mJ.
20
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
Since J has rank 1, from lemma 3.5, we find
d(µX (2) /√n , µX (1) /√n ) ≤
1
.
n
We now treat the diagonal terms.
(
(3)
Xij
=
(2)
if i 6= j
otherwise
Xij
Xii 1(|Xii | ≤ τ )
Then from lemma 3.5, we find
n
d(µX (2) /√n , µX (3) /√n )
1X
≤
1(|Xii | > τ ).
n
i=1
By taking τ large enough, we have
∞
Z
P(|X11 | > τ ) =
dQ < ε.
τ
From the strong law of large numbers, we obtain. a.s.
lim sup d(µX (2) /√n , µX (3) /√n ) ≤ ε.
n
Finally, introduce the matrix
(
(4)
Xij =
(3)
Xij
0
if i 6= j
otherwise
Then, from Corollary, 3.7, we find
v
u
n
u1 X
τ
t
√
√
d(µX (4) / n , µX (3) / n ) ≤
|Xij |2 1(|Xii | ≤ τ ) ≤ √ ,
2
n
n
i=1
which goes to 0. Let m = EX12 . We notice that the matrix
Y = X (4) /σ,
is a Wigner matrix which satisfies the hypothesis of Proposition 2.6. In particular, a.s.,
µ Y /√ n
µsc
while, we certainly have,
d(µX (4) /σ
√
n , µY /
√
1 n ) ≤ 1 − ≤ ε.
σ
This concludes the proof.
3.3
3.3.1
Concentration of measure for random matrices
With Bounded Martingale difference inequality
Theorem 3.10 (Concentration of ESD with independent half-rows). Let M ∈ Hn (C) be an
Hermitian random matrix and for 1 ≤ k ≤ n, define the variables Mk = (Mkj )1≤j≤k ∈ Ck . If
the variables (Mk )1≤k≤n are independent, then for any f : R → R such that kf kBV ≤ 1 and
every t ≥ 0,
Z
Z
nt2
P f dµM − E f dµM ≥ t ≤ 2 exp −
.
8
3.3. CONCENTRATION OF MEASURE FOR RANDOM MATRICES
21
The proof is based on Azuma-Hoeffding’s inequality. Let X1 · · · Xn be metric spaces and let
F be a measurable function on X = X1 × · · · × Xn and P a product measure on X . There is very
powerful tool to bound the deviation of F from its mean when F is Lipschitz for a weighted
Hamming pseudo-distance, i.e. for every x and y in X ,
|F (x) − F (y)| ≤
n
X
ck 1xk 6=yk .
(3.3.1)
k=1
for some c = (c1 , · · · , cn ) ∈ Rn+ . We denote by kyk2 =
qP
2
i yi ,
the usual Euclidean norm.
Theorem 3.11 (Azuma-Hoeffding’s inequality). Let F be as above, then
Z
−t2
.
P F − F dP ≥ t ≤ exp
2kck2 2
This type of result is called a concentration inequality. It has found numerous applications in
mathematics over the last decades. For more on concentration inequalities, we refer to [Led01].
As a corollary, we deduce the Hoeffding’s inequality.
Corollary 3.12 (Hoeffding’s inequality). Let (Xk )1≤k≤n be an independent sequence of real
random variables such that for all integer k, Xk ∈ [ak , bk ]. Then,
!
n
X
−t2
Pn
.
(3.3.2)
P
Xk − EXk ≥ t ≤ exp
2 k=1 (bk − ak )2
k=1
The proof of Theorem 3.11 will be based on a lemma due to Hoeffding.
Lemma 3.13. Let X be real random variable in [a, b] such that EX = 0. Then, for all λ ≥ 0,
EeλX ≤ e
λ2 (b−a)2
8
.
Proof. By the convexity of the exponential,
eλX ≤
b − X λa X − a λb
e +
e .
b−a
b−a
Taking expectation, we obtain, with p = −a/(b − a),
EeλX
b λa
a λb
e −
e
b
−
a
b
−
a
=
1 − p + peλ(b−a) e−pλ(b−a)
≤
= eϕ(λ(b−a)) ,
where ϕ(x) = −px + ln(1 − p + pex ). The derivatives of ϕ are
ϕ0 (x) = −p +
pex
(1 − p)e−x + p
and
ϕ00 (x) =
p(1 − p)
1
2 ≤ 4.
−x
((1 − p)e + p)
Since ϕ(0) = ϕ0 (0) = 0, we deduce from Taylor expansion that
ϕ(x) ≤ ϕ(0) + xϕ0 (0) +
x2 00
x2
kϕ k∞ ≤
.
2
8
22
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
Proof of Theorem 3.11. Let (X1 , · · · , Xn ) be a random variable on X with distribution P . We
shall prove that
−t2
P(F (X1 , · · · , Xn ) − EF (X1 , · · · , Xn ) ≥ t) ≤ exp
.
2kck22
For integer 1 ≤ k ≤ n, let Fk = σ(X1 , · · · , Xk ), Z0 = EF (X1 , · · · , Xn ), Zk = E[F (X1 , · · · , Xn )|Fk ],
Zn = F (X1 , · · · , Xn ). We also define Yk = Zk − Zk−1 , so that E[Yk |Fk−1 ] = 0. Finally,
let (X10 , · · · , Xn0 ) be an independent copy of (X1 , · · · , Xn ). If E0 denote the expectation over
(X10 , · · · , Xn0 ), we have
0
Zk = E0 F (X1 , · · · , Xk , Xk+1
, · · · , Xn0 ).
It follows by (3.3.1)
0
Yk = E0 F (X1 , · · · , Xk , Xk+1
, · · · , Xn0 ) − E0 F (X1 , · · · , Xk−1 , Xk0 , · · · , Xn0 ) ∈ [−ck , ck ].
Since E[Yk |Fk−1 ] = 0, we may apply Lemma 3.13: for every λ ≥ 0,
E[eλYk |Fk−1 ] ≤ e
λ2 c2
k
2
.
This estimates does not depend on Fk−1 , it follows that
Eeλ(Zn −Z0 ) = E[eλ
Pn
k=1
Yk
]≤e
λ2 kck2
2
2
.
From Chernov bound, for every λ ≥ 0,
λ2 kck22
P(F (X1 , · · · , Xn ) − EF (X1 , · · · , Xn ) ≥ t) ≤ exp −λt +
2
.
Optimizing over the choice of λ, we choose λ = t/kck22 .
Proof. For any x = (x1 , . . . , xn ) ∈ X := {(xi )1≤i≤n : xi ∈ Ci−1 × R}, let H(x) be the n × n
Hermitian matrix given by H(x)ij = xij for 1 ≤ j ≤ i ≤ n. We thus have M = H(M1 , . . . , Mn ).
For all x ∈ X and x0i ∈ Ci−1 × R, the matrix
H(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) − H(x1 , . . . , xi−1 , x0i , xi+1 , . . . , xn )
has only the i-th row and column possibly different from 0, and thus
rank H(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) − H(x1 , . . . , xi−1 , x0i , xi+1 , . . . , xn ) ≤ 2.
Therefore from lemma 3.5, we obtain,
Z
Z
f (λ)dµH
− f (λ)dµH(x
(x1 ,...,xi−1 ,xi ,xi+1 ,...,xn )
≤ 2n−1 .
0 ,x
,...,x
,x
,...,x
)
n
1
i−1 i i+1
The desired result follows now from the Azuma–Hoeffding inequality.
3.3.2
With Logarithmic Sobolev inequality
We are now going to derive optimal concentration inequalities. We follow Section 2.3.2 in
[AGZ10].
Definition 3.14 (Logarithmic Sobolev inequality (LSI)). A probability measure P on Rn satisfies LSI with constant c if for any differentiable function f ∈ L2 (P )
Z
Z
f2
2
2
EntP (f ) = f log R 2
dP ≤ c k∇f k22 dP = cEP (∇f ),
f dP
P
where k∇f k22 = ni=1 (∂i f )2 .
3.3. CONCENTRATION OF MEASURE FOR RANDOM MATRICES
23
Recall that the entropy bound which is naturally related to transport inequalities. For
example, the Pinsker’s inequality relates the entropy to the total variation distance between
two probability measures :
s
0
∂µ
0
0
.
dT V (µ, µ ) = sup |µ(A) − µ(A )| ≤ 2Entµ
∂µ
A
The definition of LSI is due to Léonard Gross 1975. It bounds the entropy by an energy. It is
closely related to hypercontractivity in semi-group theory. Refer to Ané et al. [ABC+ 00], to
Ledoux [Led01]. For techniques to prove LSI, see also Villani Chap. 21-22 ”optimal transport,
old and new” and Guionnet and Zegarlinski [GZ03]. With the proper notion energy EP (f ), the
definition of LSI extends in a context more general than Rn .
For us, it will be important that the standard Gaussian on R satisfies LSI(2). More generally,
let V : Rn → R ∪ {∞} such that V (x) − ckxk22 /4 is convex. Then the probability P (dx) =
Z −1 eV (x) dx satisfies LSI(c), it is a consequence of Bakry-Émery criterion, see Bobkov-Ledoux
2000. For the record, on space with two points {0, 1}, setting EP (f ) = P (0)P (1)|f (0) − f (1)|2 ,
the Bernouilli law on {0, 1} with parameter p satisfies LSI(c(p)), with c(1/2) = 2 and
c(p) =
log(q) − log(p)
.
q−p
d
Lemma 3.15 (Properties of LSI).
- Homogeneity : if P satisfies LSI(c) and X = P , then
the law of σX + x satisfies LSI(σ 2 c).
- Tensorization : for i ∈ {1, 2}, if Pi is a probability measure on Rmi which satisfies satisfies
LSI(ci ), then P1 ⊗ P2 satisfies LSI(max(c1 , c2 )).
Proof. Only the tensorization property deserves a proof. We write P (f ) in place of EP (f ). We
decompose the entropy
f2
2
2
EntP1 ⊗P2 f
= P1 ⊗ P2 f log
P1 ⊗ P2 (f 2 )
f2
P1 f 2
2
+
P
= P2 P1 f 2 log
P
(f
)
log
2
1
P1 f 2
P1 ⊗ P2 (f 2 )
= P2 EntP1 f 2 + EntP2 P1 f 2 .
Recall the variational formula for the entropy :
EntP (h) = sup{P (hg) : P (eg ) = 1},
(which follows from the inequality : xy ≤ x log x − x + ey , for any x > 0, y ∈ R which is applied
to x = h/P h and y = g). This yields to, for any function g : Rm2 → R with EP2 eg = 1,
P2 P1 f 2 g = P1 P2 f 2 g ≤ P1 EntP2 f 2 .
Taking the supremum over all g, yields the tensorization inequality for the entropy :
EntP1 ⊗P2 f 2 ≤ P2 EntP1 f 2 + P1 EntP2 f 2 .
We may now apply the LSI(ci ), for i ∈ {1, 2}, and the statement of the lemma is straightforward.
Lemma 3.16 (Herbst’s argument). Assume that P satisfies LSI(c). Let F : Rn → R be
Lipschitz with constant 1. Then for all λ ∈ R,
2
EP eλ(F −EP F ) ≤ ecλ .
and so for any t ≥ 0,
t2
P (|F − EP F | ≥ t) ≤ 2e− 2c .
24
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
Proof. We use Chernoff bound, for λ ∈ R,
P (F − EP F ≥ t) ≤ e−λt EP eλ(F −EP F ) .
Applied λ = t/(2c), F and −F , we deduce that the second statement is a consequence of the
first statement. We start with the case of F continuously differentiable. By homogeneity, we
can assume EP F = 0 and λ > 0. Consider the log-Laplace function
Λ(λ) = log EP e2λF .
Apply the definition of LSI to the function f = eλF . We find
n 2
X
2λF
2λF
2λF
2λEP F e
− EP e
log EP e
≤ cEP
∂i F λeλF .
i=1
Since
Λ(λ)
λ
0
=2
EP F e2λF
log EP e2λ(F )
.
−
λ2
λEP e2λF
It yields to
where L2 = maxx∈Rn k∇F k22 = maxx∈Rn
λ ↓ 0, we find for all λ > 0,
Λ(λ)
λ
Pn
0
≤ cL2
i=1 |∂i F (x)|
2
≤ 1. Since EP F = 0, Λ(λ) = o(λ) as
Λ(λ) ≤ cλ2 .
This proves the first statement for F continuously differentiable. In the general case, refer to
[AGZ10] or [ABC+ 00, lemma 7.3.3].
We can now derive powerful concentration inequalities for random matrices with independent entries. From lemma 3.16 and corollary 3.8, we find the following.
Theorem 3.17 (Concentration of ESD with LSI). Let M ∈ Hn (C) be an Hermitian random
2
matrix and assume that the variable (Mij )1≤i≤j≤n satisfies LSI(c) in Rn . Then for any f :
R → R such that kf kL ≤ 1 and every t ≥ 0,
Z
2 2
Z
n t
√
√
P f dµM/ n − E f dµM/ n ≥ t ≤ 2 exp −
.
4c
As a consequence, we find
Z
√
Var n f dµM/ n = O(kf k2L ).
As we will see, this is the good order of magnitude. The fluctuation of the eigenvalues is thus
much smaller than the fluctuation of independent variables.
We can also derive a result for the concentration of single eigenvalues.
Theorem 3.18 (Concentration of single eigenvalue with LSI). Let M ∈ Hn (C) be an Hermitian
2
random matrix and assume that the random variable (Mij )1≤i≤j≤n satisfies LSI(c) in Rn . Then
for any 1 ≤ k ≤ n and f : R → R such that kf kL ≤ 1 and every t ≥ 0,
M
nt2
M P f (λk ( √ )) − Ef (λk ( √ )) ≥ t ≤ 2 exp −
.
4c
n
n
This result is less appealing, it gives the bound
M
1
Var f (λk ( √ )) = O
.
n
n
However, this time, it is not
enough. For GUE matrices and k = dαne with 0 < α < 1,
good
log n
the variance is of order O n2 .
3.3. CONCENTRATION OF MEASURE FOR RANDOM MATRICES
3.3.3
25
With Talagrand’s concentration inequality
We start by recalling a corollary of Talagrand’s concentration inequality.
Theorem 3.19 (Talagrand’s concentration inequality). Let K be a convex compact subset of
C ' R2 with diameter D = supx,y∈K |x − y|. Consider a convex Lipshitz real-valued function
f defined on K n . Let P = P1 ⊗ · · · ⊗ Pn be a product measure on K n and let MP (f ) be the
median of f under P . Then for any t > 0,
P (|f − MP (f )| ≥ t) ≤ 4e
−
t2
8D 2 kf k2
L
.
For a proof in the case K ⊂ R see Ledoux [Led01] (with constant 4 instead of 8). If EP (f )
is the mean of f under P , with the assumption of the theorem, we find
Z ∞
P (|f − MP (f )| ≥ t)dt
|MP (f ) − EP (f )| =
0
Z
∞
−
t2
8D 2 kf k2
L
e
≤ 2
−∞
r
2
=
Dkf kL .
π
dt
We may then deduce from Talagrand’s theorem 3.19, an inequality with the EP (f ) in place of
MP (f ). In comparison with lemma 3.16, note the extra requirement that f is convex. Hence
in order to apply this result to empirical spectral distribution, the following lemma is useful.
Lemma 3.20 (Klein’s lemma). Suppose that f is a real-valued convex function on R. Then
the function F : X 7→ Trf (X) on Hn (C) is convex.
Proof. We refer to the proof in [AGZ10]. p286
From theorem 3.19, corollary 3.8 and lemma 3.20, we find the following.
Theorem 3.21 (Concentration of ESD with independent bounded entries). Let M ∈ Hn (C)
be an Hermitian random matrix and assume that the variables (Mij )1≤i≤j≤n are independent
variables with support in K ⊂ C ' R2 convex with diameter D. Then for any convex f : R → R,
for every t ≥ 0,
Z
n2 t 2
√
P f dµM/ n − MP (f ) ≥ t ≤ 4 exp −
,
16D2 kf k2L
where MP (f ) is the median under P of
R
f dµM/√n .
The restriction that f is convex can be partially lifted if f has a finite number of inflection
points.
Corollary 3.22. Let M ∈ Hn (C) be an Hermitian random matrix and assume that the variables
(Mij )1≤i≤j≤n are independent variables with support in K ⊂ C ' R2 convex with diameter D.
Then for any C 2 function f : R → R with k inflection points, for every t ≥ 0,
Z
2 t2
n
√
P f dµM/ n − MP (f ) ≥ t ≤ 4(k + 1) exp −
,
16(k + 1)2 D2 kf k2L
where MP (f ) is the median under P of
R
f dµM/√n .
26
CHAPTER 3. PERTURBATION AND CONCENTRATION INEQUALITIES
Proof. The corollary follows Theorem 3.21 together with two facts :
[ t
xi ≥
{x1 + · · · + x` ≥ t} ⊂
;
`
1≤i≤`
and a function f with k inflections points can written as
f=
k+1
X
εi fi
i=1
where εi ∈ {−1, 1}, fi is convex and kfi kL ≤ kf kL . Indeed, this last property can be checked
by recursion on k. Let x1 < · · · < xk be the inflection points of f . Up to considering ±(f (· −
xk ) − f (xk )), we may assume without loss of generality that xk = 0, f (0) = 0 and f 00 (x) > 0
for x > 0 (where it exists). We decompose f as
f (x) = f (x) − f 0 (0)x + f 0 (0)x
= (f (x) − f 0 (0)x)1(x < 0) + f 0 (0)x + (f (x) − f 0 (0)x)1(x ≥ 0)
= g1 (x) + g2 (x).
Notice that g1 and g2 are C 1 functions. Moreover, g2 is convex and, for x ≥ 0, g20 (x) =
f 0 (x) − f 0 (0) ∈ [0, f 0 (x)] so that kg2 kL ≤ kf kL . Similarly, g1 has k − 1 inflection points and,
since g1 (x) = f (x) on (−∞, 0] and g1 (x) = f 0 (0)x on [0, ∞) we find kg1 kL ≤ kf kL .
Chapter 4
A second approach to Wigner
Theorem : the resolvent method
In Section 2.3, we have seen that the even moments of Wigner’s semi-circular law are given by
the Catalan number. The generating function of the Catalan number satisfies a very simple
fixed point equation (2.2.2). This hints that the generating function of moments of ESD of
random matrices could be easier to compute than the actual moments. The resolvent method
formalizes this ideas.
4.1
4.1.1
Cauchy-Stieltjes transform
Definition and properties
Let µ be a finite measure on R. Define its Cauchy-Stieltjes transform as for all z ∈ C+ = {z ∈
C : Im(z) > 0},
Z
1
dµ(λ).
gµ (z) =
λ−z
Note that if µ has bounded support we have
gµ (z) = −
X
z
−k−1
Z
λk dµ(λ).
k≥0
The Cauchy-Stieltjes transform is thus essentially the generating function of the moments of
the measure µ.
Lemma 4.1 (Properties of Cauchy-Stieltjes transform). Let µ be a finite measure on R with
mass µ(R) ≤ 1.
(i) Analytic : the function gµ is an analytic function from C+ → C+ .
(ii) Bounded : for any z ∈ C+ , |gµ (z)| ≤ (Im(z))−1 .
The Cauchy-Stieltjes transform characterizes the measure. More precisely, the following
holds.
Lemma 4.2 (Inversion of Cauchy-Stieltjes transform). Let µ be a finite measure on R.
(i) For any f ∈ C0 (R),
Z
1
f dµ = lim
t↓0 π
Z
f (x)Imgµ (x + it)dx.
(ii) If f = 1I with I is interval and µ(∂I) = 0 the above formula holds.
27
28
CHAPTER 4. RESOLVENT METHOD
(iii) For any x ∈ R,
µ({x}) = lim tImgµ (x + it).
t↓0
(iv) If µ admits a density at x ∈ R, then its density is equal to
1
lim Imgµ (x + it).
t↓0 π
Proof. By linearity, we can assume that µ is probability measure. We have the identity
Z
t
Img(x + it) =
dµ(λ).
(λ − x)2 + t2
Hence π1 Img(x + it) is the equal to density at x of the distribution (µ ∗ Pt ), Pt is a Cauchy
distribution with density
t
Pt (x) =
.
2
π(x + t2 )
In other words,
Z
1
f (x)Imgµ (x + it)dx = Ef (X + tY ),
π
where X has law µ and is independent of Y with distribution P1 . The statements follow
easily.
4.1.2
Cauchy-Stieltjes transform and weak convergence
The convergence of Cauchy-Stieltjes transform is equivalent to the weak convergence.
Corollary 4.3. Let µ and (µn )n≥1 be a sequence of real probability measures. The following
are equivalent
(i) As n → ∞, µn
µ.
(ii) For all z ∈ C+ , as n → ∞, gµn (z) → gµ (z).
(iii) There exists a set D ⊂ C+ with an accumulation point such that for all z ∈ D, as n → ∞,
gµn (z) → gµ (z).
Proof. Statement ”(i) implies (ii)” follows from the definition of weak convergence applied to
the real and imaginary part of f (λ) = (λ − z)−1 . Statement ”(ii) implies (iii)” is trivial.
For statement ”(iii) implies (i)”, from Helly selection theorem, the sequence (µn ) is relatively
compact for the vague convergence. Let ν be such vague limit, it is a finite measure with mass
at most 1. By assumption, for any z ∈ D, gµ (z) = gν (z). Two analytic functions equal on a set
with an accumulation point are equal on their domain (principle of analytic extension). Hence,
by lemma 4.1, for any z ∈ C+ , gµ (z) = gν (z). By lemma 4.2, we deduce that ν is a probability
measure and ν = µ.
4.1.3
Cauchy-Stieltjes transform of the semi-circular law
The Cauchy-Stieltjes semi-circular distribution µsc satisfies the fixed point for all z ∈ C+ ,
gµsc (z) = −
Let z 7→
We find
√
1
z + gµ (z)
or
gµsc (z)2 + zgµsc (z) + 1 = 0.
z be the analytical continuation of x 7→
√
(4.1.1)
x on C\R− with a positive imaginary part.
√
−1 + 1 − 4z
gµsc (z) =
.
2
4.2. RESOLVENT
4.2
4.2.1
29
Resolvent
Spectral measure at a vector
Let A ∈ Mn (C) and φ ∈ Cn be a vector with unit `2 -norm, kφk2 = 1.
The spectral theorem guarantees the existence of (v1 , · · · , vn ), an orthonormal basis of Cn
of eigenvectors of A, that is, for any 1 ≤ i ≤ n, Avi = λi (A)vi . The spectral measure with
vector φ is the real probability measure defined by
µφA =
n
X
|hvk , φi|2 δλk (A) .
(4.2.1)
k=1
It may also be defined as the unique probability measure µφA such that
Z
λk dµφA (λ) = hφ, Ak φi
for any integer k ≥ 1.
(4.2.2)
If φ = ( √1n , · · · , √1n ), we have the identity µA = µφA . Also, if (e1 , · · · , en ) is the canonical
basis of Cn , summing (4.2.1), we find, with φi = hφ, ei i,
µφA =
n
X
n
|φi |2 µeAi
and
µA =
i=1
4.2.2
1 X ei
µA .
n
(4.2.3)
i=1
Resolvent matrix
If A ∈ Hn (C) and z ∈ C+ = {z ∈ C : Im(z) > 0}, then A − zI is invertible. We define the
resolvent of A as the function R : C+ 7→ Mn (C),
R(z) = (A − zI)−1 .
We have the identity
Z
hφ, R(z)φi =
1
dµφ (λ) = gµφ (z),
A
λ−z A
(4.2.4)
where µφA is the spectral measure with vector φ. Also,
gµA (z) =
1
Tr(R(z)).
n
Lemma 4.4 (Properties of the resolvent matrix). Let A ∈ Hn (C) and R(z) = (A − zI)−1 be
its resolvent. For any z ∈ C+ , 1 ≤ i, j ≤ n,
(i) Analytic : z 7→ R(z)ij is an analytic function on C+ → C.
(ii) Bounded : kR(z)k2→2 ≤ Im(z)−1 .
(iii) Normal : R(z)R(z)∗ = R(z)∗ R(z).
Proof. All properties come from the decomposition
R(z) =
n
X
k=1
1
vk vk∗ ,
λk (A) − z
where vk is an orthogonal basis of eigenvectors of A.
30
4.2.3
CHAPTER 4. RESOLVENT METHOD
Perturbation inequalities of resolvent
Rank inequalities
Lemma 4.5 (Rank inequality for resolvent). Let A, B in Hn (C). Then, if z ∈ C+ , RA (z) =
(A − zIn )−1 and RB (z) = (B − zIn )−1 ,
n
X
|RA (z)kk − RB (z)kk | ≤
k=1
2rank(A − B)
.
Imz
Proof.
There are variants of the above interlacing inequality. The following can be proved as the
above lemma.
Corollary 4.6 (Rank inequality for resolvent of minor). Let A ∈ Hn and B be a minor of A
where a subset Λ of rows and column have been removed. Then, if z ∈ C+ , RA (z) = (A−zIn )−1
and RB (z) = (B − zIn−1 )−1 ,
X
|RA (z)kk − RB (z)kk | ≤
k∈{1,··· ,n}\Λ
2|Λ|
.
Imz
(B is seen as a linear map on H = span(ek : k ∈ {1, · · · , n}\Λ)).
Proof. Consider the matrix B 0 ∈ Mn (C) obtained from B, by setting B 0 ek = 0 for k ∈ Λ and
−1
0
B 0 ek = Bek otherwise. We obviously have (B − zIn )−1
/ Λ. Moreover by
kk = (B − zIn )kk for k ∈
0
construction rank(B − A) ≤ |Λ|. We may then apply lemma 4.5.
Norm inequalities
The reader may not be surprised that in the above arguments we have used rank inequalities. Using Hoffman-Wielandt inequality, similar inequalities can be derived in term of norm
inequalities.
4.2.4
Resolvent complement formula
The Schur complement is simply a block inversion by part of an invertible matrix.
Lemma 4.7 (Schur’s complement formula). Let A ∈ Mn (C) be an invertible matrix. Set
A11 A12
A=
A21 A22
and
A
−1
=
B11 B12
B21 B22
where A11 , B11 ∈ Mp (C). Then, if A22 and B11 are invertible, we have
−1
B11 = (A11 − A12 A−1
22 A21 ) .
Corollary 4.8 (Resolvent complement formula). Let n ≥ 2, A = (aij )1≤i,j≤n ∈ Hn (C), z ∈ C+
and R = (A − zIn )−1 . For any 1 ≤ i ≤ n,
Rii = −(z − aii + hai , Ri ai i)−1 ,
where ai = (aij )j6=i ∈ Cn−1 , Ri = (Ai − zIn−1 )−1 and Ai ∈ Hn−1 (C) is the principal minor of
A where the i-th row and column have been removed.
4.3. RESOLVENT METHOD FOR RANDOM MATRICES
4.3
31
Resolvent method for random matrices
In this section, we will present on an exemple the resolvent method for random matrix. This
method will be based on two components : a probabilistic component, the concentration of
bilinear forms and a linear algebra component, the Schur complement formula.
4.3.1
Concentration for bilinear forms
Lemma 4.9 (Variance of Bilinear form of independent vectors). Let A ∈ Mn (C) and X =
(X1 , · · · , Xn ) ∈ Cn be a vector of centered and independent variables with E|Xi |2 = 1 and
Var(|Xi |2 ) ≤ κ for 1 ≤ i ≤ n. Then
EhX, AXi = TrA
and
VarhX, AXi ≤ 2TrAA∗ + κ
n
X
|Aii |2 .
i=1
Proof. We have
X
hX, AXi =
X̄i Aij Xj .
1≤i,j≤n
This yields to
EhX, AXi =
n
X
E|Xi |2 Aii = TrA
i=1
and
VarhX, AXi =
X
Ai1 j1 Āi2 j2 EX̄i1 Xj2 Xi2 X̄j2 −
X
i1 ,j1 ,i2 ,j2
E|Xi |2 E|Xj |2 Aii Ājj .
i,j
The first sum is non zero only if (i1 , i2 ) = (j1 , j2 ), (i1 , j1 ) = (i2 , j2 ) or (i1 , j1 ) = (j2 , i2 ). We get
X
X
X
VarhX, AXi = κ
|Aii |2 +
|Aij |2 E|Xi |2 E|Xj |2 +
Aij Āji EX̄i2 EXj2
≤κ
i
n
X
i6=j
|Aii |2 +
i=1
Tr(AA∗ ),
The second term is equal
Cauchy-Schwarz inequality.
X
1≤i,j≤n
i6=j
X
|Aij |2 +
|Aij ||Aji |
1≤i,j≤n
while the third term is upper bounded by Tr(AA∗ ) from
With more moments assumption, it is of course possible to strengthen lemma 4.9. For
example, for entries with sub-gaussian tail, this is topic of the Hanson-Wright theorem.
4.4
Limit empirical spectral distribution of band matrices
In this section, we show the resolvent method at work with the band matrices. For each integer
n ≥ 1, we assume that (Xij )1≤i≤j are independent real centered variables with variance 1. We
set, for 1 ≤ i ≤ j ≤ n, Yji = Yij and
Z
σ(x, y)
Yij = Xij
dxdy,
Qij |Qij |
where Qij = [(i − 1)/n, i/n] × [(j − 1)/n, j/n] and σ : [0, 1]2 → [0, 1] is a measurable function
such that
σ(x, y) = σ(y, x).
We consider the Hermitian matrix
Yn = (Yij )1≤i,j≤n .
We assume that all these matrices are defined on a common probability space.
32
CHAPTER 4. RESOLVENT METHOD
Theorem 4.10 (ESD of matrices with variance profile). There exists a probability measure µσ
depending on σ such that a.s.
µ Y /√ n
µσ .
The Cauchy-Stieltjes transform gµσ of µ is given by the formula
Z 1
g(x, z)dx,
gµσ (z) =
0
where the [0, 1] × C+ → C+ map, g : (x, z) 7→ g(x, z) satisfies : for a.a. x ∈ [0, 1], z 7→ g(x, z)
analytic on C+ and for each z ∈ C+ with Im(z) > 1, x 7→ g(x, z) is the unique function in
L1 ([0, 1]; C+ ) solution of the equation, for a.a. x ∈ [0, 1],
Z
g(x, z) = − z +
1
σ 2 (x, y)g(y, z)dy
−1
.
(4.4.1)
0
Note that by analyticity, for 0 < Im(z) ≤ 1, x 7→ g(x, z) is also a solution of (4.4.1) (which
may however a priori not be unique in L1 ([0, 1]; C+ )). A typical application of theorem 4.10 is
the following.
R1
Corollary 4.11 (ESD of band matrices). Assume that for a.a. x ∈ [0, 1], 0 σ 2 (x, y)dx = b
for some b ∈ (0, 1]. Then
µY /√bn
µsc .
The above corollary follows from noticing that the semi-circular law satisfies the fixed point
equation (4.1.1) and the unicity statement in theorem 4.10. Using Hoffman-Wielandt inequality
lemma 3.6, it may be used to prove that a.s.
µZ/√bn
µsc ,
where Zij = Xij 1(max(|i − j|, |n − i − j|) ≤ bn). In fact, from the proof of theorem 4.10, it can
be seen that this last result holds for b = b(n) → 0, as soon as b(n)n → ∞ (see exercice 4.13).
Let 1 ≤ p ≤ n and define the matrix in Mp,n (R),
Xn = (Xij )1≤i≤p,1≤j≤n .
Corollary 4.12 (Marcenko-Pastur law). Assume p(n)/n → c ∈ (0, 1] then a.s
µXX ∗ /n
where, b− = (1 −
√
c)2 , b+ = (1 +
√
µMP (dx) =
µMP .
c)2 and
1 p
(x − b− )(b+ − x)1b− ≤x≤b+ dx
2πx
Note that from the above corollary, it is also possible to deal with case c > 1. It suffices to
reverse the role of p and n and notice that, for 1 ≤ p ≤ n
nµX ∗ X = pµXX ∗ + (n − p)δ0 .
Proof of corollary 4.12. Consider the block matrix in Hn+p (C),
0n X ∗
Z=
.
X 0p
(4.4.2)
If 0 ≤ λ1 ≤ · · · ≤ λp are the eigenvalues of XX ∗ with λ1 = · · · = λm = 0, λm+1 > 0, then the
non-zero eigenvalues of Z are
√
√
± λm+1 , · · · , ± λp .
4.4. LIMIT EMPIRICAL SPECTRAL DISTRIBUTION OF BAND MATRICES
33
In particular, if µZ/√n+p converges weakly toward a limit measure ν with
ν=
1−c
2c
δ0 +
ν̂,
1+c
1+c
where ν̂ is a symmetric probability measure on R with density f , then µXX ∗ /n converges weakly
to µ with density on (0, ∞) given by
√
f ( x)
dµ(x) = √ dx.
x
Now, coming back to (4.4.2), we introduce the [0, 1]2 → [0, 1] function
σ(x, y) = 1 0 < x <
1
1
1
1
1
.
<y <1 +1
<x<1 1 0<y<
1+c
1+c
1+c
1+c
Note that σ(x, y) = σ(y, x) and we may consider the associated matrix Y as in theorem 4.10.
It is an exercise to compute explicitly µσ in this case. We may then use Hoffman-Wielandt
inequality lemma 3.6 to check that µZ/√n+p and µY /√n+p are asymptotically close.
4.4.1
Proof of theorem 4.10
The proof of theorem 4.10 is a typical instance of the resolvent method. In the first step of the
proof, we check tightness of µY /√n and that for each z ∈ C+ , a.s.
n
1X
Rii (z) →
n
i=1
Z
1
g(x, z)dx,
(4.4.3)
0
where, for each z ∈ C+ , g : x 7→ g(x, z) is a fixed point of the L1 ([0, 1]; C+ ) → L1 ([0, 1]; C+ )
map
−1
Z
2
Fz,σ (g)(x) = − z + σ (x, y)g(y)dy
.
Observe that (4.4.3) implies that, if D is countable dense set in C+ , a.s. for all z ∈ D, (4.4.3)
holds. Then, in a second step, we prove the uniqueness of the solution of Fz,σ (g) = g for Im(z)
large enough. Now, consider a converging subsequence of (µY /√n )n≥1 to µ. Invoking (4.2.4)
R1
and corollary 4.3, we conclude that a.s. for z ∈ C+ , gµ (z) = 0 g(x, z)dx . By the unicity of
the limit and a new application of corollary 4.3, it will conclude the proof of theorem 4.10. In
the sequel, the parameter z ∈ C+ is fixed and, for ease of notation, we will often omit it.
Tightness
We write
Z
λ2 dµY /√n =
1
1 X
TrY Y ∗ ≤ 2
|Xij |2 .
2
n
n
i,j
From the law of large numbers, we deduce that a.s.
Z
lim sup
n→∞
λ2 dµY /√n ≤ 1.
Hence, a.s., the sequence of probability measures (µY /√n )n≥1 is tight.
34
CHAPTER 4. RESOLVENT METHOD
Concentration and truncation
Arguing as in proposition 3.9, it is sufficient to prove theorem 4.10 when Xij are bounded by
say K. Moreover, by concentration lemma 3.17, it is sufficient to prove that
n
1X
E
Rii →
n
1
Z
g(x, z)dx,
0
i=1
and (4.4.3) will follow.
Approximation of σ
Let (Pk` )1≤k,`≤L be the usual partition of [0, 1]2 into squares of size 1/L2 . Define
X
ρ=
ρk` 1Pk` ,
1≤k,`≤L
where ρk` = L2
R
Pk`
σ(x, y)dxdy. We define the matrix Z by
X
Zij = Xij
ρk` 1Pk`
1≤k,`≤L
i j
,
.
n n
(4.4.4)
We have
1
Tr(Z−Y )2 ≤ K 2
n2
Z
2
2
|σ (x, y)−ρ (x, y)|dxdy+K
[0,1]2
2
1
|σ (x, y) −σ̂ (x, y)|dxdy+O 2 ,
n
[0,1]2
Z
2
2
2
where
X
σ̂ =
σij 1Qij ,
1≤i,j≤n
R
and σij = n2 Qij σ(x, y)dxdy. From Lebesgue theorem, for a.a (x, y), σ(x, y) − σ̂(x, y) → 0 and
as L → ∞, σ(x, y) − ρ(x, y) → 0. Hence, by dominated convergence we deduce that
kσ 2 − σ̂ 2 k1 →n→∞ 0
and
kσ 2 − ρ2 k1 →L→∞ 0
In particular,
lim sup
n→∞
1
Tr(Z − Y )2 ≤ ε(L),
n2
for some function ε going to 0 as L goes to infinity.
Approximate fixed point equation
Consider the matrix Z given by (4.4.4). The objective is to prove that the resolvent of Z
satisfies nearly a fixed point equation. To this end, we use Schur complement formula, corollary
4.8,
−1
Zii
1
(i)
Rii = − z − √ + hZi , R Zi i
,
n n
where Zi = (Zij )j6=i and R(i) = (Z (i) − zI)−1 is the resolvent of the minor matrix Z (i) obtained
from Z where the i-th row and column have been removed.
Notice that if z, w, w0 ∈ C+ , then
1
1 |w − w0 |
−
(4.4.5)
z + w z + w0 ≤ Im(z)2 .
4.4. LIMIT EMPIRICAL SPECTRAL DISTRIBUTION OF BAND MATRICES
(i)
Since hZi , R(i) Zi i ∈ C+ , Rjj ∈ C+ and E|Zij |2 = ρ ni , nj , we find
35

−1 

X
X
i j
1  |Zii | 1 (i)  (i) 
(i)
2
Rii + z + 1
Rjj
ρ
,
(E|Zij | )Rjj .
≤ Im(z)2 √n + n hZi , R Zi i −
n
n n
j6=i
j6=i
Now, by construction, the vector (Zij )j is independent of R(i) . We condition on R(i) and use
lemma 4.9, we deduce that in L2 (P),
−1
X
1
i j
(i)
Rjj  → 0.
Rii + z +
ρ
,
n
n n

j6=i
(i)
We define Nk = {1 ≤ i ≤ n : (k − 1)/L < i/n ≤ k/L} and Nk = Nk \{i}. So that |Nk |/n →
1/L. If i ∈ Nk , it yields to, in L2 (P),
L
1 X 2 (i)
z+
ρk` G`
L
Rii +
!−1
→ 0,
`=1
where we have defined
(i)
Gk =
1 X (i)
Rjj .
|Nk |
(i)
j∈Nk
Now, from finite rank inequality
(i)
|Gk − Gk | ≤
2
,
Im(z)|Nk |
and the proof of concentration lemma 3.17 gives that
n
L2
2
E|Gk − EGk | = O
=O
.
Im(z)2 |Nk |2
Im(z)2 n
It conclusion, using again (4.4.5), for any 1 ≤ k ≤ L,
L
EGk +
1X 2
z+
ρk` EG`
L
!−1
→ 0,
(4.4.6)
`=1
and
E
L
n
k=1
i=1
1X
1X
Gk = E
Rii .
L
n
Unicity of fixed point equation
Consider the function Ḡ : [0, 1] → C+ given
L
1X
Ḡ(x) =
1 k−1 <x≤ k EGk .
L
L
L
k=1
Consider an accumulation point of the vector (EG1 , · · · , EGL ), say (g1 , · · · , gL ). Then Ḡ converges in L∞ -norm to
L
1X
gρ (x) =
1 k−1 <x≤ k gk .
L
L
L
k=1
36
CHAPTER 4. RESOLVENT METHOD
By (4.4.6), gρ satisfies the fixed point equation, for all x ∈ [0, 1],
g = Fz,ρ (g),
with
−1
Z
2
Fz,ρ (g)(x) = − z + ρ(x, y) g(y)dy
.
If g, h ∈ L1 ([0, 1]; C+ ), we find,
R
|Fz,ρ (g)(x) − Fz,ρ (h)(x)| ≤
ρ2 (x, y)|g(y) − h(y)|dy
kg − hk1
≤
,
2
Im(z)
Im(z)2
where we have used again (4.4.5) and ρ(x, y) ≤ 1. In particular
kFz,ρ (g) − Fz,ρ (h)k1 ≤
kg − hk1
.
Im(z)2
Hence for Im(z) > 1, Fz,ρ is a contraction on the Banach space L1 ([0, 1]; C+ ). Hence there is a
unique solution of the fixed point
g = Fz,ρ (g).
The same argument works for the functions σ and its associated map Fz,ρ . Now similarly, we
have, if g ∈ L∞ ([0, 1]; C+ ),
R 2
|ρ (x, y) − σ 2 (x, y)||g(y)|dy
kgk∞ kσ 2 − ρ2 k1
kFz,σ (g) − Fz,ρ (g)k1 ≤
≤
,
Im(z)2
Im(z)2
In particular, if gσ is the unique fixed point g = Fz,σ (g), since kgρ k∞ ≤ 1/Im(z), we deduce
kgσ − gρ k1 = kFz,σ (gσ ) − Fz,ρ (gρ )k1 ≤ kFz,σ (gσ ) − Fz,σ (gρ )k1 + kFz,σ (gρ ) − Fz,ρ gρ k1
≤
kgσ − gρ k1 kσ 2 − ρ2 k1
+
,
Im(z)2
Im(z)3
This gives
kgσ − gρ k1 ≤
kσ 2 − ρ2 k1
,
Im(z)2 (Im(z) − 1)
As already pointed, kρ2 − σ 2 k1 → 0 as L → ∞.
End of proof
In summary, we have proved the following, fix ε > 0, z ∈ C+ with Im(z) > 1. We have, a.s.,
for all n large enough,
|gµY /√n (z) − EgµY /√n (z)| ≤ ε.
We may fix L large enough so that
|gµρ (z) − gµσ (z)| ≤ ε.
Then, for all n large enough,
|EgµZ/√n (z) − EgµY /√n (z)| ≤ ε,
and
|EgµZ/√n (z) − gµρ (z)| ≤ ε.
This concludes the proof of theorem 4.10.
4.4. LIMIT EMPIRICAL SPECTRAL DISTRIBUTION OF BAND MATRICES
37
Exercice 4.13 (Band matrices). Consider the Hermitian matrix Z = (Zij )1≤i,j≤n with Zij =
Xij 1(|i − j| ≤ bn), with b(n) → 0 and nb(n) → ∞. Using the resolvent method, prove that a.s.
µZ/√nb
µsc .
Exercice 4.14 (Adjacency matrix of Erdős-Rényi graphs). Consider the adjacency matrix A of
G(n, c/n) with 0 ≤ c(n) ≤ n and c(n)n → ∞. Namely (Aij )1≤i<j≤n are i.i.d. {0, 1} Bernoulli
random variables with mean c/n. Using the resolvent method, prove that a.s. µA/√nc
µsc .
38
CHAPTER 4. RESOLVENT METHOD
Chapter 5
Resolvent method with matrix
differentiation formulas
5.1
Matrix differentiation formulas
2
We identify Hn (C) with Rn . Then, if Φ : Hn (C) 7→ C is a continuously differentiable function,
0 Φ(X)
we define ∂jk Φ(X) as the derivative with respect to Re(Xjk ), and for 1 ≤ j 6= k ≤ n, ∂jk
as the derivative with respect to Im(Xjk ).
Define the resolvent RA = (A − z)−1 , z ∈ C+ . From the resolvent formula
RA − RB = RA (B − A)RB ,
(5.1.1)
valid for any matrices A, B ∈ Hn (C). A simple computation shows that if 1 ≤ j, k ≤ n, and
1 ≤ a 6= b ≤ n, then
∂ab Rjk = −(Rja Rbk + Rjb Rak )
and
0
Rjk = −i(Rja Rbk − Rjb Rak ),
∂ab
while if 1 ≤ a ≤ n, then
∂aa Rjk = −Rja Rak .
5.2
Gaussian differentiation formulas
We consider a Wigner matrix X = (Xij )1≤i,j≤n . We assume that
(A1) (Re(X12 ), Im(X12 )) is a centered Gaussian vector in R2 with covariance K ∈ H2 (R),
Tr(K) = 1.
(A2) X11 is a centered Gaussian in R with variance σ 2 .
We recall the Gaussian integration by part formula.
Lemma 5.1. Let G be a centered Gaussian vector in Rp with covariance matrix Σ = EGG∗ .
For any continuously differentiable function F : Rp 7→ R, with EkgradF (G)k2 < ∞,
EF (G)G = ΣEgradF (G).
(5.2.1)
The use of Gaussian integration by part in random matrix theory was initiated by Khorunzhy, Khoruzhenko and Pastur [KKP96].
√
Set M = X/ n, so that
√
R = (X/ n − z)−1 .
39
40
CHAPTER 5. RESOLVENT DIFFERENTIATION
Using (5.2.1) we get, for 0 ≤ a 6= b ≤ n, and all j, k:
1 0
0
ERjk Xab = √ E K11 ∂ab Rjk + K12 ∂ab
Rjk + iK21 ∂ab Rjk + iK22 ∂ab
Rjk
n
1
= − √ E [(K11 − K22 + iK12 + iK21 )Rja Rbk + (K11 + K22 − iK12 + iK21 )Rjb Rak ]
n
1
= − √ E(γRja Rbk + Rjb Rak ),
(5.2.2)
n
where at the last line, we have used the symmetry of K and Tr(K) = 1, together with the
notation
2
γ = K11 − K22 + 2iK12 = EX12
.
Notice that |γ| ≤ 1. Similarly, for a = b one has
σ2
ERjk Xaa = − √ ERja Rak .
n
(5.2.3)
For further use, we also set
κ = σ 2 − 1 − γ.
In the GUE and GOE case κ = 0 while γ is equal respectively to 0 and 1.
Notice that in this case the dependency on z is explicit in our notation.
X
X
1
−zR = R. √ − zI − √
= I − √ RX.
n
n
n
Hence, for 1 ≤ j, k ≤ n, using (5.2.2)-(5.2.3),
1 X
E[Rja Xak ]
−zERjk = δjk − √
n
1≤a≤n
1 X
γ
= δjk +
E[Rjk Raa ] +
n
n
1≤a≤n
X
E[Rja Rka ] +
1≤a6=k≤n
(σ 2 − 1)
E[Rjk Rkk ].
n
We set
1
Tr(R), g = Eg, g = g − Eg,
n
and consider the diagonal matrix D with Djk = 1j=k Rjk . We find
g = gµX/√n (z) =
−zER = I + E[gR] +
1
E[R(κD + γR> )].
n
Subtracting gR one has
−gER − zER = I + EgR +
1
ER(κD + γR> ).
n
Finally, multiplying by − n1 and taking the trace,
g 2 + zg + 1 = −Egg −
1
1
ETr[R(κD + γR> )] = −Eg 2 − 2 ETr[R(κD + γR> )].
2
n
n
(5.2.4)
As a function of the entries of X, g has Lipschitz constant O(n−1 Im(z)−2 ). This fact follows
from Corollary 3.8 applied to f (x) = 1/(x − z) . Since the entries of X satisfy a Poincaré
inequality, a standard concentration bound implies
E|g|2 = O(n−2 Im(z)−4 ).
(5.2.5)
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
41
Also, since |Tr(AB)| ≤ nkAkkBk, we find
TrR(κD + γR> ) = O(nIm(z)−2 ).
We deduce
Eg 2 = O(n−2 Im(z)−4 )
1
ETr[R(κD + γR> )] = O(n−1 Im(z)−2 ).
n2
and
We thus have proved that
g 2 + zg + 1 = Oz (n−1 ).
(5.2.6)
Lemma 5.2. Let δ ∈ C and z ∈ C+ . If x ∈ C+ satisfies x2 + zx + 1 = δ, then,
|x − gsc (z)| ≤
|δ|
.
Im(z)
2 (z) + zg (z) + 1 = 0. It follows that
Proof. Recall that gsc
sc
gsc (z) +
Hence
δ=
z 2
z2
= −1 +
2
4
and
x+
z 2
z2
= −1 +
+ δ.
2
4
z 2 z 2
x+
− gsc (z) +
= (x − gsc (z))(x + gsc (z) + z).
2
2
It yields,
|x − gsc (z)| =
|δ|
.
|x + gsc (z) + z|
Since x, gsc (z) ∈ C+ , |x + gsc (z) + z| ≥ Im(x + gsc (z) + z) ≥ Im(z).
From (5.2.6) and lemma 5.2, we deduce a new proof of Wigner’s semi-circular law for
Gaussian Wigner matrices.
5.2.1
Convergence of edge eigenvalues for Gaussian matrices
We pursue the analysis of Gaussian Wigner matrices to the study of extremal eigenvalues of
√
X/ n.
Our aim is to prove
Theorem 5.3. Let X = X(n) be a Gaussian Wigner matrix satisfying assumptions (A). We
have a.s.
X
X
lim λ1 √
= − lim λn √
= 2.
n→∞
n→∞
n
n
A proof of this result can be proved by using the moment method, see Füredi and Komlós
[FK81]. Here, we will instead use the resolvent differentiation formulas. In the GUE case,
this approach was initiated by Haagerup and Thorbjørnsen [HT05] and developed notably in
Schultz [Sch05], Capitaine and Donati-Martin [CDM07].
√
For simplicity, we set λk = λk (X/ n). Note that X and −X have the same law. Hence, by
symmetry, we may restrict to λ1 . Note also that Wigner semi-circular law implies that a.s.
lim inf λ1 ≥ 2.
n→∞
We start with a weak bound on the norm of random matrices.
42
CHAPTER 5. RESOLVENT DIFFERENTIATION
Lemma 5.4. Let X = (Xij )1≤i,j≤n ∈ Hn (C) be a Wigner matrix with EXij = 0, E|Xij |2 = 1
and E|Xij |4 ≤ κ. Then, there exists c > 0 depending only on κ such that for any n ≥ 1,
p
EkXk ≤ c n log n.
∗
∗
∗
Proof. For i < j, define Yij = XP
ij (ei ej + ej ei ) and Yii = Xii ei ei . The matrices (Yij )i≤j are
independent, centered and X = i≤j Yij . By symmetrization, if (εij )i≤j is i.i.d. symmetric
Bernoulli variables independent of X,
X
εij Yij k.
EkXk ≤ 2Ek
i≤j
From Rudelson’s inequality (see Oliveira [Oli10] for a short proof), for some constant C > 0,
1/2

X
X
p
εij Yij k ≤ C log nEk
Ek
Yij2 k .
i≤j
i≤j
P
Pn
2
Observe that i≤j Yij2 is a diagonal matrix whose entry (i, i) is equal to Si =
j=1 |Xij | .
From Jensen’s inequality, we get
1/2
X
p
Ek
εij Yij k ≤ C log n E max Si
.
i
i≤j
P
By assumption, E( j (|Xij |2 −1))2 ≤ κn. Hence, from Markov inequality and the union bound,
P(maxi Si ≥ n + t) ≤ κn2 t−2 . So finally,
Z ∞
√
E max Si ≤ n +
1 ∧ (κn2 t−2 )dt = (1 + 2 κ)n.
i
0
It concludes the proof.
By Lemma 3.18 and Borel-Cantelli Lemma, a.s.
lim |λ1 − Eλ1 | = 0.
n→∞
Hence, in view of Lemma (5.4), it is thus sufficient to prove that, for any ε > 0, a.s. for all
n 1,
p
1(λ1 ∈ [2(1 + ε), 2c log n]) = 0.
To this end, we fix ε > 0 and set
p
K = 2c log n
and
∆ = [2 + 2ε, K].
(5.2.7)
Consider a smooth function ϕ : R 7→ [0, 1] with support [2 + ε, 2K] such that ϕ(x) = 1 on
[2 + 2ε, K]. By Lemma 3.18 and Borel-Cantelli Lemma, a.s.
lim |ϕ(λ1 ) − Eϕ(λ1 )| = 0.
n→∞
Assume that we manage to prove that
n
1X
E
ϕ(λk ) = E
n
k=1
Z
ϕdµX/√n ≤
1
.
2n
Then, using 1(λ ∈ ∆) ≤ ϕ(λ), we would deduce, that
Z
1
P(λ1 ∈ ∆) ≤ Eϕ(λ1 ) ≤ nE ϕdµX/√n ≤ .
2
(5.2.8)
(5.2.9)
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
43
And it would yields to a.s. for n 1,
1(λ1 ∈ ∆) ≤ 1/3.
Hence, the indicator function is equal to 0 and λ1 ∈ ∆. It follows that if (6.5.2) holds for any
ε > 0, our Theorem 5.3 is proved.
R
The first step of proof is to relate ϕdµ to an integral over gµ (z). For C 1 functions f :
¯ (z) = ∂x f (z) + i∂y f (z), where z = x + iy. In the next lemma χ : R → R is a
C 7→ C, we set ∂f
compactly supported smooth function such that χ(y) = 1 in a neighbourhood of 0.
Lemma 5.5. Let k ≥ 1 be an integer and ϕ : R 7→ R a compactly supported C k+1 -function,
then for any µ ∈ P(R),
Z
Z
1
¯
ϕdµ = Re
∂Φ(z)g
µ (z)dz,
π
C+
`
P
(`)
where Φ(x + iy) = k`=0 (iy)
`! ϕ (x)χ(y).
Proof. Observe that in a neighbourhood of R,
¯
∂Φ(x
+ iy) = ϕ(k+1) (x)(iy)k /k!.
(5.2.10)
¯
It follows that for any λ ∈ R, ∂Φ(z)/(z
− λ) is integrable. Now, from Fubini’s Theorem, it
suffices to check this for µ = δ0 :
Z
¯
∂Φ(z)
1
dz,
ϕ(0) = Φ(0) = − Re
π
z
C+
If z = x + iy, we have
¯
∂Φ(z)
x∂x Φ(z) y∂y Φ(z)
1
Re
= 2
+ 2
= ∂r Φ(z),
z
x + y2
x + y2
r
where r = |z| and ∂r is the radial derivative in polar coordinates. We thus find
Z
Z ∞Z π
¯
∂Φ(z)
Re
dz =
∂r Φ(reiθ )drdθ = −πΦ(0).
z
C+
0
0
GUE case :
we can now prove Theorem 5.3 for GUE matrices. Indeed, in this case, γ = 0
and κ = 0. In particular, from Equations (5.2.4) and (5.2.5), we find
g 2 + zg + 1 = O(n−2 Im(z)−4 ).
From Lemma 5.2 we find
g − gsc = δ(z) = O(n−2 Im(z)−5 ).
We may apply Lemma 5.5 with k = 6 to our smooth function ϕ with support [2 + ε, 2K]. We
find
Z
Z ∞Z ∞
1
√
¯
ReE
E ϕdµX/ n =
∂Φ(x
+ iy)g(x + iy)dydx
π
2+ε 0
Z ∞Z ∞
1
¯
=
ReE
∂Φ(x
+ iy)(gsc (x + iy) + δ(x, y))dydx
π
2+ε 0
Z
Z 2K Z ∞
1
¯
=
ϕdµsc + ReE
∂Φ(x
+ iy)δ(x, y)dydx
π
2+ε 0
44
Now, the support of µsc
CHAPTER 5. RESOLVENT DIFFERENTIATION
R
is [−2, 2]. Hence ϕdµsc = 0. Also, from (5.2.10)
¯
∂Φ(x
+ iy)δ(x, y) = O(n−2 ).
and is compactly supported from (5.2.7) and the definition of ϕ. We thus have proved that
Z
p
E ϕdµX/√n = O(n−2 log n).
It concludes the proof of (6.5.2) in the GUE case.
General Gaussian case :
the above argument cannot work directly if γ 6= 0 or κ 6= 0.
Indeed, Equation (5.2.4) gives only
g 2 + zg + 1 = O(n−1 Im(z)−2 + n−2 Im(z)−4 ).
and from Lemma 5.2,
g = gsc + O(n−1 (Im(z)−5 ∧ 1)).
(5.2.11)
We thus have to study more precisely
1
Tr(R(κD + γR> )).
n2
We first need a lemma
Lemma 5.6. For any ε > 0, we have
inf {|gsc (z)| : z ∈ C, d(z, [−2, 2]) ≥ ε} < 1,
where d(z, A) = inf{|w − z| : w ∈ A}.
√
Proof. Let r(z) = |gsc (z)| and t = ε/ 2. If d(z, [−2, 2]) ≥ ε then either |Im(z)| ≥ t or
|Re(z)| ≥ 2 + t. We first assume that Im(z) ≥ t. Note that if z = E + iη and ξ has distribution
µsc , then r(z) = E|(ξ − E) − iη|−1 . By symmetry and monotony, r(z) ≥ r(iη) ≥ r(it). We find
Z 2 √
Z 2/t p
Z ∞
4 − (tx)2
1
t 4 − x2
1
1
2
r(it) =
dx
=
dx
<
dx = 1.
2
2
2
2π −2 t + x
2π −2/t 1 + x
2π ∞ 1 + x2
It remains to deal with z = E + iη and |E| ≥ 2 + t. We have r(z) ≥ r(E) = r(|E|) ≥ r(2 + t)
and
Z 2 √
Z 2 √
1
4 − x2
1
4 − x2
r(2 + t) =
dx <
dx = 1.
2π −2 2 + t − x
2π −2 2 − x
2 (z)/(γg 2 (z) + 1) + κg 2 (z), we have
Lemma 5.7. Let f (z) = γgsc
sc
sc
1
E Tr(R(κD + γR> )) = f (z) + O(n−1 (1 − |gsc (z)|2 )−1 (1 ∧ Im(z)−5 )).
n
Proof. We may again use the Gaussian integration by part formula. Using (5.2.1)-(5.2.2)(5.2.3), we get, for 0 ≤ a 6= b ≤ n, and all j, k, `, m:
1
ERjk R`m Xab = − √ (E(γRja Rbk + Rjb Rak )R`m + ERjk (γR`a Rbm + R`b Ram )),
n
(5.2.12)
and
σ2
ERjk R`m Xaa = − √ (ERja Rak R`m + ERjk R`a Ram ),
n
(5.2.13)
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
We use again the identity −zR = I −
−zRRT
√1 RX T RT ,
n
=R−
√1 RX.
n
45
Taking conjugate and composing by R yields to
we find
1 X
−z(RR> )kk = Rkk − √
Rkb Rka Xab .
n
a,b
We now take expectation and use (5.2.12)-(5.2.13),
E(RRT )kk =ERkk +
1X
γX
γX
2
ERka
Rbb +
ERkb Rab Rka +
ERkb Rka Rba
n
n
n
a6=b
a6=b
a6=b
2σ 2 X
1X
2
2
+
ERkb
Raa +
ERka
Raa
n
n a
a6=b
We set
1
Tr(RR> ), m = Em and m = m − Em.
n
= Rkk . Taking Tr and dividing by n, in the above expression we obtain
m=
Recall that Dkk
−zm = g + (γ + 1)Egm +
1
Tr(R(RT )2 + γR2 RT + 2κRRT D).
n2
We deduce that
− (z + (γ + 1)g)m = g + Egm +
1
Tr(RRT ((1 + γ)RT + 2κD)).
n2
(5.2.14)
Using (5.2.5), |m| ≤ Im(z)−2 and |Tr(A)| ≤ nkAk, we find
−(z + (γ + 1)g)m = g + O(n−1 (Im(z)−4 ∧ 1)).
We deduce from (5.2.11)
−(z + (γ + 1)gsc )m = gsc + O(n−1 (Im(z)−5 ∧ 1)).
2 + zg + 1 = 0. From Lemma 5.6 and |γ| ≤ 1,
We multiply by gsc = O(1) and use that gsc
sc
2 + 1| ≥ 1 − |g |2 > 0. We find
|γgsc
sc
m=
2
gsc
+ O(n−1 (1 − |gsc (z)|2 )−1 (Im(z)−5 ∧ 1)).
2 +1
γgsc
we have (z + (γ + 1)gsc ).
We set similarly
n
1X
m =
(Rkk )2 ,
n
0
m0 = Em0
and
m0 = m0 − Em0 ,
k=1
so that
1
n Tr(RD)
= m0 . From −zR = I −
√1 RX,
n
multiplying by Rkk , we obtain
1 X
−z(Rkk )2 = Rkk − √
Rka Rkk Xak .
n a
We find analogously
− (z + g)m0 = g + O(n−1 Im(z)−3 ).
2 + zg + 1 = 0 and |g | = O(1).
and, from (5.2.11), gsc
sc
sc
2
m0 = gsc
+ O(n−1 (Im(z)−5 ∧ 1)).
This concludes the proof.
(5.2.15)
46
CHAPTER 5. RESOLVENT DIFFERENTIATION
We may now conclude the proof of Theorem (6.5.2). Let K = {z ∈ C : Re(z) ≥ 2 + ε} and
K+ = K ∩ C+ . We also set
1
L = 2 ETr[R(κD + γR> )].
zn
On K+ , we have L = O(n−1 Im(z)−2 ). From Equations (5.2.4) and (5.2.5), we find
(g + L)2 + z(g + L) + 1 = O(n−2 Im(z)−4 ).
For n large enough, g + L ∈ C+ . Hence, from Lemma 5.2, we find
g + L − gsc = δ(z) = O(n−2 Im(z)−5 ).
So finally, from (5.2.11) and Lemma 5.7, for all z ∈ K+ ,
g = gsc −
f (z)
+ O(n−2 (Im(z)−5 ∧ 1)),
n
(5.2.16)
where the O(·) depends on ε.
As above, for our smooth function ϕ with support [2 + ε, 2K) we may apply Lemma 5.5
with k = 5. We find
Z
Z
1
√
¯
E ϕdµX/ n =
Re
∂Φ(x
+ iy)g(x + iy)dxdy
π
K+
Z
1
f (x + iy)
¯
=
Re
∂Φ(x
+ iy)(gsc (x + iy) −
+ δ(x, y))dxdy
π
n
K+
Z
1
f (x + iy)
¯
=
Re
∂Φ(x
+ iy)(gsc (x + iy) −
)dxdy
π
n
K+
Z
1
¯
Re
∂Φ(x
+ iy)δ(x, y)dxdy
π
K+
Now, notice that gsc (z) − f (z)/n is analytic on an open neighbourhood of K+ . In particular,
¯ sc − f /n) = 0 on this neighbourhood. Hence, by integration by part, the first integral of the
∂(g
above expression is 0. Also, from (5.2.10)
¯
∂Φ(x
+ iy)δ(x, y) = O(n−2 ).
and is compactly supported. We thus have proved that
Z
p
E ϕdµX/√n = O(n−2 log n).
It concludes the proof of Theorem 5.3.
5.2.2
Convergence of edge eigenvalues under 4 + ε moment condition
In this paragraph, we consider a new set of assumptions on the Wigner matrix X = (Xij )1≤i,j≤n .
We assume that the probability distributions P and Q of X12 and X11 satisfy
(B1) EX12 = 0, E|X12 |2 = 1 and E|X12 |p < ∞ for some p > 4.
(B2) EX11 = 0 and E|X11 |2 < ∞.
Theorem 5.8. Let X = X(n) be a Wigner matrix satisfying assumptions (B). We have a.s.
X
X
lim λ1 √
= − lim λn √
= 2.
n→∞
n→∞
n
n
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
47
The proof of Theorem 5.8 will follow the same strategy than the proof for Gaussian Wigner
matrices. There will be however two new difficulties : matrices satisfying assumptions (B)
do not have (i) as strong concentration inequalities and (ii) the Gaussian integration by part
is no longer exactly available. We will solve issue (i) by a truncation step and the use of
Talagrand’s concentration inequality. Issue (ii) will be addressed with the following lemma due
to Khorunzhy, Khoruzhenko and Pastur [KKP96].
Lemma 5.9. Let ξ be a real-valued random variable such that E|ξ|k+2 < ∞. Let f : R → C be
a C k+1 function such that the (k + 1)-th derivative is uniformly bounded. Then,
Eξf (ξ) =
k
X
κ`+1
`=0
`!
Ef (`) (ξ) + O(kf (k+1) k∞ E|ξ|k+2 ),
where κ` is the `-th cumulant of ξ and O(·) depends only on k.
Proof. · · ·
Truncation :
We start the proof of Theorem 5.8 by the truncation step. Consider the new
set of assumptions : X = X(n) is a Wigner matrix such that there exists a constant C > 0 and
q > 2, such that
(C1) EX12 = 0, E|X12 |2 = 1, E|X12 |4 < C and supp(P ) ⊂ B(0, n1/q ).
(C2) Q(0) = 1.
Recall that P is the common distribution of (Xij )1≤i<j≤n and Q is the common distribution
of (Xii )1≤i≤n . Note here that P depends on n and we shall write the variables as Xij (n) and
assume that they are on a common probability space.
The next lemma is due to Bai and Yin [BY88].
Lemma 5.10 (Truncation). If the conclusion of Theorem 5.8 holds under assumptions (C)
then it also holds under assumptions (B).
Proof. We first get rid of the diagonal terms. Recall that, for a non-negative random variable
Y and a > 0,
a
Z
EY = a
∞
ta−1 P(Y > t)dt ≥ k
0
∞
X
ua−1
(u`+1 − u` )P(Y > u` ),
`
(5.2.17)
`=1
where (u` ) is any non-negative increasing sequence. Hence, assumption (B2) implies that for
any ε > 0,
∞
X
2` P(|X11 | > ε2`/2 ) < ∞.
`=1
In particular, the above
holds for some sequence ε = ε(n) going to 0. Let En be the event that
√
max1≤i≤n |Xii | ≤ ε 2n. We claim that a.s. for all n 1, En holds. Indeed,




o
X
[
X
[
[ n
√
P
Enc  =
P
|Xii | > ε 2n 
k≥1
2k−1 ≤n<2k
2k−1 ≤n<2k 1≤i≤n
k≥1

≤
X
≤
X
k≥1
k≥1
P

[
|Xii | > ε2k/2 
1≤i<2k
2k P(|X11 | ≥ ε2k/2 ).
48
CHAPTER 5. RESOLVENT DIFFERENTIATION
It follows, from Borel-Cantelli’s lemma, that a.s. for n 1, En holds.
If X̂ = X − diag(X) is the matrix obtained by removing the matrix with the diagonal
entries, we have, if En holds,
√
√
kXk
−
k
X̂k
≤ kdiag(X)k ≤ ε 2n = o( n).
We have thus checked that we may assume without loss of generality that (C2) holds.
Similarly, if Fn denotes the event that max1≤i<j≤n |Xij | ≤ ε(2n)2/p , we have


o
X
X
[
[ n
P(Fnc ) =
P
|Xij | > ε(2n)2/p 
n≥1
k≥1
2k−1 ≤n<2k 1≤i<j≤n

≤
X
≤
X
k≥1
P
[

n
o
|Xij | > ε22k/p 
1≤i<j≤2k
22k P |X12 | > ε22k/p
k≥1
which is finite from (5.2.17) applied to a = p and u` = 22`/p . We have checked so far that it is
sufficient to prove Theorem 5.8 for the matrix Y = (Yij )1≤i,j≤n with Yij = Xij 1(|Xij | ≤ εn1/q )
(and Xii = 0) with q = p/2. The variable Y12 is however a priori no longer centered with unit
variance. If ρ(n) = EX12 1|X12 |≤εn1/q = −EX12 1|X12 |>εn1/q , we find, from Hölder inequality,
1−1/p
|ρ(n)| ≤ (E|X11 |p )1/p P(|X11 | ≥ εn1/q )
= O(ε1−p n−3/2 ).
If Ŷ has entries Ŷij = 1(i 6= j)(Yij − ρ(n)),
kY
k
−
k
Ŷ
k
≤ kY − Ŷ k = k(J − I)ρ(n)k ≤ nρ(n) = o(1).
Similarly, if σ 2 (n) = Var(Ŷ12 ) = E|X12 1(|X12 | ≤ εn1/q ) − ρ(n)|2 , by dominated convergence,
we find limn→∞ σ 2 (n) = 1. If follows that
!
Ŷ Ŷ Ŷ kŶ k
√ − √ = √ |1 − σ| = o √ .
σ n n σ n
σ n
It concludes the proof since the matrix Ŷ /σ has now all required properties.
Convergence in probability :
Assume that assumptions (C) holds. The aim of this
paragraph is to prove that for any ε > 0, the following holds
X
P λ1 √
∈ ∆ = o(1),
(5.2.18)
n
√
(in words : the largest eigenvalues of X/ n converges in probability to 2). The central part of
the proof will be in the next lemma.
Lemma 5.11. Under assumptions (C), if δ = (1/2 − 1/q) ∧ 1/2 > 0,
f (z)
2 −1 −1−δ
−6
g(z) = gsc (z) −
+ O (1 − |gsc (z)| ) n
(1 ∧ Im(z) ) .
n
2 and σ 2 = 0.
where f is given by Lemma 5.7 with γ = EX12
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
49
Notice that γ and f depend on n. Before proving Lemma 5.11, let use check that (5.2.18)
holds. We observe that |γ| ≤ E|X12 |2 = 1 implies that f is analytic on [2 + ε, ∞) × [0, ∞).
Then, the argument below (5.2.16) with k = 6 and δ(x, y) = O((1 − |gsc (x + iy)|2 )−1 n−1−δ y −c )
gives
Z
p
E ϕµX/√n = O(n−1−δ log n).
(5.2.19)
Then (5.2.18) follows directly from (5.2.9).
We now turn to the proof of Lemma 5.11.
Proof of Lemma 5.11. The computation leading to (5.2.4) and Lemma 5.9 with k = 3 give
g 2 + zg + 1 = −Eg 2 −
κ4
κ3
E|X12 |5
1
>
Ω
+
Ω4 ,
ETr[R(κD
+
γR
)]
+
Ω
+
2
3
n2
n3
n5/2
n7/2
(5.2.20)
where Ω2 and Ω3 is a double sum of second and third derivatives of resolvent entries and Ω4 is
bounded up to a multiplicative constant by a double sum of sup of fourth derivatives.
4 G
Let us start with the last term. If G = (X − z)−1 , notice that ∂ab
k` is a finite sum of
terms in
Gkε1 Gε̄1 ε2 Gε̄2 ε3 Gε̄3 ε4 Gε̄4 `
where εi ∈ {a, b}, ā = b and b̄ = a. It follows that Ω4 is bounded up to a multiplicative constant
by
X
X
|Rkε1 Rε̄1 ε2 Rε̄2 ε3 Rε̄3 ε4 Rε̄4 a |.
εi ∈{a,k} 1≤k,a≤n
Since E|X12 |5 ≤ n1/q E|X12 |4 = O(n1/2−δ ) and |Rjk | ≤ Im(z)−1 , we deduce that
E|X12 |5
n1/2−δ n2 Im(z)−5
Ω
=
O
4
n7/2
n7/2
!
= O n−1−δ Im(z)−5 .
(5.2.21)
Let us now deal with Ω2 . For some constants cε which are not necessary to compute, we
have
X
X
Ω2 =
cε
Rkε1 Rε̄1 ε2 Rε̄2 a
ε=(ε1 ,ε2 )∈{a,k}2
1≤k,a≤n
If ε = (a, a) we have
X
X
(Rka )3 ≤ Im(z)−1
|Rka |2 = Im(z)−1 Tr(RR∗ ) ≤ nIm(z)−3 .
k,a
k,a
To deal with the case ε = (a, k), we introduce the vector x = (xk ) with xk = Rkk and the
matrix Mk` = Rkk Rk` . We find
X
X
√
√
Rka Rkk Raa =
(M x)k ≤ nkM xk ≤ nkM kkxk.
k,a
k
Now, notice that M = DR. Since kRk ≤ Im(z)−1 and |Rkk | ≤ Im(z)−1 , it implies that
√
kM k ≤ Im(z)−2 and kxk ≤ nIm(z)−1 so finally
X
Rka Rkk Raa ≤ nIm(z)−3 .
k,a
50
CHAPTER 5. RESOLVENT DIFFERENTIATION
To deal with the cases ε = (k, k) and ε = (k, a), we observe that
X
X
X
Rkk Rak Raa =
Rkk Raa Rka =
Rka Rkk Raa .
k,a
k,a
k,a
So finally,
κ3
−3/2
−3
Ω
=
O
n
Im(z)
.
2
n5/2
We may repeat the same argument for Ω3 . For some new constants cε ,
X
X
Ω3 =
cε
Rkε1 Rε̄1 ε2 Rε̄2 ε3 Rε̄3 a .
ε=(ε1 ,ε2 ,ε3 )∈{a,k}2
(5.2.22)
1≤k,a≤n
The expression
S(ε) =
X
Rkε1 Rε̄1 ε2 Rε̄2 ε3 Rε̄3 a .
1≤k,a≤n
is invariant by a cyclic permutation of (ε1 , ε2 , ε3 ). We also find that S(a, a, k) = S(k, k, k).
There has thus three cases to consider. We get
X
S(a, a, a) =
(Rka )4 ≤ Im(z)−2 Tr(RR∗ ) ≤ nIm(z)−4 ,
1≤k,a≤n
and
S(a, a, k) =
X
2
Raa = Tr(DRDR> ).
Rkk Rka
1≤k,a≤n
In particular |S(a, a, k)| ≤ nIm(z)−4 . Similarly,
X
S(k, k, a) =
Rkk Rak Raa Rka = Tr(DRDR).
1≤k,a≤n
It follows that
κ4
Ω3 = O n−2 Im(z)−4 .
(5.2.23)
3
n
We now turn to the first term on the right hand side of (5.2.20). If z = E + iη, we have
1
(x − E)
η
x−E
x−E
−1
−1
=
+i
= η h1
+ iη h2
.
x−z
(x − E)2 + η 2
(x − E)2 + η 2
η
η
The function h2 is convex and the second derivative of h1 being
h001 (x) =
2x(x2 − 3)
,
(1 + x2 )2
h1 has 3 inflection points. From Corollary 3.22, we deduce that
2
Eg = O Im(z)−4 n−2 (n1/2−δ )2 = O Im(z)−4 n−1−2δ .
(5.2.24)
It finally remains to take care of the second term in (5.2.20). We have the rough bound
Tr(R(κD + γR> )) = O(nIm(z)−2 ). Using Lemma 5.2, we have proved so far that
g = gsc + O(n−1−δ (1 ∧ Im(z)−5 ).
The computation leading to (5.2.14) shows that
−(z + (γ + 1)g)m = g + Egm +
1
Tr(RRT ((1 + γ)RT + 2κD))
n2
κ3 (2) κ4 (2) E|X12 |5 (2)
+ 5/2 Ω2 + 3 Ω3 +
Ω4 ,
n
n
n7/2
(5.2.25)
5.2. GAUSSIAN DIFFERENTIATION FORMULAS
51
where, Ω0k is a double sum of k-th derivatives of RRT . Before estimating these terms, we use
(5.2.25), (5.2.24) and |m| ≤ Im(z)−2 . We find
− (z + (γ + 1)gsc )m = gsc + O(n−1/2 (1 ∧ Im(z)−5 )) +
κ3 (2) κ4 (2) E|X12 |5 (2)
Ω + 3 Ω3 +
Ω4 . (5.2.26)
n
n5/2 2
n7/2
Similarly, the computation leading to (5.2.15) gives
2
m0 = gsc
+ O(n−1/2 (1 ∧ Im(z)−5 )) +
κ3 (3) κ4 (3) E|X12 |5 (3)
Ω + 3 Ω3 +
Ω4 ,
n
n5/2 2
n7/2
(3)
(2)
(5.2.27)
(3)
where, Ωk is a double sum of k-th derivatives of RD. The analysis of Ωk , Ωk parallels the
analysis of Ωk . If G(i) is equal to either RRT or RD depending on i = 2 or i = 3. We have
X
X
(i)
(i)
c(i)
Gkε1 Rε̄1 ε2 Rε̄2 a ,
Ω2 =
ε
ε=(ε1 ,ε2 )∈{a,k}2
1≤k,a≤n
and
(i)
Ω3 =
X
cε(i)
ε=(ε1 ,ε2 ,ε3 )∈{a,k}3
X
(i)
Gkε1 Rε̄1 ε2 Rε̄2 ε3 Rε̄3 a .
1≤k,a≤n
Since kG(i) k ≤ Im(z)−2 , all terms can be bounded as we did previously for Ωk . It suffices to
(i)
multiple all previous bounds by Im(z)−1 . For example, in Ω2 , for ε = (a, a), we write,
X
X
(i) 2 −2
|Rka |2 = Im(z)−2 nTr(RR∗ ) ≤ nIm(z)−4 .
G
R
≤
Im(z)
ka ka 1≤k,a≤n
1≤k,a≤n
(2)
The remainder Ω4 is now a triple sums of terms of the form
a double sum of such terms. In any case, we find
Q6
`=1 Ri` i`+1
(3)
while Ω4
remains
(i)
|Ω4 | ≤ n3 Im(z)−6 .
Recall that E|X12 |5 = O(n1/2−δ ). Hence, plugging these bounds into (5.2.26)-(5.2.27) yields to
1
ETr[R(κD + γR> )] = f (z) + O((1 − |gsc |2 )−1 n−δ Im(z)−6 ).
n
The lemma is proved.
Convergence almost sure :
It follows however from Theorem 3.19, Corollary 3.8 and the
convexity of M 7→ λ1 (M ) in Hn (C) (which follows from Courant-Fischer min-max Theorem
3.1), that for some constants c, ρ > 0,
X
P λ1 √
− ρ ≥ t ≤ c−1 exp(−cn1−4/p t2 ).
n
In particular, from Borel-Cantelli’s Lemma, it suffices to prove that ρ ≤ 2(1 + ε). First, from
Lemma 5.4 we find ρ ≤ K. We then need to check that ρ ∈
/ ∆, the latter follows from (5.2.19).
52
CHAPTER 5. RESOLVENT DIFFERENTIATION
Chapter 6
Covariance matrices
6.1
Model description
Let Y be a random vector in Rn . Consider (Xi )1≤i≤N independent and identically distributed
copies of Y . We define
N
n X
S=
Xi Xit
N
(6.1.1)
i=1
We are interested by the spectrum of S as n, N → ∞. We will assume that (n, N ) are such
that
n
c=
∈ (C −1 , C),
N
for some C > 1. (The dependence in (n, N ) of c is implicit). Our constants will depend on C.
We consider the following set of statistical assumptions.
(A) Y is centered and isotropic :
E[Y ] = 0
In
E YYt = .
n
and
(B) For some εn → 0, for any A ∈ Mn (R),
2
1
t
E Y AY − TrA ≤ kAk2 ε2n = o(kAk2 ).
n
√
(C) Y has i.i.d. coordinates and for some p > 4, E( nY1 )p < C.
√
From Lemma 4.9, if Y = (y1 , · · · , yn )/ n has i.i.d. coordinates and the law of yi has finite
√
fourth moment and is independent of n, (B) holds with εn = O(1/ n).
We denote by 0 ≤ λn ≤ · · · ≤ λ1 the eigenvalues of S. Assumption (A) guarantees that M
is properly normalized :
n
1
1X
ETrS = E
λi = 1.
n
n
i=1
In this chapter, our aim is to estimate
kS − Ik = max(1 − λn , λ1 − 1),
as n, N → ∞.
As usual, the empirical spectral distribution (ESD) of M is
n
1X
µS =
δ λi .
n
i=1
53
54
CHAPTER 6. COVARIANCE MATRICES
It has to be compared with the Marchenko-Pastur distibution with parameter c
1 p
(b − x)(x − a)1x∈[a,b] dx + (c − 1)+ δ0
µc =
2πcx
√
√
where a = ((1 − c)+ )2 , b = (1 + c)2 .
We denote by d(µ, ν) the Lévy distance between probability measures. Our first statement
is the following:
Theorem 6.1. If Assumptions (A)-(B) hold then a.s. as (n, N ) → ∞,
d(µS , µc ) → 0.
With stronger assumptions there is also convergence of the extremal eigenvalues of S.
Theorem 6.2. If Assumptions (A)-(C) hold then for any ε > 0, a.s. for all n 1,
√
√ λn − ((1 − c)+ )2 ≤ ε and λ1 − (1 + c)2 ≤ ε.
In the sequel, we will use the notation bn = Oz (an ) if there exist integers k, ` ≥ 1 such that
bn = O((|z| + 1)` (1 ∧ Im(z))−k an )). To prove Theorem 6.2, we will use a variant of the method
developed in the previous chapter which avoids the Gaussian integration by parts.
6.2
6.2.1
Convergence of traces of resolvent matrices
Resolvent identities
Let A ∈ Hn (R). For all z ∈ C+ = {z ∈ C : Im(z) > 0}, we define
GA (z) = (A − z)−1 .
Since A is symmetric, GA is also symmetric. We recall the resolvend identity
GB = GA − GA (B − A)GB .
(6.2.1)
Moreover for any u, v ∈ Rn , the Sherman-Morrison formula implies that
GA+uvt = GA −
GA uv t GA
.
1 + v t GA u
(6.2.2)
The following lemma is an easy consequence Sherman-Morrison formula.
Lemma 6.3. We have
GA+XX t XX t =
and
TrG2A+XX t XX t =
GA XX t
1 + X t GA X
X t G2A X
.
(1 + X t GA X)2
Proof. The first statement follows directly from (6.2.2):
GA+XX t XX t = GA XX t −
GA XX t GA XX t
(X t GA X)GA XX t
t
=
G
XX
−
.
A
1 + X t GA X
1 + X t GA X
For the second, we write
GA XX t GA
GA XX t GA
2
t
GA+XX t XX =
GA −
GA −
XX t
1 + X t GA X
1 + X t GA X
G2 XX t GA XX t GA XX t G2A XX t GA XX t G2A XX t GA XX t
= G2A XX t − A
−
+
.
1 + X t GA X
1 + X t GA X
(1 + X t GA X)2
It remains to group terms as above and take the trace.
6.2. CONVERGENCE OF TRACES OF RESOLVENT MATRICES
6.2.2
55
Concentration inequalities
We now set G = GS where S is defined by (6.1.1) and
D(z) = diag(G11 (z), · · · , Gnn (z)).
For integer k ≥ 1, we now consider an arbitary product of G or D :
k
Y
P (z) =
A` (z)
`=1
where A` ∈ {G, G∗ , D, D∗ }.
Lemma 6.4. Let P be as above. For any t > 0, we have
2 2
2k
1
1
n
t
Im(z)
.
P TrP (z) − E TrP (z) ≥ t ≤ 2 exp −
n
n
8k 2 N
Proof. For any X ∈ Hn (C), we denote by
P (X) =
k
Y
A` (X),
`=1
∗ } (the dependency in z is implicit). We write the telescopic
where A` (X) ∈ {GX , G∗X , DX , DX
sum
!
!
k
k
i−1
Y
X
Y
A` (X)
P (X) − P (Y ) =
A` (Y ) (Ai (X) − Ai (Y ))
i=1
=
k
X
`=i+1
`=1
Pi (Ai (X) − Ai (Y ))Qi .
i=1
We find
|TrP (X) − TrP (Y )| ≤
k
X
|Tr(Qi Pi (Ai (X) − Ai (Y )))|
i=1
We claim that for any 1 ≤ i ≤ k,
|Tr(Qi Pi (Ai (X) − Ai (Y )))| ≤ 2Im(z)−k rank(X − Y ).
(6.2.3)
Consider first the case where Ai (X) ∈ {G, G∗ }. The resolvent identity (6.2.1) implies that
rank(Pi (Ai (X) − Ai (Y ))Qi ) ≤ rank(Ai (X) − Ai (Y )) ≤ rank(X − Y ).
Moreover, kQi Pi (Ai (X) − Ai (Y )))k ≤ 2Im(z)−k and (6.2.3) follows.
∗ }. We set B = G − G if A (X) = D and
We now deal with the case Ai (X) ∈ {DX , DX
i
X
Y
X
∗ . Then, from (6.2.1) the matrix B has rank r ≤ rank(X − Y )
B = G∗X − G∗Y if Ai (X) = DX
and norm bounded by 2Im(z)−1 . We perform the singular value decomposition of B :
B=
r
X
sα uα vαt ,
α=1
with kuα k2 = kvα k2 = 1 and 0 ≤ sα ≤ 2Im(z)−1 . Also, since kQi Pi k ≤ Im(z)−k+1 . We find
that for any 1 ≤ j ≤ n, |(Qi Pi )jj | ≤ Im(z)−k+1 and
−k
|(Qi Pi (Ai (X) − Ai (Y )))jj | = |(Qi Pi )jj Bjj | ≤ 2Im(z)
r
X
α=1
|etj uα ||etj vα |.
56
CHAPTER 6. COVARIANCE MATRICES
So finally, using Cauchy Schwartz inequality,
|Tr(Qi Pi (Ai (X) − Ai (Y )))| ≤ 2Im(z)−k
n X
r
X
|etj uα ||etj vα |
j=1 α=1
v
v
uX
n
r uX
X
u
u n t
−k
t
2
t
≤ 2Im(z)
|ej uα | t
|ej vα |2
α=1
≤ 2Im(z)
−k
j=1
j=1
rank(X − Y ).
It proves (6.2.3). We get
|TrP (X) − TrP (Y )| ≤ 2kIm(z)−k rank(X − Y ).
The conclusion follows by applying Azuma-Hoeffding’s inequality as in the proof of Theorem
3.17.
6.2.3
Variance bounds
The next result gives a better variance estimates in n of
Assumption (B) holds.
1
n TrG(z)
than Lemma 6.4 when
Proposition 6.5. Assume that Assumption (A)-(B) holds. Then,
2
1
ε
Var
TrG(z) = Oz n .
n
n
Lemma 6.6. For any real (xj )1≤j≤n , (yj )1≤j≤n , yj ≥ 0, v > 0,
n
X
yj 1 +
≥ v.
x − iv j=1 j
P
1
Proof. Write yj = ypj with j pj = 1. The image of x 7→ x−iv
, x ∈ R, is a circle C of center
P
pj
2i/v and radius 2/v. It follows that the set of z ∈ C such that z = nj=1 xj −iv
for some (pj ),
P
yj (xj ) as above is a disc D of center 2i/v and radius 2/v. In particular 1 + nj=1 xj −iv
is lower
bounded by y times the distance of −1/y to D. This distance is minimized for some z ∈ C and
we thus need to lower bound, for all real x, y
2
yx
y2v2
y 2
|1 +
| = 1+ 2
+
.
x − iv
x + v2
(x2 + v 2 )2
In y, the infimum is reached for y = −x. It follows that
2
y 2
x2
x2 v 2
|1 +
| ≥ 1− 2
+
= v2.
x − iv
x + v2
(x2 + v 2 )2
The conclusion follows.
Lemma 6.7. Assume that Assumption (A)-(B) holds. Let ` ∈ N, A ∈ Hn (C), B ∈ Mn (C)
and R = (A − z)−1 , z ∈ C+ . We have
2
1
1
= Oz,` (ε2n ),
E
−
c
t
`
`
(1 + cY RY )
(1 + n TrR) and
2
1
Y t BY
n TrB
E
−
≤ Oz,` (kBk2 ε2n ),
(1 + cY t RY )` (1 + nc TrR)` 6.3. CONVERGENCE OF TRACES OF RESOLVENT MATRICES
Proof. We set f = n1 TrR and g = n1 TrB. We have
1
`|x − y|
− 1≤
x` y ` (|x| ∧ |y|)`−1 (|x| ∨ |y|) .
57
(6.2.4)
By Lemma 6.6, we deduce that
2
1
1
≤ Oz,` |Y t RY − f |2 .
E
−
t
`
`
(1 + cY RY )
(1 + cf )
The first statement follows. Similarly, we write
Y t BY
g
Y t BY − g
1
1
.
−
=
+g
−
(1 + cY t RY )` (1 + cf )`
(1 + cY t RY )`
(1 + cY t RY )` (1 + cf )`
We deduce that
2
2
1
TrB
Y t BY
|Y t BY − g|2
1
1
2
n
≤
2
−
+
2|g|
−
c
(1 + cY t RY )` (1 + cf )` .
(1 + cY t RY )` (1 + n TrR)` |1 + cY t RY |2`
Using Lemma 6.6 and the previous bound, it yields to the claimed expression.
Proof of Proposition 6.5. We have F (X1 , · · · , XN ) =
From Efron-Stein inequality,
1
n TrG
for some function F : RnN → R.
N
1
1X
N
Var
TrR ≤
E|F (X) − F (X−i )|2 = E|F (X) − F (X−1 )|2
n
2
2
i=1
where X = (X1 , · · · , XN ) and X−i = (X1 , · · · , Xi−1 , Xi0 , Xi+1 , · · · , XN ) and Xi0 is an independent copy of Xi .
We set
X
Sj = c
Xi Xit and Gj = (Sj − zI)−1 .
i6=j
Observe that Gj and Xj are independent. We denote by Ej the expectation with respect to
Xj .
By (6.2.2), we get
1
Var
TrR
≤
n
=
=
!2
N 1
cG1 X1 X1t G1
cG1 X10 X10 t G1 1
E Tr
− Tr
2 n
1 + cX1t G1 X1
n
1 + cX10 t G1 X10 2
c2 N X1t G21 X1
X10 t G21 X10 E
−
2n2 1 + cX1t G1 X1 1 + cX10 t G1 X10 c2 N
X1t G21 X1
E Var1
,
n2
1 + cX1t G1 X1
where Var1 denotes the variance under E1 . It remains to apply Lemma 6.7 and recall that
kG1 k ≤ Im(z)−1 .
6.3
Convergence of traces of resolvent matrices
We introduce the analytic functions
f (z) =
1
1
TrG(z) = TrD(z),
n
n
58
CHAPTER 6. COVARIANCE MATRICES
g(z) =
1
1
TrG(z)2 = TrG(z)G(z)t
n
n
and
1
TrD(z)2 .
n
In this section, we compute an asymptotic equivalent for Ef (z), Eg(z) and Eh(z).
h(z) =
Proposition 6.8. Under assumption (A)-(B), we have
Ef (z) = fc (z) + Oz (εn + n−1 ).
where fc (z) =
R
1
x−z dµc (x).
R
Lemma 6.9. The function fc (z) =
satisfies
1
x−z dµc (x)
is analytic on C\[(1 −
czfc (z) = 1 − c −
√
2
c)+ , (1 +
√
c)2 ] and
1
.
1 + cfc (z)
Proof. · · ·
We then establish an approximated fixed point equation for Ef (z).
Lemma 6.10. For any z ∈ C+ ,
czEf (z) = 1 − c −
1
+ Oz (εn + n−1 ).
1 + cEf (z)
Proof. We write
zG = −I + GS = −I + c
N
X
GXi Xit .
i=1
We set
Sj = c
X
Xi Xit
and
Gj = (Sj − zI)−1 .
i6=j
We use Lemma 6.3 :
zG = −I + c
N
X
i=1
Gi Xi Xit
.
1 + cXit Gi Xi
Taking trace and expectation,
zEf (z) = −1 + E
X1t G1 X1
TrG1 X1t X1t
= −1 + E
.
t
1 + cX1 G1 X1
1 + cX1t G1 X1
So finally
czEf (z) = 1 − c − E(1 + cX1t G1 X1 )−1 .
(6.3.1)
Lemma 6.7 and the independence of X1 and G1 yield to
E1
1
1
=
+ Oz (εn ).
c
t
1 + cX1 G1 X1
1 + n TrG1
Using (6.2.4) and Lemma 6.6, we also get
1
1
c
1
−1
−1 E(1 + TrG1 ) − (1 + cEf (z)) = Oz E TrG1 − TrG + E TrG − f (z) .
n
n
n
n
By assumption (B) and Cauchy-Schwartz inequality, the first term is O(Im(z)−2 εn ). By Lemma
??, the second term is O(n−1 Im(z)−2 ). Finally, the last term is Oz (εn n−1/2 ) by Proposition
6.5. It yields to the lemma.
6.3. CONVERGENCE OF TRACES OF RESOLVENT MATRICES
59
The next lemma asserts that the quadratic equation satisfied by x = cfc (z) is stable.
Lemma 6.11. Let x, y, z ∈ C+ such that zx2 + x(z + c − 1) + c = 0 and zy 2 + y(z + c − 1) + c = δ
for some δ ∈ C. Then
|x − y| = Oz (|δ|).
Proof. We have y 2 + y(1 + (c − 1)z −1 ) + cz −1 = δz −1 and
1
y − (1 + (c − 1)z −1 )
2
2
= δz
−1
− cz
−1
+
2
1
−1
(1 + (c − 1)z ) .
2
Subtracting the same expression with δ = 0, we deduce that
δz −1 = (y − x)(y + x + 1 + (c − 1)z −1 ).
Since, x + 1 + (c − 1)z −1 = −c/(zx),
|x − y| =
|δ|
|z|y −
c .
zx
It is sufficient to prove that for some ε > 0,
c Im(z)
y
−
.
≥ε
zx
1 + |z|2
First, by assumption x = c
R∞
0
1
t−z dµc .
In particular, if z = u + iv,
∞
(u + iv)((t − u) + iv)
dµc (t).
(t − u)2 + v 2
Z
zx = c
0
(6.3.2)
Hence, zx ∈ C+ ,
Z
Im(zx) = c
0
∞
vt
dµc (t) ≥ c
(t − u)2 + v 2
Z
√
(1+ c)2
0
2u2
vt
cv
√ .
dµc (t) ≥
2
2
2
+ 2t + v
2|z| + 2(1 + c)4
So finally, −1/(zx) ∈ C+ and for some ε > 0,
c Im(zx)
≥ εIm(zx),
y −
≥ cIm(−(zx)−1 ) = c
zx
|zx|2
where we have used that Im(y) ≥ 0 and |zx| = O(1) (which follows easily from (6.3.2)).
We can now complete the proof of Proposition 6.8.
Proof of Proposition 6.8. Let y = cEf (z). By Lemma 6.10, we have zy 2 + y(z + c − 1) + c =
Oz (n−γ/2 ). It remains to use Lemma 6.11 and Lemma 6.9.
We now deal with Eg(z).
Proposition 6.12. Let
fc (z)
=
gc (z) = −
cz + (zcfc (z) + c − 1)2
√
2
√
Z
1
dµc (x).
(x − z)2
c)2 ]. Under assumption (A)-(B) we have
Eg(z) = gc (z) + Oz εn + n−1/2 .
Then gc is analytic on C\[(1 −
c)+ , (1 +
60
CHAPTER 6. COVARIANCE MATRICES
Proof. We write
G2 = G(−z −1 I + cz −1
N
X
GXi Xit ) = −z −1 G + cz −1
i=1
N
X
G2 Xi Xit .
i=1
We take the trace and use Lemma 6.3, we find
N
Xit G2i Xi
1 X
.
zg(z) = −f (z) +
N
(1 + cXit Gi Xi )2
i=1
Taking expectation, we find
zEg(z) = −Ef (z) + E
X1t G21 X1
.
(1 + cX1t G1 X1 )2
By Lemma 6.7, we find
E1
By Lemma 6.4, Var
1
2
X1t G21 X1
n TrG1
=
+ Oz (εn ).
(1 + cX1t G1 X1 )2
(1 + nc TrG1 )2
1
2
n TrG
= Oz ( n1 ). Hence, arguing as in Lemma 6.10, we find
czEg(z) = −fc (z) −
Eg(z)
−1/2
.
+
O
ε
+
n
z
n
(1 + cfc (z))2
By Lemma 6.9, we may rewrite this expression as
Eg(z) cz + (zcfc (z) + c − 1)2 = −fc (z) + Oz εn + n−1/2 .
We finally deal with Eh(z).
Proposition 6.13. Under assumption (A)-(B) we have
Eh(z) = fc2 (z) + Oz εn + n−1 .
Proof. We have Eh(z) = EG211 By Proposition 6.8, EG11 = f (z) = fc (z) + Oz (εn ), it is thus
sufficient to check that
VarG11 = Oz (ε2n ).
We start with the identity
zG11 = −1 + c
N
X
et Gi Xi X t e1
1
i=1
i
1 + Xit Gi Xi
.
So that
Var(G11 ) = (c/z)2 Var
N
X
et Gi Xi X t e1
1
i=1
· · · FIXME : always the same computations...
i
1 + Xit Gi Xi
!
.
6.4. BOUND ON MATRIX NORM
6.4
61
Bound on matrix norm
√
Proposition 6.14. Under Assumptions (A)-(B) with εn = O(1/ n), we have
p
EkSk = O( log n).
Proof. Since EXi Xi∗ = In /n, By symmetrization, we have
N
X n
EkS − Ik ≤ 2E
εi Xi Xi∗ ,
N
i=1
with εi i.i.d. symmetric Bernoulli independent of X1 , · · · , XN . From Rudelson’s inequality,
v
u N
uX
p
EkS − Ik ≤ C log nEt (n/N )2 kXi k22 Xi Xi∗ .
i=1
If Λ = max1≤i≤N kXi k22 , we find from Cauchy-Schwartz inequality and the triangle inequality,
EkS − Ik ≤ C 0
p
p
√ p
√ p
log n EΛ EkSk ≤ C 0 log n EΛ
EkS − Ik + 1 .
√
Assumption (B) with εn = O(1/ n) and A = In give that P(kXi k22 ≥ 1 + t) = O(1/(nt2 )). In
particular, from the union bound,
P(Λ ≥ 1 + t) = O(1/t2 ).
Consequently, EΛ = O(1) and for some new constant C 0 ,
p
p
EkS − Ik ≤ C 0 log n
EkS − Ik + 1 .
√
It is easy to√check that if x2 ≤ a(x + 1) then x ≤ ( 2a) ∧ (2a2 ). We thus have proved that
EkSk ≤ C 00 log n.
6.5
Convergence of edge eigenvalues
We introduce a new assumption on Y ,
√
√
(C’) Y has i.i.d. coordinates, E| nYi |4 < C and for some q > 2, | nYi | ≤ n1/q with probability
1.
We now establish the convergence in probability of extremal eigenvalues.
Proposition 6.15. Under Assumptions (A)-(C’), for any ε > 0, we have,
√
√ P λn − ((1 − c)+ )2 ≥ ε = o(1) and P λ1 − (1 + c)2 ≥ ε = o(1).
The passage from Propostion 6.15 to Theorem 6.2 is performed as in the previous chapter: we truncate and renormalize the entries of the vector Y which satisfies (C) (see Lemma
5.10). The almost sure convergence is obtained by an application of Talagrand’s concentration
inequality.
Note that Proposition 6.8 and Lemma 6.4 imply that a.s.
lim sup λn ≤ ((1 −
n→∞
√
c)+ )2
and
lim sup λ1 ≥ (1 +
n→∞
√ 2
c) .
62
CHAPTER 6. COVARIANCE MATRICES
We should thus proved the converse statement. We will follow the approach used in the previous
chapter. We fix κn = log n. From Proposition 6.14
P(kSk ≥ κn ) = o(1).
(6.5.1)
For any ε > 0, consider a smooth function ϕ : R+ → [0, 1] with support
√
√
Iε = [0, ((1 − c)+ )2 − ε)+ ] ∪ [(1 + c)2 + ε, κn ]
such that ϕ(x) = 1 on I2ε .
Assume that we manage to prove that
n
1X
E
ϕ(λk ) = E
n
Z
ϕdµM
k=1
1
.
=o
n
(6.5.2)
Then, using 1(λ ∈ I2ε ) ≤ ϕ(λ), we would deduce, that
P(∃λk ∈ I2ε ) ≤
n
X
Z
Eϕ(λk ) ≤ nE
ϕdµM = o(1).
k=1
Using (6.5.1), it would conclude the proof of Proposition 6.15.
The main technical lemma is the following.
Lemma 6.16. Under Assumptions (A)-(C’), there exists ρ > 0 and an analytic function τ on
√
√
C\[(1 − c)+ )2 , (1 + c)2 ] such that for any z ∈ C+ ,
τ (z)
1
Ef (z) = fc (z) +
+ Oz
.
n
n1+ρ
Before proving Lemma 6.16, let us give the proof of Proposition 6.15.
Proof of Proposition 6.15. We consider the above smooth function ϕ with support Iε . The term
δ(z)/n1+γ = Oz (1/n1+γ ) in Lemma 6.16 is of the form δ(z) = O((1 + |z|)` (1 ∧ Im(z))−k ) for
some integers k, ` ≥ 1.
Let Kε = {z ∈ C+ : Re(z) ∈ Iε }. We may apply Lemma 5.5 with k as above. Since
¯
∂Φ(x + iy) has support in Kε , we get
Z
Z
1
¯
E ϕdµM =
Re
∂Φ(x
+ iy)Ef (x + iy)dxdy
π
Kε
Z
1
τ (x + iy) δ(x + iy)
¯
=
Re
∂Φ(x + iy) fc (x + iy) −
+
dxdy
π
n
n1+ρ
Kε
Z
τ (x + iy)
1
¯
=
Re
∂Φ(x + iy) fc (x + iy) −
dxdy
π
n
Kε
Z
1
δ(x + iy)
¯
+ Re
∂Φ(x
+ iy)
dxdy
π
n1+ρ
Kε
Now, notice that fc (z) − τ (z)/n is analytic on an open neighborhood of Kε . In particular,
¯ c − τ /n) = 0 on this neighbourhood. Hence, by integration by part, the first integral of the
∂(f
above expression is 0. Also, from (5.2.10), for some C > 0,
¯
∂Φ(x
+ iy)δ(x + iy) = O 1(y ≤ C)(1 + |x|` ) .
We integrate over Kε and recall that κn = no(1) . We have proved that
Z
E ϕdµM = o(n−1 ).
It concludes the proof of Proposition 6.15.
6.5. CONVERGENCE OF EDGE EIGENVALUES
63
Lemma 6.17. Set f1 (z) = n1 TrG1 (z). Under Assumptions (A)-(B),
εn
1 cgc
1
+ Oz
Ef (z) − Ef1 (z) = −
+ 3/2 .
n 1 + cfc
n
n
Proof. By (6.2.2),
G = G1 − c
Hence
f (z) − f1 (z) = −
G1 X1 X1t G1
.
1 + cX1t G1 X1
c TrG1 X1 X1t G1
c X1t G21 X1
=
−
.
n 1 + cX1t G1 X1
n 1 + cX1t G1 X1
By Lemma 6.7, we find
E1
g1
X1t G21 X1
=
+ Oz (εn ),
t
1 + cX1 G1 X1
1 + cf1
where g1 = n1 TrG21 . We may then write
g1
Eg1
g1 − Eg1
f1 − Ef1
=
+E
− cEg1 E
.
1 + cf1
1 + cEf1
1 + cf1
(1 + cf1 )(1 + cEf1 )
It remains to apply Propositions 6.8-6.12 and Lemmas 3.5-6.6.
The next lemma establishes a convergence rate for random quadratic forms.
Lemma 6.18. Under Assumption (A)-(C’), that for any A ∈ Mn (C),
4
t
1
EY AY − TrA = O kAk4 n4/(q∧4)−3
n
Proof. We can assume without loss of generality that A ∈ Mn (R). We set Zi = Yi2 − 1/n and
write
4
4
X
4
X
1
t
E Y AY − TrA ≤ 8E
Aii Zi + 8E
Aij Yi Yj (6.5.3)
n
i=1
i6=j
We first treat the first term, we have
4
X
X
X
E
Aii Zi =
A4ii E(Zi )4 +
A2ii A2jj E(Zi )2 E(Zi )2 .
i=1
i
i6=j
Since |Aii | ≤ kAk, EZi2 = O(n−2 ) and |Zi | = O(n2/q−1 ), we have E(Zi )4 = O(n4/q−4 ) and
4
X
Aii Zi = O(kAk4 n4/q−3 ).
E
i=1
Similarly,
X
A2ii A2jj E(Zi )2 E(Zi )2 = O(kAk4 n−2 ).
i6=j
Hence, the first term is (6.5.3) is bounded by O(kAk4 n4/(q∧4)−3 ).
For the second term of (6.5.3), up to replacing A by A0 = A − diag(A11 , · · · , Ann ), we may
simply assume that Aii = 0 (indeed kA0 k ≤ kAk + maxi kAii | ≤ 2kAk). We need expand

4
4
4
X
X Y
Y


E
Aij Yi Yj
=
Ai` j ` E
Yi` Yj`
i,j
i` ,j` ,1≤`≤4 `=1
=
X
4
Y
i` ,j` ,1≤`≤4 `=1
`=1
Ai` j `
k
Y
a=1
a
EYsm
,
a
(6.5.4)
64
CHAPTER 6. COVARIANCE MATRICES
where (sa ), 1 ≤ a ≤ k is the set of distinct indices in (i` , j` ), 1 ≤ ` ≤ 4 and ma is the number of
occurrences of sa in (i` , j` ), 1 ≤ ` ≤ 4. Since EYsa = 0, the above product is 0 unless ma ≥ 2
for any 1 ≤ a ≤ k. It implies that 1 ≤ k ≤ 4 and that the ordered vector (ma )1≤a≤k is one the
following: (2, 2, 2, 2), (3, 3, 2), (4, 2, 2), (4, 4), (5, 3), (6, 2) (the case k = 1 and m1 = 8 does not
contribute since Aii = 0). By assumption, we have the bound
(p−4)
+
− p2
p
q
EYi = O n
.
(6.5.5)
We deduce that
k
Y
a
EYsm
= O(n
a
−4+ 2q 1(k=2)
).
a=1
Q4
`=1 Ai` j` .
Let us start with the case k = 2. We use twice the
We now deal with the control of
bound |Aij | ≤ kAk. We get a contribution of order
X Y
4
k
X
Y
ma 2 −4+2/q
EY
=
O(kAk
n
|Aij |2 + |Aij Aji |)
A
i
j
s
`
`
a i` ,j` ,k=2 `=1
a=1
i,j
From Cauchy-Schwartz inequality,
X
X
|Aij Aji | ≤
|Aij |2 = Tr(AA∗ ) ≤ nkAk2 .
i,j
i,j
So finally,
X Y
4
k
Y
ma Ai` j`
EYsa = O(kAk4 n−3+2/q ).
i` ,j` ,k=2 `=1
a=1
For the terms such that k = 3, we use once the bound |Aij | ≤ kAk. It remains an expression of
the form Aij Bji Cik or Aij Bjk Cki , where B and C are equal A or A∗ . In the first case, we use
the bound
sX
X
X
√
|Aij Bji Cik | ≤
|Aij ||Bji | n
|Cik |2 ≤ n3/2 kAk3 .
i,j
i,j,k
k
In the second case,
sX
sX
X
X
X
|Aij Bjk Cki | ≤
|Aij |
|Bjk |2
|Cik |2 ≤ kAk2
|Aij | ≤ n3/2 kAk3 .
i,j
i,j,k
k
k
i,j
So finally,
4
k
Y
X Y
ma A
EY
= O(kAk4 n−4+3/2 ).
i
j
s
`
`
a i` ,j` ,k=3 `=1
a=1
It remains to deal with k = 4 in (6.5.4). By assumption EYi = 1/n. Each index appears exactly
twice : there are expressions of the form Aij Bji Ck` D`k and Aij Bjk Ck` D`i where B, C, D are
equal A or A∗ . In the first case,
X
X
X
Aij Bji Ck` D`k =
Aij Bji
Ck` D`k = Tr(AB)Tr(CD),
i,j
i,j,k,`
k,`
n2 kAk2 .
whose absolute value is bounded by
In the second case,
X
Aij Bjk Ck` D`i = Tr(ABCD),
i,j,k,`
whose absolute value is bounded by nkAk4 . We thus have checked that (6.5.4) is at most of
order kAk4 n−2 .
6.5. CONVERGENCE OF EDGE EIGENVALUES
65
The next lemma computes the first moments of a random quadratic form.
Lemma 6.19. Under Assumptions (A)-(C’), we set β = n2 EY14 = O(1). If A ∈ Mn (C) and
D = diag(A11 , · · · , Ann ), we have
2
1
1
E Y t AY − TrA = 2 TrA2 + TrAAt + (β − 2)TrD2 ,
n
n
and
3
1
t
E Y AY − TrA = O(kAk3 n−2+2/q ).
n
Proof. We expand brutally

2
2
X
1
E Y t AY − TrA
= E (Yi Aij Yj − EYi Aij Yj )
n
i,j
X
=
Ai1 j1 Ai2 j2 (EYi1 Yj1 Yi2 Yj2 − EYi1 Yj1 EYi2 Yj2 )
i1 ,j1 ,i2 ,j2
= (i) + (ii),
where in the sum, the summand is zero unless one the two disjoint cases holds :
(i) i1 = i2 and j1 = j2
(ii) i1 = j2 , i2 = j1 and i1 6= j1 .
Let us start with case (i). We have
(i) =
=
1 X 2
1 X 2
4
Aij + EY1 − 2
Aii
n2
n
i
i6=j
1
β
1
t
TrAA +
−
TrD2 .
n2
n2 n2
Similarly,
(ii) =
EY12 Y22
X
Aij Aji
i6=j
=
1
TrA2 − TrD2 ,
2
n
We now compute
3
3
3
XY
Y
1
E Y t AY − TrA
=
Ai` j` E
(Yi` Yj` − EYi1 Yj1 )
n
i` ,j` `=1
=
3
X
`=1
Sk ,
k=1
where in Sk , the sum is over k distinct values of i` , j` each appearing at least twice. If k = 1,
we have from (6.5.5)
S1 =
X
i
1 3
2
Yi −
= O(kAk3 n2/q−2 ).
n
A3ii E
66
CHAPTER 6. COVARIANCE MATRICES
If k = 2, each distinct index appears 2, 3 or 4 times. From (6.5.5), the terms of the form
A`` Aij Ai0 j 0 with i0 , j 0 , i, j different from ` are equal to 0 because E Y`2 − EY`2 = 0. Hence, only
remains terms of the form Aij Bjk Cki where B, C are equal to A, At or D. However,
X
Aij Bjk Cki = Tr(ABC) ≤ nkAk3 .
i,j,k
We deduce from (6.5.5) that
S2 = O(kAk2 n−2 ).
It remains to deal with S3 . Each distinct index appears exactly twice.
Moreover we can restrict
2
2
to sums such that i` 6= j` for any 1 ≤ ` ≤ 3 (because E Y` − EY` = 0). In particular, we find
that EYi1 j1 = 0 and EYi1 Yj1 Yi2 Yj2 = 0 (since i3 6= j3 , i3 is equal to exactly one of the index
(i1 , i2 , j1 , j2 )). Expanding, we obtain
S3 = 0.
Proof of Lemma 6.16. We set f1 = n1 TrG1 . By Lemma 6.19, Assumption (C’) implies that (B)
holds with εn = O(n−1/2 ). We start from (6.3.1)
zcEf (z) = 1 − c − E(1 + cX1t G1 X1 )−1 .
We perform a Taylor expansion of (1 + cY1t G1 X1 )−1
3
t
`
X
1
(X1t G1 X1 − f1 )4
` (X1 G1 X1 − f1 )
4
=
(−c)
+
c
.
1 + cX1t G1 X1
(1 + cf1 )4 (1 + cX1t G1 X1 )
(1 + cf1 )`+1
(6.5.6)
`=0
By Lemma 6.6 and Lemma 6.18, we have
(X1t G1 X1 − f1 )4
= Oz E1 X1t G1 X1 − f1 4 = Oz n4/(q∧4)−3 .
E1 (1 + cf1 )3 (1 + cX1t G1 X1 ) We finally write,
E1
1
1 + cX1t G1 X1
=
=
1
E1 (X1t G1 X1 − f1 )2
+ c2
1 + cf1
(1 + cf1 )3
E1 (X1t G1 X1 − f1 )3
(X1t G1 X1 − f1 )4
4
−c3
+
c
E
1
(1 + cf1 )4
(1 + cf1 )4 (1 + cX1t G1 X1 )
1
E1 (X1t G1 X1 − f1 )2
4/(q∧4)−3
+ c2
+
O
n
,
z
1 + cf1
(1 + cf1 )3
where we have applied Lemma 6.19 for the third term. It remains to apply Lemma 6.19 for the
second term
1
1
c2 2g1 + (β − 2)h1 (z)
4/(q∧4)−3
=
+
+
O
n
,
E1
z
1 + cX1t G1 X1
1 + cf1
n
(1 + cf1 )3
where h1 = n1 Tr(D12 ) and g1 = n1 Tr(G21 ) = n1 Tr(G1 Gt1 ). From (6.2.3), we find that
|h1 (z) − h(z)| = Oz (1/n)
and
|g1 (z) − g(z)| = Oz (1/n).
So finally, we may apply Propositions 6.12 and 6.13 and use Lemma 6.17, in (6.3.1) it leads to
1
zcEf (z) = 1 − c −
1 + cEf (z)
1
cgc
2 2gc + (β − 2)hc (z)
−3/2
4/(q∧4)−3
+
−
−
c
+
O
n
+
n
z
n
(1 + cfc )3
(1 + cfc )3
1
ϕ(z)
= 1−c−
+
+ Oz n−1−ρ ,
1 + cEf (z)
n
6.5. CONVERGENCE OF EDGE EIGENVALUES
where ϕ(z) is analytic outside [(1 −
We may rewrite this last equality as
√
c)+ )2 , (1 +
zc2 (Ef (z))2 + cEf (z)(z + c − 1 −
√
67
c)2 ] and where ρ = 2 − 4/(q ∧ (8/3)) > 0.
ϕ(z)
ϕ(z)
)+c−
= Oz n−1−ρ .
n
n
(6.5.7)
We may solve this quadratic equation. By Lemma 6.11 we have Ef (z) = fc (z) + δ(z)/n with
δ(z) = Oz (1). Putting this into (6.5.7) and keeping terms of order Oz n−1−ρ give a linear
equation satisfied by δ(z) in terms of fc , ϕ and Oz n−1−ρ . This concludes the proof of the
lemma.
68
CHAPTER 6. COVARIANCE MATRICES
Bibliography
[ABC+ 00] Cécile Ané, Sébastien Blachère, Djalil Chafaı̈, Pierre Fougères, Ivan Gentil, Florent
Malrieu, Cyril Roberto, and Grégory Scheffer. Sur les inégalités de Sobolev logarithmiques, volume 10 of Panoramas et Synthèses [Panoramas and Syntheses]. Société
Mathématique de France, Paris, 2000. With a preface by Dominique Bakry and
Michel Ledoux. 23, 24
[AGZ10]
Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random
matrices, volume 118 of Cambridge Studies in Advanced Mathematics. Cambridge
University Press, Cambridge, 2010. 13, 17, 22, 24, 25
[Bai97]
Z. D. Bai. Circular law. Ann. Probab., 25(1):494–529, 1997. 6
[BS10]
Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random
matrices. Springer Series in Statistics. Springer, New York, second edition, 2010. 3
[BVW10] Jean Bourgain, Van H. Vu, and Philip Matchett Wood. On the singularity probability of discrete random matrices. J. Funct. Anal., 258(2):559–603, 2010. 8
[BY88]
Z. D. Bai and Y. Q. Yin. Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a Wigner matrix. Ann. Probab., 16(4):1729–1741,
1988. 7, 47
[CDM07] M. Capitaine and C. Donati-Martin. Strong asymptotic freeness for Wigner and
Wishart matrices. Indiana Univ. Math. J., 56(2):767–803, 2007. 41
[CL95]
Ovidiu Costin and Joel L. Lebowitz. Gaussian fluctuation in random matrices. Phys.
Rev. Lett., 75:69–72, Jul 1995. 7
[CRT06]
Emmanuel J. Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information.
IEEE Trans. Inform. Theory, 52(2):489–509, 2006. 4
[DV11]
Sandrine Dallaporta and Van Vu. A note on the central limit theorem for the eigenvalue counting function of Wigner matrices. Electron. Commun. Probab., 16:314–
322, 2011. 7
[ER05]
Alan Edelman and N. Raj Rao. Random matrix theory. Acta Numer., 14:233–297,
2005. 4
[Erd11]
L. Erdös. Universality of Wigner random matrices: a survey of recent results.
Uspekhi Mat. Nauk, 66(3(399)):67–198, 2011. 7
[FK81]
Z. Füredi and J. Komlós. The eigenvalues of random symmetric matrices. Combinatorica, 1(3):233–241, 1981. 7, 41
69
70
BIBLIOGRAPHY
[Gir90]
V. L. Girko. Theory of random determinants, volume 45 of Mathematics and its
Applications (Soviet Series). Kluwer Academic Publishers Group, Dordrecht, 1990.
Translated from the Russian. 6
[Gui09]
Alice Guionnet. Large random matrices: lectures on macroscopic asymptotics, volume 1957 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2009. Lectures
from the 36th Probability Summer School held in Saint-Flour, 2006. 11
[Gus05]
Jonas Gustavsson. Gaussian fluctuations of eigenvalues in the GUE. Ann. Inst. H.
Poincaré Probab. Statist., 41(2):151–178, 2005. 7
[GvN47]
Herman H. Goldstine and John von Neumann. Planning and Coding of Problems
for an Electronic Computing Instrument. Institute for Advanced Study, Princeton,
N. J., 1947. 4
[GvN48a] Herman H. Goldstine and John von Neumann. Planning and Coding of Problems
for an Electronic Computing Instrument. Report on the Mathematical and Logical
Aspects of an Electronic Computing Instrument, Part II, Volume II. The Institute
for Advanced Study, Princeton, N. J., 1948. 4
[GvN48b] Herman H. Goldstine and John von Neumann. Planning and Coding of Problems
for an Electronic Computing Instrument. Report on the Mathematical and Logical
Aspects of an Electronic Computing Instrument, Part II, Volume III. The Institute
for Advanced Study, Princeton, N. J., 1948. 4
[GZ03]
A. Guionnet and B. Zegarlinski. Lectures on logarithmic Sobolev inequalities. In
Séminaire de Probabilités, XXXVI, volume 1801 of Lecture Notes in Math., pages
1–134. Springer, Berlin, 2003. 23
[HT05]
Uffe Haagerup and Steen Thorbjørnsen. A new application of random matrices:
∗ (F )) is not a group. Ann. of Math. (2), 162(2):711–775, 2005. 41
Ext(Cred
2
[KKP96]
Alexei M. Khorunzhy, Boris A. Khoruzhenko, and Leonid A. Pastur. Asymptotic
properties of large random matrices with independent entries. J. Math. Phys.,
37(10):5033–5060, 1996. 39, 47
[Led01]
Michel Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI,
2001. 21, 23, 25
[Meh67]
M. L. Mehta. Random matrices and the statistical theory of energy levels. Academic
Press, New York, 1967. 6
[MP67]
V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues in certain sets of
random matrices. Mat. Sb. (N.S.), 72 (114):507–536, 1967. 6
[Oli10]
Roberto Imbuzeiro Oliveira. Sums of random Hermitian matrices and an inequality
by Rudelson. Electron. Commun. Probab., 15:203–212, 2010. 42
[Rud13]
Mark Rudelson.
Lecture notes on non-aymptotic random matrix theory.
arXiv:1301.2382, 2013. 8
[RV08]
Mark Rudelson and Roman Vershynin. The Littlewood-Offord problem and invertibility of random matrices. Adv. Math., 218(2):600–633, 2008. 8
[Sch05]
Hanne Schultz. Non-commutative polynomials of independent Gaussian random
matrices. The real and symplectic cases. Probab. Theory Related Fields, 131(2):261–
309, 2005. 41
BIBLIOGRAPHY
71
[Sos02]
Alexander Soshnikov. Gaussian limit for determinantal random point fields. Ann.
Probab., 30(1):171–187, 2002. 7
[ST04]
Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: why
the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385–463 (electronic), 2004. 4
[TV04]
Antonia M. Tulino and Segio Verdú. Random matrix theory and wireless communications. Foundations and Trends in Communications and Information Theory. Now,
2004. 4
[TV10a]
T. Tao and V. Vu. Random matrices: universality of ESDs and the circular law.
Ann. Probab., 38(5):2023–2065, 2010. With an appendix by Manjunath Krishnapur.
6
[TV10b]
Terence Tao and Van Vu. Random matrices: the distribution of the smallest singular
values. Geom. Funct. Anal., 20(1):260–297, 2010. 8
[TV12]
Terence Tao and Van Vu. Random matrices: The universality phenomenon for
wigner ensembles. arXiv:1202.0068, 2012. 7
[TW94]
Craig A. Tracy and Harold Widom. Fredholm determinants, differential equations
and matrix models. Comm. Math. Phys., 163(1):33–72, 1994. 7
[Ver12]
Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices.
In Compressed sensing, pages 210–268. Cambridge Univ. Press, Cambridge, 2012. 8
[Wig55]
Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2), 62:548–564, 1955. 3
[Wig58]
Eugene P. Wigner. On the distribution of the roots of certain symmetric matrices.
Ann. of Math. (2), 67:325–327, 1958. 3
[Wis28]
John Wishart. The generalized product moment distribution in samples from a
normal multivariate population. Biometrika, 20(A):32–52, 1928. 3