Non-Asymptotic Theory of Random Matrices
Lecture 6: Norm of a Random Matrix
Lecturer: Roman Vershynin
Scribe: Yuting Yang
Tuesday, January 23, 2007
1
Introduction
Let A be an m × n random subgaussian matrix. That is, its entries xij
are i.i.d. mean zero subgaussian random variables, or equivalently, they are
independent copies of a mean zero subgaussian random variable X.
We know that the operator normkAk = s1 (A) = max kAxk2 , and we want
x∈S n−1
to bound this with high probability.
• lower estimates (trivial): kAk ≥ max(kcolumnsk2 , krowsk2 )
√ √
In particular, if A is a Bernoulli matrix, then kAk ≥ max( n, m).
• matching upper estimates: norm of a random matrix ≈ norms of
rows or columns
– Asymptotic: Consider the mode when n → ∞, m
n → y. Let X
be a random variable with EX 2 = 1.(X need not be subgaussian
in this case.) The best asymptotic result is that
√
√ →1+ y
if EX 4 < ∞, then kAk
a.s.;
n
if EX 4 = ∞, then
kAk
√
n
→∞
a.s..
For more details, see [5] [2] [1] [8].
– Non-asymptotic: We have the following main theorem in this
lecture.
Theorem 1 (Upper Bound). Let A be an m×n subgaussian random matrix.
√
√
Then kAk ≤ C( n + m) with probability 1 − exp {−c(m + n)}.
Before we get to its proof, let us first pick up some useful tools.
1
2
Discretization of the Sphere
In the investigation of kAk, the difficulty is that the maximum “ max ” is
x∈S n−1
taken over infinitely many random variables. This is also called a random
process. There are various tools to bound such random processes, from the
²-net method (simplest) to the majorizing measure method (hardest). We
will work through the ²-net method in this lecture. For the majorizing measure method, see [3].
Idea: In order to reduce to a finite situation, we want first to discretize
S n−1 , or in other words, to replace the sphere with a finite subset. This is
possible because the sphere S n−1 is compact, and every compact set has a
finite ²-net.
Definition 2. An ²-net of a subset K of a Banach space is a set N such
that ∀z ∈ K, ∃x ∈ N : kz − xk ≤ ².
Note: If N is an ²-net, then every point of K is within distance ² from N ,
and K is covered by balls of radius ² centered at x ∈ N (these balls are
simply translates of D := ² BX ).
Definition 3. The minimal cardinality of an ²-net of K is called the covering number of K by ²-balls,denoted by N (K, D). In other words,
N (K, D) := minimal number of translates of D to cover K.
Remark : The covering number gives a quantitative notion of compactness.
However, the covering number is hard to compute precisely even in the
simplest case where we try to cover the unit disk by smaller disks in R2 .
So we may want to make estimates. The good news is that relatively sharp
estimates exist.
• lower estimate: Vol(K) ≤ N (K, D) · Vol(D) ⇒ N (K, D) ≥
Vol(K)
Vol(D) .
• matching upper estimate:
Proposition 4 (Covering Number). N (K, D) ≤
Vol(K)+ 21 D)
Vol( 12 D)
Proof. (constructive proof — greedy algorithm to find an ²-net of K)
We want to locate the centers in order to find the ²-net.
Start with an arbitrary point x1 ∈ K.
2
Then choose x2 ∈ K: kx2 − x1 k > ².
Then choose x3 ∈ K: kx3 − xk k > ², k = 1, 2.
...
Then choose xN ∈ K: kxN − xk k > ², k = 1, . . . N − 1.
Stop if no more points are left.
Claim: N = {x1 , . . . , xN } is an ²-net.
Suppose not. Then ∃z ∈ K such that kz − xk k > ² ∀k.
This contradicts the stopping criterion of the algorithm.
Then, if we shrink the radius of these balls to ²/2, they will be disjoint
(recall that each pair (xi , xj ) is at least ²-apart). Thus, ²/2-balls (i.e. 12 D)
centered at points of N are disjoint and contained in K + 21 D. Therefore,
1
1
Vol(K + D) ≥ N · Vol( D),
2
2
and thus
N≤
Vol(K + 21 D)
.
Vol( 12 D)
Remark :An upper bound for vol(K + 12 D) was found by Bourgain-Milman
[7].
Example 5. Let K be the unit ball BX in Rn . We want to cover the unit
ball with smaller balls D = ²BX . Note that
1
1
1
K + D = BX + ²BX = (1 + ²)BX .
2
2
2
Then, the covering number
N (K, D) ≤
(1 + 12 ²)n
( 12 ²)n
Corollary 6. Let ² ∈ (0, 1). For n-dimensional Banach spaces, the unit ball
has an ²-net of cardinality ( 3² )n .
Remark : We can follow the same procedure to formulate a statement for
the unit sphere.
Proposition 7 (Discretization of kAk). Let N be a 12 -net of S n−1 . Then
kAk ≤ 2 max kAxk2 .(For simplicity, here we let ² = 12 )
x∈N
3
Proof. Note that every z ∈ S n−1 can be written as z = x + u, where x ∈
N , kuk ≤ ² = 21 . Then, by the triangle inequality,
kAk ≤ max kAxk2 +
x∈N
But
u:
max
kuk2 ≤ 12
max
u : kuk2 ≤ 12
kAuk2
1
kAuk2 = kAk by definition. So we have
2
1
kAk ≤ max kAxk2 + kAk,
x∈N
2
and therefore,
kAk ≤ 2 max kAxk2
x∈N
Note: (Exercise) Further, we can discretize the Euclidean norm
kAxk2 =
max hAx, yi ≤ 2 maxhAx, yi,
y∈M
y∈S m−1
where M is the 21 -net of S m−1 .
Corollary 8. Let N , M be 12 -nets of S n−1 , S m−1 , respectively. Then,
kAk ≤ 4 maxhAx, yi.
x∈N
y∈M
Note: We can find that |N | ≤ 6n and |M| ≤ 6n by Corollary 6 with
² = 12 .
3
Proof of the Main Theorem
Now,we are ready to prove the main theorem that gives an upper estimate
of kAk.
Proof. Let t > 0. By Corollary 8, we have
P(kAk > t) ≤ P(maxhAx, yi >
x∈N
y∈M
Note that if maxhAx, yi >
x∈N
y∈M
t
4,
t
).
4
then ∃ x ∈ N and y ∈ M such that
t
hAx, yi > .
4
4
[
t
t
In other words, the event (maxhAx, yi > ) ⊆
(hAx, yi > ). Hence,
x∈N
4
4
x∈N
y∈M
y∈M
X
t
t
)≤
P(hAx, yi > )
4
4
P(maxhAx, yi >
x∈N
y∈M
x∈N
y∈M
Now,fix any x ∈ N , y ∈ M, and note that hAx, yi is a random variable.
Claim: hAx, yi is a subgaussian random variable.
Let x = (x1 , . . . , xn ) ∈ Rn , and let Xij denote the ij th entry in the
random matrix A. Then, the coordinates of Ax are
n
X
Xij xij .
j=1
Notice that x ∈
S n−1
so
n
X
xj = 1. Also, Xij are independent mean zero
j=1
subgaussian random variables.
Then, by Theorem 5 from last lecture, the coordinates of Ax are independent
subgaussian random variables with mean zero. Then,
hAx, yi =
n
X
(Ax)i yi ,
i=1
where
n
X
yi = 1. Using the same theorem again, we see that hAx, yi is sub-
i=1
gaussian.
Now, knowing that hAx, yi is subgaussian, we have, by definition,
P(hAx, yi >
Then,
X
x∈N
y∈M
P(hAx, yi >
t
) ≤ C exp (−ct2 ).
4
t
) ≤ |N | · |M| · C exp (−ct2 ) ≤ 6m+n · C exp (−ct2 )
4
√
√
Thus,with t = C( n + m), we obtain the desired result.
5
Remark :
1. This proof is due to Litvak-Pajar-Rudelson-Tomczak-Vershynin [6].
2. The entries of A must essentially be subgaussian for this result to hold.
(Slight fluctuation is allowed)
¯
¢
¡¯
3. Open problem: (fluctuations) P ¯kAk − EkAk¯ > t ≤ ?
• Best result by Meckes [4]
• If A is gaussian, then EkAk ≤
√
n+
√
m.
References
[1] Soshnikov A. Gaussian limit for determinantal random point fields. Annals of Probability, 30:171, 2002.
[2] Silverstein J.W. On the weak limit of the largest eigenvalue of a large
dimensional sample variance matrix. Journal of Multivariate Analysis,
30:1, 1989.
[3] Talagrand M. Majorizing measures:the generic chaining. Annals of Probability, 24:1049, 1996.
[4] Meckes M.W. Concentration of norms and eigenvalues of random matrices. Journal of Functional Analysis, 211:508, 2004.
[5] Geman S. A limit theorem for the norm of random matrices. Annals of
Probability, 8:252, 1980.
[6] A. Litvak ; A. Pajor; M. Rudelson; N. Tomczak-Jaegermann. Smallest
singular values of random matrices and geometry of random polytopes.
Advances in Mathematics, 195:491, 2005.
[7] J. Bourgain; V.D.Milman. New volume ratio properties for convex symmetric bodies in rn . Invent.Math., 88:319, 1987.
[8] Z.D.Bai ; J.W.Silverstein ; Y.Q.Yin. A note on the largest eigenvalue of
a large dimensional sample covariance matrix. Journal of Multivariate
Analysis, 26:166, 1988.
6
© Copyright 2026 Paperzz