Random matrices with independent rows or columns
Nicole Tomczak-Jaegermann
Phenomena in High Dimensions
in geometric analysis, random matrices,
and computational geometry
Roscoff, June 25–29, 2012
Nicole Tomczak-Jaegermann (U of A)
May, 2012
1 / 32
Project on random matrices with indepeendent rows or columns, norms and
condition numbers of their submatrices.
Involved in this project:
Alain Pajor and various subsets of:
Radosław Adamczak,
Olivier Guédon,
Rafał Latała,
Alexander Litvak,
Krzysztof Oleszkiewicz,
Nicole Tomczak-Jaegermann
Nicole Tomczak-Jaegermann (U of A)
May, 2012
2 / 32
Basic definitions and Notation
Let X = (X(1), . . . , X(N)) be a random vector in RN with full dimensional support.
We say that the distribution of X is
logaritmically concave, if X has density of the form e−h(x) with
h : RN → (−∞, ∞] convex (one of equivalent definitions by C. Borell)
isotropic, if EX(i) = 0 and EX(i)X(j) = δi,j .
For x ∈ RN we put
|x| = kxk2 =
P
N
i=1
x2i
1/2
PI x - canonical projection of x onto {y ∈ RN : supp(y) ⊂ I}, I ⊂ {1, . . . , N}.
For integers k 6 ` we we use the shorthand notation [k, `] = {k, . . . , `}.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
3 / 32
Examples
1. Let K ⊂ Rn be a convex body ( = compact convex, with non-empty interior)
(symmetric means −K = K).
X a random vector uniformly distributed in K. Then the corresponding probability
measure on Rn
µK (A) =
|K ∩ A|
|K|
is log-concave (by Brunn-Minkowski).
Moreover, for every convex body K there exists an affine map T such that µT K is
isotropic.
2. The Gaussian vector G = (g1 , ..., gn ), where gi ’s have N(0, 1) distribution, is
isotropic and log-concave.
3. Similarly the vector X = (ξ1 , ..., √
ξn ), where ξi ’s have exponential distribution
(i.e., with density f(t) = √12 exp(− 2|t|), for t ∈ R)
is isotropic and log-concave.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
4 / 32
Random Matrices
Let n, N > 1 be integers (a priori, no relation between them); fixed throughout.
Our interest in behaviour of invariants as functions of n, N
Random matrix: A is n × N matrix, defined either by a sequence of rows or
columns, which will be independent random vectors
...........................
A = . . . . . . . . . . . . . . . . . . . . . . . . . . . ,
...........................
..
.
A= .
..
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
May, 2012
5 / 32
Difference with RMT where entries are independent,
limiting behaviour of invariants when the size → ∞
Nicole Tomczak-Jaegermann (U of A)
Random Matrices, Norms of submatrices, Ak,m
Let k 6 n and m 6 N integers.
Ak,m = maximal operator norm over submatrices of A with k rows, m columns.
“operator norm” means the norm A : Rm → Rk with the Euclidean norms.
Example: Let X1 , . . . , Xn ∈ RN be independent random vectors.
Let A be n × N random matrix with rows X1 , . . . , Xn . This acts as an operator
A : RN → Rn
n
Ax = hXj , xi
j=1
Ak,m = sup
J⊂[1,n]
|J|=k
sup
x∈Um
X
∈ Rn ,
for x ∈ RN .
1/2
|hXj , xi|2
.
j∈J
Um = {x ∈ SN−1 : | supp x| 6 m}.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
6 / 32
Ak,m – examples: independent columns
I: A is n × N matrix defined by independent isotropic log-concave columns
..
.
A= .
..
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
Main application to approximation of a covariance matrix by empirical covariance
matrices
An,m (i.e., k = n) was sufficient. It corresponds to submatrices of full columns,
thus preserving the structure of the matrix.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
7 / 32
Approximation of a covariance matrix
Let X ∈ Rn isotropic and log-concave,
(Xi )i6N independent copies of X.
By isotropicity, EX ⊗ X = Id.
By the law of large numbers, the empirical covariance matrix converges to Id.
N
1 X
N
Xi ⊗ Xi −→ Id
as N → ∞..
i=1
Kannan-Lovász-Simonovits asked (around 1995), motivated by a problem of
complexity in computing volume in high dimension:
Under the above assumptions, estimate the size N for which, given ε ∈ (0, 1),
N
1 X
Xi ⊗ Xi − Id 6 ε
N
i=1
holds with high probability.
Typical “translation” of a limit law into a quantitative statement in the non-limit
theory.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
8 / 32
KLS question
KLS showed that for any ε, δ ∈ (0, 1) (under a finite third moment assumption),
N > (C/εδ)n2 gives the required approximation, with probability 1 − δ.
Bourgain (1996): for any ε, δ ∈ (0, 1), there exists C(ε, δ) > 0 such that
N = C(ε, δ)n log3 n gives the approximation with probability 1 − δ.
Rudelson:
using non-commutative Khinchine inequalities of Pisier and
Lust-Piquard/Pisier;
by majorazing measure approach of Talagrand.
Several other authors improved powers of logarithm, from late 1990’s to 2010
ALPT: N proportional to n is sufficient (JAMS 2010), improved in CRAS 2011.
Let X ∈ Rn be isotropic log-concave,X1 , . . . , XN be independent copies of X.
P
N
1 X
p
Xi ⊗ Xi − Id 6 C n/N
N
!
√
n
> 1 − e−c
.
i=1
So letting ε = C
p
n/N we get N = Cn/ε2 .
Nicole Tomczak-Jaegermann (U of A)
May, 2012
9 / 32
Extremal s-numbers of matrices with independent rows
As the corollary of ALPT we get
a quantitative version of Bai-Yin theorem for matrices of a fixed size:
Nicole Tomczak-Jaegermann (U of A)
May, 2012
10 / 32
Ak,m – examples: independent rows
II: A is n × N matrix, defined by independent (isotropic log-concave) rows
...........................
A = ...........................
...........................
Ak,m is harder to tackle if m < N; because the row structure of A is destroyed.
Ak,m = sup
J⊂[1,n]
|J|=k
sup
x∈Um
X
1/2
|hXj , xi|2
.
j∈J
Applicable for studies of reconstruction problems and in particular RIP; for uniform
versions of some geometric questions on large deviation estimates....
Nicole Tomczak-Jaegermann (U of A)
May, 2012
11 / 32
Large Deviation for Ak,m
Intuition: A – a random matrix with independent isotropic log-concave rows. Then
for a submatrix AJ,I with k rows and m columns,
1/2 p
EkAI,J k2
> max{k, m}.
One of main results by ALLPT is a large deviation theorem for Ak.m .
Let n, N, k 6 n, and m 6 N, let A be a n × N matrix with independent isotropic
log-concave rows. For t > 1 we have
p
P Ak,m > C t λ 6 exp(−tλ/ log(3m)),
where
λ=
p
√
log log(3m) m log
e max{N, n} m
+
en √
k log
k
and C is a universal constant.
The bound is essentially optimal up to
Nicole Tomczak-Jaegermann (U of A)
√
log log m factor.
May, 2012
12 / 32
Paouris’ large deviation theorem
Paouris’ large deviation (2005):
There exists c > 0 such that if X is an isotropic log-concave random vector in RN ,
then for all t > 1,
√ √
P |X| > c t N 6 exp(−t N).
Equivalent formulations, for pth moments, etc....
Weak parameter
For a vector X in RN we define
σX (p) := sup (E|ht, Xi|p )1/p p > 1.
t∈SN−1
Examples
√
For isotropic log-concave vectors X, σX (p) 6 p/ 2.
√
For subgaussian vectors X, σX (p) 6 C p.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
13 / 32
Paouris’ theorem with weak parameter
For any log-concave random vector X,
(E|X|p )1/p 6 C E|X| + σX (p)
for p > 2,
and if X is isotropic,
1 t
P (|X| > t) 6 exp − σ−
X
C
Nicole Tomczak-Jaegermann (U of A)
for t > C(E|X|2 )1/2 .
May, 2012
14 / 32
Uniform large deviation theorem
Uniform Paouris-type theorem [ALLPT]:
For 1 6 m 6 N and an isotropic log-concave vector X in RN we have, for t > 1,
P
√
sup |PI X| > ct m log
I⊂[1,N]
|I|=m
eN m
t√m
eN 1
p
6 exp − σ−
.
log
X
m
log(em)
If X is isotropic log-concave in RN , then so is PI X, for every I ⊂ [1, N]. However
theprobability is too high to beat the complexity of the family of subsets (which is
N
m ). So a direct union bound argument cannot be used.
Trade off of an extra logarithm in the threshold; is based on new non-trivial
estimates for order statistics.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
15 / 32
Order Statistics
For an N–dimensional random vector X by X∗1 > X∗2 > . . . > X∗N we denote the
nonincreasing rearrangement of |X(1)|, . . . , |X(N)|.
In particular, X∗1 = max{|X(1)|, . . . , |X(N)|} and X∗N = min{|X(1)|, . . . , |X(N)|}.
Random variables X∗k , 1 6 k 6 N, are called order statistics of X.
Problem Find upper bound for P (X∗k > t).
Nicole Tomczak-Jaegermann (U of A)
May, 2012
16 / 32
Order Statistics for isotropic log-concave vectors
Let X be N-dimensional log-concave isotropic vector. Then
√ 1 1
P (X∗k > t) 6 exp − σ−
t k
X
C
for t > C log
eN .
k
The weak parameter is needed for a better control of a probability for random
vectors which are sums of independent random vectors, in terms of sequences of
coefficients in these sums.
Latala (2010) proved a version without the weak parameter.
ALLPT (2012) the present version
The approach is based on the suitable estimate of moments of the process NX (t)
NX (t) :=
n
X
1{X(i)>t} ,
t > 0.
i=1
That is, NX (t) is equal to the number of coordinates of X larger than or equal to t.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
17 / 32
Estimate for NX
For any isotropic log-concave vector X and p > 1 we have
E(t2 NX (t))p 6 (CσX (p))2p
for t > C log
Nt2 .
σ2X (p)
To get estimate for order statistics we observe that X∗k > t implies that
NX (t) > k/2 or N−X (t) > k/2 and vector −X is also isotropic and log-concave.
Estimates for NX and Chebyshev’s inequality give
Cp 2p
ENX (t)p + EN−X (t)p 6 2 √
k
t k
√
1
provided that t > C log(Nt2 /p2 ). We take p = eC
t k and notice that the
restriction on t follows by the assumption that t > C log(eN/k).
P (X∗k > t) 6
2 p
Nicole Tomczak-Jaegermann (U of A)
May, 2012
18 / 32
Estimate for NX (t)
Proof of estimate for NX (t) is based on two ideas.
the restriction of a log-concave vector X to a convex set is log-concave;
Paouris’ large deviation theorem.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
19 / 32
Uniform Paouris-type estimate
For any m 6 N and any isotropic log-concave vector X in RN we have for t > 1,
P
eN t√m
eN √
−1
sup |PI X| > ct m log
6 exp − σX p
log
.
m
m
I⊂[1,N]
log(em)
|I|=m
Idea of the proof. It is easy.
sup |PI X| =
I⊂[1,N]
|I|=m
m
X
|X∗k |2
k=1
1/2
62
s−1
X
2i |X∗2i |2
1/2
,
i=0
where s = dlog2 me.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
20 / 32
Applications – reconstruction, compressed sensing
Let n, N > 1. Let T ⊂ RN and Γ be an n × N matrix.
Consider any vector x ∈ T . Assuming that Γ x is known, the problem is to
reconstruct x with a fast algorithm.
Hypothesis on T and on Γ . The common hypothesis is that T = Um .
the Restricted Isometry Property (RIP) of order m: for all m-sparse vectors x,
(1 − δ)|x| 6 |Γ x| 6 (1 + δ)|x|.
The RIP parameter:
δm = δm (Γ ) = sup |Γ x|2 − E|Γ x|2 x∈Um
Introduced by E. Candes, J. Romberg and T. Tao around 2006.
If δ2m is appropriately small then every m-sparse vector x can be reconstructed
from Γ x by the `1 -minimization method.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
21 / 32
More notation
Upper estimates for
δm = δm (Γ ) = sup |Γ x|2 − E|Γ x|2 x∈Um
More generally, for any T ⊂ SN−1 ,
δT (Γ ) = sup |Γ x|2 − E|Γ x|2 .
x∈T
Let X1 , . . . , Xn ∈ RN independent; Γ the n × N matrix with rows Xi . (In
reconstruction problems – we look for vectors given by their measurements)
Let 1 6 k 6 n and define the parameter Γk (T ) by
Γk (T )2 = sup sup
y∈T
I⊂{1,...,n}
|I|=k
X
| hXi , yi |2 .
i∈I
We write Γk,m = Γk (Um ).
It agrees with the definition introduced earlier – of Ak,m
Nicole Tomczak-Jaegermann (U of A)
May, 2012
22 / 32
Fundamental Lemma:
[ALPT, CRAS], [ALLPT]:
Let X1 , . . . , Xn ∈ RN be independent isotropic, T ⊂ SN−1 finite. Let0 < θ < 1
and B > 1. Then with probability at least 1 − |T | exp −3θ2 n/8B2 ,
δT
Γ
√
n
n
1 X
2
2 = sup (|hXi , yi| − E|hXi , yi| )
y∈T n
i=1
6θ+
1
sup
n
n
X
y∈T i=1
n
X
6θ+
1
n
!
|hXi , yi| 1{|hXi ,yi|>B}
+ sup E
y∈T
|hXi , yi|2 1{|hXi ,yi|>B}
2
i=1
Γk (T )2 + EΓk (T )2 .
where k 6 n is the largest integer satisfying k 6 (Γk (T )/B)2 .
Nicole Tomczak-Jaegermann (U of A)
May, 2012
23 / 32
Corollary for RIP:
Let Xi , Γ , 0 < θ < 1 and B > 1, as before. Assume that m 6 N satisfies
m log
Then with probability at least
δm
Γ
√
n
11eN
6
3θ2 n
.
16B2
m
2
1 − exp − 316θBn2
one has
n
1 X
2
2 = sup (|hXi , yi| − E|hXi , yi| )
y∈Um n
i=1
62θ+
2
n
Γk2,m + EΓk2,m ,
where k 6 n is the largest integer satisfying k 6 (Γkm /B)2 .
Nicole Tomczak-Jaegermann (U of A)
May, 2012
24 / 32
RIP Theorem for matrices with independent rows:
Let n, N > 1 and 0 < θ < 1. Let Γ be an n × N matrix, whose rows are
independent isotropic log-concave random vectors Xi , i 6 n.
There exists an absolute constant c > 0, such that if m 6 N satisfies
2
2
θ
3 max{N, n}
6c
n
m log log 3m log
m
log(3/θ)
then
√
δm (Γ/ n) 6 θ
with high probability.
Optimal up to a log log factor.
For unconditional distributions we know that this factor can be removed; we
conjecture that in general can be removed as well.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
25 / 32
Return to Large Deviation for Ak,m
Recall the result for Ak,m .
For n 6 N, k 6 n, m 6 N,
Ak,m = sup
J⊂[1,n]
|J|=k
sup
x∈Um
X
1/2
|hXj , xi|2
.
j∈J
Then for t > 1 we have
p
P Ak,m > C t λ 6 exp(−tλ/ log(3m)),
where
λ=
p
√
log log(3m) m log(eN/m) +
Nicole Tomczak-Jaegermann (U of A)
√
k log(en/k).
May, 2012
26 / 32
Ak,m , idea of proof
To bound Ak,m one has then to prove uniformity with respect to two families of
different character:
one being {I ⊂ [1, N] : |I| = k}; and the other equal to Um (RN ).
X1 , . . . , Xn independent isotropic N-dimensional log-concave vectors.
n
x = (x
Pi ) ∈ R with some structural assumptions, like sparsity.... we consider
Y= n
x
X
i=1 i i .
By duality we need to estimate probability that
!
n
X
sup |PJ Y| > t
xi Xi > t =
sup PJ
J⊂[1,N]
J⊂[1,N] |J|=m
i=1
|J|=m
for every t > 0, depending on the norms |x| and kxk∞ .
Complexity of these families are too high for using a union bound argument, and
so we need to come up with some chaining.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
27 / 32
Ak,m , continuation
This leads us to distinguishing two cases, depending on the relation between k
and k 0 :
k 0 = inf{` > 1 : m log(eN/m) 6 ` log(en/`)}.
Step 1. when k > k 0 . We reduce to the case k 6 k 0 .
Step 2. Case k 6 k 0 .
To build intuition we may take k 0 ∼ k.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
28 / 32
Ak,m , Step 1
Step 1. We take only the family of k-sparse vectors, but do not need projections.
n
X
sup xi Xi > t .
x∈Uk i=1
Assume first that x is a “flat” vector: xi = ±a or 0 and a = k̃−1/2 , where
k̃ = | supp(x)|. That is, |x| = 1 and kxk∞ = k̃−1/2 .
Direct argument shows that the estimate is right for such vectors..
We may have 0 < |x1 | 6 |x2 | 6 . . . |xk̃ | and xj = 0 for j > k̃, which may consist of
some number of “flat” vectors.
The first natural try is to consider separately each flat vector and then add the
results together. This works but may produce an extra logarithmic factor.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
29 / 32
Ak,m , Step 1, chaining
Chaining:
Ps
let k1 ∼ k/2, k2 ∼ k/4, . . ., ks ∼ k/2s ∼ k 0 . So j=1 kj = k 0 .
Given x ∈ Uk , let x1 be the restriction of x to the k1 smallest coordinates;
x2 be the restriction of x to the next k2 smallest coordinates, etc.
This way,
x=
s
X
xi
i=1
i
where x ’s have mutually disjoint supports, each of cardinality 6 ki , and
coordinates of xi are larger than coordinates of xj if i < j.
We use Paouris-type estimates for each xi ...
This is similar to ALPT (JAMS).
Nicole Tomczak-Jaegermann (U of A)
May, 2012
30 / 32
Ak,m , Step 2
Step 2. Another chaining argument, more delicate in definitions of ε-nets. We use
the uniform estimate for projections of sums, which in general is weaker than in
Case 1.
At this step we lose log log m.
Nicole Tomczak-Jaegermann (U of A)
May, 2012
31 / 32
Congratulations Alain!
Nicole Tomczak-Jaegermann (U of A)
May, 2012
32 / 32
© Copyright 2026 Paperzz