Lecture 3 — November 4th, 2010 1 A construction of

236620(Group 10): Advanced Topics in Algorithms
Fall 2010
Lecture 3 — November 4th, 2010
Lecturer: Nir Ailon
1
Scribe: Tomer Koren
A construction of 4-wise independent random variables
We have seen in lecture 2 that random matrices used for dimensionality reduction (as in the JL
lemma) need not be “completely” random. Specifically, it is sufficient that the entries of each row
of such random matrix would be 4-wise independent, with any two rows independent of each other
(which is a considerably weaker requirement than “complete” independence). We shall describe
an explicit construction of 4-wise independent random bits, which will allow us to generate such
random matrices using only a logarithmic amount (in the size of the matrix) of “randomness”.
For the construction, we shall need some basic results from Finite Fields theory and from the theory
of BCH Codes.
1.1
Basic definitions of Finite Fields theory
It is well known that for any prime number p, the set {0, 1, . . . , p − 1} with arithmetics performed
modulo p form a field of order p, which is denoted by GF (p). In fact, for every prime number
p and natural number k there exists a finite field of order pk . Moreover, this field is essentially
unique (that is, up to isomorphism). The field of order pk is denoted by GF (pk ), and p is called
the characteristic of the field.
We can construct the field GF (pk ) by taking an irreducible polynomial of degree k over GF (p), and
performing arithmetics modulo this polynomial. The elements of this field would be the polynomials
over GF (p) of degree < k (there are indeed pk of them).
Example. Consider the irreducible polynomial f (x) = x3 + x + 1 over the field GF (2). Noting
that x3 = x + 1, we see that in GF (23 ),
(1 + x) · x2
= x2 + x3 = x2 + x + 1,
(1 + x) · (x2 + x) = x2 + x + x3 + x2 = x + 1 + x = 1.
In fact, we got that (1 + x)−1 = x2 + x.
We will need the following nice property of a field of characteristic p.
Lemma 1. If F is a field of characteristic p and m is a natural number, then
m
m
(x + y)p = xp + y p
m
1
for all
x, y ∈ F.
1.2
A (dual) BCH Code matrix
Our goal here is to construct a matrix over GF (2) of size O(d log n) by n, with the property that
any set of d = 2t + 1 of its columns is linearly independent.
Take n = 2k − 1 and let x1 , x2 , . . . , xn be the non-zero elements of GF (2k ), represented as column
vectors of length k of coefficients (of the corresponding polynomials over GF (2)). Consider the
following kt + 1 by n matrix over GF (2):




H=



1
x1
x31
..
.
1
x2
x32
..
.
1
...
...
..
.
1
xn
x3n
..
.
x2t−1
x22t−1 . . .
1
x2t−1
n








Lemma 2. Any set of d = 2t + 1 columns of H is linearly independent over GF (2).
Proof. Assume that we have di=1 zi Hji = 0 for some j1 < · · · < jd and zi ∈ GF (2), where Hj
denotes the jth column of H. By the construction of H, this means that
P
d
X
zi xsji = 0
(1)
i=1
for all s = 0, 1, 3, . . . , 2t − 1. Using Lemma 1 we have
d
X
!2a
zi xsji
=
d
X
a
zi x2ji s = 0
i=1
i=1
for all s = 0, 1, 3, . . . , 2t − 1 and a ≥ 1. But every natural number can be written as a product of
an odd number and a power of 2, so (1) in fact holds for all 0 ≤ s ≤ 2t. We arrive at the linear
system Az = 0 of d equations and d variables with z = (z1 , z2 , . . . , zd )t and Aki = xkji . The matrix
A is a d × d Vandermonde matrix, which is nonsingular. Thus the only solution is the trivial one,
i.e. zi = 0 for all i = 1, . . . , d, completing the proof.
1.3
The construction
We are now ready to describe the construction of 4-wise independent n random variables, using
only O(log n) random bits, which is given in the following lemma.
Lemma 3. Assume that the m × n matrix H has the property that any set of d of its columns is
linearly independent. Let z1 , . . . , zm be iid random variables over GF (2) with Pr[zi = 0] = Pr[zi =
1] = 21 , and let
(y1 , . . . , yn ) = (z1 , . . . , zm )H.
(2)
Then the random variables y1 , . . . , yn are d-wise independent with Pr[yi = 0] = Pr[yi = 1] = 21 .
2
Proof. Take the columns Hi1 , . . . , Hid of H (that are linearly independent) and note that the linear
system



L99 Hit1 99K
x1

 . 
..

 .  = u
.
.


L99 Hitd
99K
xm
over GF (2) has 2m−d different solutions for any vector u. This is because the rank of the coefficient
matrix is d, so its kernel is of dimension m − d (over GF (2)). Therefore, any assignment of the
variables yi1 , . . . , yid given by (2) occurs equally often, that is, with probability 2−d . This implies
that the random variables yi1 , . . . , yid are independent with Pr[yij = 0] = Pr[yij = 1] = 21 , and
completes the proof.
In section 1.2 we saw that we can construct a (deterministic) matrix H with m = O(log n) that
suits the last lemma with d = 5. So, by generating O(log n) random bits for zi and calculating
y1 , . . . , yn using (2), we obtain n random variables that are 4-wise independent (actually, they are
5-wise independent).
2
Talagrand Inequality
Talagrand inequality bounds the deviation of a function f defined on the binary cube {0, 1}n from
its mean in terms of the Lipschitz constant of a convex extension f˜ of f to [0, 1]n . Let us first recall
the notion of a Lipschitz constant of a real function.
Definition 4. Let f : X → R be a real function. A number σ is a Lipschitz constant of f w.r.t. a
metric dX on X if for all x1 , x2 ∈ X,
|f (x1 ) − f (x2 )| ≤ σdX (x1 , x2 ).
Next, denote by Ω the probability space over the binary cube {0, 1}n with the uniform distribution.
The statement of Talagrand Inequality is as follows.
Theorem 5 (Talagrand Inequality). Let f be a function f : Ω → R, and let f˜ be a convex1
extension of f to [0, 1]n with Lipschitz constant σ w.r.t. the `2 metric. Then for any t > 0,
Pr [f ≥ Ef + t] ≤ K exp(−Ct2 /σ 2 )
(3)
where K, C > 0 are absolute constants.
This inequality has a similar form to an inequality of McDiarmid, that we have encountered in
previous lectures.
Theorem 6 (McDiarmid Inequality). Let f be a function f : Ω → R with Lipschitz constant c
w.r.t. the Hamming metric. Then for any t > 0,
Pr [f ≥ Ef + t] ≤ exp(−2t2 /nc2 ).
(4)
However, note that Talagrand’s result is stronger: according to McDiarmid, f “looks like a Gaussian” with variance of O(n), while according to Talagrand this Gaussian has variance of O(1).
1
a real function f defined on some convex subset D of a vector space is called convex if f (αx1 + (1 − α)x2 ) ≤
αf (x1 ) + (1 − α)f (x2 ) for every x1 , x2 ∈ D and 0 ≤ α ≤ 1.
3
References
[AL09] N. Ailon and E. Liberty. Fast dimension reduction using Rademacher series on dual BCH
codes. Discrete and Computational Geometry, 42(4):615–630, 2009.
[AS00] N. Alon and J.H. Spencer. The probabilistic method. Wiley-Interscience, 2000.
[LT91] M. Ledoux and M. Talagrand. Probability in Banach Spaces: isoperimetry and processes.
Springer, 1991.
4