Pseudorandomness for permutation and regular branching programs

Pseudorandomness for permutation and regular branching programs
Anindya De∗
March 6, 2013
Abstract
In this paper, we prove the following two results about the INW pseudorandom generator
• It fools constant width permutation branching programs with a seed of length O(log n ·
log(1/)).
• It fools constant width regular branching programs with a seed of length O(log n·(log log n+
log(1/)).
The results match the recent results of Koucký et al. (STOC 2011) and Braverman et al. and
Brody and Verbin (FOCS 2010). We improve the dependence of the seed on the width for
permutation branching programs. Probably, far more importantly, our work proceeds by analyzing the singular values of the stochastic matrices that arise in the transitions of the branching
program which hopefully makes its ambit bigger.
As a corollary of our techniques, we present new results on the “small biased spaces for group
products” problem [MZ09]. We get a pseudorandom generator with seed length log n · (log |G| +
log(1/)). Previously, using the result of Koucký et al. , it was possible to get a seed length of
log n · (|G|O(1) + log(1/)) for this problem.
Keywords: Pseudorandom generators, Permutation branching programs, expander products
∗
Computer Science Division, University of California, Berkeley, CA, USA. [email protected].
1
Introduction
One of the most fundamental questions in complexity theory is whether one can save on computational resources like space and time by using randomness. While it is known that randomness
is indispensable in settings like cryptography and distributed computation, a long line of research
[Yao82, BM84, NW94, IW97] has shown that assuming appropriate lower bounds on the circuit
complexity of some functions, one can derandomize every randomized polynomial time algorithm
i.e. show P = BPP.
Unfortunately, it has also been shown that any non-trivial derandomization of BPP implies circuit
lower bounds [KI04] which seem out of reach of the present state of the art. Thus, getting unconditional derandomization of complexity classes like BPP and MA look out of reach for current
techniques. This has led to a shift of focus towards derandomization of “low-level” complexity
classes where one can hope to get uncondtional results. One of the most important problems in
this line of research is to derandomize bounded space computation. The ultimate aim of this line of
research is to prove RL = L i.e., to show that any computation that can be solved in randomized
logspace can be simulated in deterministic logspace. Savitch [Sav70] showed that RL ⊂ NL ⊂ L2
i.e. randomized logspace computation and in fact, non-deterministic logspace computation can
be simulated deterministically in O(log2 n) space. Nisan [Nis92] also showed that RL ⊂ L2 by
constructing a pseudorandom generator (PRG) which can stretch a seed of length O(log2 n) into n
bits that fools logspace machines. In fact, Nisan’s PRG fools read-once branching programs (which
we define next) of polynomial length and width.
Definition 1.1 A read-once branching program (BP) of width w and length n is a directed multilayer graph with n + 1 layers such that each layer has w nodes with edges going from the ith layer to
the (i + 1)th layer (0 ≤ i ≤ n − 1). For every node (except those in the last layer), there are exactly
two edges leaving that node with one marked 0 and the other marked 1. There is a designated start
state in the first layer and a set of “accepting” states in the (n + 1)th layer.
Remark 1.2 In this paper, whenever we refer to branching programs, we mean read-once branching programs. We note that if the read-once restriction is not imposed, then in fact, width 5
branching programs capture NC1 [Bar89]. A BP is said to be a permutation branching program
(PBP) if for any layer the transitions corresponding to 0, (resp. 1) are matchings. A BP is said to
be a regular branching program (RBP) if the number of edges coming into every node is either 0
or 2.
A given BP accepts an input x ∈ {0, 1}n if starting from the start state and following the path
specified by the input, it ends in one of the accepting states. It is not hard to see that randomized
log space computation is a uniform version of BPs with w = nO(1) . Coming back to PRGs for
branching programs, after [Nis92], several papers [INW94, RR99] improved on some parameters of
the construction in [Nis92]. However, improving on the O(log2 n) seed remained (and continues to
remain) open in the following minimal sense: It is not known how to construct a PRG with seed
length o(log2 n) seed with constant error for width 3 BPs.
Faced with this difficulty, research was focussed on solving special cases of this problem with better
seed lengths ([Lu02],[LRTV09],[GMRZ10]). Attention has also been directed towards getting better
seed length when there is some structural restriction on the BPs. In particular, when the branching
program is regular, then Braverman et al. [BRRY10] and Brody and Verbin [BV10] constructed
pseudorandom generators with seed length O(log n·(log log n+log(1/))) which fool constant width
1
branching programs with error . The dependence of the seed length on width was better for the
latter paper which obtained O(log n · (log w + log log n + log(1/))).
Koucký, Nimbhorkar and Pudlák [KNP10] improved on this further for the case of permutation
branching programs. In particular, for fooling constant width permutation branching programs with
error , they got a seed length of O(log n · log(1/)). In their analysis, they transform this problem
into the language of group products1 and then analyze their construction using basic properties
of groups. In this vein, we should also mention that S̆ı́ma and Z̆ák recently got a breakthrough
by constructing a hitting set with seed length O(log n) for width-3 branching programs [vv10].
However, their techniques seem totally disjoint from other works in this line of research.
1.1
Our results
We present a pseudorandom generator which -fools permutation branching programs of length n
and width w using a seed of length O(log n · (w8 + log(1/)). The PRG we use is the INW generator
[INW94] (as was in the previous works [BV10, BRRY10, KNP10]). We remark that Koucký et al.
obtained a seed length of O(log n · (w! + log(1/)) for fooling permutation branching programs using
the same generator. What we consider interesting is that our analysis is based on analyzing spectra
of the stochastic matrices that arise in the transitions of the branching program. Thus, we see it is
a more combinatorial approach which might be helpful in other contexts as well. This is in contrast
to the result of Koucký et al. which is based on a group theoretic approach and is thus difficult to
adapt to more combinatorial settings.
Our techniques also shows that the IN W generator -fools regular branching programs of length n
and width w using a seed of length O(log n · (w log log n + log(1/)). Our analysis is based on linear
algebra in contrast to the analysis of Braverman et al. (which was based on information theoretic
ideas) and Brody and Verbin (based on combinatorial ideas).
We also consider the small-bias spaces problem for group products, first considered by Meka and
Zuckerman in [MZ09]. Both the problem and our results on it are stated in Section 4. We next
discuss the INW generator in detail and the main idea behind our improved analysis.
2
Technical overview
2.1
Impagliazzo-Nisan-Wigderson generator
The PRG used in this paper is the construction of Impagliazzo, Nisan and Wigderson [INW94]
(hereafter referred to as the INW generator). We now describe their construction. First, let us
recall the following important fact about construction of expander graphs [RVW00].
Fact 2.1 For any n and λ > 0, there exists graphs on {0, 1}n with degree d = (1/λ)Θ(1) and second
eigenvalue λ such that given any vertex x ∈ {0, 1}n and an edge label i ∈ [d], the ith neighbor of x
is computable in nΘ(1) time.
The INW generator is defined recursively as follows. Let Γ0 : {0, 1}t → {0, 1} be simply a function
which maps a t bit string to its first bit. Assume Γi−1 : {0, 1}m → {0, 1}` . Then Γi : {0, 1}m+log d →
{0, 1}2` is defined as follows. Let x = y ◦ z ∈ {0, 1}m+log d such that y is m bits long and z is log d
1
we discuss this later
2
bits long. Let H be a graph on 2m vertices constructed using Fact 2.1. Let y 0 be the z th neighbor
of y in H. Then Γi (x) = Γi−1 (y) ◦ Γi−1 (y 0 ). Here and elsewhere, ◦ is used to denote concatenation.
i
From the above, one can easily see that Γi : {0, 1}t+i log d → {0, 1}2 . As we can put t to be anything,
we get that Γlog n : {0, 1}log n log d → {0, 1}n . As d = (1/λ)Θ(1) , the INW generator stretches a seed
of length O(log n · log(1/λ)) to n bits.
Remark 2.2 It is possible and will in fact be necessary for us (in Section 4) to define the INW
generator which produces elements from a bigger alphabet. The construction in this case is as
follows : we assume that we want to produce elements from some set G. For some t ≥ log |G|,
let Γ0 : {0, 1}t → G be simply a function whose output is the first log |G| bits and interprets it as
an element of G. Assume Γi−1 : {0, 1}m → G` . Then Γi : {0, 1}m+log d → {0, 1}2` is defined as
follows. Let x = y ◦ z ∈ {0, 1}m+log d such that y is m bits long and z is log d bits long. Let H
be a graph on 2m vertices constructed using Fact 2.1. Let y 0 be the z th neighbor of y in H. Then
Γi (x) = Γi−1 (y) ◦ Γi−1 (y 0 ).
i
From the above, one can easily see that Γi : {0, 1}t+i log d → G2 . As we can put t to be anything
as long as it is at least log |G|, we get that Γlog n : {0, 1}log |G|+log n log d → Gn . As d = (1/λ)Θ(1) ,
the INW generator stretches a seed of length O(log |G| + log n · log(1/λ)) to n bits.
We now get back to the analysis of the INW generator. For the purposes of this discussion, we
assume that the INW generator is producing bit strings as opposed to elements in G.
2.2
Analysis of INW generator in terms of stochastic matrices
To understand the analysis of the INW generator from [INW94] as well as the improvements in
this paper, it is helpful to look at branching programs from the following viewpoint. Assume that
the branching program is of width w. Then states in every layer can be numbered from 1 to w.
Also for x, y ∈ [w] and b ∈ {0, 1}, we introduce the notation (x, i, b)
(y, i + 1) if there is an edge
labelled b going from vertex x in layer i to vertex y in layer i + 1. Now, for every layer 0 ≤ i < n,
we can introduce two stochastic matrices M0i and M1i (we interchangeably call them walk matrices
as well) which are defined as
(
1 if (x, i, b)
(y, i + 1)
Mbi (y, x) =
0 otherwise
Now, assume that we start with a probability distribution x ∈ Rw over the states in the 0th layer
and then Q
the string chosen is y, then the probability distribution on the states in the final layer is
n
given by n−1
i=0 Myi i · x. Since any string is chosen with probability 1/2 , the final distribution is
given by
!
n−1
X Qn−1 Mx i · x
Y M0i + M1i i
i=0
=
·x
n
2
2
n
i=0
x∈{0,1}
If instead, the y’s are drawn from a distribution D, then the distribution on the final layer will be


n−1
X
Y

D(y)
Myi i  · x
y∈{0,1}n
i=0
3
Thus, our aim is to find a distribution D which can be sampled with a few bits of randomness and


!
n−1
n−1
X
Y
Y
M0i + M1i
≤

D(y)
Myi i  −
2
y∈{0,1}n
i=0
i=0
In the above, we do not specify the norm we use but actually any norm works for us (like the
Frobenius norm). This is because for a constant sized matrix, all these norms are within a constant
factor of each other. We now define and understand the concept of expander-product of distribution
of matrices.
2.3
Comparison of the product and the expander product of matrices
We first recall that the notion of operator norm of a matrix M ∈ Cn×n denoted by kM k2 is defined
as
kx · M k
kM k2 = sup
kxk
x∈Cn
An important property of the operator norm is that it is submultiplicative i.e., kX · Y k2 ≤
kXk2 kY k2 . Another important property, which shall be useful to us is the following fact.
Fact 2.3 For any stochastic matrix M ∈ Rn×n and kM k2 ≤ 1.
Proof: Note that
P for every i, j ∈ [n], Mij ≥ 0. Further,
P because M is a stochastic matrix, hence,
for every j ∈ [n], i Mij = 1. Now, note that (x · M )j = i xi · Mij . Hence, we have
2

kx · M k22 =
X
X

j∈[n]
xi · Mij  ≤
i∈[n]
X X
Mij · x2i =
j∈[n] i∈[n]
X
x2i = 1
i∈[n]
The inequality in the above is an application of Cauchy-Schwarz inequality.
Let us assume that Γ1 , Γ2 : {0, 1}r → {0, 1}n and ρ1 , ρ2 : {0, 1}n → Cm×m . Assume that
kρ1 (x)k2 , kρ2 (x)k2 ≤ 1 for all x ∈ {0, 1}n (this will be the case throughout this paper). Consider the following two sums;
A=
X
x∈{0,1}r
1
ρ1 (Γ1 (x))
2r
B=
X
x∈{0,1}r
1
ρ2 (Γ2 (x))
2r
(1)
Then the product of A and B is given by
X
x,y∈{0,1}r
1
ρ1 (Γ1 (x)) · ρ2 (Γ2 (y)) = A · B
22r
We now consider a 2d regular graph H on {0, 1}r with second eigenvalue bounded by λ. We define
the expander product as
A ·H B =
1
X
x∈{0,1}r ,(x,y)∈E(H)
4
ρ1 (Γ1 (x))
2r+d
· ρ2 (Γ2 (y))
We note that without specifying the functions ρi , Γi , it is not possible to concretely define A ·H B.
However, specifying all the parameters, makes the definitions and applications cumbersome. So,
we sacrifice some accuracy for the sake of clarity. The ρi ’s and Γi ’s will be clear from the context.
The relation between the above definitions and the INW generator and branching programs is
m
obvious. Let us consider a branching program of length 2m+1 . Now, let Γm : {0, 1}t → {0, 1}2
m
be
Q the instantiation of theQINW generator which stretches t bits to 2 bits. Let us define ρ1 (x) =
i<2m Mxi i and ρ2 (x) =
i<2m Mxi (i+2m ) . Hence, A and B correspond to the walk matrices for
the first and the second half of the branching program provided the input to both the halves is
sampled from the output of Γm . Now, if we use independent seeds for the first and the second
applications of Γm , then the transition matrix for the entire branching program is A · B.
On the other hand, we can have another application of the INW generator and hence define Γm+1 :
m+1
{0, 1}t+d → {0, 1}2
as Γm+1 (x, j) = Γm (x) ◦ Γm (H(x, j)) where H(x, j) denotes the j th neighbor
of x in H. In this case, it is easy to see that the transition matrix corresponding to the entire
branching program when the input is sampled from the output of Γm+1 is A ·H B.
Lemma 2.4 Let ρ1 , ρ2 : {0, 1}m → Cw×w such that ∀x ∈ {0, 1}m , j ∈ {1, 2}, kρj (x)k2 ≤ 1. Let
Γ1 , Γ2 : {0, 1}t → {0, 1}m . Let H be a 2d -regular graph on {0, 1}t with second eigenvalue λ. Then,
for A and B as defined in (1),
kA · B − A ·H Bk2 ≤ λ
Also, we have the following observation :
• If for all x, ρ1 (x) and ρ2 (x) are identity on subspace W , then both A · B and A ·H B are also
identity on subspace W . By a matrix M being identity on subspace W , we mean M : W ⊥ →
W ⊥ and for every x ∈ W , x · M = M · x = x (W ⊥ is the orthogonal complement of W ).
t
Proof: Let us define X, Y ∈ Cw×w2 as follows. Both X and Y are divided into 2t blocks such
that the ith block in X is ρ1 (Γ1 (i)) and the ith block in Y is ρ2 (Γ2 (i)). Also, let us define matrices
t
t
Λ1 , Λ2 ∈ Cw2 ×w2 as follows. Λ1 = ΛH ⊗ Id and Λ2 = ΛK ⊗ Id where Id is the w × w identity
matrix, ΛH is the random walk matrix for graph H and ΛK is the random walk matrix for a clique
on 2t vertices. Here ⊗ denotes the tensor product of matrices. Then,
A · B − A ·H B =
X · (Λ1 − Λ2 ) · Y t
2t
However, we also note that by definition of second eigenvalue of ΛH (and that the graph H is
regular), we derive that the largest singular value of C = ΛH − ΛK is bounded by λ. Since, the
largest singular value of tensor product of two matrices is given by the product of the largest
singular values of the two matrices, we get that the largest singular value of Λ1 − Λ2 is at most λ.
This implies that kΛ1 − Λ2 k2 ≤ λ. Now, observe that
X · (Λ1 − Λ2 ) · Y t
X
Yt
√
√
=
·
(C
⊗
Id)
·
2t
2t
2t
√
Also,√as each block of X is a matrix whose norm is at most 1, hence kX/ 2t k2 ≤ 1. Similarly,
kY t / 2t k2 ≤ 1. Putting everything together, we prove the claim. The observation trivially follows
from the assumptions about H and ρi ’s.
5
2.4
Basic error analysis of the INW generator
We would now like to put an upper bound on the probability with which the branching program can
distinguish between the output of the INW generator and the uniform distribution. Equivalently,
let M be the average walk matrix between the 0th and the nth layer when the input is uniformly
random. Let M̃ be the average walk matrix when the input is chosen from the output of the INW
generator. We would like to put an upper bound on kM − M̃ k2 .
In order to do this, we will consider two trees. Both of these will be full binary trees. The leaf
nodes are numbered 1 to n from left to right. The first tree represents the average walk matrix (at
various points in the branching program) when the input is sampled from the output of the INW
generator. The second tree represents the average walk matrix (at various points in the branching
program) when the input is uniform. We call the first tree the pseudo tree and the second one the
true tree. Without loss of generality, we consider n to be a power of 2. Consider any node x (in
any of the trees) at height m. Assume that the leaves in the subtree rooted at x are numbered
m
{i, . . . , j = i + 2m − 1}. Also, let us define Γm : {0, 1}t → {0, 1}2 to be the INW generator which
m
m
stretches t bits Q
into 2m bits. Let Γ0m : {0, 1}2 → {0, 1}2 be the identity function.2 . Further, we
define ρx (w) = i≤t≤j Mwt t . The labeling L(x) is then as follows
(P
L(x) =
2m
Pw∈{0,1}
1
ρ (Γ0 (w))
22m x m
1
w∈{0,1}t 2t ρx (Γm (w))
if x is in the true tree
if x is in the pseudo tree
Thus, with the above labeling, L(x) is simply the average walk matrix to go from layer i to layer
i + 2m when the input is sampled from the uniform distribution (in case of the true tree) or the
output of the INW generator (in case of the pseudo tree). We adopt the following convention.
Whenever, we talk about a node x in the tree, it refers to the corresponding nodes in both the true
and the pseudo tree. To refer to the corresponding node in the true tree, we refer to it as xt and
for the pseudo tree, we refer to it as xp . We now observe the following : Let x be a node and y and
z be its left and right children.
• L(xt ) = L(yt ) · L(zt )
• L(xp ) = L(yp ) ·H L(zp )
Claim 2.5 Let x be a node at height t. Then kL(xt ) − L(xp )k2 ≤ 2 · 2t λ.
Proof:
Clearly, the claim holds when t = 0. We assume it holds for t ≤ t0 and prove it for
t = t0 + 1. Let x be at height t0 + 1 and its children be y and z at height t0 .
kL(xp ) − L(xt )k2 = kL(xp ) − L(yp )L(zp ) + L(yp )L(zp ) − L(yt )L(zt )k2
≤ kL(xp ) − L(yp )L(zp )k2 + kL(yp )L(zp ) − L(yt )L(zt )k2
≤ kL(yp ) ·H L(zp ) − L(yp )L(zp )k2 + kL(yp )k2 kL(zp ) − L(zt )k2 + kL(zt )k2 kL(yp ) − L(yt )k2
≤ λ + 2 · 2t0 λ < 2 · 2t0 +1 λ
2
For a bit string w of length n and position t ∈ [n], wt represents the tth bit of w
6
In the above analysis, we use Fact 2.3. The above claim clearly shows that to have a total error of , it
suffices to have λ = 2/n which means that the INW generator has seed of length O(log n·log(n/)).
The more important aspect of the above analysis is that while we pessimistically assume that
kL(yp )k2 , kL(zt )k2 are as large as 1, but in general it can be much smaller. In particular, if they
are both bounded by say 1/3, then the error will not increase with height. Of course, in general,
this is not true and it can be the case that kL(yp )k2 = kL(zt )k2 = 1 but then we show that in fact,
there is no error incurred when one does expander product versus true product ! 3
It is interesting to note that the results in [BRRY10, BV10, KNP10] as well as our result beats
the above naive analysis for regular (or permutation) branching programs by a clever analysis of
the INW generator. In contrast, results in [Lu02, LRTV09, GMRZ10] which are directed towards
“symmetric functions” use a combination of hash functions and INW generator. It seems hard
to use hash functions for general branching programs because the main purpose of hashing in
these constructions is to “rearrange weights so that they are evenly spread out”. However, unless
one is guaranteed that the function being computed by the branching program is invariant under
permutations, it seems impossible to use hash functions.
2.5
Organization of the paper
Section 3 considers the problem of fooling group products over an abelian group. While technically
much simpler than the subsequent section on fooling permutation branching programs, the analysis
gives the intuition on how to improve the analysis of the INW generator for general permutation
branching programs. Also, the seed length we achieve for the problem is incomparable to the
previous best known seed length for the same problem. Section 4 considers the problem of fooling
group products over small biased groups. We improve on the previous best result for this problem
[MZ09]. Section 5 presents a PRG with seed length O(log n · (w8 + log(1/)) for permutation
branching programs of width w and length n. Section 6 presents a PRG with seed length O(log n ·
(w log log n + log(1/)) for regular branching programs of width w and length n.
We would like to highlight that while the results in the following section about fooling abelian
group products are particularly easy to prove, they are nevertheless important for two reasons.
One is that they highlight some of the important ideas which will later be used to analyze general
permutation branching programs. The second important thing is that the complexity of the analysis
in [KNP10] seems to stem from the fact (as they themselves remark) that a group may possess
non-trivial subgroups. Our analysis shows that in fact most of the complexity in analyzing general
permutation branching programs comes from the non-commutativity of the group rather than
existence of proper subgroups.
3
Fooling abelian group products using the INW generator
Assume that we have been given an abelian group G and g0 , . . . , gn−1 ∈ G. Further, for a, b ∈ G,
a · b represents the group operation applied on a and b. Let us also define
(
1 if x = 0
gx =
g if x = 1
Consider the distribution over group G obtained by sampling x0 , . . . , xn−1 uniformly at random
xn−1
and considering the product g0x0 · g1x1 . . . · gn−1
. The following is the main theorem of this section.
3
This is not exactly true but we make it precise later
7
Theorem 3.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = /|G|7 . Consider the
distribution
xn−1
: (x0 , . . . , xn−1 ) ∼ Γ(Ut )}
D = {g0x0 · g1x1 . . . · gn−1
x
n−1
: (x0 , . . . , xn−1 ) ∼ Un }
D0 = {g0x0 · g1x1 . . . · gn−1
Then D and D0 are close in statistical distance.
As the seed length required for the INW generator is O(log n · log(1/λ)), we get the following
corollary.
Corollary 3.2 There exists a polynomial time computable function Γ : {0, 1}t → {0, 1}n where
t = O(log n · (log m + log(1/)) such that for every abelian group G of size m and g1 , . . . , gn ∈ G,
the distributions D and D0 defined as
x
n−1
: (x0 , . . . , xn−1 ) ∼ Γ(Ut )}
D = {g0x0 · g1x1 . . . · gn−1
x
n−1
: (x0 , . . . , xn−1 ) ∼ Un }
D0 = {g0x0 · g1x1 . . . · gn−1
are -close in statistical distance.
In order to prove theorem 3.1, we first need to go over some basic Fourier analysis.
Definition 3.3 A character χ : G → C∗ is a group homomorphism i.e., for x, y ∈ G, χ(x · y) =
χ(x)χ(y). Any abelian group G has |G| distinct characters including the trivial character which
maps every element to 1.
Definition
3.4 For a distribution D : G → [0, 1] and a character χ : G → C∗ , we define D̂(χ) =
P
x∈G χ(x)D(x). Note that this differs from the standard definition of fourier coefficient of a
function by a normalization factor.
For any element g ∈ G, consider the matrix Rg which is defined as follows :
(
1 if xy −1 = g
Rg (x, y) =
0 otherwise
First of all, we observe that all the matrices of the form Rg commute with each other. This is
because the underlying group is a commutative group. This implies that they are simultaneously
diagonalizable in some basis. In fact, Rg = Γ · ρ(g) · Γ−1 where Γ is a unitary matrix and ρ(g) is a
diagonal matrix.
Rg = Γ diag[χ1 (g), . . . , χ|G| (g)] Γ−1 where χ1 , . . . , χn are the different characters
Now, note that we can define the group products problem as a branching program. More precisely,
we have |G| states at every level corresponding to each of the group elements. From level i to level
i + 1, if the input is 0, then ∀g, g 7→ g. If the input is 1, then ∀g, g 7→ ggi . Therefore, the walk
matrices are M0i = Id (Id is the |G| × |G| identity matrix) and M1i = R(gi ). We now make the
following observation.
Observation 3.5 Let xp be the root node of the pseudo tree and xt be the root node of the true tree
with the parameters of the pseudo tree same as in Theorem 3.1. Also, the walk matrices at the ith
step are M0i and M1i as p
defined above. Let L(xp ) and L(xt ) be the labels of xp and xt respectively.
If ||L(xp ) − L(xt )||2 ≤ / |G|, then Theorem 3.1 follows.
8
Proof: Note that distribution obtained in case of the pseudo tree is given by e1 · L(xp ) where
e1 is the standard unit vector with 1 at the position of the group identity and 0 everywhere else.
Similarly, the distribution in case of the true tree is e1 · L(xt ). Note that the statistical distance
between the two distributions is given by
p
p
||e1 · (L(xp ) − L(xt ))||1 ≤ |G| ||e1 · (L(xp ) − L(xt ))||2 ≤ |G| ||(L(xp ) − L(xt ))||2
which proves our claim.
In order to prove that ||L(xp ) − L(xt )||2 is small, we make the following very important observation.
Observation 3.6 Corresponding to the walk matrix Mbi , let us assume the diagonalized matrix is
ρbi . Since, all the matrices are simultaneously diagonalizable, we can assume that walk matrices at
the leaf nodes of the pseudo and the true trees are ρbi instead of Mbi . The labeling for the non-leaf
nodes is generated in the same way as before. More precisely, in the true tree, the label of a non-leaf
node is the product of the labels of its two children while in the pseudo tree, the label of a non-leaf
node is the expander product of its two children (the expander being the underlying expander of the
INW generator).
Also, since all the matrices in the leaf nodes are diagonals and hence each of the intermediate
products are also diagonal (in both the trees), therefore to bound ||L(xp ) − L(xt )||2 , it suffices to put
an upper bound on every diagonal entry of L(xp ) − L(xt ).
p
From the above observation, it suffices to show that for any i ∈ |G|, |L(xp )[i] − L(xt )[i]| ≤ / |G|.
Here, for a matrix A, A[i] represents its ith diagonal entry. Lets fix a particular i ∈ [|G|].
Claim 3.7 Consider any node x in the true tree. Let L(x) be its labeling. Consider the ith diagonal
entry of L(x). Then either the entry is 1 or it is at most 1 − 1/|G|2 in magnitude. Further, it is
1 if and only if the corresponding diagonal entry is 1 for each of the walk matrices (now they are
diagonalized) in all the leaf nodes.
Proof: We first prove the claim for the leaf nodes. Note that the diagonal entries correspond to
the characters. So, if the ith diagonal entry corresponds to character χi . Hence the ith diagonal
entry of the tth leaf is labeled by 1/2(χi (e) + χi (gt )) = 1/2(1 + χi (gt )). However, note that because
2πia
character is a homomorphism, hence |χi (gt )||G| = 1. Hence, χi (gt ) = e |G| where 0 < a < |G|.
1 + ω = cos πia ≤ cos πi ≤ 1 − 1
2 |G|
|G| |G|2
This gives us the result for leaf nodes. Now, assume the hypothesis to be true for nodes at a height
h < t0 . Consider a node xt at height t0 . For non-leaf nodes, we observe that if xt is a node and yt
and zt are its children, then the ith diagonal entry of xt is the product of the corresponding entries
in yt and zt . By induction hypothesis, unless both yt and zt are 1 in magnitude, at least one of
them is going to be at most 1 − 1/|G|2 in magnitude. Hence, so is xt . Also, if both yt and zt are 1,
so is zt but by induction hypothesis we get that all the leaves in the tree rooted at yt and zt have
walk matrices whose ith diagonal entry is 1. This proves the claim for xt as well.
For the rest of the discussion, we fix the i and let L0 (x) denote L(x)[i] for any node x. The next claim
shows that for any node xt in the true tree, if the ith diagonal entry is at least 1/10 in magnitude,
9
then the corresponding entry in xp is within an error of λ|G|4 from it. All the calculations in this
section use that λ|G|6 < 1/10 (eventually we set λ = /|G|7 ). More precisely, we have the following
claim.
Claim 3.8 Let xt be a node in true tree such that its labeling L0 (xt ) = ì satisfies |ì | ≥ 1/10.
Then, |L0 (xt ) − L0 (xp )| ≤ λ|G|4 log(1/|ì |).
Proof: We prove it by induction on the height of the node xt . Note that it is trivially true for
the leaves as the marginal of the INW generator on any coordinate is uniformly random. Let us
assume it is true for nodes at height < h. Let x be a node at height h. We break it into two
situations. Let the children of x be y and z. Now, assume that at least one of L0 (yt ) or L0 (zt ) is
1. Let us assume without loss of generality that it is the former case. Then L0 (xt ) = L0 (zt ). Also,
we note that L0 (xp ) = L0 (zp ) (We do not incur an error λ because of Lemma 2.4.) We note that
by induction hypothesis, we have |L0 (zp ) − L0 (zt )| ≤ λ|G|4 log(1/|L(zt )|). However,
|L0 (xp ) − L0 (xt )| = |L(zp ) − L(zt )| ≤ λ|G|4 log(1/|L0 (zt )|) = λ|G|4 log(1/|ì |)
Next, we consider the case when both |L0 (yt )| ≤ 1 − 1/|G|2 and |L0 (zt )| ≤ 1 − 1/|G|2 . Let
L0 (yp ) = L0 (yt ) + y
L0 (zp ) = L0 (zt ) + z
Let us assume without loss of generality that L0 (yt ) ≤ L0 (zt ). Then,
|L0 (xp ) − L0 (xt )| = |y L0 (zt ) + z (L0 (yt ) + y ) + |
|| ≤ λ
0
≤ |z | + |y L (zt )| + λ
1
≤ λ|G| log(1/|L (zt )|) + 1 −
λ|G|4 log(1/|L0 (yt )|) + λ
|G|2
4
0
≤ λ|G|4 log(1/|L0 (zt )||L0 (yt )|) = λ|G|4 log(1/|L0 (xt )|)
The last inequality uses the fact that |L0 (yt )| ≤ 1 −
1
.
|G|2
We next show that for any node xt in the true tree, if the ith diagonal entry is at most 1/10 in
magnitude, then the corresponding entry in xp is within an error of λ|G|6 from it.
Claim 3.9 Let xt be a node in true tree such that L0 (xt ) = ì satisfies |ì | < 1/10. Then, |L0 (xt ) −
L0 (xp )| ≤ λ|G|6 .
Proof: Let the children of x be y and z. Let us assume by induction hypothesis that the claim
holds for all nodes below x. We consider the following four cases :
• At least one of L0 (yt ) = 1 or L0 (zt ) = 1. Assume L0 (yt ) = 1.
• Both |L0 (yt )| ≥ 1/10 and |L0 (zt )| ≥ 1/10
• Both |L0 (yt )| < 1/10 and |L0 (zt )| < 1/10
• Exactly one of |L0 (yt )| and |L0 (zt )| is less than 1/10.
10
In the first case, note that by induction hypothesis, the claim holds for y and z. Now, as one of
the entries is 1, hence by Lemma 2.4, we see that
L0 (xp ) = L0 (yp ) ·H L0 (zp ) = L0 (yp ) · L0 (zp ) = L0 (zp )
As |L0 (zp ) − L0 (zt )| ≤ λ|G|6 , |L0 (xp ) − L0 (xt )| ≤ λ|G|6 .
In the second case, by a basic “union bound”, we get that the error is at most 2 log 10 · λ|G|4 + λ ≤
λ|G|6 .
For the next two cases, let us write L0 (zp ) = L0 (zt ) + z and L0 (yp ) = L0 (yt ) + y . For the third
case, |y |, |z | ≤ λ|G|6 .
|L0 (xp ) − L0 (xt )| = |y L0 (zt ) + z (L0 (yt ) + y ) + |
0
|| ≤ λ
0
≤ |y ||L (zt )| + |z ||L (yt )| + |z ||y | + λ
λ|G|6 λ|G|6
≤
+
+ λ2 |G|12 + λ ≤ λ|G|6
10
10
For the last part, assume that |L0 (yt )| ≤ 1/10 and |L0 (zt )| ≥ 1/10. Hence, for this part, we have
that |y | ≤ λ|G|6 and |z | ≤ (log 10) · λ|G|4 . Again, we have that,
|L0 (xp ) − L0 (xt )| ≤ |y ||L0 (zt )| + |z ||L0 (yt )| + |z ||y | + λ
1
1
≤
1−
λ|G|6 + λ|G|4 log 10 + λ2 |G|10 + λ ≤ λ|G|6
2
|G|
10
Combining Claims 3.8 and 3.9, we get the following lemma which combined with Observation 3.6
implies Theorem 3.1.
Lemma 3.10 Let xt be a node in true tree and xp be the corresponding node in the
ppseudo tree.
Then |L0 (xt ) − L0 (xp )| ≤ λ|G|6 . This implies that ||L(xt ) − L(xp )||2 ≤ λ|G|6 ≤ / |G| for λ =
/|G|7 .
4
Small biased spaces for group products
Next, we introduce the problem of small biased spaces for group products. Let G be an arbitrary
group and let x1 , . . . , xn ∈ {0, 1}. Again, for a, b ∈ G, we let a·b denote the group operation applied
on a and b. We also remind ourselves that for g ∈ G and x ∈ {0, 1},
(
1 if x = 0
x
g =
g if x = 1
Consider the distribution
D = g1x1 · . . . · gnxn
where g1 , . . . , gn ∈ G are chosen independently and uniformly at random. We seek to come up
with an efficiently computable function Γ : {0, 1}t → Gn such that when g1 , . . . , gn is sampled from
Γ(Ut ) then
D0 = g1x1 · . . . · gnxn
11
is -close to D in statistical distance. The aim is to keep t as small as possible and a probabilistic
argument shows that it is possible to get t = O(log |G| + log n + log(1/)). The task of getting
an explicit function Γ was first considered by Meka and Zuckerman [MZ09]. They obtained the
following result.
Theorem 4.1 There exists some fixed constant c = c(G) < 1/|G| and Γ : {0, 1}t → Gn with
t = O(log n) such that if X is a distribution defined as the output of Γ(Ut ) such that for any h ∈ G
|
P
[g1x1 · . . . · gnxn = h] −
(g1 ,...,gn )∼Γ(Ut )
P
(g1 ,...,gn )∼Gn
[g1x1 · . . . · gnxn = h]| ≤ c
Also one of their theorems coupled with the pseudorandom generator from [KNP10], gives the following result. For every > 0, there exists Γ : {0, 1}t → Gn with t = O(log n · (log(1/) + |G|Θ(1) ))
such that if X is a distribution defined as the output of Γ(Ut ), then
|
P
[g1x1 · . . . · gnxn = h] −
(g1 ,...,gn )∼Γ(Ut )
P
(g1 ,...,gn )∼Gn
[g1x1 · . . . · gnxn = h]| ≤ We improve on their result in terms of dependency on the seed length on the size of the group.
Also, as we shall see, to get this improvement, we do not require the whole machinery of [KNP10]
and our proof is rather short and simple. In particular, we prove the following theorem.
Theorem 4.2 Let Γ : {0, 1}t → Gn denote the INW generator with λ = /|G|. If X denotes the
output distribution of Γ(Ut ),
|
P
(g1 ,...,gn )∼X
[g1x1 · . . . · gnxn = h] −
P
(g1 ,...,gn
)∼Gn
[g1x1 · . . . · gnxn = h]| ≤ As a corollary, using the INW generator describe in Remark 2.2, we get that there is a polynomial
time computable function Γ : {0, 1}t → Gn with t = O(log n · (log |G| + log(1/)) such that if X
denotes the output distribution of Γ(Ut ),
|
P
(g1 ,...,gn )∼X
[g1x1 · . . . · gnxn = h] −
P
(g1 ,...,gn )∼Gn
[g1x1 · . . . · gnxn = h]| ≤ Before we discuss the proof of the above theorem, we will need to review some basic representation
theory. While it is possible to talk about the entire proof without using the language of representations, we believe its the right way to look at the proof and might be useful in getting improvements
in the future. Also, it will be helpful for us in Section 5. An excellent source for reviewing the
required material are lecture notes by Telerman [Tel05]. Below we review some basic representation
theory which will be helpful. For the reader familiar with representation theory, we remark that
our definitions are sometimes restrictive because the most general definition is not always helpful
for us.
Definition 4.3 Let V be any vector space over C. Then GL(V ) is the group of all invertible linear
transformations from V to V with the group operation being the function composition.
Definition 4.4 For a group G and a vector space V , a map ρ : G → GL(V ) is said to be a
representation of G if ρ is a group homomorphism. In this paper, we only consider vector spaces
V over C.
12
Definition 4.5 A group representation ρ : G → GL(V ) is said to be irreducible if it does not have
an invariant subspace. In other words, if for W ( V , ∀g, ρ(g)W ⊆ W , then W = {0}.
Irreducible representations are fundamental building blocks in the sense that any representation of
the group G can be seen as a direct product of irreducible representations of G. Moreover, any
finite group G, has only finitely many irreducible representations (up to isomorphism). We say two
representations ρ1 and ρ2 are isomorphic if there exists an invertible matrix τ such that for every
g, τ · ρ1 (g) · τ −1 = ρ2 (g).
Theorem 4.6 (Maschke) Let V be any finite dimensional vector space over C, G be a finite group
and ρ : G → GL(V ) be a representation. Then ρ can be written as a direct product of the irreducible
representations of group G.
We list the following important properties of irreducible representations (some of which will be
helpful later).
Lemma 4.7 Let S = {ρ1 , ρ2 , . . .} be the set of irreducible
of a finite group G. Let
P representations
d
×d
2
i
i
di be the dimension of ρi i.e. ρi : G → C
. Then, i∈S di = |G|.
We next state Schur’s lemma.
Lemma 4.8 Let ρ and τ be two non-isomorphic irreducible representations of a group G. Then,
• hτi,j , ρk,` i = E[τi,j (g)ρk,` (g)] = 0
• hτi,j , τk,` i =
δi,k δj,`
dτ
where dτ is the dimension of τ .
We remark that every group has a trivial irreducible representation ρt : G → GL(C) such that ∀x,
ρ(x) 7→ 1. We now state the following simple corollary of Schur’s lemma and the earlier observation.
Corollary 4.9 Let ρ : G → GL(V ) be a non-trivial irreducible representation of G. Then
0.
P
x∈G ρ(x)
Proof:
By Schur’s lemma, we get that if τ is the trivial representation, then for any i, j,
hτi,j , ρk,` i = 0 which implies the claim.
We now return to the problem of constructing small bias spaces over group products. We use the
INW generator described in Remark 2.2.
We now state the analogue of lemma 2.4 in this setting. To, do this let us assume that Γ0 , Γ1 :
{0, 1}r → Gm and ρ : Gm → Cw×w and let us define A and B as following :
1 X
1 X
A= r
ρ(Γ1 (x))
B= r
ρ(Γ2 (x))
(2)
2
2
r
r
x∈{0,1}
x∈{0,1}
Lemma 4.10 Let ρ1 , ρ2 : Gm → Cw×w such that ∀x ∈ {0, 1}m , j ∈ {1, 2}, ||ρj (x)||2 ≤ 1. Let
Γ1 , Γ2 : {0, 1}t → Gm . Let H be a 2d -regular graph on {0, 1}r with second eigenvalue λ. Then, for
A and B as defined in (2),
||A · B − A ·H B||2 ≤ λ
We also have the following observation.
13
=
• If A and B are identity on some subspace W , then A · B as well as A ·H B are identity on
W . We recall that a matrix A is said to be identity on some subspace W if and only if for all
x ∈ W , x · A = A · x = x.
We now formulate the problem of constructing small biased spaces for group products in terms of
getting pseudorandomness for a certain permutation branching program (for fooling which, we use
the INW generator). The branching program has n + 1 layers. Each layer consists of |G| many
vertices, each vertex labeled by an element of G. The branching program starts at the identity
element of G in the 0th layer. Now, if xi = 1, the branching program moves from x in the (i − 1)th
layer to x · gi in the ith layer. On the other hand, if xi = 0, the branching program moves from x
in the (i − 1)th layer to x in the ith layer. Thus, we have |G| walk matrices for the transition from
the (i − 1)th layer to the ith layer. We call them Mhi for every h ∈ G. Further, if xi = 0, they are
all the identity matrix. In case, xi = 1, then Mhi is defined by
(
1 x−1 y = h
Mhi (x, y) =
0 otherwise
We observe that if we take a random walk starting at the vertex corresponding to the identity
element of group G in the zeroth layer and then choose a random walk matrix among Mhi to
go from layer i − 1 to layer i, then after j steps, the distribution on the j th layer is exactly the
x
distribution g1x1 · . . . · gj j where gi ’s are chosen uniformly at random. Hence, it suffices to analyze
the error for this branching program when gi ’s are chosen from the output of the INW generator
in order to prove Theorem 4.2. We next make the observation that the walk matrices can be block
diagonalized such that the blocks have some nice properties.
Observation 4.11 If xi = 1, the map h 7→ Mhi is a group representation. This implies that there
is a basis transformation in which all the walk matrices can be simultaneously block diagonalized.
Note that this is because if xi = 0, then all the walk matrices are identity and hence after any
change of basis transformation, it will remain identity in every block.
If xi = 1, then each of the blocks in the walk matrices Mhi correspond to some irreducible representation. The corresponding blocks in Mhi when xi = 0 is always the identity. In particular, the
following is true.
• If the block corresponds to the trivial representation, then the corresponding block is the 1 × 1
identity matrix in all the walk matrices.
• If the block corresponds to a non-trivial representation, then the following is true.
– If xi = 0, then all the walk matrices are identity in that block.
– If xi = 1, then the sum of the walk matrices in that block is identically zero. Also, all
the blocks are unitary matrices because they correspond to irreducible representations.
The next observation says that since all the walk matrices can be simultaneously diagonalized,
we might as well treat the blocks individually and analyze the error incurred by using the INW
generator vis-a-vis the uniform distribution in each of the blocks.
Observation 4.12 Since all the walk matrices are simultaneously diagonalizable, we can assume
that the leaves of the pseudo tree for the INW generator as well as the true tree are marked by these
block diagonalized matrices as opposed to the true walk matrices.
14
Also, consider a particular block corresponding to the representation ρ. Let us instead label the leaf
nodes (in both the trees) by identity matrix of dimension dρ if xi = 0 and ρh for all h ∈ G if xi = 1.
The labels for the non-leaf nodes in the true and the pseudo tree are generated by taking the true
and the expander products of the children respectively. For a node x, we describe this labeling as
L0 (x).
If we prove that for any node xp in the pseudo tree and node xt in the true
p tree and any representation ρ, the labeling L0 (xp ) and L0 (xt ) satisfy ||L0 (xp ) − L0 (xt )||2 ≤ / |G|, then it implies that
the INW generator fools the branching program with error .
So, we now treat each block individually. Let us fix a representation ρ and analyze the difference
between labellings in the true tree an the pseudo tree. If the representation corresponding to the
block is trivial, then the following claim says that there is no error between the pseudo tree and
true tree.
Claim 4.13 Let xp and xt be corresponding nodes in the pseudo tree and the true tree respectively.
Let L0 (xp ) and L0 (xt ) be the labeling of xp and xt with respect to the trivial representation. Then,
L0 (xp ) = L0 (xt ) = 1
Proof: Note that if we consider the trivial representation, then all the leaf nodes in both the true
and the pseudo tree are labelled by matrices which are just the 1 × 1 identity matrix. Now, clearly
the labeling of the leaf nodes in both the pseudo and true tree is identical namely 1. We now use
induction on the height of a node. Let xp be a node at height t in the pseudo tree and xt be the
corresponding node in the true tree. By induction hypothesis, the labeling of the children of xt
namely yt and zt is 1. Similarly, the labeling of yp and zp is also 1. Now as L0 (xt ) = L0 (yt )L0 (zt ) = 1.
Further, note that L0 (xp ) = L0 (yp ) = 1 by Lemma 4.10 as the labeling is 1 on all the leaf nodes
under zp . This proves the lemma.
Next, we consider the case when the representation ρ is non-trivial.
Claim 4.14 Let xp and xt be corresponding nodes in the pseudo tree and the true tree respectively.
Let L0 (xp ) and L0 (xt ) be the labeling of xp and xt corresponding to the representation ρ. Then,
||L0 (xp ) − L0 (xt )||2 ≤ 2λ.
Proof: We prove this by induction. We observe that the claim holds for the leaf nodes. We first
observe that for any node xt in the true tree, L0 (xt ) is either the identity matrix or L0 (xt ) = 0.
This is because if xi = 0, then the ith leaf node is labeled
by the identity matrix. On the other
P
hand, if xi = 1, then the leaf node is labeled by 0 (as h∈G ρ(h) = 0).
Clearly, for a node xt , L(xt ) = 1 if and only if all the leaf nodes below it are marked 1, else it
is labeled 0. Now, consider any node xt in the true tree and its corresponding node xp . Let the
children of xt be yt and zt , such that one of them, lets say L0 (yt ) = Id. Then, by Lemma 4.10,
L0 (xt ) = L0 (zt ) and L0 (xp ) = L0 (zp ). However, by induction on the height of the tree, we can
assume that ||L0 (zp ) − L0 (zt )||2 ≤ 2λ which implies that ||L0 (xp ) − L0 (xt )||2 ≤ 2λ.
So, we assume that both L(yt ) and L(zt ) are 0. In that case, by induction hypothesis, we can
assume that ||L(yp )||2 ≤ 2λ. Similarly, ||L(zp )||2 ≤ 2λ. By Lemma 4.10, we can say that ||L(xp ) −
L(yp ) · L(zp )||2 ≤ λ. This implies that ||L(xp )||2 ≤ λ + 4λ2 ≤ 2λ (provided λ < 1/10).
The above two claims together with Observation 4.12 implies that it suffices to have λ = /2
to get an overall error of and hence get theorem 4.2.
15
p
|G|
5
Pseudorandomness for permutation branching programs
In this section, we discuss the most important result of this paper. Namely, we get a PRG for
read-once permutation branching programs of constant width with seed length O(log n · log(1/)).
More precisely, we have the following theorem.
8
Theorem 5.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = · 2−w . Then the output
of Γ(Ut ) is -indistinguishable from Un for read-once width w permutation branching programs of
length n.
Using the standard INW generator, we get the following corollary.
Corollary 5.2 There is a polynomial time computable function Γ : {0, 1}t → {0, 1}n with t =
O(log n · (w8 + log(1/)) such that Γ(Ut ) is -indistinguishable from Un for read-once permutation
branching programs of width w.
We now describe the overall strategy for proving Theorem 5.1. As described in Section 2, we
will label the leaf nodes in both the INW tree as well as the true tree with the walk matrices
corresponding to the branching program. In particular, the label for the ith leaf node will simply
be the average of the walk matrix for the transition corresponding to 0 and that corresponding to
1 (recall, we call them M0i and M1i ). Subsequently, the label of the non-leaf node is the product
of the labels of its children in the true tree and the expander product in case of the pseudo tree.
Much like the case for abelian groups, for a node xt in the true tree and the corresponding node
xp in the pseudo tree, we would like to say that kL(xp ) − L(xt )k is a function of λ and kL(xt )k
alone. However, in the case of abelian groups, we had the luxury of diagonalizing and treating each
coordinate individually. Instead, here we cannot do that. We instead try to adopt the following
strategy. For discussing the intuition further, it will be helpful to introduce the following concepts.
Definition 5.3 For a matrix M ∈ Cw×w , we say W is the fixed point subspace of M if W is a
maximal subspace such that for all x ∈ W , x · M = M · x = x. In other words, W is the maximal
subspace such that M is identity on that subspace. We note that for a given matrix M , the fixed
point subspace is uniquely defined. Further, for the matrix M , we define its non-trivial subspace to
be W ⊥ .
Definition 5.4
• For a matrix A ∈ Cm×m and for its non-trivial subspace W ⊆ Cm , we define
kAkW = max
w∈W
kA · wk
kwk
Coming back to the structure of the proof, we show that for any node xt in true tree and the
corresponding node xp , kL(xp ) − L(xt )k can be bounded as a function of the dimension of the
non-trivial subspace of L(xt ) (call it W ) and kL(xt )kW . In particular, we will allow the error to
grow as the dimension of W increases or kL(xt )kW decreases. In fact, the dependence of the error
on the dimension of the non-trivial subspace i.e., W shall dominate the dependence of the error on
the norm of the label on its non-trivial subspace.
The proof shall proceed by induction on the height of the nodes. To convey the intuition, let us say
that we will claim that for any pair of corresponding nodes xt and xp , kL(xp ) − L(xt )k ≤ f (α)g(β)λ
16
where if W is the non-trivial subspace of L(xt ), then α = kL(xt )kW and β = dim(W ). Further, let
our choice be such that
lim λ · f (α) · g(β) = 0
λ→0
This ensures that we can indeed choose λ such that if we just want constant error, then we need
to choose some constant λ dependent on w.
Now, assume that this holds by induction hypothesis till some height. Consider the inductive step.
Let xt be a node in the true tree and yt and zt be its children. Let xp , yp and zp be the corresponding
nodes in the pseudo tree. There are exactly three situations which can arise :
• The non-trivial subspace of yt and zt and hence xt are the same. In this case, the allowed
dependence of the error on β plays no role. The only relevant factor is α and the analysis is
similar to the analysis in case of abelian groups.
• The non-trivial subspace of yt and zt are such that neither is contained within the other. In
this case, the non-trivial subspace of xt has strictly bigger dimension than those yt or zt . It
is here that the dependence of the error on β plays a role. In fact, because we allow the
dependence on β to supersede any dependence on α, we can bound the error easily. The only
thing we need to show is that the norm of xt on its non-trivial subspace is not very close to
1 which we manage to show easily.
• The non-trivial subspace of yt is properly contained in that of zt (or vice-versa). In this
situation, it is possible that labeling of xt has the same norm on its non-trivial subspace as
zt , yet kL(xp ) − L(xt )kW > kL(zp ) − L(zt )kW where W is the non-trivial subspace of zt and
xt . What is more concerning is that one can have a series of nodes (in the true tree), call
them x0 , . . . , xm and y1 , . . . , ym such that xi has two children, xi−1 and yi . Also, for all i, the
non-trivial subspace of L(yi ) is properly contained in the non-trivial subspace of L(x0 ). The
way around in this situation is to actually do a global analysis of the error incurred by the
chain as a whole rather than trying to do it on a per node basis. Here we use an induction
based proof of the “Key Convegence Lemma” in [KNP10].
We will now state the claims we prove and how the main theorem 5.1 follows from it. Below, when
we say that an operator acts trivially on subspace X, we simply means that it is identity on the
subspace X. The first claim we prove is the following.
Claim 5.5 Let α00 ≥ 1/10. Also, let xt be a node in the true tree and xp be the corresponding
node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are
the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W with dimension β.
Let ||L(yt )||W = α, kL(zt )kW = α0 and kL(xt )kW = α00 . Then, provided kL(yp ) − L(yt )kW ≤
5
5
5
5
5
5
λ · ww ww β log(1/α) + λww β 2β 6w and kL(zp ) − L(zt )kW ≤ λ · ww ww β log(1/α0 ) + λww β 2β 6w .
5
5
Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β log(1/α00 ). If L(zp ) and L(yp ) act trivially on W ⊥ , so does
L(xp ).
Claim 5.6 Let xt be a node in the true tree and xp be the corresponding node in the pseudo tree.
Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further,
let the non-trivial subspace of xt , yt and zt be W such that dim(W ) = β. Let kL(xt )kW < 1/10.
Assume that for j ∈ {y, z}, the following holds
(
5
5
5
λ · ww ww β log(1/kL(jt )kW ) + λww β 2β 6w if kL(jt )kW ≥ 1/10
kL(jp ) − L(jt )kW ≤
(3)
5
5
5
λ · ww ww β w6w + λww β 2β 6w
otherwise
17
5
5
Then, ||L(xp )−L(xt )||W ≤ λ·ww ww β w6w . If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ).
Claim 5.7 Let xt be a node in the true tree and xp be a node in the corresponding node in the
pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children
of xp . Further, let Wx , Wy and Wz be the non-trivial subspaces of L(xt ), L(yt ) and L(zt ) and
dim(Wy ) = β1 , dim(Wz ) = β2 and dim(Wx ) = β. Also, Wy 6= Wy ∩ Wz 6= Wz . Then,
• β > β1 and β > β2 .
5
5
5
5
Also, if kL(yp )−L(yt )kWy ≤ ww ww β1 w6w +λww β1 2β1 w6w and kL(zp )−L(zt )kWz ≤ ww ww
5
λww β2 2β2 w6w , then
(
5
5
λ · ww ww β log(1/kL(xt )kW ) if kL(xt )kWx ≥ 1/10
kL(xp ) − L(xt )kWx ≤
5
5
λ · ww ww β w6w
otherwise
5β
2
w6w +
(4)
If L(yp ) acts trivially on Wy⊥ and L(zp ) acts trivially on Wz⊥ , then L(xp ) acts trivially on Wx⊥ .
Claim 5.8 Let x0 , x1 , . . ., xm , y1 , y2 , . . ., ym be nodes in the true tree. Let x00 , x01 , . . ., x0m , y10 , y20 ,
0 be the corresponding nodes in the pseudo tree. Further, for i ≥ 1, x is the parent of x
. . ., ym
i
i−1 and
yi . Also, the non-trivial subspace of L(yi ) (for 0 < i ≤ m) is strictly contained in the non-trivial
subspace of L(x0 ). Then, provided that all L(x0i ) and L(yi0 )’s have the same non-trivial subspace as
their corresponding counterparts in the true tree and for z ∈ {x0 , y1 , . . . , ym } and β = dim(Wz )
(
5
5
λ · ww ww β log(1/kL(z)kWz ) if kL(z)kWz ≥ 1/10
0
kL(z) − L(z )kWz ≤
5
5
λ · ww ww β w6w
otherwise
(where Wz represents the non-trivial subspace of L(z)). Then, the non-trivial subspace of L(xt )
and L(x0t ) are the same as that of L(x0 ). Also, if V is the said subspace
5
kL(xt ) − L(x0t )kV ≤ kL(x0 ) − L(x00 )kV + λww β 2β 6w
We first see how the above four claims can be used to prove the main Theorem 5.1.
Proof: [of Theorem 5.1] First, we make the claim that for a node xt , if L(xt ) acts trivially on
a subspace (i.e., it is the identity), then L(xp ) is also the identity on the same subspace. We see
that this holds trivially for all the leaf nodes. Now, we prove this by induction. Assume this holds
for all nodes till height t. Let xt be a node at height t + 1 and yt and zt be its children. Now,
by induction, the fixed point subspace of L(yt ) and L(zt ) are the same as that of L(yp ) and L(zp )
respectively.
• If the fixed point subspace of L(yt ) and L(zt ) are the same, then by Claim 5.5 and Claim 5.6,
the fixed point subspace of L(xp ) is the same as that of L(xt ).
• If the fixed point subspace of L(yt ) is not contained within that of zt and vice versa, then
Claim 5.7 says that the fixed point subspace of L(xp ) is the same as that of L(xt ).
• If the fixed point subspace of L(yt ) is contained within that of zt or vice versa, then Claim 5.8
says that the fixed point subspace of L(xp ) is the same as that of L(xt ).
18
The above means that in order to bound ||L(xp ) − L(xt )||, we can just put a bound on ||L(xp ) −
L(xt )||Wx where Wx is the non-trivial subspace of L(xt ). We claim that for the root node of the
5
6
pseudo tree and the true tree (call them ap and at ), ||L(ap )−L(at )|| ≤ 2λ·ww ww w6w . This proves
the main theorem (Allowing w to be a sufficiently large constant which we can assume without loss
of generality). In order to prove this, we assume as a base case that any leaf node xp in the pseudo
tree and the corresponding node xt in the true tree,
(
5
5
λ · ww ww β log(1/kL(xt )kWx ) if kL(xt )kWx ≥ 1/10
kL(xp ) − L(xt )kWx ≤
(5)
5
5
λ · ww ww β w6w
otherwise
where β is the dimension of Wx i.e., the non-trivial subspace of L(xt ). Also, the non-trivial subspace
of L(xt ) and L(xp ) are exactly the same. This holds trivially at the leaves. Now consider any node
at in the true tree and ap in the pseudo tree. Also, let the respective children be yt and zt and yp
and zp respectively. Now, assume (5) holds when x is replaced by y and z (by induction hypothesis).
Also, let us assume that one of the two things is true. Either, Wyt and Wzt are the same in which
case Wat = Wyt = Wzt . Or else, Wyt and Wzt are properly contained in Wat . In the former case,
we can apply either of Claim 5.5 or Claim 5.6 and see that (5) holds when x is replaced by a. In
the latter case, Claim 5.7 applies and we again see that (5) holds when x is replaced by a.
The only remaining case is when Wzt ( Wyt . In this case, Wat = Wyt . To handle this case, consider
the nearest ancestor of at , (call it bt and let the parent of bt be ct and its sibling be dt ), such that
one of the following holds :
• Wbt = Wat ( Wct
• Wbt = Wdt
We remark that if no such bt exists, then we let bt be the root node. Now, by Claim 5.8, we can
say that
5
kL(bt ) − L(bp )kWyt ≤ kL(yt ) − L(yp )kWyt + λww β 6w 2β
where β is the dimension of Wyt . In case, bt is the root node, we are done because kL(yt ) −
5
5
5
5
6
L(yp )kWyt ≤ λww ww β w6w + λww β 2β 6w . However, this is clearly bounded by 2λww ww w6w . If
bt is not the root node, then we can apply one of Claim 5.5, Claim 5.6 or Claim 5.7 to get that
5
kL(ct ) − L(cp )kWct ≤ λww ww
5
dim(Wct )
w6w
and from there we can proceed inductively.
5.1
Basic spectral properties of the labelings
We start by stating the following lemma from [KNP10] (Lemma 30 in their paper).
Lemma 5.9 Let P1 , . . . , Pm ∈ Rw×w be m permutation matrices. Let G be the group generated by
these matrices. Consider the matrix A defined as
I + P1
I + P2
I + Pm
A=
...
2
2
2
P
P
Note that A = g λg Pg such that λg ≥ 0 and g λg = 1. Also, Pg ∈ G. Then, there is a set K ⊆ G
and a ∈ G such that a−1 K generates G and for every k ∈ K, λk ≥ 1/2|G|. Also, λa ≥ 1/2|G|.
19
Claim 5.10 Let ρ : G → Cm×m be an irreducible representation of G. Let there be a set K ⊆ G
and a ∈ G such that a−1 K generates G. Then, for any v ∈ Cm such that ||v||2 = 1, hρ(a)v, ρ(k)vi ≤
1 − 1/|G|3 .
Proof:
Assume for contradiction that ∀k ∈ K, hρ(a)v, ρ(k)vi = hv, ρ(a−1 k)i v > 1 − 1/|G|3 .
−1
Because, a K generates G and to generate any element g ∈ G, one needs to multiply at most |G|
elements
from G, we get that for every g ∈ G, hv, ρ(g)vi > 1 − 1/|G|2 . However, by Schur’s lemma,
P
g ρ(g) = 0.
P
This implies that hv, ρ(g)vi = 0. This leads to a contradiction.
We now prove the following important lemma about the spectra of the labelings in the true tree.
Lemma 5.11 Let P1 , . . . , Pm ∈ Rw×w be m permutation matrices. Consider the matrix A defined
as
I + P2
I + Pm
I + P1
...
A=
2
2
2
Let V be the eigenspace of A such that ∀v ∈ V , v · A = v. Let W be the space orthogonal to V .
Then W is invariant under A i.e., ∀w ∈ W , A · w ⊆ W and for any w ∈ W ,
kw · Ak2 ≤ (1 − w−3w )kwk2
Proof: Consider the group G generated by the matrices P1 , . . . , Pm . Clearly, G ≤ Sw . Consider
the irreducible representations ρ of G. Note, that we can find a unitary matrix U such that U AU †
is block-diagonal. Further, the blocks correspond to the irreducible representations of G. Now, it is
obvious, that corresponding to the trivial representation, all the blocks are 1 and hence correspond
to the trivial eigenspace.
Now, consider some non-trivial representation ρ of G. Let us interpret Pi
P
as giP
∈ G. If A = g λg Pg , then after the block diagonalization, the block of A corresponding to
ρ is g λg ρ(g).
Consider any v ∈ W . That means it has non-zero values only in coordinates corresponding to
the non-trivial representations of G. Let Sρ be a set of coordinates corresponding to a particular
non-trivial representation ρ. Consider the vector vSρ which is simply the projection of v on the
coordinates in Sρ .
!
X
(Av)Sρ =
λg ρ(g) vSρ
g
Now, by Claim 5.10 and Lemma 5.11, we can say that there exists a, k such that λa , λk ≥ 1/2|G|
and hρ(a)vSρ , ρ(k)vSρ i ≤ 1 − 1/|G|3 . This implies that k(Av)Sρ k2 ≤ (1 − 1/|G|3 )kvSρ k2 . As the size
of the group G is at most w! ≤ ww , we get the claim.
We now list a property of the labelings in regard to its fixed point subspace.
Claim 5.12 Let xt be a node in the true tree and L(xt ) be its labeling. Further, let y ∈ Cw×w be
such that y · L(xt ) = y. Then, if xp is the node corresponding to xt in the pseudo tree, and L(xp )
is its labeling, then y · L(xp ) = y.
Proof: We prove this claim by induction on the height of the node xt . Clearly, the assertion
is true for the leaves of the true tree. Now, observe that the labeling of the node xt is simply
20
the product of L(yt ) and L(zt ) where yt and zt are the children of xt . It is easy to observe that
kL(yt )k2 , kL(zt )k2 ≤ 1. This implies that if y ·L(xt ) = y, then y ·L(yt ) = y ·L(zt ) = y. By induction
hypothesis, we can say that y · L(yp ) = y · L(zp ) = y. As L(xp ) = L(yp ) ·H L(zp ), by lemma 2.4, we
can say that y · L(xp ) = y.
Corollary 5.13 Let xt be a node in the true tree and L(xt ) be its labeling. Further, let y ∈ Cw×w
be such that L(xt ) · y = y. Then, if xp is the node corresponding to xt in the pseudo tree, and L(xp )
is its labeling, then y · L(xp ) = L(xp ) · y = y.
P
Proof:
Note thatP
L(xt ) can be expressed as i λi Pi where Pi are permutation matrices and
λi ∈ R+ such that
λi = 1. This implies however, that for each of the i, Pi · y = y. That
means that for all i, y · Pi = y which in turn implies that y · L(xt ) = y. Now, we can simply use
Claim 5.12.
We now list some more useful facts about matrices which shall be helpful in the course of our proof.
Fact 5.14 Let A, B ∈ Cn×n and v1 , . . . , vn be an orthonormal basis of Cn . Then, the following are
true:
• Tr[AB] = Tr[BA] (cyclic property of trace)
P
• kAk2F = ni=1 λ2i where λi ’s are the singular values of A.
P
• kAk2F = ni=1 vi† · A† · A · vi
• kA · BkF ≤ kAk2 · kBkF and kA · BkF ≤ kAkF · kBk2 .
5.2
Proofs of Claim 5.5 and Claim 5.6
In this subsection, we prove claims 5.5 and 5.6.
Claim B.5 Let α00 ≥ 1/10. Also, let xt be a node in the true tree and xp be the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp
are the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W with dimension
β. Let kL(yt )kW = α, kL(zt )kW = α0 and kL(xt )kW = α00 . Then, provided kL(yp ) − L(yt )kW ≤
5
5
5
5
5
5
λ · ww ww β log(1/α) + λww β 2β 6w and kL(zp ) − L(zt )kW ≤ λ · ww ww β log(1/α0 ) + λww β 2β 6w .
5
5
Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β log(1/α00 ). If L(zp ) and L(yp ) act trivially on W ⊥ , so does
L(xp ).
Proof: The fact that if L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ), follows trivially from
Lemma 2.4.
We let L(yp ) = L(yt ) + Ey and L(zp ) = L(zt ) + Ez . We observe that by definition kEy kW ⊥ =
kEz kW ⊥ = 0. By Lemma 2.4, we get that since we define L(xp ) = L(yp ) ·H L(zp ), hence L(xp )W ⊥ =
Id and hence kL(xp ) − L(xt )kW ⊥ = 0. Now, for Ex = L(xt ) − L(xp ),
kEx kW ≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ
21
(6)
Note that kL(xt )kW ≤ kL(yt )kW kL(zt )kW . Also, let us put kL(yt )kW = γ1 and kL(zt )kW = γ2 .
Hence, we get that
kL(xp ) − L(xt )kW
≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ
5
5
5
≤ ((λ · ww ww β (log(1/γ1 ) + log(1/γ2 )) + 2λww β 2β 6w )(1 − w−3w ) + λ
5
5
≤ (λ · ww ww β (log(1/γ1 γ2 ))
The above uses the fact that γ1 , γ2 ≤ 1 − w−3w .
Claim B.6 Let xt be a node in the true tree and xp be the corresponding node in the pseudo tree.
Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further,
let the non-trivial subspace of xt , yt and zt be W such that dim(W ) = β. Let kL(xt )kW < 1/10.
Assume that for j ∈ {y, z}, the following holds
(
5
5
5
λ · ww ww β log(1/kL(jt )kW ) + λww β 2β 6w if kL(jt )kW ≥ 1/10
kL(jp ) − L(jt )kW ≤
(7)
5
5
5
λ · ww ww β w6w + λww β 2β 6w
otherwise
5
5
Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β w6w . If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ).
Proof: The fact that if L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ), follows trivially from
Lemma 2.4.
We begin by observing that if for j ∈ {y, z}, kL(jt )kW ≥ 1/10, then
5
kL(jp ) − L(jt )kW ≤ 5λ · ww ww
5β
As was the case with abelian groups, we divide this into three cases. We also let L(yp ) = L(yt ) + Ey
and L(zp ) = L(zt ) + Ez . We observe that by definition kEy kW ⊥ = kEz kW ⊥ = 0
• Both kL(yt )kW , kL(zt )kW ≥ 1/10.
• Exactly one of kL(yt )kW or kL(zt )kW ≥ 1/10. We assume without loss of generality that
• Both kL(yt )kW , kL(zt )kW < 1/10.
We note that the first case was handled by Claim 5.5. Now, we come to the second case. We recall
that kL(zt )kW ≥ 1/10.
kL(xp ) − L(xt )kW
≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ
5
5
5
5
≤ (1 − w−3w )(λ · ww ww β w6w + λww β 2β 6w ) + 5λ · ww ww
≤ λ·w
w5
w
w5 β
w
5β
+λ
6w
Next, we deal with the third case
kL(xp ) − L(xt )kW
≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ
1
1
5
5
5
5
5
5
(λ · ww ww β w6w + λww β 2β 6w ) + (λ · ww ww β w6w + λww β 2β 6w ) + λ
≤
10
10
5
5
≤ λ · ww ww β w6w
22
5.3
Proof of Claim 5.7
Claim B.7 Let xt be a node in the true tree and xp be a node in the corresponding node in the
pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children
of xp . Further, let Wx , Wy and Wz be the non-trivial subspaces of L(xt ), L(yt ) and L(zt ) and
dim(Wy ) = β1 , dim(Wz ) = β2 and dim(Wx ) = β. Also, Wy 6= Wy ∩ Wz 6= Wz . Then,
• β > β1 and β > β2
5
5
5
5
Also, if kL(yp )−L(yt )kWy ≤ ww ww β1 w6w +λww β1 6w 2β1 and kL(zp )−L(zt )kWz ≤ ww ww
5
λww β2 6w 2β2 , then
(
5
5
λ · ww ww β log(1/kL(xt )kW ) if kL(xt )kWx ≥ 1/10
kL(xp ) − L(xt )kWx ≤
5
5
λ · ww ww β w6w
otherwise
5β
2
w6w +
(8)
If L(yp ) acts trivially on Wy⊥ and L(zp ) acts trivially on Wz⊥ , then L(xp ) acts trivially on Wx⊥ .
Proof: If L(yp ) acts trivially on Wy⊥ , then it acts trivially on Wx⊥ as well. Similarly, if L(zp ) acts
trivially on Wz⊥ , then it does so on Wx⊥ as well. Now, by Lemma 2.4, L(xp ) acts trivially on Wx⊥
as well.
Note that because L(xt ) acts non-trivially on Wx , hence we get that kL(xt )kWx ≤ 1 − w−3w .
kL(xp ) − L(xt )k ≤ kL(yp ) − L(yt )k + kL(zp ) − L(zt )k + λ
5
5β
5
5β
≤ λ · ww ww
≤ λ · ww ww
5.4
1
5
w6w + λ · ww ww
5β
2
5
w6w + λ + 2λww β 2β 6w
log(1/(1 − w−3w ))
Convergence lemma for chains
In this subsection, we prove Claim 5.8. We prove the claim in two parts. In order to understand
the two claims, let us recall the situation we have. There are nodes x0 , x1 , . . . , xt and y1 , . . . , yt in
the true tree and the corresponding nodes x00 , x01 , . . . , x0t and y10 , . . . , yt0 in the pseudo tree. Also, xi
is the parent of xi−1 and yi and likewise, x0i is the parent of x0i−1 and yi0 . Note that this means
that label L(xi ) is the product of labels L(xi−1 ) and L(yi ). In case of the pseudo tree, L(x0i ) is the
expander product of labels L(x0i−1 ) and L(yi0 ). In order to bound the difference between L(xt ) and
L(x0t ), we define an intermediate process by defining nodes L(x00i ) such that L(x00i ) is obtained by
the product of L(x00i−1 ) and L(yi0 ). Also, L(x000 ) is simply L(x00 ). We first make the following claim.
Claim 5.15 Consider the node xt and x00t as defined above. Let W be the non-trivial subspace of
5
L(xt ) and let dim(W ) = β. Then, kL(xt ) − L(x00t )kW ≤ λ · 2β−1 ww β w6w .
Proof: We note that L(x00t ) is product of L(x000 ) and L(yi0 )’s in some order and L(xt ) is the product
of L(x0 ) and L(yi )’s in the same order. Here we will take advantage of the fact that multiplication
of matrices is associative (as opposed to expander product which is not necessarily associative).
Let us construct two trees, call them the rearranged tree and the intermediate tree where for the
moment we just specify the leaves. The leaves of the rearranged tree are the L(yi )’s and L(x0 )
23
arranged in the correct permutation (in which they need to be multiplied) . The leaves of the
intermediate tree are arranged in the same order except the corresponding leaves are marked by
L(yi0 )’s as opposed to L(yi )’s. Also, in place of L(x0 ), we have L(x000 ). We now describe the non-leaf
nodes of the tree (note this is not going to be a balanced binary tree in general).
First, we give a definition. When we construct the trees, then any node x in the rearranged tree
will have an obvious label which is the product of labels of its children. Similarly, any node x00
in the intermediate tree will be the product of the labels of its children. We call a node x in the
rearranged tree “good”, if for the corresponding node x00 in the intermediate tree, the following is
true :
(
5
5
λ · ww ww β log(1/kL(x)kWx ) if kL(x)kWx ≥ 1/10
00
kL(x) − L(x )kWx ≤
(9)
5
5
λ · ww ww β w6w
otherwise
Here Wx is the non-trivial subspace of L(x). Now, it should be clear that it does not matter how
we construct the tree but if we can show for z which is the root of the rearranged tree and z 00 which
5
is root of the intermediate tree kL(z) − L(z 00 )kWz ≤ λ · 2β ww β w6w , then we are done. That is,
5
this implies that kL(xt ) − L(x00t )kW ≤ λ · 2β ww β w6w where W is the non-trivial subspace of L(xt )
which is the same as that of L(xt ).
We now start by observing that in the beginning, every node in the rearranged tree is good. Now,
assume we can find two adjacent nodes in the rearranged tree (call them a and b) such that one of
the following is true:
• Node a and b are adjacent and the non-trivial subspace of both of these nodes is identical.
• The non-trivial subspace of L(a) and L(b) are such that neither of them is contained within
the other
In case any of the above two things happen, then note that we construct a node c in the correct
tree, make a and b its children. Similarly, we can construct c00 in the intermediate tree, make a00
and b00 its children. Also, using Claims 5.5, 5.6 and 5.7, it is easy to check that since a and b were
“good”, c continues to remain good. Let us call the nodes which have no parent as active nodes.
We have reduced the number of active nodes by 1 by this process. We can continue this process
until we are in a situation where any two adjacent active nodes (adjacency among active nodes is
defined in the obvious way i.e. the permutation in which the matrices should be multiplied) are
such that the non-trivial subspace of one of them is strictly contained in the non-trivial subspace of
the other one. Let the active node which has smallest non-trivial subspace dimension be a (there
may be many of them). Let the non-trivial subspace dimension be γ. We find all such active nodes.
Clearly, none of them are adjacent. Now, for every such node, we make a pair with the node left
of this one (this is arbitrary). For any such pair (a, b), we create a node c in the correct tree and
c00 in the intermediate tree such that L(c) = L(a) · L(b) and L(c00 ) = L(a00 ) · L(b00 ). Note that there
are two things which happen after this process :
• The non-trivial subspace of any active node after this process has dimension at least γ + 1
5
5
• While the active nodes may not all be good, they are all λ · ww ww γ w6w -close to being good.
We remark that by saying a node x is δ-far from being good, we simply mean that kL(x)−L(x00 )kW
is at most δ more than what it would have been had x been a “good” node. We now make the
following inductive claim.
24
Claim 5.16 At any point, if the active node with the smallest (dimension) non-trivial subspace
5
has dimension γ + 1, then every active node is at most 2γ λ · ww γ w6w far from being good.
Clearly, this is true in the beginning. If we find two active nodes a and b such that
• Node a and b are adjacent and the non-trivial subspace of both of these nodes is identical.
• The non-trivial subspace of L(a) and L(b) are such that neither of them is contained within
the other
Then, we create an active node c with its children as a and b. We notice that Claim 5.5, 5.6 and
5
5
5.7 will imply that c is good even though a and b are only λ · ww ww γ w6w close to being good
(Note that we are using that dimension of every active node is at least γ + 1).
In case, we are again stuck then we simply observe that we can again pair all active nodes of
dimension γ + 1 with active nodes which have dimension at least γ + 2. Note that after this, all
5
5
5
5
remaining nodes will be λ2γ ww ww γ w6w + λww ww (γ+1) w6w far from being good. However,
5
5
5
λ2γ ww ww γ w6w + λww ww
5 (γ+1)
5
w6w ≤ λ2γ+1 ww ww
5 (γ+1)
w6w
This concludes our proof.
At this point, having proven that kL(xt ) − L(x00t )kW is small i.e., bounded by a constant dependent
just on the dimension of the non-trivial subspace, it is very easy to get O(log n·(log log n+log(1/)+
log(1/)) seed. This is because we need to bound kL(x0t ) − L(x00t )kW and an obvious bound on this
quantity is λ · t. We formalize the claim below.
Claim 5.17 Let us have nodes x00 , y10 , . . . , yt0 , each having a labeling L(x00 ), L(y10 ), . . ., L(yt0 ). Let us
define nodes x01 , . . . , x0t and x001 , . . . , x00t . Let us define their labellings as follows. L(x0i ) = L(x0i−1 ) ·H
L(yi0 ) and L(x00i ) = L(x00i−1 ) · L(yi0 ). Then kL(x00t ) − L(x0t )k ≤ tλ.
Proof:
Follows by a simple hybrid argument.
Now as length of the chain is bounded by O(log n), we will get a bound of O(λ · log n). This fact
can be used to get an overall error of O(λ · c(w) · log n) where c(w) is a constant solely dependent
on w. This implies one can choose λ = /(c(w) · log n) which means one can get a seed length of
O(log n · (log log n + log(1/)) for constant width branching programs. This reproduces the results
of [BRRY10, BV10]. However, we want to get an upper bound which is a constant rather than
something dependent on n. The following subsection achieves the same.
5.4.1
Showing that kL(x0t ) − L(x00t )kW is small
Now, to prove Claim 5.8, what remains to be shown is that ||L(x0t ) − L(x00t )||W is bounded by
5
λ2β ww β w6 where W is the non-trivial subspace of the labeling L(xt ) and the dimension of W is β.
For this part of the proof, we need to consider a slightly different notion of norm which we define
next.
Definition 5.18 Consider some matrix M ∈ Cw×w . Let V be some subspace of C[w]×[w] . Then,
define kM kF,V as the length of the projection of M on V . Note that when V = C[w]×[w] , then
kM kF,V is simply the Frobenius norm of M .
25
We now introduce a bit more notation. Let V, V 0 , W be subspaces such that W ⊆ V and V 0 is
the orthogonal complement of W with respect to V . Let dim(V ) = n and A ∈ Cn×n be a matrix.
Note that we can choose a orthonormal basis for V such that the first dim(W ) elements form an
orthonormal basis for W and the next dim(V 0 ) elements form a basis for V 0 . Hence, we can use
the elements of this basis to label the rows and columns of A in the natural way. Now, we let
AV 0 ,V 0 denote the block of A where both the rows and columns from the indices labelled by the
orthonormal basis for V 0 and likewise for AW,W . We also let AV 0 ,W represent the block where the
rows are the indices labelled by the basis for V 0 and the columns are the indices labeled by basis
of W . With the notation in place, we make the following important claim.
Claim 5.19 Let ρ1 , ρ2 : {0, 1}n → Cw×w . Also, let the non-trivial subspace for all elements in
the domain of ρ1 be V and the non-trivial subspace of all elements in the range of ρ2 be W with
W ⊆ V . Let Γ1 : {0, 1}t → {0, 1}n and Γ2 : {0, 1}t → {0, 1}n . Now, consider the two products
1 X
1 X
A= t
ρ1 (Γ1 (x))
B= t
ρ2 (Γ2 (x))
2
2
t
t
x∈{0,1}
x∈{0,1}
Let V = W ⊕ V 0 i.e. V 0 is the orthogonal subspace of W when the ambient space is V . Consider
the matrix C = A · B − A ·H B. Then, CV 0 ,V 0 = 0 and CW,V 0 = 0.
Proof:
To see this, note that for every z, ρ2 (z) is of the following form.
I 0
ρ2 (z) =
0 Az
where the first block of rows correspond to V 0 and the next block of rows correspond to W . Now,
for any matrix M , let MV,W ∈ Cdim(V )×dim(W ) correspond to block of M with rows from V and
columns from W (In particular, it makes sense whenever W = V or W = V 0 ). Hence, a matrix M
can be written in the block matrix form as
MV 0 ,V 0 MV 0 ,W
M=
MW,V 0 MW,W
In particular, the product of any matrix ρ2 (z) with M is
MV 0 ,V 0 MV 0 ,W · Az
M · ρ2 (z) =
MW,V 0 MW,W · Az
Now, it is obvious that (A · B)V 0 ,V 0 = AV 0 ,V 0 and (A · B)W,V 0 = AW,V 0 . However, using that the
graph H is regular, we also get that (A ·H B)V 0 ,V 0 = AV 0 ,V 0 and (A ·H B)W,V 0 = AW,V 0 . This implies
that CW,V 0 = 0 and CV 0 ,V 0 = 0.
Analogous to the above claim, we also have the following claim :
Claim 5.20 Let ρ1 , ρ2 : {0, 1}n → Cw×w . Also, let the non-trivial subspace for all elements in
the domain of ρ1 be V and the non-trivial subspace of all elements in the range of ρ2 be W with
W ⊆ V . Let Γ1 : {0, 1}t → {0, 1}n and Γ2 : {0, 1}t → {0, 1}n . Now, consider the two products
1 X
1 X
A= t
ρ1 (Γ1 (x))
B= t
ρ2 (Γ2 (x))
2
2
t
t
x∈{0,1}
x∈{0,1}
Let V = W ⊕ V 0 i.e. V 0 is the orthogonal subspace of W when the ambient space is V . Consider
the matrix C = B · A − B ·H A. Then, CV 0 ,V 0 = 0 and CV 0 ,W = 0.
26
As we have said before, we will be dealing with subspaces of matrices. In case, M ∈ Cw×w , note
2
that we can view M as an element of Cw . Also, observe that if Cw = V1 ⊕ V2 where V1 and V2
are orthogonal, then MV1 ,V2 can be viewed as a projection of M along the subspace with the rows
indexed by V1 and the columns indexed by V2 . We also use kM kF,V1 ,V2 to denote the length of this
projection. Next, we prove the following claim.
Claim 5.21 Let A : W → W be an operator such that for any x ∈ W , such that the singular value
of A is bounded by 1 − . Consider the operator B = A ⊕ IdW ⊥ . Then,
kX · BkF,W,W ≤ (1 − )kXkF,W,W
(X · B)W,W ⊥ = XW,W ⊥
Proof:
kX · BkF,W ⊥ ,W ≤ (1 − )kXkF,W ⊥ ,W
(X · B)W ⊥ ,W ⊥ = XW ⊥ ,W ⊥
Let us write X, B and X · B in the block matrix form.
X1 X2
I 0
X1 X2 A
X=
B=
X ·B =
X3 X4
0 A
X3 X4 A
From this point, the second part of the claim is obvious. Also, (X · B)W,W = XW,W · A. Using the
bound on the singular value of A, we get that
kX · BkF,W,W ≤ (1 − )kXkF,W,W
Likewise, we get the other claim.
The next claim is analogous to the last one.
Claim 5.22 Let A : W → W be an operator such that for any x ∈ W , such that the singular value
of A is bounded by 1 − . Consider the operator B = A ⊕ IdW ⊥ . Then,
kB · XkF,W,W ≤ (1 − )kXkF,W,W
(B · X)W ⊥ ,W = XW ⊥ ,W
kB · XkF,W,W ⊥ ≤ (1 − )kXkF,W,W ⊥
(B · X)W ⊥ ,W ⊥ = XW ⊥ ,W ⊥
2
Now, as we have said, we will treat a matrix X ∈ Cw×w as an element in Cw . Further, suppose
B ∈ Cw×w is another matrix. Then, note that we have a linear transformation Br (defined by
the matrix B) as follows Br : Cw×w → Cw×w as Br : X 7→ X · B. Likewise, we can define
B` : X 7→ B · X. For a map B` or Br as defined above, we can define an invariant subspace
namely a subspace which B` (or Br respectively) maps to itself. The space orthogonal to the
invariant subspace will be called the non-trivial subspace. In particular, for the kind of B defined
in Claim 5.21 and Claim 5.22, in the space orthogonal to its invariant subspace, B` and Br are
“contractive” i.e. they shrink every vector by a factor of 1 − . For a map A like the above, we
use Inv(A) to denote its invariant space and Inv ⊥ (A) to denote the space orthogonal to Inv(A).
We call such maps A as “label maps”. Note that for any node xt in the true tree, L(xt ) defines
maps like the above i.e. L(xt )` : X 7→ L(xt ) · X and L(xt )r : X 7→ X · L(xt ). The maps of
interest to us (which will become clear after a bit) are maps defined in the following way : Let
A`,i : X 7→ Li · X and Ar,i : X 7→ X · Li where Li is some label in the true tree. Now, finally
consider the map An : X 7→ A`,n ◦ Ar,n ◦ . . . A`,1 ◦ Ar,1 (X).
Q We first note
Q that by associativity of
matrix multiplication, this map is the same as X 7→ ( ni=1 A`,i ) · X · ( ni=1 Ar,i ). Note that the
map An is basically a composition of label maps. However, it also satisfies the following properties
(given in the next claim) :
27
Claim 5.23 Let us defined map An as above and let L be a label in the true tree such that we
define A0n : X 7→ L · (An (X)) (Here An (X) denotes the output when the map An is applied to X).
Let W = Inv(An ) and W 0 = Inv(A0n ) and W 0 ( W . If Let W ⊥ and W 0⊥ represent the orthogonal
complements of W and W 0 . If Y is such that the length of its projection along W 0⊥ is√ and along
2
W ⊥ is δ, then its projection along the space orthogonal to the Inv(L) is at least w−6w · 2 − δ 2 −δ.
2
Proof: We start by showing that for any v ∈ W 0⊥ , kA0n · vk ≤ (1 − w−6w )kvk2 . To see this, note
that the map A0n can be realized in the following way : First,
Qnbegin byt writing X as an element
2
in Cw by writing it in row major order and then
multiply
(
i=1 Ar,i ) to the left. This has the
Qn
t
same effect as multiplying X (as a matrix) by ( i=1 Ar,i ) to its right. Then, permute the elements
of v to change from
Q row major order to column major order. Subsequently, the resulting vector
is multiplied by ( ni=1 A`,i ). This gives An (X). Now, multiplying again by L to the left achieves
A0n (X). Note that we can realize all the steps by a permutation branching program of size w2 . In
2
particular, we can apply Lemma 5.11 to conclude that kA0n · vk ≤ (1 − w−6w )kvk2 . Now, this holds
even for a v which lies in W and hence lies in W ∩ W 0⊥ . In other words, for such a v, An · v = v.
−6w2 kvk on the space orthogonal to
Now, this means that such a v has a projection of size
2
√ w
Inv(L). We now see that Y has at least a projection of 2 − δ 2 on W ∩ W 0⊥ from which we derive
the result.
We now present the main convergence lemma of this section. We recall the setting for this. In
0 . Further, the labelings are as follows.
the pseudo tree, we have nodes x00 , x01 , . . . , x0m and y10 , . . . , ym
For i > 0, there are two possible cases : Either, L(x0i ) = L(x0i−1 ) ·H L(yi0 ) (corresponding to right
multiplication) or L(x0i ) = L(yi0 ) ·H L(x0i−1 ) (corresponding to left multiplication). Correspondingly,
we also defined the intermediate process where L(x00i ) = L(x00i−1 ) · L(yi0 ) or L(x00i ) = L(yi0 ) · L(x00i−1 )
with L(x000 ) = L(x00 ).
We will also define label maps Ai , A”i : Cw×w → Cw×w in the following way : For the case of left
multiplication, Ai = A`,i ◦Ai−1 and A00i = A”`,i ◦A”i−1 . Likewise, for the case of right multiplication,
Ai = Ai−1 ◦ Ar,i and A”i = A”i−1 ◦ A”r,i The proof of the next lemma essentially follows the proof
of the “Key Convergence Lemma” from [KNP10] by doing an induction on the dimension of the
non-trivial subspace of these label maps.
0 be nodes in the pseudo tree as defined earlier and
Lemma 5.24 Let x00 , x01 , . . . .x0m and y10 , . . . , ym
x000 , x001 , ... , x00m be the nodes of the intermediate process as defined earlier. Let the non-trivial
subspace of A0i (for all i) be strictly contained in the non-trivial subspace of A00 . Let constants hi
and di (defined recursively as follows)
d0 = w−9w
h0 = λ
Let the non-trivial subspace of
Q`
0
i=1 Ai
dm =
hm =
w−20w dm−1
600
1200w10w hm−1
dm−1
be V1 . Let dim(V1 ) = β1 . Then,
||L(x00` ) − L(x0` )||V1 ≤ max{hβ1 , ||L(x000 ) − L(x00 )||F,V1 (1 − dβ1 )}
Proof:
We are going to prove this by induction on the dimension of V1 . First of all, if the
dimension of V1 = 0, then there is nothing to prove.
28
So, we now assume that dimension of V1 ≥ 1. Let dim(V1 ) = m. Let V be the non-trivial
subspace of A0i . We first prove the assertion for the special case when V ( V1 . Now, assume that
A0` = A0`−1 · A0r,` (the case of left multiplication is exactly the same). Let the non-trivial subspace
of A0r,` be V 00 . Having fixed the notation, let us define γ1 = kA00 − A000 kF,V1 . Let us also define
α = w−10w γ1 .
m−1
0
00
We first consider the case when α ≥ 2h
dm−1 . We claim that in this case, kA` − A` kF,V1 ≤ (1 −
dm )kA00 − A000 kF,V1 . To analyze this, we further divide this into subcases.
First case is when : kA00 ) − A000 kF,V ≥ α . Note that in this case, α ≥ 2hm−1 which means
that kA00 − A000 kF,V ≥ 2hm−1 . By induction hypothesis, we can say that kA0`−1 − A00`−1 kF,V ≥
(1 − dm−1 )kA00 − A000 kF,V . Note that by Claim 5.21, we can also say that
kA0`−1 − A00`−1 kF,V ⊥ = kA00 − A000 kF,V ⊥
Here V ⊥ represents the orthogonal space of V when the ambient space is V1 . This implies that
kA0`−1 − A00`−1 k2F,V1 ≤ kA00 − A”0 k2F,V1 − (2dm−1 + d2m−1 )kA00 − A000 k2F,V
This also implies that
kA0`−1 − A00`−1 kF,V1 ≤ kA00 − A000 kF,V1 (1 − w−10w dm−1 )
Further, from here, we get that
kA0` − A00` )kF,V1
≤ kA00 − A000 kF,V1 (1 − w−10w dm−1 ) + λ
≤ kA00 − A000 kF,V1 (1 − w−20w dm−1 ) = kA00 − A000 kF,V1 (1 − dm )
The penultimate inequality uses that (w−10w dm−1 γ1 )/2 ≥ αdm−1 /2 ≥ hm−1 ≥ λ
The second possibility is : kA00 − A000 kF,V < α. This implies the following two things :
By induction hypothesis,
kA0`−1 − A00`−1 kF,V < α + hm−1
Also, by Claim 5.23, we get that
kA00
−
A000 kF,V 00
≤w
−4w
q
γ12 − α2
From this, we also get that
kA0`−1
−
A00`−1 kF,V 00
≤w
−4w
q
γ12 − α2 − α − hm−1
Now, using Claim 5.21, we get that
kA0`−1 − A00`−1 kF,V 00 ≤ (1 − w−3w )(w−4w
q
γ12 − α2 − α − hm−1 )
This implies that
kA0` − A00` kF,V 00
≤ γ1 + hm−1 − w−3w (w−4w
q
γ12 − α2 − α − hm−1 )
≤ γ1 (1 − w−7w ) + hm−1 (1 + w−3w ) + w−3w α
29
Plugging the values, we get that the above expression is bounded by γ1 (1 − dm ).
We now consider the case when α ≤
2hm−1
dm−1 .
In this case, by induction hypothesis, we can say that
kA0`−1 − A00`−1 kF,V < α + hm−1
From the fact that γ1 = w10w α ≤
2w10w hm−1
,
dm−1
we get that
kA0`−1 − A00`−1 )kF,V1 <
3w10w hm−1
dm−1
From this, we get that
kA0` − A00` kF,V1 <
3w10w hm−1
4w10w hm−1
+λ≤
dm−1
dm−1
So, we see that one of the two things happen :
• If γ1 ≥
w10w hm−1
dm−1 ,
then kA0` − A00` )kF,V1 ≤ (1 − dm )γ1 = (1 − dm )kA00 − A000 kF,V1 .
• If γ1 ≤
w10w hm−1
dm−1 ,
then kA0` − A00` kF,V1 ≤ kA00 − A000 kF,V1 +
4w10w hm−1
.
dm−1
Q
0
We had assumed that the non-trivial subspace of `−1
i=1 Ar,i (we are assuming right multiplications
Q
here but its the same with left multiplications) was strictly contained in ì=1 A0r,i . This need not
Q
be true in general but we can do the following. If the non-trivial subspace of ì=1 A0r,i is W ,
then we break the sequence into blocks i.e., we break [`] into contiguous subsets {1, . . . , s1 }, {s1 +
1, . . . , s2 }, . . . , {sk , . . . , sk+1 = `}. This block structure
has the property that with the possible
Qsj+1
A0r,i is W but the non-trivial subspace
exception of the last block, the non-trivial subspace of i=s
j
Qsj+1 −1 0
Ar,i is strictly contained in W .
i=sj
Now consider any block {a, . . . , b}. Either,kA00 − A000 kF,V1 goes down by a factor of dm > 0 or in case
10w h
10w h
m−1
m−1
it is less than w dm−1
, then it can increase by at most a constant namely 4w dm−1
. Therefore,
10w
hm−1
if the end of the penultimate block is j, then kA00 − A000 kF,V1 ≤ 5w dm−1
. The last block can be
now dealt with recursively since the dimension of the non-trivial subspace of the labels has gone
down by at least 1.
3
We now notice that hm ≤ 10w ww (we use that m ≤ w2 ). Putting, everything together, we get
3
that kA0t − A00t k − kA00 − A000 k ≤ 10w ww which proves our claim.
6
Pseudorandomness for regular branching programs
In this section, we show that constant width regular branching programs can be fooled with a
seed of length O(log n · (log log n + log(1/))). As before, the PRG we use continues to be the
INW pseudorandom generator. Our analysis will have some similarities with the analysis of the
permutation branching programs but will be significantly different in parts. We now state the
formal theorem we will prove.
3
Theorem 6.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = · 2−7w · log3w n. Then the
output of Γ(Ut ) is -indistinguishable from Un for read-once width w regular branching programs of
length n.
30
Using the standard INW generator, we get the following corollary.
Corollary 6.2 There is a polynomial time computable function Γ : {0, 1}t → {0, 1}n with t =
O(log n · (w3 + w · log log n + log(1/)) such that Γ(Ut ) is -indistinguishable from Un for read-once
regular branching programs of width w.
To prove this theorem, as before we will construct a true tree and a pseudo tree and the corresponding labelings on the trees. Our analysis however will be somewhat different. The starting
point of the analysis is the following observation (using Hall’s theorem).
Observation 6.3 Let M0i and M1i be the walk matrices corresponding to the transition from layer
i to layer i+1 in a regular branching program. Then, up to renaming of nodes, there is a permutation
matrix π such that Id + π = M0i + M1i .
Proof: Look at all the edges from layer i to layer i + 1. Note that the total number of edges
coming out of every node in the ith layer is 2 and so is the total number of edges coming into any
vertex in the (i+1)th layer. Therefore, by Hall’s theorem, we can partition these edges into two sets
of equal size such that each of them corresponds to a matching between layer i and layer i + 1. Of
course, once we have the matchings, we can rename the vertices such that one of them corresponds
to the identity permutation while the other corresponds to some permutation π. This proves our
claim.
Thus labels of the true tree correspond to a certain permutation branching program. In particular,
for a node xt in the true tree, as before we can define the fixed point subspace and the nontrivial subspace. The important difference between the case of regular branching programs and
permutation branching programs, of course, is that while the labelings of the pseudo tree had the
same fixed point and non-trivial subspace as the labelings of the true tree for permutation branching
programs, in the case of regular branching programs, it is no longer the case. We begin by making
several observations.
Claim 6.4 Let L(xt ) be the labeling of the true tree and L(xp ) be the label of the corresponding node
in the pseudo tree. If W is the fixed point subspace of L(xt ), then for every x ∈ W , x·L(xp )·x = kxk22
Proof:
Note that as L(xt ) is a convex combination of permutation matrices, the fixed point
subspace of L(xt ) has the following structure : It is the direct sum of one dimensional vector spaces
where each of this one-dimensional vector spaces is uniform over a subset H. Further, the subsets
corresponding to different one-dimensional spaces are disjoint. Call these subsets Hi . Now, note
that L(xp ) is a convex combination of product of walk matrices such that all the walk matrices
have the property that vertices in Hi are mapped to vertices in Hi . Now, let yi be a vector which is
uniform on Hi . Consider any product of these walk matrices (Say X). Then, it is easy to see that
yi ·X ·yi = kyi k22 . Thus, it holds that if y is a vector in the fixed point subspace, then y ·X ·y = kyk22 .
Since it holds for a every product of walk matrices, it also holds for their convex combination and
hence y · L(xp ) · y = kyk22 .
From the above observation, we can conclude the following : At any stage of the computation,
L(xt ) and L(xp ) can be block diagonalized to have the following structure.
"
#
"
#
I 0
I
E0
L(xt ) =
L(xp ) =
0 A
E A + EW
31
In the above, A is the operator corresponding to the subspace W which is the non-trivial subspace
of L(xt ). Thus, the error matrix
"
#
0 E0
L(xp ) − L(xt ) =
E EW
With this observation, the way in which we bound the norm of the error L(xp ) − L(xt ) will be as
in the case of the permutation branching programs. Consider a node xt in the true tree, xp in the
pseudo tree and let yt and zt be the children in the true tree and yp and zp be the corresponding
children in the pseudo tree. We will consider three cases :
• The non-trivial subspace of L(yt ) and L(zt ) are exactly the same.
• The non-trivial subspace of L(yt ) and L(zt ) are incomparable.
• The non-trivial subspace of L(yt ) is strictly contained in L(zt ) or the other way round.
More precisely, let us assume that for any xt such that the dimension of its non-trivial subspace
is d − 1, we can bound the error kL(xt ) − L(xp )k2 by a quantity E(d − 1, n, w). We then try to
bound E(d, n, w) in terms of E(d − 1, n, w). To do this, for every node xt in the true tree whose
non-trivial subspace is d is assigned an integer (call it Γd (xt )). In case, xt is such that both its
children have a non-trivial subspace of dimension less than d, then Γd (xt ) = 1. If only one of the
children of xt has a non-trivial subspace of dimension d (say the child is yt ), then Γd (xt ) = Γd (yt ).
In all other case, Γd (xt ) = Γd (yt ) + Γd (zt ).
We now consider all nodes xt (such that L(xt ) has a non-trivial subspace of dimension d) and the
non-trivial subspaces of L(yt ) and L(zt ) are incomparable. Let the set of these nodes be called
Ad,1 . This means that the non-trivial subspaces of L(yt ) and L(zt ) have dimension at most d − 1.
The following claim bounds the error for all such nodes xt .
Claim 6.5 Let xt ∈ Ad,1 . Then, kL(xt ) − L(xp )k ≤ 3(E(n − 1, d, w) + λ).
Proof:
kL(xt )−L(xp )k ≤ kL(yt )k·kL(zt )−L(zp )k+kL(zt )k·kL(yt )−L(yp )k+kL(zt )−L(zp )k·kL(yt )−L(yp )k+λ
Now, as we have said, L(xt ), L(yt ) and L(zt ) are doubly stochastic matrices (using Observation 6.3)
and hence we can say that their 2-norms are bounded by 1. In particular, we get that
kL(xt ) − L(xp )k ≤ kL(zt ) − L(zp )k + kL(yt ) − L(yp )k + kL(zt ) − L(zp )k · kL(yt ) − L(yp )k + λ (10)
Now, we know that kL(zt ) − L(zp )k, kL(yt ) − L(yp )k ≤ E(d − 1, n, w). Using E(d − 1, n, w) ≤ 1, we
get that
kL(xt ) − L(xp )k ≤ 3E(d − 1, n, w)
which proves the claim.
We now consider nodes xt whose non-trivial subspace have dimension d and the non-trivial subspace
of L(yt ) is strictly contained in that of zt or vice-versa. We call this set of nodes as Ad,2 . To state
the next claim, we use h(xt ) to denote the length of the longest path starting at xt and going down
the tree composed entirely of nodes from Ad,2 .
32
Claim 6.6 For any node xt ∈ Ad,2 and height h(xt ), kL(xt ) − L(xp )k ≤ 3(E(n − 1, d, w) + λ)h(xt )
Proof: We will prove this by induction. In the base case, h(xt ) = 1, then xt ∈ Ad,1 and then our
claim trivially holds. Now, we do the inductive case. In this case, assume L(yt ) is strictly contained
in L(zt ) (the other case is symmetric). Now, by induction,
kL(zt ) − L(zp )k ≤ 3(E(d − 1, n, w) + λ)(h(zt ) + 1)
kL(yt ) − L(yp )k ≤ E(d − 1, n, w)
Using (10), we get the claimed bound.
This means that for any node xt ∈ Ad,1 ∪ Ad,2 , kL(xt ) − L(xp )k ≤ 3(E(d − 1, n, w) + λ) · log n as
h(xt ) is always bounded by log n. We now consider nodes of the third kind namely nodes xt whose
non-trivial subspace is of dimension d and its children yt and zt also have (the same) non-trivial
subspace of dimension d. Call the set of these nodes Ad,3 . We make the following observation.
Observation 6.7 For a particular subspace W of dimension d, consider all nodes xt such that the
non-trivial subspace of xt is W . Then, these nodes form a disjoint union of trees. Also, the leaf
nodes of the tree have children from Ad,1 ∪ Ad,2 .
With this observation, our aim essentially becomes to bound kL(xt ) − L(xp )k for any node xt
which is the root of such a tree. To do this, we start by observing that for any such node xt whose
non-trivial space is W , the structure of L(yt ) and L(zt ) are as follows :
"
#
"
#
"
#
I
Ey00
0 Ey00
I 0
L(yp ) =
L(yp ) − L(yt ) =
L(yt ) =
Ey Ay + Ey0
Ey Ey0
0 Ay
Likewise, we can say that
"
#
I 0
L(zt ) =
0 Az
"
L(zp ) =
I
#
Ez00
"
L(zp ) − L(zt ) =
Ez Az + Ez0
0
Ez00
Ez
Ez0
#
Now, note that
"
L(yt ) · (L(zp ) − L(zt )) =
0
#
Ez00
Ay · Ez Ay · Ez0
Similarly, we have that
"
(L(yp ) − L(yt )) · L(zt ) =
0
Ey00 · Az
Ey
Az · Ey0
#
Finally,
"
(L(yp ) − L(yt )) · (L(zp ) − L(zt )) =
0
Ey00 · Ez0
#
Ey0 · Ez Ey0 · Ez0 + Ey · Ez00
It is important to note here that the last equality uses Claim 6.4 (i.e. in proving one of the entries
to be zero).
To control the error for nodes in Ad,3 , we will switch from the 2-norm to the Frobenius norm. We
begin by noting that L is an P
orthonormal
basis for W and L0 is orthonormal basis for W ⊥ . Note
P
that for kL(xp ) − L(xt )k2F = v∈L∪L0 w∈L hw, L(xp ) − L(xt ), vi This can be divided into
XX
XX
hw, L(xp ) − L(xt ), vi +
hw, L(xp ) − L(xt ), vi
v∈L w∈L
v∈L0 w∈L
33
We now define kL(xp ) − L(xt )kF,W =
following inequality :
P
v∈L
P
w∈L hw, L(xp )
− L(xt ), vi. In particular, we have the
kL(xp ) − L(xt )kF,W ≤ kAy k2 · kL(zp ) − L(zt )kF,W + kAz k2 · kL(yp ) − L(yt )kF,W + λ
We now observe that
"
L(xt ) =
I
#
0
0 Ax
"
=
I
0
(11)
#
0 Ay · Az
Note the because for any node xt , L(xt ) correspond to labelings from a permutation branching
program, (11) is exactly like (6) from Claim B.6. In particular, using the exact same proof, one
can get the following claim.
Claim 6.8 For any node xt whose non-trivial subspace is W , kL(xp ) − L(xt )kF,W ≤ 3 · w18w ·
(E(d − 1, n, w) + λ) · log n
P
P
The only term which remains to be bounded is v∈L0 w∈L hw, L(xp ) − L(xt ), vi (denoted henceforth by kL(xp ) − L(xt )kF,W ⊥ ,W ). In other words, we need to bound the norm of the term in the
lower left block of L(xp ) − L(xt ). This analysis is again pretty easy. As a starting point, we make
the following claim.
Claim 6.9 For any node xt in the true tree, kAx k2 ≤ (1 − w−3w )Γd (xt ) .
Proof: For Γd (xt ) = 1, this is clearly true (because, as we have remarked before, L(xt ) correspond
to labels of a permutation branching program). If Γd (xt ) > 1, then clearly xt ∈ Ad,3 . Then, consider
its two children yt and zt . By induction hypothesis, kAy k2 ≤ (1 − w−3w )Γd (yt ) and kAz k2 ≤ (1 −
w−3w )Γd (zt ) . Since Γd (xt ) = Γd (yt )+Γd (zt ) and Ax = Ay ·Az (and that 2-norms is submultiplicative),
we have the claim.
Before making the next claim, we define the quantity γ = 3 · w18w · (E(d − 1, n, w) + λ) · log n (which
is simply the error term appearing in Claim 6.8).
Claim 6.10 For any node xt , kL(xp ) − L(xt )kF,W ⊥ ,W ≤ (2 · Γd (xt ) − 1)γ.
Proof: Again, we prove this by induction. In the base case, i.e., for Γd (xt ) = 1, this is trivially
a true assertion. We first observe that
kL(xp ) − L(xt )kF,W ⊥ ,W
≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W
+ kL(zp ) − L(zt )kF,W · kL(yp ) − L(yt )kF,W ⊥ ,W
Next, we use Claim 6.8 to bound kL(zp ) − L(zt )kF,W . Using this and kAy k ≤ 1 ( and the fact that
kL(yp ) − L(yt )kF,W ⊥ ,W ≤ 1), we get that
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ
Now, using Γd (xt ) = Γd (yt ) + Γd (zt ), we have the proof.
Now, let M = w3w log n and consider all nodes xt in Ad,3 such that Γd (xt ) ≤ M . Clearly, for any
such node,
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ 6 · w21w log3 n · (E(d − 1, n, w) + λ)
34
Again, for notational convenience, we define δ0 = 6 · w21w log3 n · (E(d − 1, n, w) + λ). Now,
we consider any node xt in Ad,3 such that Γd (xt ) > M . We now define a quantity E(xt ) for
every such node in the true tree. This quantity is defined inductively in the following way : For
every node xt with Γ(xt ) ≤ M , define E(xt ) = δ0 and for every other node, xt , define E(xt ) =
(1 + 2/ log n) · max{E(yt ), E(zt )}. The following is the main claim.
Claim 6.11 For any node xt with Γd (xt ) > M , kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(xt ).
Proof:
Note that
kL(xp ) − L(xt )kF,W ⊥ ,W
≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W
+ kL(zp ) − L(zt )kF,W · kL(yp ) − L(yt )kF,W ⊥ ,W
First, we use Claim 6.8 to bound kL(zp ) − L(zt )kF,W by γ. With that, we get
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ
Next, we consider two cases : The first case is if Γd (yt ) ≤ w3w log log n and the second case is when
Γd (yt ) > w3w log log n. In the first case, we note that because Γd (zt ) = Γd (xt ) − Γd (yt ), hence we
have that E(zt ) ≥ δ0 . This means that
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(zt ) + γ + kL(yp ) − L(yt )kF,W ⊥ ,W
Using Claim 6.10, we get that
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(zt ) + 2w3w log log n · γ
Using the fact that E(zt ) ≥ δ0 = 2w3w log2 n · γ, we get the claim.
The next possibility to consider is that Γd (yt ) > w3w log log n. In this case, using Claim 6.9, we get
that kAy k ≤ 1/ log n. Then, we get that
kL(xp ) − L(xt )kF,W ⊥ ,W ≤
1
· kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ
log n
Again using the fact that max E(zt ), E(yt ) ≥ δ0 , we get the claimed bound.
Since the height of the INW tree is at most log n, using Claim 6.11, it is clear that for every xt
with Γd (xt ) > M , we have
log n
2
kL(xp ) − L(xt )kF,W ⊥ ,W ≤ 1 +
· δ0 ≤ 10δ0 .
log n
Likewise, we also get that
kL(xp ) − L(xt )kF,W,W ⊥ ≤
2
1+
log n
log n
· δ0 ≤ 10δ0 .
Thus, combining the last equation with Claim 6.8, we get that for any node xt ∈ Ad,3 , we have
that kL(xp ) − L(xt )k ≤ 22δ0 . Since, this is already true for nodes xt ∈ Ad,1 ∪ Ad,2 (from Claim 6.5
and Claim 6.6), we have that for any node xt whose non-trivial subspace is of dimension d,
kL(xt ) − L(xp )k ≤ 22δ0 = 72 · w21w log3 n · (E(d − 1, n, w) + λ)
This implies that E(d, n, w) ≤ 72 · w21w log3 n · (E(d − 1, n, w) + λ). Setting E(0, n, w) = λ, we get
2
that E(w, n, w) ≤ 72w · w21w log3w n · λ. As the non-trivial subspace of any node xt in the true
tree has dimension at most w, we get Theorem 6.1
35
Acknowledgements
We would like to thank Omer Reingold, Thomas Steinke, Luca Trevisan, Salil Vadhan, and Avi
Wigderson.
References
[Bar89]
David A. Mix Barrington. Bounded-Width Polynomial-Size Branching Programs Recognize Exactly Those Languages in NC1 . Journal of Computer and System Sciences,
38(1):150–164, 1989. 1
[BM84]
Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences
of pseudorandom bits. SIAM Journal on Computing, 13(4):850–864, 1984. Preliminary
version in Proc. of FOCS’82. 1
[BRRY10] Mark Braverman, Anup Rao, Ran Raz, and Amir Yehudayoff. Pseudorandom generators for regular branching programs. In Proceedings of the 51st IEEE Symposium on
Foundations of Computer Science, pages 40–47, 2010. 1, 2, 7, 25
[BV10]
Joshua Brody and Elad Verbin. The Coin Problem and pseudorandomness for Branching programs . In Proceedings of the 51st IEEE Symposium on Foundations of Computer
Science, pages 30–39, 2010. 1, 2, 7, 25
[GMRZ10] Parikshit Gopalan, Raghu Meka, Omer Reingold, and David Zuckerman. Pseudorandom Generators for Combinatorial Shapes. Technical Report TR10-176, Electronic
Colloquium on Computational Complexity, 2010. 1, 7
[INW94]
Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network
algorithms. In Proceedings of the 26th ACM Symposium on Theory of Computing, pages
356–364, 1994. 1, 2, 3
[IW97]
Russell Impagliazzo and Avi Wigderson. P = BP P unless E has sub-exponential
circuits. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages
220–229, 1997. 1
[KI04]
Valentine Kabanets and Russell Impagliazzo. Derandomizing polynomial identity tests
means proving circuit lower bounds. Computational Complexity, 13(1-2):1–46, 2004. 1
[KNP10]
Michal Koucký, Prajakta Nimbhorkar, and Pavel Pudlák. Pseudorandomness for group
products. Technical Report TR10-113, Electronic Colloquium on Computational Complexity, 2010. 2, 7, 12, 17, 19, 28
[LRTV09] Shachar Lovett, Omer Reingold, Luca Trevisan, and Salil P. Vadhan. Pseudorandom
Bit Generators That Fool Modular Sums. In APPROX-RANDOM, pages 615–630,
2009. 1, 7
[Lu02]
Chi-Jen Lu. Improved Pseudorandom Generators for Combinatorial Rectangles. Combinatorica, 22(3):417–434, 2002. 1, 7
[MZ09]
Raghu Meka and David Zuckerman. Small-Bias Spaces for Group Products. In Proceedings of APPROX-RANDOM, pages 658–672, 2009. 1, 2, 7, 12
36
[Nis92]
Noam Nisan. Pseudorandom generators for space bounded computation. Combinatorica, 12(4):449–461, 1992. Preliminary version in Proc. of STOC’90. 1
[NW94]
Noam Nisan and Avi Wigderson. Hardness vs randomness. Journal of Computer and
System Sciences, 49:149–167, 1994. Preliminary version in Proc. of FOCS’88. 1
[RR99]
Ran Raz and Omer Reingold. On recycling randomness in space bounded computation.
In Proceedings of the 31st ACM Symposium on Theory of Computing, pages 159–168,
1999. 1
[RVW00]
Omer Reingold, Salil P. Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph
product, and new constant-degree expanders and extractors. In Proceedings of the 41st
IEEE Symposium on Foundations of Computer Science, 2000. 2
[Sav70]
Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177–192, 1970. 1
[Tel05]
Constantin Telerman.
Lecture notes on Representation theory.
http://www.math.berkeley.edu/∼telerman, 2005. 12
[vv10]
Jiri Šı́ma and Stanislav Žák. A Polynomial time construction of Hitting set for read-once
branching programs of width 3. Technical Report TR10-088, Electronic Colloquium on
Computational Complexity, 2010. 2
[Yao82]
Andrew C. Yao. Theory and applications of trapdoor functions. In Proceedings of the
23th IEEE Symposium on Foundations of Computer Science, pages 80–91, 1982. 1
37
Available at

Download Report

Pseudorandomness for permutation and regular branching programs

Paperzz.com

Your Paperzz