Pseudorandomness for permutation and regular branching programs Anindya De∗ March 6, 2013 Abstract In this paper, we prove the following two results about the INW pseudorandom generator • It fools constant width permutation branching programs with a seed of length O(log n · log(1/)). • It fools constant width regular branching programs with a seed of length O(log n·(log log n+ log(1/)). The results match the recent results of Koucký et al. (STOC 2011) and Braverman et al. and Brody and Verbin (FOCS 2010). We improve the dependence of the seed on the width for permutation branching programs. Probably, far more importantly, our work proceeds by analyzing the singular values of the stochastic matrices that arise in the transitions of the branching program which hopefully makes its ambit bigger. As a corollary of our techniques, we present new results on the “small biased spaces for group products” problem [MZ09]. We get a pseudorandom generator with seed length log n · (log |G| + log(1/)). Previously, using the result of Koucký et al. , it was possible to get a seed length of log n · (|G|O(1) + log(1/)) for this problem. Keywords: Pseudorandom generators, Permutation branching programs, expander products ∗ Computer Science Division, University of California, Berkeley, CA, USA. [email protected]. 1 Introduction One of the most fundamental questions in complexity theory is whether one can save on computational resources like space and time by using randomness. While it is known that randomness is indispensable in settings like cryptography and distributed computation, a long line of research [Yao82, BM84, NW94, IW97] has shown that assuming appropriate lower bounds on the circuit complexity of some functions, one can derandomize every randomized polynomial time algorithm i.e. show P = BPP. Unfortunately, it has also been shown that any non-trivial derandomization of BPP implies circuit lower bounds [KI04] which seem out of reach of the present state of the art. Thus, getting unconditional derandomization of complexity classes like BPP and MA look out of reach for current techniques. This has led to a shift of focus towards derandomization of “low-level” complexity classes where one can hope to get uncondtional results. One of the most important problems in this line of research is to derandomize bounded space computation. The ultimate aim of this line of research is to prove RL = L i.e., to show that any computation that can be solved in randomized logspace can be simulated in deterministic logspace. Savitch [Sav70] showed that RL ⊂ NL ⊂ L2 i.e. randomized logspace computation and in fact, non-deterministic logspace computation can be simulated deterministically in O(log2 n) space. Nisan [Nis92] also showed that RL ⊂ L2 by constructing a pseudorandom generator (PRG) which can stretch a seed of length O(log2 n) into n bits that fools logspace machines. In fact, Nisan’s PRG fools read-once branching programs (which we define next) of polynomial length and width. Definition 1.1 A read-once branching program (BP) of width w and length n is a directed multilayer graph with n + 1 layers such that each layer has w nodes with edges going from the ith layer to the (i + 1)th layer (0 ≤ i ≤ n − 1). For every node (except those in the last layer), there are exactly two edges leaving that node with one marked 0 and the other marked 1. There is a designated start state in the first layer and a set of “accepting” states in the (n + 1)th layer. Remark 1.2 In this paper, whenever we refer to branching programs, we mean read-once branching programs. We note that if the read-once restriction is not imposed, then in fact, width 5 branching programs capture NC1 [Bar89]. A BP is said to be a permutation branching program (PBP) if for any layer the transitions corresponding to 0, (resp. 1) are matchings. A BP is said to be a regular branching program (RBP) if the number of edges coming into every node is either 0 or 2. A given BP accepts an input x ∈ {0, 1}n if starting from the start state and following the path specified by the input, it ends in one of the accepting states. It is not hard to see that randomized log space computation is a uniform version of BPs with w = nO(1) . Coming back to PRGs for branching programs, after [Nis92], several papers [INW94, RR99] improved on some parameters of the construction in [Nis92]. However, improving on the O(log2 n) seed remained (and continues to remain) open in the following minimal sense: It is not known how to construct a PRG with seed length o(log2 n) seed with constant error for width 3 BPs. Faced with this difficulty, research was focussed on solving special cases of this problem with better seed lengths ([Lu02],[LRTV09],[GMRZ10]). Attention has also been directed towards getting better seed length when there is some structural restriction on the BPs. In particular, when the branching program is regular, then Braverman et al. [BRRY10] and Brody and Verbin [BV10] constructed pseudorandom generators with seed length O(log n·(log log n+log(1/))) which fool constant width 1 branching programs with error . The dependence of the seed length on width was better for the latter paper which obtained O(log n · (log w + log log n + log(1/))). Koucký, Nimbhorkar and Pudlák [KNP10] improved on this further for the case of permutation branching programs. In particular, for fooling constant width permutation branching programs with error , they got a seed length of O(log n · log(1/)). In their analysis, they transform this problem into the language of group products1 and then analyze their construction using basic properties of groups. In this vein, we should also mention that S̆ı́ma and Z̆ák recently got a breakthrough by constructing a hitting set with seed length O(log n) for width-3 branching programs [vv10]. However, their techniques seem totally disjoint from other works in this line of research. 1.1 Our results We present a pseudorandom generator which -fools permutation branching programs of length n and width w using a seed of length O(log n · (w8 + log(1/)). The PRG we use is the INW generator [INW94] (as was in the previous works [BV10, BRRY10, KNP10]). We remark that Koucký et al. obtained a seed length of O(log n · (w! + log(1/)) for fooling permutation branching programs using the same generator. What we consider interesting is that our analysis is based on analyzing spectra of the stochastic matrices that arise in the transitions of the branching program. Thus, we see it is a more combinatorial approach which might be helpful in other contexts as well. This is in contrast to the result of Koucký et al. which is based on a group theoretic approach and is thus difficult to adapt to more combinatorial settings. Our techniques also shows that the IN W generator -fools regular branching programs of length n and width w using a seed of length O(log n · (w log log n + log(1/)). Our analysis is based on linear algebra in contrast to the analysis of Braverman et al. (which was based on information theoretic ideas) and Brody and Verbin (based on combinatorial ideas). We also consider the small-bias spaces problem for group products, first considered by Meka and Zuckerman in [MZ09]. Both the problem and our results on it are stated in Section 4. We next discuss the INW generator in detail and the main idea behind our improved analysis. 2 Technical overview 2.1 Impagliazzo-Nisan-Wigderson generator The PRG used in this paper is the construction of Impagliazzo, Nisan and Wigderson [INW94] (hereafter referred to as the INW generator). We now describe their construction. First, let us recall the following important fact about construction of expander graphs [RVW00]. Fact 2.1 For any n and λ > 0, there exists graphs on {0, 1}n with degree d = (1/λ)Θ(1) and second eigenvalue λ such that given any vertex x ∈ {0, 1}n and an edge label i ∈ [d], the ith neighbor of x is computable in nΘ(1) time. The INW generator is defined recursively as follows. Let Γ0 : {0, 1}t → {0, 1} be simply a function which maps a t bit string to its first bit. Assume Γi−1 : {0, 1}m → {0, 1}` . Then Γi : {0, 1}m+log d → {0, 1}2` is defined as follows. Let x = y ◦ z ∈ {0, 1}m+log d such that y is m bits long and z is log d 1 we discuss this later 2 bits long. Let H be a graph on 2m vertices constructed using Fact 2.1. Let y 0 be the z th neighbor of y in H. Then Γi (x) = Γi−1 (y) ◦ Γi−1 (y 0 ). Here and elsewhere, ◦ is used to denote concatenation. i From the above, one can easily see that Γi : {0, 1}t+i log d → {0, 1}2 . As we can put t to be anything, we get that Γlog n : {0, 1}log n log d → {0, 1}n . As d = (1/λ)Θ(1) , the INW generator stretches a seed of length O(log n · log(1/λ)) to n bits. Remark 2.2 It is possible and will in fact be necessary for us (in Section 4) to define the INW generator which produces elements from a bigger alphabet. The construction in this case is as follows : we assume that we want to produce elements from some set G. For some t ≥ log |G|, let Γ0 : {0, 1}t → G be simply a function whose output is the first log |G| bits and interprets it as an element of G. Assume Γi−1 : {0, 1}m → G` . Then Γi : {0, 1}m+log d → {0, 1}2` is defined as follows. Let x = y ◦ z ∈ {0, 1}m+log d such that y is m bits long and z is log d bits long. Let H be a graph on 2m vertices constructed using Fact 2.1. Let y 0 be the z th neighbor of y in H. Then Γi (x) = Γi−1 (y) ◦ Γi−1 (y 0 ). i From the above, one can easily see that Γi : {0, 1}t+i log d → G2 . As we can put t to be anything as long as it is at least log |G|, we get that Γlog n : {0, 1}log |G|+log n log d → Gn . As d = (1/λ)Θ(1) , the INW generator stretches a seed of length O(log |G| + log n · log(1/λ)) to n bits. We now get back to the analysis of the INW generator. For the purposes of this discussion, we assume that the INW generator is producing bit strings as opposed to elements in G. 2.2 Analysis of INW generator in terms of stochastic matrices To understand the analysis of the INW generator from [INW94] as well as the improvements in this paper, it is helpful to look at branching programs from the following viewpoint. Assume that the branching program is of width w. Then states in every layer can be numbered from 1 to w. Also for x, y ∈ [w] and b ∈ {0, 1}, we introduce the notation (x, i, b) (y, i + 1) if there is an edge labelled b going from vertex x in layer i to vertex y in layer i + 1. Now, for every layer 0 ≤ i < n, we can introduce two stochastic matrices M0i and M1i (we interchangeably call them walk matrices as well) which are defined as ( 1 if (x, i, b) (y, i + 1) Mbi (y, x) = 0 otherwise Now, assume that we start with a probability distribution x ∈ Rw over the states in the 0th layer and then Q the string chosen is y, then the probability distribution on the states in the final layer is n given by n−1 i=0 Myi i · x. Since any string is chosen with probability 1/2 , the final distribution is given by ! n−1 X Qn−1 Mx i · x Y M0i + M1i i i=0 = ·x n 2 2 n i=0 x∈{0,1} If instead, the y’s are drawn from a distribution D, then the distribution on the final layer will be n−1 X Y D(y) Myi i · x y∈{0,1}n i=0 3 Thus, our aim is to find a distribution D which can be sampled with a few bits of randomness and ! n−1 n−1 X Y Y M0i + M1i ≤ D(y) Myi i − 2 y∈{0,1}n i=0 i=0 In the above, we do not specify the norm we use but actually any norm works for us (like the Frobenius norm). This is because for a constant sized matrix, all these norms are within a constant factor of each other. We now define and understand the concept of expander-product of distribution of matrices. 2.3 Comparison of the product and the expander product of matrices We first recall that the notion of operator norm of a matrix M ∈ Cn×n denoted by kM k2 is defined as kx · M k kM k2 = sup kxk x∈Cn An important property of the operator norm is that it is submultiplicative i.e., kX · Y k2 ≤ kXk2 kY k2 . Another important property, which shall be useful to us is the following fact. Fact 2.3 For any stochastic matrix M ∈ Rn×n and kM k2 ≤ 1. Proof: Note that P for every i, j ∈ [n], Mij ≥ 0. Further, P because M is a stochastic matrix, hence, for every j ∈ [n], i Mij = 1. Now, note that (x · M )j = i xi · Mij . Hence, we have 2 kx · M k22 = X X j∈[n] xi · Mij ≤ i∈[n] X X Mij · x2i = j∈[n] i∈[n] X x2i = 1 i∈[n] The inequality in the above is an application of Cauchy-Schwarz inequality. Let us assume that Γ1 , Γ2 : {0, 1}r → {0, 1}n and ρ1 , ρ2 : {0, 1}n → Cm×m . Assume that kρ1 (x)k2 , kρ2 (x)k2 ≤ 1 for all x ∈ {0, 1}n (this will be the case throughout this paper). Consider the following two sums; A= X x∈{0,1}r 1 ρ1 (Γ1 (x)) 2r B= X x∈{0,1}r 1 ρ2 (Γ2 (x)) 2r (1) Then the product of A and B is given by X x,y∈{0,1}r 1 ρ1 (Γ1 (x)) · ρ2 (Γ2 (y)) = A · B 22r We now consider a 2d regular graph H on {0, 1}r with second eigenvalue bounded by λ. We define the expander product as A ·H B = 1 X x∈{0,1}r ,(x,y)∈E(H) 4 ρ1 (Γ1 (x)) 2r+d · ρ2 (Γ2 (y)) We note that without specifying the functions ρi , Γi , it is not possible to concretely define A ·H B. However, specifying all the parameters, makes the definitions and applications cumbersome. So, we sacrifice some accuracy for the sake of clarity. The ρi ’s and Γi ’s will be clear from the context. The relation between the above definitions and the INW generator and branching programs is m obvious. Let us consider a branching program of length 2m+1 . Now, let Γm : {0, 1}t → {0, 1}2 m be Q the instantiation of theQINW generator which stretches t bits to 2 bits. Let us define ρ1 (x) = i<2m Mxi i and ρ2 (x) = i<2m Mxi (i+2m ) . Hence, A and B correspond to the walk matrices for the first and the second half of the branching program provided the input to both the halves is sampled from the output of Γm . Now, if we use independent seeds for the first and the second applications of Γm , then the transition matrix for the entire branching program is A · B. On the other hand, we can have another application of the INW generator and hence define Γm+1 : m+1 {0, 1}t+d → {0, 1}2 as Γm+1 (x, j) = Γm (x) ◦ Γm (H(x, j)) where H(x, j) denotes the j th neighbor of x in H. In this case, it is easy to see that the transition matrix corresponding to the entire branching program when the input is sampled from the output of Γm+1 is A ·H B. Lemma 2.4 Let ρ1 , ρ2 : {0, 1}m → Cw×w such that ∀x ∈ {0, 1}m , j ∈ {1, 2}, kρj (x)k2 ≤ 1. Let Γ1 , Γ2 : {0, 1}t → {0, 1}m . Let H be a 2d -regular graph on {0, 1}t with second eigenvalue λ. Then, for A and B as defined in (1), kA · B − A ·H Bk2 ≤ λ Also, we have the following observation : • If for all x, ρ1 (x) and ρ2 (x) are identity on subspace W , then both A · B and A ·H B are also identity on subspace W . By a matrix M being identity on subspace W , we mean M : W ⊥ → W ⊥ and for every x ∈ W , x · M = M · x = x (W ⊥ is the orthogonal complement of W ). t Proof: Let us define X, Y ∈ Cw×w2 as follows. Both X and Y are divided into 2t blocks such that the ith block in X is ρ1 (Γ1 (i)) and the ith block in Y is ρ2 (Γ2 (i)). Also, let us define matrices t t Λ1 , Λ2 ∈ Cw2 ×w2 as follows. Λ1 = ΛH ⊗ Id and Λ2 = ΛK ⊗ Id where Id is the w × w identity matrix, ΛH is the random walk matrix for graph H and ΛK is the random walk matrix for a clique on 2t vertices. Here ⊗ denotes the tensor product of matrices. Then, A · B − A ·H B = X · (Λ1 − Λ2 ) · Y t 2t However, we also note that by definition of second eigenvalue of ΛH (and that the graph H is regular), we derive that the largest singular value of C = ΛH − ΛK is bounded by λ. Since, the largest singular value of tensor product of two matrices is given by the product of the largest singular values of the two matrices, we get that the largest singular value of Λ1 − Λ2 is at most λ. This implies that kΛ1 − Λ2 k2 ≤ λ. Now, observe that X · (Λ1 − Λ2 ) · Y t X Yt √ √ = · (C ⊗ Id) · 2t 2t 2t √ Also,√as each block of X is a matrix whose norm is at most 1, hence kX/ 2t k2 ≤ 1. Similarly, kY t / 2t k2 ≤ 1. Putting everything together, we prove the claim. The observation trivially follows from the assumptions about H and ρi ’s. 5 2.4 Basic error analysis of the INW generator We would now like to put an upper bound on the probability with which the branching program can distinguish between the output of the INW generator and the uniform distribution. Equivalently, let M be the average walk matrix between the 0th and the nth layer when the input is uniformly random. Let M̃ be the average walk matrix when the input is chosen from the output of the INW generator. We would like to put an upper bound on kM − M̃ k2 . In order to do this, we will consider two trees. Both of these will be full binary trees. The leaf nodes are numbered 1 to n from left to right. The first tree represents the average walk matrix (at various points in the branching program) when the input is sampled from the output of the INW generator. The second tree represents the average walk matrix (at various points in the branching program) when the input is uniform. We call the first tree the pseudo tree and the second one the true tree. Without loss of generality, we consider n to be a power of 2. Consider any node x (in any of the trees) at height m. Assume that the leaves in the subtree rooted at x are numbered m {i, . . . , j = i + 2m − 1}. Also, let us define Γm : {0, 1}t → {0, 1}2 to be the INW generator which m m stretches t bits Q into 2m bits. Let Γ0m : {0, 1}2 → {0, 1}2 be the identity function.2 . Further, we define ρx (w) = i≤t≤j Mwt t . The labeling L(x) is then as follows (P L(x) = 2m Pw∈{0,1} 1 ρ (Γ0 (w)) 22m x m 1 w∈{0,1}t 2t ρx (Γm (w)) if x is in the true tree if x is in the pseudo tree Thus, with the above labeling, L(x) is simply the average walk matrix to go from layer i to layer i + 2m when the input is sampled from the uniform distribution (in case of the true tree) or the output of the INW generator (in case of the pseudo tree). We adopt the following convention. Whenever, we talk about a node x in the tree, it refers to the corresponding nodes in both the true and the pseudo tree. To refer to the corresponding node in the true tree, we refer to it as xt and for the pseudo tree, we refer to it as xp . We now observe the following : Let x be a node and y and z be its left and right children. • L(xt ) = L(yt ) · L(zt ) • L(xp ) = L(yp ) ·H L(zp ) Claim 2.5 Let x be a node at height t. Then kL(xt ) − L(xp )k2 ≤ 2 · 2t λ. Proof: Clearly, the claim holds when t = 0. We assume it holds for t ≤ t0 and prove it for t = t0 + 1. Let x be at height t0 + 1 and its children be y and z at height t0 . kL(xp ) − L(xt )k2 = kL(xp ) − L(yp )L(zp ) + L(yp )L(zp ) − L(yt )L(zt )k2 ≤ kL(xp ) − L(yp )L(zp )k2 + kL(yp )L(zp ) − L(yt )L(zt )k2 ≤ kL(yp ) ·H L(zp ) − L(yp )L(zp )k2 + kL(yp )k2 kL(zp ) − L(zt )k2 + kL(zt )k2 kL(yp ) − L(yt )k2 ≤ λ + 2 · 2t0 λ < 2 · 2t0 +1 λ 2 For a bit string w of length n and position t ∈ [n], wt represents the tth bit of w 6 In the above analysis, we use Fact 2.3. The above claim clearly shows that to have a total error of , it suffices to have λ = 2/n which means that the INW generator has seed of length O(log n·log(n/)). The more important aspect of the above analysis is that while we pessimistically assume that kL(yp )k2 , kL(zt )k2 are as large as 1, but in general it can be much smaller. In particular, if they are both bounded by say 1/3, then the error will not increase with height. Of course, in general, this is not true and it can be the case that kL(yp )k2 = kL(zt )k2 = 1 but then we show that in fact, there is no error incurred when one does expander product versus true product ! 3 It is interesting to note that the results in [BRRY10, BV10, KNP10] as well as our result beats the above naive analysis for regular (or permutation) branching programs by a clever analysis of the INW generator. In contrast, results in [Lu02, LRTV09, GMRZ10] which are directed towards “symmetric functions” use a combination of hash functions and INW generator. It seems hard to use hash functions for general branching programs because the main purpose of hashing in these constructions is to “rearrange weights so that they are evenly spread out”. However, unless one is guaranteed that the function being computed by the branching program is invariant under permutations, it seems impossible to use hash functions. 2.5 Organization of the paper Section 3 considers the problem of fooling group products over an abelian group. While technically much simpler than the subsequent section on fooling permutation branching programs, the analysis gives the intuition on how to improve the analysis of the INW generator for general permutation branching programs. Also, the seed length we achieve for the problem is incomparable to the previous best known seed length for the same problem. Section 4 considers the problem of fooling group products over small biased groups. We improve on the previous best result for this problem [MZ09]. Section 5 presents a PRG with seed length O(log n · (w8 + log(1/)) for permutation branching programs of width w and length n. Section 6 presents a PRG with seed length O(log n · (w log log n + log(1/)) for regular branching programs of width w and length n. We would like to highlight that while the results in the following section about fooling abelian group products are particularly easy to prove, they are nevertheless important for two reasons. One is that they highlight some of the important ideas which will later be used to analyze general permutation branching programs. The second important thing is that the complexity of the analysis in [KNP10] seems to stem from the fact (as they themselves remark) that a group may possess non-trivial subgroups. Our analysis shows that in fact most of the complexity in analyzing general permutation branching programs comes from the non-commutativity of the group rather than existence of proper subgroups. 3 Fooling abelian group products using the INW generator Assume that we have been given an abelian group G and g0 , . . . , gn−1 ∈ G. Further, for a, b ∈ G, a · b represents the group operation applied on a and b. Let us also define ( 1 if x = 0 gx = g if x = 1 Consider the distribution over group G obtained by sampling x0 , . . . , xn−1 uniformly at random xn−1 and considering the product g0x0 · g1x1 . . . · gn−1 . The following is the main theorem of this section. 3 This is not exactly true but we make it precise later 7 Theorem 3.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = /|G|7 . Consider the distribution xn−1 : (x0 , . . . , xn−1 ) ∼ Γ(Ut )} D = {g0x0 · g1x1 . . . · gn−1 x n−1 : (x0 , . . . , xn−1 ) ∼ Un } D0 = {g0x0 · g1x1 . . . · gn−1 Then D and D0 are close in statistical distance. As the seed length required for the INW generator is O(log n · log(1/λ)), we get the following corollary. Corollary 3.2 There exists a polynomial time computable function Γ : {0, 1}t → {0, 1}n where t = O(log n · (log m + log(1/)) such that for every abelian group G of size m and g1 , . . . , gn ∈ G, the distributions D and D0 defined as x n−1 : (x0 , . . . , xn−1 ) ∼ Γ(Ut )} D = {g0x0 · g1x1 . . . · gn−1 x n−1 : (x0 , . . . , xn−1 ) ∼ Un } D0 = {g0x0 · g1x1 . . . · gn−1 are -close in statistical distance. In order to prove theorem 3.1, we first need to go over some basic Fourier analysis. Definition 3.3 A character χ : G → C∗ is a group homomorphism i.e., for x, y ∈ G, χ(x · y) = χ(x)χ(y). Any abelian group G has |G| distinct characters including the trivial character which maps every element to 1. Definition 3.4 For a distribution D : G → [0, 1] and a character χ : G → C∗ , we define D̂(χ) = P x∈G χ(x)D(x). Note that this differs from the standard definition of fourier coefficient of a function by a normalization factor. For any element g ∈ G, consider the matrix Rg which is defined as follows : ( 1 if xy −1 = g Rg (x, y) = 0 otherwise First of all, we observe that all the matrices of the form Rg commute with each other. This is because the underlying group is a commutative group. This implies that they are simultaneously diagonalizable in some basis. In fact, Rg = Γ · ρ(g) · Γ−1 where Γ is a unitary matrix and ρ(g) is a diagonal matrix. Rg = Γ diag[χ1 (g), . . . , χ|G| (g)] Γ−1 where χ1 , . . . , χn are the different characters Now, note that we can define the group products problem as a branching program. More precisely, we have |G| states at every level corresponding to each of the group elements. From level i to level i + 1, if the input is 0, then ∀g, g 7→ g. If the input is 1, then ∀g, g 7→ ggi . Therefore, the walk matrices are M0i = Id (Id is the |G| × |G| identity matrix) and M1i = R(gi ). We now make the following observation. Observation 3.5 Let xp be the root node of the pseudo tree and xt be the root node of the true tree with the parameters of the pseudo tree same as in Theorem 3.1. Also, the walk matrices at the ith step are M0i and M1i as p defined above. Let L(xp ) and L(xt ) be the labels of xp and xt respectively. If ||L(xp ) − L(xt )||2 ≤ / |G|, then Theorem 3.1 follows. 8 Proof: Note that distribution obtained in case of the pseudo tree is given by e1 · L(xp ) where e1 is the standard unit vector with 1 at the position of the group identity and 0 everywhere else. Similarly, the distribution in case of the true tree is e1 · L(xt ). Note that the statistical distance between the two distributions is given by p p ||e1 · (L(xp ) − L(xt ))||1 ≤ |G| ||e1 · (L(xp ) − L(xt ))||2 ≤ |G| ||(L(xp ) − L(xt ))||2 which proves our claim. In order to prove that ||L(xp ) − L(xt )||2 is small, we make the following very important observation. Observation 3.6 Corresponding to the walk matrix Mbi , let us assume the diagonalized matrix is ρbi . Since, all the matrices are simultaneously diagonalizable, we can assume that walk matrices at the leaf nodes of the pseudo and the true trees are ρbi instead of Mbi . The labeling for the non-leaf nodes is generated in the same way as before. More precisely, in the true tree, the label of a non-leaf node is the product of the labels of its two children while in the pseudo tree, the label of a non-leaf node is the expander product of its two children (the expander being the underlying expander of the INW generator). Also, since all the matrices in the leaf nodes are diagonals and hence each of the intermediate products are also diagonal (in both the trees), therefore to bound ||L(xp ) − L(xt )||2 , it suffices to put an upper bound on every diagonal entry of L(xp ) − L(xt ). p From the above observation, it suffices to show that for any i ∈ |G|, |L(xp )[i] − L(xt )[i]| ≤ / |G|. Here, for a matrix A, A[i] represents its ith diagonal entry. Lets fix a particular i ∈ [|G|]. Claim 3.7 Consider any node x in the true tree. Let L(x) be its labeling. Consider the ith diagonal entry of L(x). Then either the entry is 1 or it is at most 1 − 1/|G|2 in magnitude. Further, it is 1 if and only if the corresponding diagonal entry is 1 for each of the walk matrices (now they are diagonalized) in all the leaf nodes. Proof: We first prove the claim for the leaf nodes. Note that the diagonal entries correspond to the characters. So, if the ith diagonal entry corresponds to character χi . Hence the ith diagonal entry of the tth leaf is labeled by 1/2(χi (e) + χi (gt )) = 1/2(1 + χi (gt )). However, note that because 2πia character is a homomorphism, hence |χi (gt )||G| = 1. Hence, χi (gt ) = e |G| where 0 < a < |G|. 1 + ω = cos πia ≤ cos πi ≤ 1 − 1 2 |G| |G| |G|2 This gives us the result for leaf nodes. Now, assume the hypothesis to be true for nodes at a height h < t0 . Consider a node xt at height t0 . For non-leaf nodes, we observe that if xt is a node and yt and zt are its children, then the ith diagonal entry of xt is the product of the corresponding entries in yt and zt . By induction hypothesis, unless both yt and zt are 1 in magnitude, at least one of them is going to be at most 1 − 1/|G|2 in magnitude. Hence, so is xt . Also, if both yt and zt are 1, so is zt but by induction hypothesis we get that all the leaves in the tree rooted at yt and zt have walk matrices whose ith diagonal entry is 1. This proves the claim for xt as well. For the rest of the discussion, we fix the i and let L0 (x) denote L(x)[i] for any node x. The next claim shows that for any node xt in the true tree, if the ith diagonal entry is at least 1/10 in magnitude, 9 then the corresponding entry in xp is within an error of λ|G|4 from it. All the calculations in this section use that λ|G|6 < 1/10 (eventually we set λ = /|G|7 ). More precisely, we have the following claim. Claim 3.8 Let xt be a node in true tree such that its labeling L0 (xt ) = `i satisfies |`i | ≥ 1/10. Then, |L0 (xt ) − L0 (xp )| ≤ λ|G|4 log(1/|`i |). Proof: We prove it by induction on the height of the node xt . Note that it is trivially true for the leaves as the marginal of the INW generator on any coordinate is uniformly random. Let us assume it is true for nodes at height < h. Let x be a node at height h. We break it into two situations. Let the children of x be y and z. Now, assume that at least one of L0 (yt ) or L0 (zt ) is 1. Let us assume without loss of generality that it is the former case. Then L0 (xt ) = L0 (zt ). Also, we note that L0 (xp ) = L0 (zp ) (We do not incur an error λ because of Lemma 2.4.) We note that by induction hypothesis, we have |L0 (zp ) − L0 (zt )| ≤ λ|G|4 log(1/|L(zt )|). However, |L0 (xp ) − L0 (xt )| = |L(zp ) − L(zt )| ≤ λ|G|4 log(1/|L0 (zt )|) = λ|G|4 log(1/|`i |) Next, we consider the case when both |L0 (yt )| ≤ 1 − 1/|G|2 and |L0 (zt )| ≤ 1 − 1/|G|2 . Let L0 (yp ) = L0 (yt ) + y L0 (zp ) = L0 (zt ) + z Let us assume without loss of generality that L0 (yt ) ≤ L0 (zt ). Then, |L0 (xp ) − L0 (xt )| = |y L0 (zt ) + z (L0 (yt ) + y ) + | || ≤ λ 0 ≤ |z | + |y L (zt )| + λ 1 ≤ λ|G| log(1/|L (zt )|) + 1 − λ|G|4 log(1/|L0 (yt )|) + λ |G|2 4 0 ≤ λ|G|4 log(1/|L0 (zt )||L0 (yt )|) = λ|G|4 log(1/|L0 (xt )|) The last inequality uses the fact that |L0 (yt )| ≤ 1 − 1 . |G|2 We next show that for any node xt in the true tree, if the ith diagonal entry is at most 1/10 in magnitude, then the corresponding entry in xp is within an error of λ|G|6 from it. Claim 3.9 Let xt be a node in true tree such that L0 (xt ) = `i satisfies |`i | < 1/10. Then, |L0 (xt ) − L0 (xp )| ≤ λ|G|6 . Proof: Let the children of x be y and z. Let us assume by induction hypothesis that the claim holds for all nodes below x. We consider the following four cases : • At least one of L0 (yt ) = 1 or L0 (zt ) = 1. Assume L0 (yt ) = 1. • Both |L0 (yt )| ≥ 1/10 and |L0 (zt )| ≥ 1/10 • Both |L0 (yt )| < 1/10 and |L0 (zt )| < 1/10 • Exactly one of |L0 (yt )| and |L0 (zt )| is less than 1/10. 10 In the first case, note that by induction hypothesis, the claim holds for y and z. Now, as one of the entries is 1, hence by Lemma 2.4, we see that L0 (xp ) = L0 (yp ) ·H L0 (zp ) = L0 (yp ) · L0 (zp ) = L0 (zp ) As |L0 (zp ) − L0 (zt )| ≤ λ|G|6 , |L0 (xp ) − L0 (xt )| ≤ λ|G|6 . In the second case, by a basic “union bound”, we get that the error is at most 2 log 10 · λ|G|4 + λ ≤ λ|G|6 . For the next two cases, let us write L0 (zp ) = L0 (zt ) + z and L0 (yp ) = L0 (yt ) + y . For the third case, |y |, |z | ≤ λ|G|6 . |L0 (xp ) − L0 (xt )| = |y L0 (zt ) + z (L0 (yt ) + y ) + | 0 || ≤ λ 0 ≤ |y ||L (zt )| + |z ||L (yt )| + |z ||y | + λ λ|G|6 λ|G|6 ≤ + + λ2 |G|12 + λ ≤ λ|G|6 10 10 For the last part, assume that |L0 (yt )| ≤ 1/10 and |L0 (zt )| ≥ 1/10. Hence, for this part, we have that |y | ≤ λ|G|6 and |z | ≤ (log 10) · λ|G|4 . Again, we have that, |L0 (xp ) − L0 (xt )| ≤ |y ||L0 (zt )| + |z ||L0 (yt )| + |z ||y | + λ 1 1 ≤ 1− λ|G|6 + λ|G|4 log 10 + λ2 |G|10 + λ ≤ λ|G|6 2 |G| 10 Combining Claims 3.8 and 3.9, we get the following lemma which combined with Observation 3.6 implies Theorem 3.1. Lemma 3.10 Let xt be a node in true tree and xp be the corresponding node in the ppseudo tree. Then |L0 (xt ) − L0 (xp )| ≤ λ|G|6 . This implies that ||L(xt ) − L(xp )||2 ≤ λ|G|6 ≤ / |G| for λ = /|G|7 . 4 Small biased spaces for group products Next, we introduce the problem of small biased spaces for group products. Let G be an arbitrary group and let x1 , . . . , xn ∈ {0, 1}. Again, for a, b ∈ G, we let a·b denote the group operation applied on a and b. We also remind ourselves that for g ∈ G and x ∈ {0, 1}, ( 1 if x = 0 x g = g if x = 1 Consider the distribution D = g1x1 · . . . · gnxn where g1 , . . . , gn ∈ G are chosen independently and uniformly at random. We seek to come up with an efficiently computable function Γ : {0, 1}t → Gn such that when g1 , . . . , gn is sampled from Γ(Ut ) then D0 = g1x1 · . . . · gnxn 11 is -close to D in statistical distance. The aim is to keep t as small as possible and a probabilistic argument shows that it is possible to get t = O(log |G| + log n + log(1/)). The task of getting an explicit function Γ was first considered by Meka and Zuckerman [MZ09]. They obtained the following result. Theorem 4.1 There exists some fixed constant c = c(G) < 1/|G| and Γ : {0, 1}t → Gn with t = O(log n) such that if X is a distribution defined as the output of Γ(Ut ) such that for any h ∈ G | P [g1x1 · . . . · gnxn = h] − (g1 ,...,gn )∼Γ(Ut ) P (g1 ,...,gn )∼Gn [g1x1 · . . . · gnxn = h]| ≤ c Also one of their theorems coupled with the pseudorandom generator from [KNP10], gives the following result. For every > 0, there exists Γ : {0, 1}t → Gn with t = O(log n · (log(1/) + |G|Θ(1) )) such that if X is a distribution defined as the output of Γ(Ut ), then | P [g1x1 · . . . · gnxn = h] − (g1 ,...,gn )∼Γ(Ut ) P (g1 ,...,gn )∼Gn [g1x1 · . . . · gnxn = h]| ≤ We improve on their result in terms of dependency on the seed length on the size of the group. Also, as we shall see, to get this improvement, we do not require the whole machinery of [KNP10] and our proof is rather short and simple. In particular, we prove the following theorem. Theorem 4.2 Let Γ : {0, 1}t → Gn denote the INW generator with λ = /|G|. If X denotes the output distribution of Γ(Ut ), | P (g1 ,...,gn )∼X [g1x1 · . . . · gnxn = h] − P (g1 ,...,gn )∼Gn [g1x1 · . . . · gnxn = h]| ≤ As a corollary, using the INW generator describe in Remark 2.2, we get that there is a polynomial time computable function Γ : {0, 1}t → Gn with t = O(log n · (log |G| + log(1/)) such that if X denotes the output distribution of Γ(Ut ), | P (g1 ,...,gn )∼X [g1x1 · . . . · gnxn = h] − P (g1 ,...,gn )∼Gn [g1x1 · . . . · gnxn = h]| ≤ Before we discuss the proof of the above theorem, we will need to review some basic representation theory. While it is possible to talk about the entire proof without using the language of representations, we believe its the right way to look at the proof and might be useful in getting improvements in the future. Also, it will be helpful for us in Section 5. An excellent source for reviewing the required material are lecture notes by Telerman [Tel05]. Below we review some basic representation theory which will be helpful. For the reader familiar with representation theory, we remark that our definitions are sometimes restrictive because the most general definition is not always helpful for us. Definition 4.3 Let V be any vector space over C. Then GL(V ) is the group of all invertible linear transformations from V to V with the group operation being the function composition. Definition 4.4 For a group G and a vector space V , a map ρ : G → GL(V ) is said to be a representation of G if ρ is a group homomorphism. In this paper, we only consider vector spaces V over C. 12 Definition 4.5 A group representation ρ : G → GL(V ) is said to be irreducible if it does not have an invariant subspace. In other words, if for W ( V , ∀g, ρ(g)W ⊆ W , then W = {0}. Irreducible representations are fundamental building blocks in the sense that any representation of the group G can be seen as a direct product of irreducible representations of G. Moreover, any finite group G, has only finitely many irreducible representations (up to isomorphism). We say two representations ρ1 and ρ2 are isomorphic if there exists an invertible matrix τ such that for every g, τ · ρ1 (g) · τ −1 = ρ2 (g). Theorem 4.6 (Maschke) Let V be any finite dimensional vector space over C, G be a finite group and ρ : G → GL(V ) be a representation. Then ρ can be written as a direct product of the irreducible representations of group G. We list the following important properties of irreducible representations (some of which will be helpful later). Lemma 4.7 Let S = {ρ1 , ρ2 , . . .} be the set of irreducible of a finite group G. Let P representations d ×d 2 i i di be the dimension of ρi i.e. ρi : G → C . Then, i∈S di = |G|. We next state Schur’s lemma. Lemma 4.8 Let ρ and τ be two non-isomorphic irreducible representations of a group G. Then, • hτi,j , ρk,` i = E[τi,j (g)ρk,` (g)] = 0 • hτi,j , τk,` i = δi,k δj,` dτ where dτ is the dimension of τ . We remark that every group has a trivial irreducible representation ρt : G → GL(C) such that ∀x, ρ(x) 7→ 1. We now state the following simple corollary of Schur’s lemma and the earlier observation. Corollary 4.9 Let ρ : G → GL(V ) be a non-trivial irreducible representation of G. Then 0. P x∈G ρ(x) Proof: By Schur’s lemma, we get that if τ is the trivial representation, then for any i, j, hτi,j , ρk,` i = 0 which implies the claim. We now return to the problem of constructing small bias spaces over group products. We use the INW generator described in Remark 2.2. We now state the analogue of lemma 2.4 in this setting. To, do this let us assume that Γ0 , Γ1 : {0, 1}r → Gm and ρ : Gm → Cw×w and let us define A and B as following : 1 X 1 X A= r ρ(Γ1 (x)) B= r ρ(Γ2 (x)) (2) 2 2 r r x∈{0,1} x∈{0,1} Lemma 4.10 Let ρ1 , ρ2 : Gm → Cw×w such that ∀x ∈ {0, 1}m , j ∈ {1, 2}, ||ρj (x)||2 ≤ 1. Let Γ1 , Γ2 : {0, 1}t → Gm . Let H be a 2d -regular graph on {0, 1}r with second eigenvalue λ. Then, for A and B as defined in (2), ||A · B − A ·H B||2 ≤ λ We also have the following observation. 13 = • If A and B are identity on some subspace W , then A · B as well as A ·H B are identity on W . We recall that a matrix A is said to be identity on some subspace W if and only if for all x ∈ W , x · A = A · x = x. We now formulate the problem of constructing small biased spaces for group products in terms of getting pseudorandomness for a certain permutation branching program (for fooling which, we use the INW generator). The branching program has n + 1 layers. Each layer consists of |G| many vertices, each vertex labeled by an element of G. The branching program starts at the identity element of G in the 0th layer. Now, if xi = 1, the branching program moves from x in the (i − 1)th layer to x · gi in the ith layer. On the other hand, if xi = 0, the branching program moves from x in the (i − 1)th layer to x in the ith layer. Thus, we have |G| walk matrices for the transition from the (i − 1)th layer to the ith layer. We call them Mhi for every h ∈ G. Further, if xi = 0, they are all the identity matrix. In case, xi = 1, then Mhi is defined by ( 1 x−1 y = h Mhi (x, y) = 0 otherwise We observe that if we take a random walk starting at the vertex corresponding to the identity element of group G in the zeroth layer and then choose a random walk matrix among Mhi to go from layer i − 1 to layer i, then after j steps, the distribution on the j th layer is exactly the x distribution g1x1 · . . . · gj j where gi ’s are chosen uniformly at random. Hence, it suffices to analyze the error for this branching program when gi ’s are chosen from the output of the INW generator in order to prove Theorem 4.2. We next make the observation that the walk matrices can be block diagonalized such that the blocks have some nice properties. Observation 4.11 If xi = 1, the map h 7→ Mhi is a group representation. This implies that there is a basis transformation in which all the walk matrices can be simultaneously block diagonalized. Note that this is because if xi = 0, then all the walk matrices are identity and hence after any change of basis transformation, it will remain identity in every block. If xi = 1, then each of the blocks in the walk matrices Mhi correspond to some irreducible representation. The corresponding blocks in Mhi when xi = 0 is always the identity. In particular, the following is true. • If the block corresponds to the trivial representation, then the corresponding block is the 1 × 1 identity matrix in all the walk matrices. • If the block corresponds to a non-trivial representation, then the following is true. – If xi = 0, then all the walk matrices are identity in that block. – If xi = 1, then the sum of the walk matrices in that block is identically zero. Also, all the blocks are unitary matrices because they correspond to irreducible representations. The next observation says that since all the walk matrices can be simultaneously diagonalized, we might as well treat the blocks individually and analyze the error incurred by using the INW generator vis-a-vis the uniform distribution in each of the blocks. Observation 4.12 Since all the walk matrices are simultaneously diagonalizable, we can assume that the leaves of the pseudo tree for the INW generator as well as the true tree are marked by these block diagonalized matrices as opposed to the true walk matrices. 14 Also, consider a particular block corresponding to the representation ρ. Let us instead label the leaf nodes (in both the trees) by identity matrix of dimension dρ if xi = 0 and ρh for all h ∈ G if xi = 1. The labels for the non-leaf nodes in the true and the pseudo tree are generated by taking the true and the expander products of the children respectively. For a node x, we describe this labeling as L0 (x). If we prove that for any node xp in the pseudo tree and node xt in the true p tree and any representation ρ, the labeling L0 (xp ) and L0 (xt ) satisfy ||L0 (xp ) − L0 (xt )||2 ≤ / |G|, then it implies that the INW generator fools the branching program with error . So, we now treat each block individually. Let us fix a representation ρ and analyze the difference between labellings in the true tree an the pseudo tree. If the representation corresponding to the block is trivial, then the following claim says that there is no error between the pseudo tree and true tree. Claim 4.13 Let xp and xt be corresponding nodes in the pseudo tree and the true tree respectively. Let L0 (xp ) and L0 (xt ) be the labeling of xp and xt with respect to the trivial representation. Then, L0 (xp ) = L0 (xt ) = 1 Proof: Note that if we consider the trivial representation, then all the leaf nodes in both the true and the pseudo tree are labelled by matrices which are just the 1 × 1 identity matrix. Now, clearly the labeling of the leaf nodes in both the pseudo and true tree is identical namely 1. We now use induction on the height of a node. Let xp be a node at height t in the pseudo tree and xt be the corresponding node in the true tree. By induction hypothesis, the labeling of the children of xt namely yt and zt is 1. Similarly, the labeling of yp and zp is also 1. Now as L0 (xt ) = L0 (yt )L0 (zt ) = 1. Further, note that L0 (xp ) = L0 (yp ) = 1 by Lemma 4.10 as the labeling is 1 on all the leaf nodes under zp . This proves the lemma. Next, we consider the case when the representation ρ is non-trivial. Claim 4.14 Let xp and xt be corresponding nodes in the pseudo tree and the true tree respectively. Let L0 (xp ) and L0 (xt ) be the labeling of xp and xt corresponding to the representation ρ. Then, ||L0 (xp ) − L0 (xt )||2 ≤ 2λ. Proof: We prove this by induction. We observe that the claim holds for the leaf nodes. We first observe that for any node xt in the true tree, L0 (xt ) is either the identity matrix or L0 (xt ) = 0. This is because if xi = 0, then the ith leaf node is labeled by the identity matrix. On the other P hand, if xi = 1, then the leaf node is labeled by 0 (as h∈G ρ(h) = 0). Clearly, for a node xt , L(xt ) = 1 if and only if all the leaf nodes below it are marked 1, else it is labeled 0. Now, consider any node xt in the true tree and its corresponding node xp . Let the children of xt be yt and zt , such that one of them, lets say L0 (yt ) = Id. Then, by Lemma 4.10, L0 (xt ) = L0 (zt ) and L0 (xp ) = L0 (zp ). However, by induction on the height of the tree, we can assume that ||L0 (zp ) − L0 (zt )||2 ≤ 2λ which implies that ||L0 (xp ) − L0 (xt )||2 ≤ 2λ. So, we assume that both L(yt ) and L(zt ) are 0. In that case, by induction hypothesis, we can assume that ||L(yp )||2 ≤ 2λ. Similarly, ||L(zp )||2 ≤ 2λ. By Lemma 4.10, we can say that ||L(xp ) − L(yp ) · L(zp )||2 ≤ λ. This implies that ||L(xp )||2 ≤ λ + 4λ2 ≤ 2λ (provided λ < 1/10). The above two claims together with Observation 4.12 implies that it suffices to have λ = /2 to get an overall error of and hence get theorem 4.2. 15 p |G| 5 Pseudorandomness for permutation branching programs In this section, we discuss the most important result of this paper. Namely, we get a PRG for read-once permutation branching programs of constant width with seed length O(log n · log(1/)). More precisely, we have the following theorem. 8 Theorem 5.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = · 2−w . Then the output of Γ(Ut ) is -indistinguishable from Un for read-once width w permutation branching programs of length n. Using the standard INW generator, we get the following corollary. Corollary 5.2 There is a polynomial time computable function Γ : {0, 1}t → {0, 1}n with t = O(log n · (w8 + log(1/)) such that Γ(Ut ) is -indistinguishable from Un for read-once permutation branching programs of width w. We now describe the overall strategy for proving Theorem 5.1. As described in Section 2, we will label the leaf nodes in both the INW tree as well as the true tree with the walk matrices corresponding to the branching program. In particular, the label for the ith leaf node will simply be the average of the walk matrix for the transition corresponding to 0 and that corresponding to 1 (recall, we call them M0i and M1i ). Subsequently, the label of the non-leaf node is the product of the labels of its children in the true tree and the expander product in case of the pseudo tree. Much like the case for abelian groups, for a node xt in the true tree and the corresponding node xp in the pseudo tree, we would like to say that kL(xp ) − L(xt )k is a function of λ and kL(xt )k alone. However, in the case of abelian groups, we had the luxury of diagonalizing and treating each coordinate individually. Instead, here we cannot do that. We instead try to adopt the following strategy. For discussing the intuition further, it will be helpful to introduce the following concepts. Definition 5.3 For a matrix M ∈ Cw×w , we say W is the fixed point subspace of M if W is a maximal subspace such that for all x ∈ W , x · M = M · x = x. In other words, W is the maximal subspace such that M is identity on that subspace. We note that for a given matrix M , the fixed point subspace is uniquely defined. Further, for the matrix M , we define its non-trivial subspace to be W ⊥ . Definition 5.4 • For a matrix A ∈ Cm×m and for its non-trivial subspace W ⊆ Cm , we define kAkW = max w∈W kA · wk kwk Coming back to the structure of the proof, we show that for any node xt in true tree and the corresponding node xp , kL(xp ) − L(xt )k can be bounded as a function of the dimension of the non-trivial subspace of L(xt ) (call it W ) and kL(xt )kW . In particular, we will allow the error to grow as the dimension of W increases or kL(xt )kW decreases. In fact, the dependence of the error on the dimension of the non-trivial subspace i.e., W shall dominate the dependence of the error on the norm of the label on its non-trivial subspace. The proof shall proceed by induction on the height of the nodes. To convey the intuition, let us say that we will claim that for any pair of corresponding nodes xt and xp , kL(xp ) − L(xt )k ≤ f (α)g(β)λ 16 where if W is the non-trivial subspace of L(xt ), then α = kL(xt )kW and β = dim(W ). Further, let our choice be such that lim λ · f (α) · g(β) = 0 λ→0 This ensures that we can indeed choose λ such that if we just want constant error, then we need to choose some constant λ dependent on w. Now, assume that this holds by induction hypothesis till some height. Consider the inductive step. Let xt be a node in the true tree and yt and zt be its children. Let xp , yp and zp be the corresponding nodes in the pseudo tree. There are exactly three situations which can arise : • The non-trivial subspace of yt and zt and hence xt are the same. In this case, the allowed dependence of the error on β plays no role. The only relevant factor is α and the analysis is similar to the analysis in case of abelian groups. • The non-trivial subspace of yt and zt are such that neither is contained within the other. In this case, the non-trivial subspace of xt has strictly bigger dimension than those yt or zt . It is here that the dependence of the error on β plays a role. In fact, because we allow the dependence on β to supersede any dependence on α, we can bound the error easily. The only thing we need to show is that the norm of xt on its non-trivial subspace is not very close to 1 which we manage to show easily. • The non-trivial subspace of yt is properly contained in that of zt (or vice-versa). In this situation, it is possible that labeling of xt has the same norm on its non-trivial subspace as zt , yet kL(xp ) − L(xt )kW > kL(zp ) − L(zt )kW where W is the non-trivial subspace of zt and xt . What is more concerning is that one can have a series of nodes (in the true tree), call them x0 , . . . , xm and y1 , . . . , ym such that xi has two children, xi−1 and yi . Also, for all i, the non-trivial subspace of L(yi ) is properly contained in the non-trivial subspace of L(x0 ). The way around in this situation is to actually do a global analysis of the error incurred by the chain as a whole rather than trying to do it on a per node basis. Here we use an induction based proof of the “Key Convegence Lemma” in [KNP10]. We will now state the claims we prove and how the main theorem 5.1 follows from it. Below, when we say that an operator acts trivially on subspace X, we simply means that it is identity on the subspace X. The first claim we prove is the following. Claim 5.5 Let α00 ≥ 1/10. Also, let xt be a node in the true tree and xp be the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W with dimension β. Let ||L(yt )||W = α, kL(zt )kW = α0 and kL(xt )kW = α00 . Then, provided kL(yp ) − L(yt )kW ≤ 5 5 5 5 5 5 λ · ww ww β log(1/α) + λww β 2β 6w and kL(zp ) − L(zt )kW ≤ λ · ww ww β log(1/α0 ) + λww β 2β 6w . 5 5 Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β log(1/α00 ). If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ). Claim 5.6 Let xt be a node in the true tree and xp be the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W such that dim(W ) = β. Let kL(xt )kW < 1/10. Assume that for j ∈ {y, z}, the following holds ( 5 5 5 λ · ww ww β log(1/kL(jt )kW ) + λww β 2β 6w if kL(jt )kW ≥ 1/10 kL(jp ) − L(jt )kW ≤ (3) 5 5 5 λ · ww ww β w6w + λww β 2β 6w otherwise 17 5 5 Then, ||L(xp )−L(xt )||W ≤ λ·ww ww β w6w . If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ). Claim 5.7 Let xt be a node in the true tree and xp be a node in the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let Wx , Wy and Wz be the non-trivial subspaces of L(xt ), L(yt ) and L(zt ) and dim(Wy ) = β1 , dim(Wz ) = β2 and dim(Wx ) = β. Also, Wy 6= Wy ∩ Wz 6= Wz . Then, • β > β1 and β > β2 . 5 5 5 5 Also, if kL(yp )−L(yt )kWy ≤ ww ww β1 w6w +λww β1 2β1 w6w and kL(zp )−L(zt )kWz ≤ ww ww 5 λww β2 2β2 w6w , then ( 5 5 λ · ww ww β log(1/kL(xt )kW ) if kL(xt )kWx ≥ 1/10 kL(xp ) − L(xt )kWx ≤ 5 5 λ · ww ww β w6w otherwise 5β 2 w6w + (4) If L(yp ) acts trivially on Wy⊥ and L(zp ) acts trivially on Wz⊥ , then L(xp ) acts trivially on Wx⊥ . Claim 5.8 Let x0 , x1 , . . ., xm , y1 , y2 , . . ., ym be nodes in the true tree. Let x00 , x01 , . . ., x0m , y10 , y20 , 0 be the corresponding nodes in the pseudo tree. Further, for i ≥ 1, x is the parent of x . . ., ym i i−1 and yi . Also, the non-trivial subspace of L(yi ) (for 0 < i ≤ m) is strictly contained in the non-trivial subspace of L(x0 ). Then, provided that all L(x0i ) and L(yi0 )’s have the same non-trivial subspace as their corresponding counterparts in the true tree and for z ∈ {x0 , y1 , . . . , ym } and β = dim(Wz ) ( 5 5 λ · ww ww β log(1/kL(z)kWz ) if kL(z)kWz ≥ 1/10 0 kL(z) − L(z )kWz ≤ 5 5 λ · ww ww β w6w otherwise (where Wz represents the non-trivial subspace of L(z)). Then, the non-trivial subspace of L(xt ) and L(x0t ) are the same as that of L(x0 ). Also, if V is the said subspace 5 kL(xt ) − L(x0t )kV ≤ kL(x0 ) − L(x00 )kV + λww β 2β 6w We first see how the above four claims can be used to prove the main Theorem 5.1. Proof: [of Theorem 5.1] First, we make the claim that for a node xt , if L(xt ) acts trivially on a subspace (i.e., it is the identity), then L(xp ) is also the identity on the same subspace. We see that this holds trivially for all the leaf nodes. Now, we prove this by induction. Assume this holds for all nodes till height t. Let xt be a node at height t + 1 and yt and zt be its children. Now, by induction, the fixed point subspace of L(yt ) and L(zt ) are the same as that of L(yp ) and L(zp ) respectively. • If the fixed point subspace of L(yt ) and L(zt ) are the same, then by Claim 5.5 and Claim 5.6, the fixed point subspace of L(xp ) is the same as that of L(xt ). • If the fixed point subspace of L(yt ) is not contained within that of zt and vice versa, then Claim 5.7 says that the fixed point subspace of L(xp ) is the same as that of L(xt ). • If the fixed point subspace of L(yt ) is contained within that of zt or vice versa, then Claim 5.8 says that the fixed point subspace of L(xp ) is the same as that of L(xt ). 18 The above means that in order to bound ||L(xp ) − L(xt )||, we can just put a bound on ||L(xp ) − L(xt )||Wx where Wx is the non-trivial subspace of L(xt ). We claim that for the root node of the 5 6 pseudo tree and the true tree (call them ap and at ), ||L(ap )−L(at )|| ≤ 2λ·ww ww w6w . This proves the main theorem (Allowing w to be a sufficiently large constant which we can assume without loss of generality). In order to prove this, we assume as a base case that any leaf node xp in the pseudo tree and the corresponding node xt in the true tree, ( 5 5 λ · ww ww β log(1/kL(xt )kWx ) if kL(xt )kWx ≥ 1/10 kL(xp ) − L(xt )kWx ≤ (5) 5 5 λ · ww ww β w6w otherwise where β is the dimension of Wx i.e., the non-trivial subspace of L(xt ). Also, the non-trivial subspace of L(xt ) and L(xp ) are exactly the same. This holds trivially at the leaves. Now consider any node at in the true tree and ap in the pseudo tree. Also, let the respective children be yt and zt and yp and zp respectively. Now, assume (5) holds when x is replaced by y and z (by induction hypothesis). Also, let us assume that one of the two things is true. Either, Wyt and Wzt are the same in which case Wat = Wyt = Wzt . Or else, Wyt and Wzt are properly contained in Wat . In the former case, we can apply either of Claim 5.5 or Claim 5.6 and see that (5) holds when x is replaced by a. In the latter case, Claim 5.7 applies and we again see that (5) holds when x is replaced by a. The only remaining case is when Wzt ( Wyt . In this case, Wat = Wyt . To handle this case, consider the nearest ancestor of at , (call it bt and let the parent of bt be ct and its sibling be dt ), such that one of the following holds : • Wbt = Wat ( Wct • Wbt = Wdt We remark that if no such bt exists, then we let bt be the root node. Now, by Claim 5.8, we can say that 5 kL(bt ) − L(bp )kWyt ≤ kL(yt ) − L(yp )kWyt + λww β 6w 2β where β is the dimension of Wyt . In case, bt is the root node, we are done because kL(yt ) − 5 5 5 5 6 L(yp )kWyt ≤ λww ww β w6w + λww β 2β 6w . However, this is clearly bounded by 2λww ww w6w . If bt is not the root node, then we can apply one of Claim 5.5, Claim 5.6 or Claim 5.7 to get that 5 kL(ct ) − L(cp )kWct ≤ λww ww 5 dim(Wct ) w6w and from there we can proceed inductively. 5.1 Basic spectral properties of the labelings We start by stating the following lemma from [KNP10] (Lemma 30 in their paper). Lemma 5.9 Let P1 , . . . , Pm ∈ Rw×w be m permutation matrices. Let G be the group generated by these matrices. Consider the matrix A defined as I + P1 I + P2 I + Pm A= ... 2 2 2 P P Note that A = g λg Pg such that λg ≥ 0 and g λg = 1. Also, Pg ∈ G. Then, there is a set K ⊆ G and a ∈ G such that a−1 K generates G and for every k ∈ K, λk ≥ 1/2|G|. Also, λa ≥ 1/2|G|. 19 Claim 5.10 Let ρ : G → Cm×m be an irreducible representation of G. Let there be a set K ⊆ G and a ∈ G such that a−1 K generates G. Then, for any v ∈ Cm such that ||v||2 = 1, hρ(a)v, ρ(k)vi ≤ 1 − 1/|G|3 . Proof: Assume for contradiction that ∀k ∈ K, hρ(a)v, ρ(k)vi = hv, ρ(a−1 k)i v > 1 − 1/|G|3 . −1 Because, a K generates G and to generate any element g ∈ G, one needs to multiply at most |G| elements from G, we get that for every g ∈ G, hv, ρ(g)vi > 1 − 1/|G|2 . However, by Schur’s lemma, P g ρ(g) = 0. P This implies that hv, ρ(g)vi = 0. This leads to a contradiction. We now prove the following important lemma about the spectra of the labelings in the true tree. Lemma 5.11 Let P1 , . . . , Pm ∈ Rw×w be m permutation matrices. Consider the matrix A defined as I + P2 I + Pm I + P1 ... A= 2 2 2 Let V be the eigenspace of A such that ∀v ∈ V , v · A = v. Let W be the space orthogonal to V . Then W is invariant under A i.e., ∀w ∈ W , A · w ⊆ W and for any w ∈ W , kw · Ak2 ≤ (1 − w−3w )kwk2 Proof: Consider the group G generated by the matrices P1 , . . . , Pm . Clearly, G ≤ Sw . Consider the irreducible representations ρ of G. Note, that we can find a unitary matrix U such that U AU † is block-diagonal. Further, the blocks correspond to the irreducible representations of G. Now, it is obvious, that corresponding to the trivial representation, all the blocks are 1 and hence correspond to the trivial eigenspace. Now, consider some non-trivial representation ρ of G. Let us interpret Pi P as giP ∈ G. If A = g λg Pg , then after the block diagonalization, the block of A corresponding to ρ is g λg ρ(g). Consider any v ∈ W . That means it has non-zero values only in coordinates corresponding to the non-trivial representations of G. Let Sρ be a set of coordinates corresponding to a particular non-trivial representation ρ. Consider the vector vSρ which is simply the projection of v on the coordinates in Sρ . ! X (Av)Sρ = λg ρ(g) vSρ g Now, by Claim 5.10 and Lemma 5.11, we can say that there exists a, k such that λa , λk ≥ 1/2|G| and hρ(a)vSρ , ρ(k)vSρ i ≤ 1 − 1/|G|3 . This implies that k(Av)Sρ k2 ≤ (1 − 1/|G|3 )kvSρ k2 . As the size of the group G is at most w! ≤ ww , we get the claim. We now list a property of the labelings in regard to its fixed point subspace. Claim 5.12 Let xt be a node in the true tree and L(xt ) be its labeling. Further, let y ∈ Cw×w be such that y · L(xt ) = y. Then, if xp is the node corresponding to xt in the pseudo tree, and L(xp ) is its labeling, then y · L(xp ) = y. Proof: We prove this claim by induction on the height of the node xt . Clearly, the assertion is true for the leaves of the true tree. Now, observe that the labeling of the node xt is simply 20 the product of L(yt ) and L(zt ) where yt and zt are the children of xt . It is easy to observe that kL(yt )k2 , kL(zt )k2 ≤ 1. This implies that if y ·L(xt ) = y, then y ·L(yt ) = y ·L(zt ) = y. By induction hypothesis, we can say that y · L(yp ) = y · L(zp ) = y. As L(xp ) = L(yp ) ·H L(zp ), by lemma 2.4, we can say that y · L(xp ) = y. Corollary 5.13 Let xt be a node in the true tree and L(xt ) be its labeling. Further, let y ∈ Cw×w be such that L(xt ) · y = y. Then, if xp is the node corresponding to xt in the pseudo tree, and L(xp ) is its labeling, then y · L(xp ) = L(xp ) · y = y. P Proof: Note thatP L(xt ) can be expressed as i λi Pi where Pi are permutation matrices and λi ∈ R+ such that λi = 1. This implies however, that for each of the i, Pi · y = y. That means that for all i, y · Pi = y which in turn implies that y · L(xt ) = y. Now, we can simply use Claim 5.12. We now list some more useful facts about matrices which shall be helpful in the course of our proof. Fact 5.14 Let A, B ∈ Cn×n and v1 , . . . , vn be an orthonormal basis of Cn . Then, the following are true: • Tr[AB] = Tr[BA] (cyclic property of trace) P • kAk2F = ni=1 λ2i where λi ’s are the singular values of A. P • kAk2F = ni=1 vi† · A† · A · vi • kA · BkF ≤ kAk2 · kBkF and kA · BkF ≤ kAkF · kBk2 . 5.2 Proofs of Claim 5.5 and Claim 5.6 In this subsection, we prove claims 5.5 and 5.6. Claim B.5 Let α00 ≥ 1/10. Also, let xt be a node in the true tree and xp be the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W with dimension β. Let kL(yt )kW = α, kL(zt )kW = α0 and kL(xt )kW = α00 . Then, provided kL(yp ) − L(yt )kW ≤ 5 5 5 5 5 5 λ · ww ww β log(1/α) + λww β 2β 6w and kL(zp ) − L(zt )kW ≤ λ · ww ww β log(1/α0 ) + λww β 2β 6w . 5 5 Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β log(1/α00 ). If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ). Proof: The fact that if L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ), follows trivially from Lemma 2.4. We let L(yp ) = L(yt ) + Ey and L(zp ) = L(zt ) + Ez . We observe that by definition kEy kW ⊥ = kEz kW ⊥ = 0. By Lemma 2.4, we get that since we define L(xp ) = L(yp ) ·H L(zp ), hence L(xp )W ⊥ = Id and hence kL(xp ) − L(xt )kW ⊥ = 0. Now, for Ex = L(xt ) − L(xp ), kEx kW ≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ 21 (6) Note that kL(xt )kW ≤ kL(yt )kW kL(zt )kW . Also, let us put kL(yt )kW = γ1 and kL(zt )kW = γ2 . Hence, we get that kL(xp ) − L(xt )kW ≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ 5 5 5 ≤ ((λ · ww ww β (log(1/γ1 ) + log(1/γ2 )) + 2λww β 2β 6w )(1 − w−3w ) + λ 5 5 ≤ (λ · ww ww β (log(1/γ1 γ2 )) The above uses the fact that γ1 , γ2 ≤ 1 − w−3w . Claim B.6 Let xt be a node in the true tree and xp be the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let the non-trivial subspace of xt , yt and zt be W such that dim(W ) = β. Let kL(xt )kW < 1/10. Assume that for j ∈ {y, z}, the following holds ( 5 5 5 λ · ww ww β log(1/kL(jt )kW ) + λww β 2β 6w if kL(jt )kW ≥ 1/10 kL(jp ) − L(jt )kW ≤ (7) 5 5 5 λ · ww ww β w6w + λww β 2β 6w otherwise 5 5 Then, kL(xp ) − L(xt )kW ≤ λ · ww ww β w6w . If L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ). Proof: The fact that if L(zp ) and L(yp ) act trivially on W ⊥ , so does L(xp ), follows trivially from Lemma 2.4. We begin by observing that if for j ∈ {y, z}, kL(jt )kW ≥ 1/10, then 5 kL(jp ) − L(jt )kW ≤ 5λ · ww ww 5β As was the case with abelian groups, we divide this into three cases. We also let L(yp ) = L(yt ) + Ey and L(zp ) = L(zt ) + Ez . We observe that by definition kEy kW ⊥ = kEz kW ⊥ = 0 • Both kL(yt )kW , kL(zt )kW ≥ 1/10. • Exactly one of kL(yt )kW or kL(zt )kW ≥ 1/10. We assume without loss of generality that • Both kL(yt )kW , kL(zt )kW < 1/10. We note that the first case was handled by Claim 5.5. Now, we come to the second case. We recall that kL(zt )kW ≥ 1/10. kL(xp ) − L(xt )kW ≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ 5 5 5 5 ≤ (1 − w−3w )(λ · ww ww β w6w + λww β 2β 6w ) + 5λ · ww ww ≤ λ·w w5 w w5 β w 5β +λ 6w Next, we deal with the third case kL(xp ) − L(xt )kW ≤ kEy kW · kL(zt )kW + kEz kW · kL(yt )kW + λ 1 1 5 5 5 5 5 5 (λ · ww ww β w6w + λww β 2β 6w ) + (λ · ww ww β w6w + λww β 2β 6w ) + λ ≤ 10 10 5 5 ≤ λ · ww ww β w6w 22 5.3 Proof of Claim 5.7 Claim B.7 Let xt be a node in the true tree and xp be a node in the corresponding node in the pseudo tree. Also, assume that yt and zt are the children of xt and yp and zp are the children of xp . Further, let Wx , Wy and Wz be the non-trivial subspaces of L(xt ), L(yt ) and L(zt ) and dim(Wy ) = β1 , dim(Wz ) = β2 and dim(Wx ) = β. Also, Wy 6= Wy ∩ Wz 6= Wz . Then, • β > β1 and β > β2 5 5 5 5 Also, if kL(yp )−L(yt )kWy ≤ ww ww β1 w6w +λww β1 6w 2β1 and kL(zp )−L(zt )kWz ≤ ww ww 5 λww β2 6w 2β2 , then ( 5 5 λ · ww ww β log(1/kL(xt )kW ) if kL(xt )kWx ≥ 1/10 kL(xp ) − L(xt )kWx ≤ 5 5 λ · ww ww β w6w otherwise 5β 2 w6w + (8) If L(yp ) acts trivially on Wy⊥ and L(zp ) acts trivially on Wz⊥ , then L(xp ) acts trivially on Wx⊥ . Proof: If L(yp ) acts trivially on Wy⊥ , then it acts trivially on Wx⊥ as well. Similarly, if L(zp ) acts trivially on Wz⊥ , then it does so on Wx⊥ as well. Now, by Lemma 2.4, L(xp ) acts trivially on Wx⊥ as well. Note that because L(xt ) acts non-trivially on Wx , hence we get that kL(xt )kWx ≤ 1 − w−3w . kL(xp ) − L(xt )k ≤ kL(yp ) − L(yt )k + kL(zp ) − L(zt )k + λ 5 5β 5 5β ≤ λ · ww ww ≤ λ · ww ww 5.4 1 5 w6w + λ · ww ww 5β 2 5 w6w + λ + 2λww β 2β 6w log(1/(1 − w−3w )) Convergence lemma for chains In this subsection, we prove Claim 5.8. We prove the claim in two parts. In order to understand the two claims, let us recall the situation we have. There are nodes x0 , x1 , . . . , xt and y1 , . . . , yt in the true tree and the corresponding nodes x00 , x01 , . . . , x0t and y10 , . . . , yt0 in the pseudo tree. Also, xi is the parent of xi−1 and yi and likewise, x0i is the parent of x0i−1 and yi0 . Note that this means that label L(xi ) is the product of labels L(xi−1 ) and L(yi ). In case of the pseudo tree, L(x0i ) is the expander product of labels L(x0i−1 ) and L(yi0 ). In order to bound the difference between L(xt ) and L(x0t ), we define an intermediate process by defining nodes L(x00i ) such that L(x00i ) is obtained by the product of L(x00i−1 ) and L(yi0 ). Also, L(x000 ) is simply L(x00 ). We first make the following claim. Claim 5.15 Consider the node xt and x00t as defined above. Let W be the non-trivial subspace of 5 L(xt ) and let dim(W ) = β. Then, kL(xt ) − L(x00t )kW ≤ λ · 2β−1 ww β w6w . Proof: We note that L(x00t ) is product of L(x000 ) and L(yi0 )’s in some order and L(xt ) is the product of L(x0 ) and L(yi )’s in the same order. Here we will take advantage of the fact that multiplication of matrices is associative (as opposed to expander product which is not necessarily associative). Let us construct two trees, call them the rearranged tree and the intermediate tree where for the moment we just specify the leaves. The leaves of the rearranged tree are the L(yi )’s and L(x0 ) 23 arranged in the correct permutation (in which they need to be multiplied) . The leaves of the intermediate tree are arranged in the same order except the corresponding leaves are marked by L(yi0 )’s as opposed to L(yi )’s. Also, in place of L(x0 ), we have L(x000 ). We now describe the non-leaf nodes of the tree (note this is not going to be a balanced binary tree in general). First, we give a definition. When we construct the trees, then any node x in the rearranged tree will have an obvious label which is the product of labels of its children. Similarly, any node x00 in the intermediate tree will be the product of the labels of its children. We call a node x in the rearranged tree “good”, if for the corresponding node x00 in the intermediate tree, the following is true : ( 5 5 λ · ww ww β log(1/kL(x)kWx ) if kL(x)kWx ≥ 1/10 00 kL(x) − L(x )kWx ≤ (9) 5 5 λ · ww ww β w6w otherwise Here Wx is the non-trivial subspace of L(x). Now, it should be clear that it does not matter how we construct the tree but if we can show for z which is the root of the rearranged tree and z 00 which 5 is root of the intermediate tree kL(z) − L(z 00 )kWz ≤ λ · 2β ww β w6w , then we are done. That is, 5 this implies that kL(xt ) − L(x00t )kW ≤ λ · 2β ww β w6w where W is the non-trivial subspace of L(xt ) which is the same as that of L(xt ). We now start by observing that in the beginning, every node in the rearranged tree is good. Now, assume we can find two adjacent nodes in the rearranged tree (call them a and b) such that one of the following is true: • Node a and b are adjacent and the non-trivial subspace of both of these nodes is identical. • The non-trivial subspace of L(a) and L(b) are such that neither of them is contained within the other In case any of the above two things happen, then note that we construct a node c in the correct tree, make a and b its children. Similarly, we can construct c00 in the intermediate tree, make a00 and b00 its children. Also, using Claims 5.5, 5.6 and 5.7, it is easy to check that since a and b were “good”, c continues to remain good. Let us call the nodes which have no parent as active nodes. We have reduced the number of active nodes by 1 by this process. We can continue this process until we are in a situation where any two adjacent active nodes (adjacency among active nodes is defined in the obvious way i.e. the permutation in which the matrices should be multiplied) are such that the non-trivial subspace of one of them is strictly contained in the non-trivial subspace of the other one. Let the active node which has smallest non-trivial subspace dimension be a (there may be many of them). Let the non-trivial subspace dimension be γ. We find all such active nodes. Clearly, none of them are adjacent. Now, for every such node, we make a pair with the node left of this one (this is arbitrary). For any such pair (a, b), we create a node c in the correct tree and c00 in the intermediate tree such that L(c) = L(a) · L(b) and L(c00 ) = L(a00 ) · L(b00 ). Note that there are two things which happen after this process : • The non-trivial subspace of any active node after this process has dimension at least γ + 1 5 5 • While the active nodes may not all be good, they are all λ · ww ww γ w6w -close to being good. We remark that by saying a node x is δ-far from being good, we simply mean that kL(x)−L(x00 )kW is at most δ more than what it would have been had x been a “good” node. We now make the following inductive claim. 24 Claim 5.16 At any point, if the active node with the smallest (dimension) non-trivial subspace 5 has dimension γ + 1, then every active node is at most 2γ λ · ww γ w6w far from being good. Clearly, this is true in the beginning. If we find two active nodes a and b such that • Node a and b are adjacent and the non-trivial subspace of both of these nodes is identical. • The non-trivial subspace of L(a) and L(b) are such that neither of them is contained within the other Then, we create an active node c with its children as a and b. We notice that Claim 5.5, 5.6 and 5 5 5.7 will imply that c is good even though a and b are only λ · ww ww γ w6w close to being good (Note that we are using that dimension of every active node is at least γ + 1). In case, we are again stuck then we simply observe that we can again pair all active nodes of dimension γ + 1 with active nodes which have dimension at least γ + 2. Note that after this, all 5 5 5 5 remaining nodes will be λ2γ ww ww γ w6w + λww ww (γ+1) w6w far from being good. However, 5 5 5 λ2γ ww ww γ w6w + λww ww 5 (γ+1) 5 w6w ≤ λ2γ+1 ww ww 5 (γ+1) w6w This concludes our proof. At this point, having proven that kL(xt ) − L(x00t )kW is small i.e., bounded by a constant dependent just on the dimension of the non-trivial subspace, it is very easy to get O(log n·(log log n+log(1/)+ log(1/)) seed. This is because we need to bound kL(x0t ) − L(x00t )kW and an obvious bound on this quantity is λ · t. We formalize the claim below. Claim 5.17 Let us have nodes x00 , y10 , . . . , yt0 , each having a labeling L(x00 ), L(y10 ), . . ., L(yt0 ). Let us define nodes x01 , . . . , x0t and x001 , . . . , x00t . Let us define their labellings as follows. L(x0i ) = L(x0i−1 ) ·H L(yi0 ) and L(x00i ) = L(x00i−1 ) · L(yi0 ). Then kL(x00t ) − L(x0t )k ≤ tλ. Proof: Follows by a simple hybrid argument. Now as length of the chain is bounded by O(log n), we will get a bound of O(λ · log n). This fact can be used to get an overall error of O(λ · c(w) · log n) where c(w) is a constant solely dependent on w. This implies one can choose λ = /(c(w) · log n) which means one can get a seed length of O(log n · (log log n + log(1/)) for constant width branching programs. This reproduces the results of [BRRY10, BV10]. However, we want to get an upper bound which is a constant rather than something dependent on n. The following subsection achieves the same. 5.4.1 Showing that kL(x0t ) − L(x00t )kW is small Now, to prove Claim 5.8, what remains to be shown is that ||L(x0t ) − L(x00t )||W is bounded by 5 λ2β ww β w6 where W is the non-trivial subspace of the labeling L(xt ) and the dimension of W is β. For this part of the proof, we need to consider a slightly different notion of norm which we define next. Definition 5.18 Consider some matrix M ∈ Cw×w . Let V be some subspace of C[w]×[w] . Then, define kM kF,V as the length of the projection of M on V . Note that when V = C[w]×[w] , then kM kF,V is simply the Frobenius norm of M . 25 We now introduce a bit more notation. Let V, V 0 , W be subspaces such that W ⊆ V and V 0 is the orthogonal complement of W with respect to V . Let dim(V ) = n and A ∈ Cn×n be a matrix. Note that we can choose a orthonormal basis for V such that the first dim(W ) elements form an orthonormal basis for W and the next dim(V 0 ) elements form a basis for V 0 . Hence, we can use the elements of this basis to label the rows and columns of A in the natural way. Now, we let AV 0 ,V 0 denote the block of A where both the rows and columns from the indices labelled by the orthonormal basis for V 0 and likewise for AW,W . We also let AV 0 ,W represent the block where the rows are the indices labelled by the basis for V 0 and the columns are the indices labeled by basis of W . With the notation in place, we make the following important claim. Claim 5.19 Let ρ1 , ρ2 : {0, 1}n → Cw×w . Also, let the non-trivial subspace for all elements in the domain of ρ1 be V and the non-trivial subspace of all elements in the range of ρ2 be W with W ⊆ V . Let Γ1 : {0, 1}t → {0, 1}n and Γ2 : {0, 1}t → {0, 1}n . Now, consider the two products 1 X 1 X A= t ρ1 (Γ1 (x)) B= t ρ2 (Γ2 (x)) 2 2 t t x∈{0,1} x∈{0,1} Let V = W ⊕ V 0 i.e. V 0 is the orthogonal subspace of W when the ambient space is V . Consider the matrix C = A · B − A ·H B. Then, CV 0 ,V 0 = 0 and CW,V 0 = 0. Proof: To see this, note that for every z, ρ2 (z) is of the following form. I 0 ρ2 (z) = 0 Az where the first block of rows correspond to V 0 and the next block of rows correspond to W . Now, for any matrix M , let MV,W ∈ Cdim(V )×dim(W ) correspond to block of M with rows from V and columns from W (In particular, it makes sense whenever W = V or W = V 0 ). Hence, a matrix M can be written in the block matrix form as MV 0 ,V 0 MV 0 ,W M= MW,V 0 MW,W In particular, the product of any matrix ρ2 (z) with M is MV 0 ,V 0 MV 0 ,W · Az M · ρ2 (z) = MW,V 0 MW,W · Az Now, it is obvious that (A · B)V 0 ,V 0 = AV 0 ,V 0 and (A · B)W,V 0 = AW,V 0 . However, using that the graph H is regular, we also get that (A ·H B)V 0 ,V 0 = AV 0 ,V 0 and (A ·H B)W,V 0 = AW,V 0 . This implies that CW,V 0 = 0 and CV 0 ,V 0 = 0. Analogous to the above claim, we also have the following claim : Claim 5.20 Let ρ1 , ρ2 : {0, 1}n → Cw×w . Also, let the non-trivial subspace for all elements in the domain of ρ1 be V and the non-trivial subspace of all elements in the range of ρ2 be W with W ⊆ V . Let Γ1 : {0, 1}t → {0, 1}n and Γ2 : {0, 1}t → {0, 1}n . Now, consider the two products 1 X 1 X A= t ρ1 (Γ1 (x)) B= t ρ2 (Γ2 (x)) 2 2 t t x∈{0,1} x∈{0,1} Let V = W ⊕ V 0 i.e. V 0 is the orthogonal subspace of W when the ambient space is V . Consider the matrix C = B · A − B ·H A. Then, CV 0 ,V 0 = 0 and CV 0 ,W = 0. 26 As we have said before, we will be dealing with subspaces of matrices. In case, M ∈ Cw×w , note 2 that we can view M as an element of Cw . Also, observe that if Cw = V1 ⊕ V2 where V1 and V2 are orthogonal, then MV1 ,V2 can be viewed as a projection of M along the subspace with the rows indexed by V1 and the columns indexed by V2 . We also use kM kF,V1 ,V2 to denote the length of this projection. Next, we prove the following claim. Claim 5.21 Let A : W → W be an operator such that for any x ∈ W , such that the singular value of A is bounded by 1 − . Consider the operator B = A ⊕ IdW ⊥ . Then, kX · BkF,W,W ≤ (1 − )kXkF,W,W (X · B)W,W ⊥ = XW,W ⊥ Proof: kX · BkF,W ⊥ ,W ≤ (1 − )kXkF,W ⊥ ,W (X · B)W ⊥ ,W ⊥ = XW ⊥ ,W ⊥ Let us write X, B and X · B in the block matrix form. X1 X2 I 0 X1 X2 A X= B= X ·B = X3 X4 0 A X3 X4 A From this point, the second part of the claim is obvious. Also, (X · B)W,W = XW,W · A. Using the bound on the singular value of A, we get that kX · BkF,W,W ≤ (1 − )kXkF,W,W Likewise, we get the other claim. The next claim is analogous to the last one. Claim 5.22 Let A : W → W be an operator such that for any x ∈ W , such that the singular value of A is bounded by 1 − . Consider the operator B = A ⊕ IdW ⊥ . Then, kB · XkF,W,W ≤ (1 − )kXkF,W,W (B · X)W ⊥ ,W = XW ⊥ ,W kB · XkF,W,W ⊥ ≤ (1 − )kXkF,W,W ⊥ (B · X)W ⊥ ,W ⊥ = XW ⊥ ,W ⊥ 2 Now, as we have said, we will treat a matrix X ∈ Cw×w as an element in Cw . Further, suppose B ∈ Cw×w is another matrix. Then, note that we have a linear transformation Br (defined by the matrix B) as follows Br : Cw×w → Cw×w as Br : X 7→ X · B. Likewise, we can define B` : X 7→ B · X. For a map B` or Br as defined above, we can define an invariant subspace namely a subspace which B` (or Br respectively) maps to itself. The space orthogonal to the invariant subspace will be called the non-trivial subspace. In particular, for the kind of B defined in Claim 5.21 and Claim 5.22, in the space orthogonal to its invariant subspace, B` and Br are “contractive” i.e. they shrink every vector by a factor of 1 − . For a map A like the above, we use Inv(A) to denote its invariant space and Inv ⊥ (A) to denote the space orthogonal to Inv(A). We call such maps A as “label maps”. Note that for any node xt in the true tree, L(xt ) defines maps like the above i.e. L(xt )` : X 7→ L(xt ) · X and L(xt )r : X 7→ X · L(xt ). The maps of interest to us (which will become clear after a bit) are maps defined in the following way : Let A`,i : X 7→ Li · X and Ar,i : X 7→ X · Li where Li is some label in the true tree. Now, finally consider the map An : X 7→ A`,n ◦ Ar,n ◦ . . . A`,1 ◦ Ar,1 (X). Q We first note Q that by associativity of matrix multiplication, this map is the same as X 7→ ( ni=1 A`,i ) · X · ( ni=1 Ar,i ). Note that the map An is basically a composition of label maps. However, it also satisfies the following properties (given in the next claim) : 27 Claim 5.23 Let us defined map An as above and let L be a label in the true tree such that we define A0n : X 7→ L · (An (X)) (Here An (X) denotes the output when the map An is applied to X). Let W = Inv(An ) and W 0 = Inv(A0n ) and W 0 ( W . If Let W ⊥ and W 0⊥ represent the orthogonal complements of W and W 0 . If Y is such that the length of its projection along W 0⊥ is√ and along 2 W ⊥ is δ, then its projection along the space orthogonal to the Inv(L) is at least w−6w · 2 − δ 2 −δ. 2 Proof: We start by showing that for any v ∈ W 0⊥ , kA0n · vk ≤ (1 − w−6w )kvk2 . To see this, note that the map A0n can be realized in the following way : First, Qnbegin byt writing X as an element 2 in Cw by writing it in row major order and then multiply ( i=1 Ar,i ) to the left. This has the Qn t same effect as multiplying X (as a matrix) by ( i=1 Ar,i ) to its right. Then, permute the elements of v to change from Q row major order to column major order. Subsequently, the resulting vector is multiplied by ( ni=1 A`,i ). This gives An (X). Now, multiplying again by L to the left achieves A0n (X). Note that we can realize all the steps by a permutation branching program of size w2 . In 2 particular, we can apply Lemma 5.11 to conclude that kA0n · vk ≤ (1 − w−6w )kvk2 . Now, this holds even for a v which lies in W and hence lies in W ∩ W 0⊥ . In other words, for such a v, An · v = v. −6w2 kvk on the space orthogonal to Now, this means that such a v has a projection of size 2 √ w Inv(L). We now see that Y has at least a projection of 2 − δ 2 on W ∩ W 0⊥ from which we derive the result. We now present the main convergence lemma of this section. We recall the setting for this. In 0 . Further, the labelings are as follows. the pseudo tree, we have nodes x00 , x01 , . . . , x0m and y10 , . . . , ym For i > 0, there are two possible cases : Either, L(x0i ) = L(x0i−1 ) ·H L(yi0 ) (corresponding to right multiplication) or L(x0i ) = L(yi0 ) ·H L(x0i−1 ) (corresponding to left multiplication). Correspondingly, we also defined the intermediate process where L(x00i ) = L(x00i−1 ) · L(yi0 ) or L(x00i ) = L(yi0 ) · L(x00i−1 ) with L(x000 ) = L(x00 ). We will also define label maps Ai , A”i : Cw×w → Cw×w in the following way : For the case of left multiplication, Ai = A`,i ◦Ai−1 and A00i = A”`,i ◦A”i−1 . Likewise, for the case of right multiplication, Ai = Ai−1 ◦ Ar,i and A”i = A”i−1 ◦ A”r,i The proof of the next lemma essentially follows the proof of the “Key Convergence Lemma” from [KNP10] by doing an induction on the dimension of the non-trivial subspace of these label maps. 0 be nodes in the pseudo tree as defined earlier and Lemma 5.24 Let x00 , x01 , . . . .x0m and y10 , . . . , ym x000 , x001 , ... , x00m be the nodes of the intermediate process as defined earlier. Let the non-trivial subspace of A0i (for all i) be strictly contained in the non-trivial subspace of A00 . Let constants hi and di (defined recursively as follows) d0 = w−9w h0 = λ Let the non-trivial subspace of Q` 0 i=1 Ai dm = hm = w−20w dm−1 600 1200w10w hm−1 dm−1 be V1 . Let dim(V1 ) = β1 . Then, ||L(x00` ) − L(x0` )||V1 ≤ max{hβ1 , ||L(x000 ) − L(x00 )||F,V1 (1 − dβ1 )} Proof: We are going to prove this by induction on the dimension of V1 . First of all, if the dimension of V1 = 0, then there is nothing to prove. 28 So, we now assume that dimension of V1 ≥ 1. Let dim(V1 ) = m. Let V be the non-trivial subspace of A0i . We first prove the assertion for the special case when V ( V1 . Now, assume that A0` = A0`−1 · A0r,` (the case of left multiplication is exactly the same). Let the non-trivial subspace of A0r,` be V 00 . Having fixed the notation, let us define γ1 = kA00 − A000 kF,V1 . Let us also define α = w−10w γ1 . m−1 0 00 We first consider the case when α ≥ 2h dm−1 . We claim that in this case, kA` − A` kF,V1 ≤ (1 − dm )kA00 − A000 kF,V1 . To analyze this, we further divide this into subcases. First case is when : kA00 ) − A000 kF,V ≥ α . Note that in this case, α ≥ 2hm−1 which means that kA00 − A000 kF,V ≥ 2hm−1 . By induction hypothesis, we can say that kA0`−1 − A00`−1 kF,V ≥ (1 − dm−1 )kA00 − A000 kF,V . Note that by Claim 5.21, we can also say that kA0`−1 − A00`−1 kF,V ⊥ = kA00 − A000 kF,V ⊥ Here V ⊥ represents the orthogonal space of V when the ambient space is V1 . This implies that kA0`−1 − A00`−1 k2F,V1 ≤ kA00 − A”0 k2F,V1 − (2dm−1 + d2m−1 )kA00 − A000 k2F,V This also implies that kA0`−1 − A00`−1 kF,V1 ≤ kA00 − A000 kF,V1 (1 − w−10w dm−1 ) Further, from here, we get that kA0` − A00` )kF,V1 ≤ kA00 − A000 kF,V1 (1 − w−10w dm−1 ) + λ ≤ kA00 − A000 kF,V1 (1 − w−20w dm−1 ) = kA00 − A000 kF,V1 (1 − dm ) The penultimate inequality uses that (w−10w dm−1 γ1 )/2 ≥ αdm−1 /2 ≥ hm−1 ≥ λ The second possibility is : kA00 − A000 kF,V < α. This implies the following two things : By induction hypothesis, kA0`−1 − A00`−1 kF,V < α + hm−1 Also, by Claim 5.23, we get that kA00 − A000 kF,V 00 ≤w −4w q γ12 − α2 From this, we also get that kA0`−1 − A00`−1 kF,V 00 ≤w −4w q γ12 − α2 − α − hm−1 Now, using Claim 5.21, we get that kA0`−1 − A00`−1 kF,V 00 ≤ (1 − w−3w )(w−4w q γ12 − α2 − α − hm−1 ) This implies that kA0` − A00` kF,V 00 ≤ γ1 + hm−1 − w−3w (w−4w q γ12 − α2 − α − hm−1 ) ≤ γ1 (1 − w−7w ) + hm−1 (1 + w−3w ) + w−3w α 29 Plugging the values, we get that the above expression is bounded by γ1 (1 − dm ). We now consider the case when α ≤ 2hm−1 dm−1 . In this case, by induction hypothesis, we can say that kA0`−1 − A00`−1 kF,V < α + hm−1 From the fact that γ1 = w10w α ≤ 2w10w hm−1 , dm−1 we get that kA0`−1 − A00`−1 )kF,V1 < 3w10w hm−1 dm−1 From this, we get that kA0` − A00` kF,V1 < 3w10w hm−1 4w10w hm−1 +λ≤ dm−1 dm−1 So, we see that one of the two things happen : • If γ1 ≥ w10w hm−1 dm−1 , then kA0` − A00` )kF,V1 ≤ (1 − dm )γ1 = (1 − dm )kA00 − A000 kF,V1 . • If γ1 ≤ w10w hm−1 dm−1 , then kA0` − A00` kF,V1 ≤ kA00 − A000 kF,V1 + 4w10w hm−1 . dm−1 Q 0 We had assumed that the non-trivial subspace of `−1 i=1 Ar,i (we are assuming right multiplications Q here but its the same with left multiplications) was strictly contained in `i=1 A0r,i . This need not Q be true in general but we can do the following. If the non-trivial subspace of `i=1 A0r,i is W , then we break the sequence into blocks i.e., we break [`] into contiguous subsets {1, . . . , s1 }, {s1 + 1, . . . , s2 }, . . . , {sk , . . . , sk+1 = `}. This block structure has the property that with the possible Qsj+1 A0r,i is W but the non-trivial subspace exception of the last block, the non-trivial subspace of i=s j Qsj+1 −1 0 Ar,i is strictly contained in W . i=sj Now consider any block {a, . . . , b}. Either,kA00 − A000 kF,V1 goes down by a factor of dm > 0 or in case 10w h 10w h m−1 m−1 it is less than w dm−1 , then it can increase by at most a constant namely 4w dm−1 . Therefore, 10w hm−1 if the end of the penultimate block is j, then kA00 − A000 kF,V1 ≤ 5w dm−1 . The last block can be now dealt with recursively since the dimension of the non-trivial subspace of the labels has gone down by at least 1. 3 We now notice that hm ≤ 10w ww (we use that m ≤ w2 ). Putting, everything together, we get 3 that kA0t − A00t k − kA00 − A000 k ≤ 10w ww which proves our claim. 6 Pseudorandomness for regular branching programs In this section, we show that constant width regular branching programs can be fooled with a seed of length O(log n · (log log n + log(1/))). As before, the PRG we use continues to be the INW pseudorandom generator. Our analysis will have some similarities with the analysis of the permutation branching programs but will be significantly different in parts. We now state the formal theorem we will prove. 3 Theorem 6.1 Let Γ : {0, 1}t → {0, 1}n be the INW generator with λ = · 2−7w · log3w n. Then the output of Γ(Ut ) is -indistinguishable from Un for read-once width w regular branching programs of length n. 30 Using the standard INW generator, we get the following corollary. Corollary 6.2 There is a polynomial time computable function Γ : {0, 1}t → {0, 1}n with t = O(log n · (w3 + w · log log n + log(1/)) such that Γ(Ut ) is -indistinguishable from Un for read-once regular branching programs of width w. To prove this theorem, as before we will construct a true tree and a pseudo tree and the corresponding labelings on the trees. Our analysis however will be somewhat different. The starting point of the analysis is the following observation (using Hall’s theorem). Observation 6.3 Let M0i and M1i be the walk matrices corresponding to the transition from layer i to layer i+1 in a regular branching program. Then, up to renaming of nodes, there is a permutation matrix π such that Id + π = M0i + M1i . Proof: Look at all the edges from layer i to layer i + 1. Note that the total number of edges coming out of every node in the ith layer is 2 and so is the total number of edges coming into any vertex in the (i+1)th layer. Therefore, by Hall’s theorem, we can partition these edges into two sets of equal size such that each of them corresponds to a matching between layer i and layer i + 1. Of course, once we have the matchings, we can rename the vertices such that one of them corresponds to the identity permutation while the other corresponds to some permutation π. This proves our claim. Thus labels of the true tree correspond to a certain permutation branching program. In particular, for a node xt in the true tree, as before we can define the fixed point subspace and the nontrivial subspace. The important difference between the case of regular branching programs and permutation branching programs, of course, is that while the labelings of the pseudo tree had the same fixed point and non-trivial subspace as the labelings of the true tree for permutation branching programs, in the case of regular branching programs, it is no longer the case. We begin by making several observations. Claim 6.4 Let L(xt ) be the labeling of the true tree and L(xp ) be the label of the corresponding node in the pseudo tree. If W is the fixed point subspace of L(xt ), then for every x ∈ W , x·L(xp )·x = kxk22 Proof: Note that as L(xt ) is a convex combination of permutation matrices, the fixed point subspace of L(xt ) has the following structure : It is the direct sum of one dimensional vector spaces where each of this one-dimensional vector spaces is uniform over a subset H. Further, the subsets corresponding to different one-dimensional spaces are disjoint. Call these subsets Hi . Now, note that L(xp ) is a convex combination of product of walk matrices such that all the walk matrices have the property that vertices in Hi are mapped to vertices in Hi . Now, let yi be a vector which is uniform on Hi . Consider any product of these walk matrices (Say X). Then, it is easy to see that yi ·X ·yi = kyi k22 . Thus, it holds that if y is a vector in the fixed point subspace, then y ·X ·y = kyk22 . Since it holds for a every product of walk matrices, it also holds for their convex combination and hence y · L(xp ) · y = kyk22 . From the above observation, we can conclude the following : At any stage of the computation, L(xt ) and L(xp ) can be block diagonalized to have the following structure. " # " # I 0 I E0 L(xt ) = L(xp ) = 0 A E A + EW 31 In the above, A is the operator corresponding to the subspace W which is the non-trivial subspace of L(xt ). Thus, the error matrix " # 0 E0 L(xp ) − L(xt ) = E EW With this observation, the way in which we bound the norm of the error L(xp ) − L(xt ) will be as in the case of the permutation branching programs. Consider a node xt in the true tree, xp in the pseudo tree and let yt and zt be the children in the true tree and yp and zp be the corresponding children in the pseudo tree. We will consider three cases : • The non-trivial subspace of L(yt ) and L(zt ) are exactly the same. • The non-trivial subspace of L(yt ) and L(zt ) are incomparable. • The non-trivial subspace of L(yt ) is strictly contained in L(zt ) or the other way round. More precisely, let us assume that for any xt such that the dimension of its non-trivial subspace is d − 1, we can bound the error kL(xt ) − L(xp )k2 by a quantity E(d − 1, n, w). We then try to bound E(d, n, w) in terms of E(d − 1, n, w). To do this, for every node xt in the true tree whose non-trivial subspace is d is assigned an integer (call it Γd (xt )). In case, xt is such that both its children have a non-trivial subspace of dimension less than d, then Γd (xt ) = 1. If only one of the children of xt has a non-trivial subspace of dimension d (say the child is yt ), then Γd (xt ) = Γd (yt ). In all other case, Γd (xt ) = Γd (yt ) + Γd (zt ). We now consider all nodes xt (such that L(xt ) has a non-trivial subspace of dimension d) and the non-trivial subspaces of L(yt ) and L(zt ) are incomparable. Let the set of these nodes be called Ad,1 . This means that the non-trivial subspaces of L(yt ) and L(zt ) have dimension at most d − 1. The following claim bounds the error for all such nodes xt . Claim 6.5 Let xt ∈ Ad,1 . Then, kL(xt ) − L(xp )k ≤ 3(E(n − 1, d, w) + λ). Proof: kL(xt )−L(xp )k ≤ kL(yt )k·kL(zt )−L(zp )k+kL(zt )k·kL(yt )−L(yp )k+kL(zt )−L(zp )k·kL(yt )−L(yp )k+λ Now, as we have said, L(xt ), L(yt ) and L(zt ) are doubly stochastic matrices (using Observation 6.3) and hence we can say that their 2-norms are bounded by 1. In particular, we get that kL(xt ) − L(xp )k ≤ kL(zt ) − L(zp )k + kL(yt ) − L(yp )k + kL(zt ) − L(zp )k · kL(yt ) − L(yp )k + λ (10) Now, we know that kL(zt ) − L(zp )k, kL(yt ) − L(yp )k ≤ E(d − 1, n, w). Using E(d − 1, n, w) ≤ 1, we get that kL(xt ) − L(xp )k ≤ 3E(d − 1, n, w) which proves the claim. We now consider nodes xt whose non-trivial subspace have dimension d and the non-trivial subspace of L(yt ) is strictly contained in that of zt or vice-versa. We call this set of nodes as Ad,2 . To state the next claim, we use h(xt ) to denote the length of the longest path starting at xt and going down the tree composed entirely of nodes from Ad,2 . 32 Claim 6.6 For any node xt ∈ Ad,2 and height h(xt ), kL(xt ) − L(xp )k ≤ 3(E(n − 1, d, w) + λ)h(xt ) Proof: We will prove this by induction. In the base case, h(xt ) = 1, then xt ∈ Ad,1 and then our claim trivially holds. Now, we do the inductive case. In this case, assume L(yt ) is strictly contained in L(zt ) (the other case is symmetric). Now, by induction, kL(zt ) − L(zp )k ≤ 3(E(d − 1, n, w) + λ)(h(zt ) + 1) kL(yt ) − L(yp )k ≤ E(d − 1, n, w) Using (10), we get the claimed bound. This means that for any node xt ∈ Ad,1 ∪ Ad,2 , kL(xt ) − L(xp )k ≤ 3(E(d − 1, n, w) + λ) · log n as h(xt ) is always bounded by log n. We now consider nodes of the third kind namely nodes xt whose non-trivial subspace is of dimension d and its children yt and zt also have (the same) non-trivial subspace of dimension d. Call the set of these nodes Ad,3 . We make the following observation. Observation 6.7 For a particular subspace W of dimension d, consider all nodes xt such that the non-trivial subspace of xt is W . Then, these nodes form a disjoint union of trees. Also, the leaf nodes of the tree have children from Ad,1 ∪ Ad,2 . With this observation, our aim essentially becomes to bound kL(xt ) − L(xp )k for any node xt which is the root of such a tree. To do this, we start by observing that for any such node xt whose non-trivial space is W , the structure of L(yt ) and L(zt ) are as follows : " # " # " # I Ey00 0 Ey00 I 0 L(yp ) = L(yp ) − L(yt ) = L(yt ) = Ey Ay + Ey0 Ey Ey0 0 Ay Likewise, we can say that " # I 0 L(zt ) = 0 Az " L(zp ) = I # Ez00 " L(zp ) − L(zt ) = Ez Az + Ez0 0 Ez00 Ez Ez0 # Now, note that " L(yt ) · (L(zp ) − L(zt )) = 0 # Ez00 Ay · Ez Ay · Ez0 Similarly, we have that " (L(yp ) − L(yt )) · L(zt ) = 0 Ey00 · Az Ey Az · Ey0 # Finally, " (L(yp ) − L(yt )) · (L(zp ) − L(zt )) = 0 Ey00 · Ez0 # Ey0 · Ez Ey0 · Ez0 + Ey · Ez00 It is important to note here that the last equality uses Claim 6.4 (i.e. in proving one of the entries to be zero). To control the error for nodes in Ad,3 , we will switch from the 2-norm to the Frobenius norm. We begin by noting that L is an P orthonormal basis for W and L0 is orthonormal basis for W ⊥ . Note P that for kL(xp ) − L(xt )k2F = v∈L∪L0 w∈L hw, L(xp ) − L(xt ), vi This can be divided into XX XX hw, L(xp ) − L(xt ), vi + hw, L(xp ) − L(xt ), vi v∈L w∈L v∈L0 w∈L 33 We now define kL(xp ) − L(xt )kF,W = following inequality : P v∈L P w∈L hw, L(xp ) − L(xt ), vi. In particular, we have the kL(xp ) − L(xt )kF,W ≤ kAy k2 · kL(zp ) − L(zt )kF,W + kAz k2 · kL(yp ) − L(yt )kF,W + λ We now observe that " L(xt ) = I # 0 0 Ax " = I 0 (11) # 0 Ay · Az Note the because for any node xt , L(xt ) correspond to labelings from a permutation branching program, (11) is exactly like (6) from Claim B.6. In particular, using the exact same proof, one can get the following claim. Claim 6.8 For any node xt whose non-trivial subspace is W , kL(xp ) − L(xt )kF,W ≤ 3 · w18w · (E(d − 1, n, w) + λ) · log n P P The only term which remains to be bounded is v∈L0 w∈L hw, L(xp ) − L(xt ), vi (denoted henceforth by kL(xp ) − L(xt )kF,W ⊥ ,W ). In other words, we need to bound the norm of the term in the lower left block of L(xp ) − L(xt ). This analysis is again pretty easy. As a starting point, we make the following claim. Claim 6.9 For any node xt in the true tree, kAx k2 ≤ (1 − w−3w )Γd (xt ) . Proof: For Γd (xt ) = 1, this is clearly true (because, as we have remarked before, L(xt ) correspond to labels of a permutation branching program). If Γd (xt ) > 1, then clearly xt ∈ Ad,3 . Then, consider its two children yt and zt . By induction hypothesis, kAy k2 ≤ (1 − w−3w )Γd (yt ) and kAz k2 ≤ (1 − w−3w )Γd (zt ) . Since Γd (xt ) = Γd (yt )+Γd (zt ) and Ax = Ay ·Az (and that 2-norms is submultiplicative), we have the claim. Before making the next claim, we define the quantity γ = 3 · w18w · (E(d − 1, n, w) + λ) · log n (which is simply the error term appearing in Claim 6.8). Claim 6.10 For any node xt , kL(xp ) − L(xt )kF,W ⊥ ,W ≤ (2 · Γd (xt ) − 1)γ. Proof: Again, we prove this by induction. In the base case, i.e., for Γd (xt ) = 1, this is trivially a true assertion. We first observe that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + kL(zp ) − L(zt )kF,W · kL(yp ) − L(yt )kF,W ⊥ ,W Next, we use Claim 6.8 to bound kL(zp ) − L(zt )kF,W . Using this and kAy k ≤ 1 ( and the fact that kL(yp ) − L(yt )kF,W ⊥ ,W ≤ 1), we get that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ Now, using Γd (xt ) = Γd (yt ) + Γd (zt ), we have the proof. Now, let M = w3w log n and consider all nodes xt in Ad,3 such that Γd (xt ) ≤ M . Clearly, for any such node, kL(xp ) − L(xt )kF,W ⊥ ,W ≤ 6 · w21w log3 n · (E(d − 1, n, w) + λ) 34 Again, for notational convenience, we define δ0 = 6 · w21w log3 n · (E(d − 1, n, w) + λ). Now, we consider any node xt in Ad,3 such that Γd (xt ) > M . We now define a quantity E(xt ) for every such node in the true tree. This quantity is defined inductively in the following way : For every node xt with Γ(xt ) ≤ M , define E(xt ) = δ0 and for every other node, xt , define E(xt ) = (1 + 2/ log n) · max{E(yt ), E(zt )}. The following is the main claim. Claim 6.11 For any node xt with Γd (xt ) > M , kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(xt ). Proof: Note that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + kL(zp ) − L(zt )kF,W · kL(yp ) − L(yt )kF,W ⊥ ,W First, we use Claim 6.8 to bound kL(zp ) − L(zt )kF,W by γ. With that, we get kL(xp ) − L(xt )kF,W ⊥ ,W ≤ kAy k · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ Next, we consider two cases : The first case is if Γd (yt ) ≤ w3w log log n and the second case is when Γd (yt ) > w3w log log n. In the first case, we note that because Γd (zt ) = Γd (xt ) − Γd (yt ), hence we have that E(zt ) ≥ δ0 . This means that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(zt ) + γ + kL(yp ) − L(yt )kF,W ⊥ ,W Using Claim 6.10, we get that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ E(zt ) + 2w3w log log n · γ Using the fact that E(zt ) ≥ δ0 = 2w3w log2 n · γ, we get the claim. The next possibility to consider is that Γd (yt ) > w3w log log n. In this case, using Claim 6.9, we get that kAy k ≤ 1/ log n. Then, we get that kL(xp ) − L(xt )kF,W ⊥ ,W ≤ 1 · kL(zp ) − L(zt )kF,W ⊥ ,W + kL(yp ) − L(yt )kF,W ⊥ ,W + γ log n Again using the fact that max E(zt ), E(yt ) ≥ δ0 , we get the claimed bound. Since the height of the INW tree is at most log n, using Claim 6.11, it is clear that for every xt with Γd (xt ) > M , we have log n 2 kL(xp ) − L(xt )kF,W ⊥ ,W ≤ 1 + · δ0 ≤ 10δ0 . log n Likewise, we also get that kL(xp ) − L(xt )kF,W,W ⊥ ≤ 2 1+ log n log n · δ0 ≤ 10δ0 . Thus, combining the last equation with Claim 6.8, we get that for any node xt ∈ Ad,3 , we have that kL(xp ) − L(xt )k ≤ 22δ0 . Since, this is already true for nodes xt ∈ Ad,1 ∪ Ad,2 (from Claim 6.5 and Claim 6.6), we have that for any node xt whose non-trivial subspace is of dimension d, kL(xt ) − L(xp )k ≤ 22δ0 = 72 · w21w log3 n · (E(d − 1, n, w) + λ) This implies that E(d, n, w) ≤ 72 · w21w log3 n · (E(d − 1, n, w) + λ). Setting E(0, n, w) = λ, we get 2 that E(w, n, w) ≤ 72w · w21w log3w n · λ. As the non-trivial subspace of any node xt in the true tree has dimension at most w, we get Theorem 6.1 35 Acknowledgements We would like to thank Omer Reingold, Thomas Steinke, Luca Trevisan, Salil Vadhan, and Avi Wigderson. References [Bar89] David A. Mix Barrington. Bounded-Width Polynomial-Size Branching Programs Recognize Exactly Those Languages in NC1 . Journal of Computer and System Sciences, 38(1):150–164, 1989. 1 [BM84] Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences of pseudorandom bits. SIAM Journal on Computing, 13(4):850–864, 1984. Preliminary version in Proc. of FOCS’82. 1 [BRRY10] Mark Braverman, Anup Rao, Ran Raz, and Amir Yehudayoff. Pseudorandom generators for regular branching programs. In Proceedings of the 51st IEEE Symposium on Foundations of Computer Science, pages 40–47, 2010. 1, 2, 7, 25 [BV10] Joshua Brody and Elad Verbin. The Coin Problem and pseudorandomness for Branching programs . In Proceedings of the 51st IEEE Symposium on Foundations of Computer Science, pages 30–39, 2010. 1, 2, 7, 25 [GMRZ10] Parikshit Gopalan, Raghu Meka, Omer Reingold, and David Zuckerman. Pseudorandom Generators for Combinatorial Shapes. Technical Report TR10-176, Electronic Colloquium on Computational Complexity, 2010. 1, 7 [INW94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for network algorithms. In Proceedings of the 26th ACM Symposium on Theory of Computing, pages 356–364, 1994. 1, 2, 3 [IW97] Russell Impagliazzo and Avi Wigderson. P = BP P unless E has sub-exponential circuits. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 220–229, 1997. 1 [KI04] Valentine Kabanets and Russell Impagliazzo. Derandomizing polynomial identity tests means proving circuit lower bounds. Computational Complexity, 13(1-2):1–46, 2004. 1 [KNP10] Michal Koucký, Prajakta Nimbhorkar, and Pavel Pudlák. Pseudorandomness for group products. Technical Report TR10-113, Electronic Colloquium on Computational Complexity, 2010. 2, 7, 12, 17, 19, 28 [LRTV09] Shachar Lovett, Omer Reingold, Luca Trevisan, and Salil P. Vadhan. Pseudorandom Bit Generators That Fool Modular Sums. In APPROX-RANDOM, pages 615–630, 2009. 1, 7 [Lu02] Chi-Jen Lu. Improved Pseudorandom Generators for Combinatorial Rectangles. Combinatorica, 22(3):417–434, 2002. 1, 7 [MZ09] Raghu Meka and David Zuckerman. Small-Bias Spaces for Group Products. In Proceedings of APPROX-RANDOM, pages 658–672, 2009. 1, 2, 7, 12 36 [Nis92] Noam Nisan. Pseudorandom generators for space bounded computation. Combinatorica, 12(4):449–461, 1992. Preliminary version in Proc. of STOC’90. 1 [NW94] Noam Nisan and Avi Wigderson. Hardness vs randomness. Journal of Computer and System Sciences, 49:149–167, 1994. Preliminary version in Proc. of FOCS’88. 1 [RR99] Ran Raz and Omer Reingold. On recycling randomness in space bounded computation. In Proceedings of the 31st ACM Symposium on Theory of Computing, pages 159–168, 1999. 1 [RVW00] Omer Reingold, Salil P. Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science, 2000. 2 [Sav70] Walter J. Savitch. Relationships between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4(2):177–192, 1970. 1 [Tel05] Constantin Telerman. Lecture notes on Representation theory. http://www.math.berkeley.edu/∼telerman, 2005. 12 [vv10] Jiri Šı́ma and Stanislav Žák. A Polynomial time construction of Hitting set for read-once branching programs of width 3. Technical Report TR10-088, Electronic Colloquium on Computational Complexity, 2010. 2 [Yao82] Andrew C. Yao. Theory and applications of trapdoor functions. In Proceedings of the 23th IEEE Symposium on Foundations of Computer Science, pages 80–91, 1982. 1 37 Available at
© Copyright 2026 Paperzz