d+1 - UCSD CSE

Worst case to Average case
Reductions for Polynomials
Shachar Lovett
Weizmann Institute / Microsoft Research
Joint work with Tali Kaufman
A motivating example
• Let p(x1,…,xn)=f(x1,…,xn)g(x1,…,xn)
– f,g generic polynomials of degree d over F2
• p(x) is a biased degree 2d polynomial
– Pr[p(x)=0] ~ 3/4
• Reason: p is a biased function of f,g
Another motivating example
• p(x)=MAJ(f(x),g(x),h(x))
– f,g,h generic polynomials of degree d
• p(x) is unbiased of degree 2d
• p(x) can be approximated by a lower degree
polynomial
– Pr[p(x)=f(x)] ~ 3/4
• Reason: p is a (unbiased) function of f,g,h
Is this generally true?
• Let p(x1,…,xn) be a degree d polynomial
– d is constant
• Assume that:
– p is biased: Pr[p(x)=0] = ½ + ε
or
– p can be approximated by a lower degree polynomial:
Pr[p(x)=f(x)] = ½ + ε
• Can we deduce a structure theorem for p?
Warm Up – Quadratics
• Assume p(x) is quadratic
• Dixon’s Theorem:
– p(x) = l1(x)l2(x) + l3(x)l4(x) + … + lr-1(x)lr(x) ( + lr+1(x) )
– l1,…,lr+1 linear and independent
– All the non-zero Fourier coefficients of p(x) are 2-r/2
• Assume that:
– Pr[p(x)=0] = ½ + ε
Or
– Pr[p(x)=l(x)] = ½ + ε
(p is biased)
(p is approximated by linear)
• p(x) has a Fourier coefficient 2ε
 r = O(log(1/ε))
• p(x) is a function of O(log(1/ε)) linear functions
Main theorem: general degrees
• p(x1,…,xn) of degree d
• Assume that:
– Pr[p(x)=0] = ½ + ε
or
– Pr[p(x)=f(x)] = ½ + ε
( deg(f) ≤ d-1 )
• Then: p(x)=C(f1(x),…,fk(x))
– f1,…,fk: polynomials of degree at most d-1
– C: any combiner function
– k depends only on d, ε – independent of n !
Functions of polynomials:
Computation vs. Approximation
• pn(x1,...,xn) - family of degree d polynomials
• The following models are equivalent:
– pn can be computed by a constant number of lower degree
polynomials:
pn ( x)  Cn ( f n(1) ( x),..., f n( k ) ( x))
deg( f n(1) ),..., deg( f n( k ) )  d  1,
k  const
– pn can be approximated by a constant number of lower degree
polynomials:
1
Pr[ pn ( x)  Cn ( f ( x),..., f ( x))]   
2
deg( f n(1) ),..., deg( f n( k ) )  d  1, k ,   const
(1)
n
(k )
n
Example for applications
• S4(x1,…,xn) – symmetric polynomial of degree 4
S 4 ( x1 ,..., xn ) 
x x x x
i  j  k l
i
j k l
• To refute the Inverse Conjecture for the Gowers
Norm, needed to prove:
– For any cubic f(x), Pr[S4(x)=f(x)] ≤ ½ + o(1)
• Given our theorem, enough to prove:
– S4 cannot be computed by a constant number of cubics
Bias implies low rank
• Let p(x1,…,xn) be a degree d polynomial
• Bias(p) = E[(-1)p(x)] = Pr[p(x)=0]-Pr[p(x)=1]
– Bias – a measure for the distance of p(x) from uniformity
– Pr[p(x)=0] = ½ + ε  bias(p) = 2ε
• Rankd-1(p) = min k s.t.
p(x)=C(f(1)(x),…,f(k)(x))
– f(1)(x),…,f(k)(x) of degree ≤ d-1
– C:F2k  F2 any combiner function
• Theorem: Bias implies low rank
|Bias(p)| ≥ ε  rankd-1(p) ≤ k(d,ε)
Bias  Approximation
• “Bias implies low rank” is enough for the general theorem
• Assume: Pr[p(x)=f(x)] ≥ ½ + ε, deg(p)=d,
deg(f) ≤ d-1
• Then: Bias(p-f) ≥ 2ε
– p(x)-f(x) = C(f(1)(x),…,f(k)(x))
– p(x) = f(x) + C(f(1)(x),…,f(k)(x))
deg(f(1)),…,deg(f(k)) ≤ d-1
• Assume: Pr[p(x)= C(g(1)(x),…,g(k)(x))] ≥ ½ + ε,
deg(g(i)) ≤ d-1
• Then: There are a1,…,ak  F2 s.t.
bias(p(x)-(a1g(1)(x)+…+akg(k)(x))) ≥ ε 2-O(k)
– p(x)-(a1g(1)(x)+…+akg(k)(x)) = C(f(1)(x),…,f(k’)(x))
Bias implies low rank
• Recall:
–
–
–
–
p(x1,…,xn) is a degree d polynomial
Bias(p) = E[(-1)p(x)]
Rankd-1(p) = min k : p(x)=C(f(1)(x),…,f(k)(x)), deg(f(i)) ≤ d-1
We want: |Bias(p)| ≥ ε  rankd-1(p) ≤ k(d,ε)
• Green & Tao prove this when d < |F|
– Used to prove the Inverse Conjecture for the Gower Norm in this case
– However, The ICGN is false when d >> |F|
– They conjectured that “bias implies low rank” holds even if d >> |F|
• We prove “bias implies low rank” for all constant degrees
– Following Green&Tao proof, with one major change
•
Proof by induction on d.
– d=1: Trivial – any biased linear function is in fact constant
First step: bias amplification
• Assume: bias(p(x)) ≥ ε
• We will generate degree d-1 polynomials f(1)(x),…,f(k)(x) s.t.
Prx[p(x)=C(f(1)(x),…,f(k)(x))] ≥ 1 - 
– k=k(, ε)
– Will use with =2-O(d)
• Derivatives: pa(x) = p(x+a) – p(x)
a  F2n
– pa(x) of degree d-1
• Proof: Fix x, and consider Ea [( 1) pa ( x )  p ( x ) ]
(1)
Ea [( 1) pa ( x )  p ( x ) ]  Ea [( 1) p ( x  a ) ]  bias ( p)
(2)
Ea [( 1) pa ( x )  p ( x ) ]  (1) p ( x ) Ea [( 1) pa ( x ) ]
First step: bias amplification
(1) p ( x ) Ea [( 1) pa ( x ) ]  Ea [( 1) pa ( x ) p ( x ) ]  bias ( p)

(1) p ( x ) 
1
Ea [( 1) pa ( x ) ]
bias ( p)
• Sampling:

k
Prx ,a1 ,..., ak [( 1) p ( x )  sign ( (1)
pa i ( x )
)]  1  
i 1
• There exists a1,…,ak
Prx [ p( x)  MAJ ( pa1 ( x),..., pak ( x))]  1  
First step: bias amplification
Prx [ p( x)  MAJ ( f (1) ( x),..., f ( k ) ( x))]  1  2 O ( d )
• Originally – Lemma of Bogdanov & Viola
– Used to build PRG for low degree polynomials
• We will prove:
– If f(1),…,f(k) are “random enough”, then in fact
p(x) = MAJ(f(1)(x),…,f(k)(x))
for all x  F2n
– Otherwise we “make them random enough”
– Derivatives f(1),…,f(k) of degree ≤ d-1
 use “bias implies low rank” inductively
• You can think of this as a generalization of:
– q(x) of degree d-1, Pr[p(x)=q(x)] > 1-2-d
 p=q
Partitioning the space
Prx [ p( x)  MAJ ( f (1) ( x),..., f ( k ) ( x))]  1  2 O ( d )
• f(1),…,f(k) partition the space F2n into 2k “equal” regions
n
2
F
Partitioning the space
Prx [ p( x)  MAJ ( f (1) ( x),..., f ( k ) ( x))]  1  2 O ( d )
• f(1),…,f(k) partition the space F2n into 2k “equal” regions
f (1) ( x)  0
f (1) ( x)  1
Partitioning the space
Prx [ p( x)  MAJ ( f (1) ( x),..., f ( k ) ( x))]  1  2 O ( d )
• f(1),…,f(k) partition the space F2n into 2k “equal” regions
f (1) ( x)  0
f (1) ( x)  0
f ( 2 ) ( x)  0
f ( 2 ) ( x)  1
f (1) ( x)  1
f (1) ( x)  1
f ( 2 ) ( x)  0
f ( 2 ) ( x)  1
Partitioning the space
Prx [ p( x)  MAJ ( f (1) ( x),..., f ( k ) ( x))]  1  2 O ( d )
F(x)
• F assigns a value to each region
• p is equal to F almost everywhere
 On most regions, p is almost constant (and equal to F)
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
Partitioning the space
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
Good & bad regions
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
• Good regions: bad areas are very small ( 2-O(d) )
• Almost all regions are good ( 1 – 2-O(d) )
Proof strategy
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
The proof has two steps:
1. Good regions are excellent: p=F on all points in region
(i.e. p(x) is constant on good regions)
2. Assuming almost all regions are excellent,
we will prove all regions are excellent
(i.e. p(x) is constant on all regions)
Proof: Step 1
• We will prove: p(x) is constant on good regions
• Let R be a good region
– p(x) = F(R) (=const) for almost all x  R
• Let x0  R be arbitrary
– We will prove: p(x0) = F(R)
• We will use:
– p is a low degree polynomial
– Regions defined by lower degree polynomials
– Induction on “bias implies low rank”
The derivatives identity
• p(x) of degree d
• Derivatives reduce degree:
– py(x) = p(x+y)-p(x) of degree d-1
– py_1,…,y_{d+1}(x)  0
• Thus we have the identity:
p ( x) 

S [ d 1]
S 0
(1)
|S |1
p( x   yi )
iS
Using the derivatives identity
p ( x) 

S [ d 1]
S 0
(1)|S |1 p( x   yi )
iS
•
Let R be a good region
•
Take arbitrary x0  R
•
Assume there are y1,…,yd+1 s.t. x0 + iS yi are in
the “good part” of R, for all non-empty S
•
Then p(x0 + iS yi) = F(R) for all non-empty S
•
Then also: p(x0) = F(R) !
Using the derivatives identity
p( x0 )  p( x0  y1  y2 )  p( x0  y1 )  p( x0  y2 )
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
x0
x0+y2
x0+y1
x0+y1+y2
Getting all the points in R
•
We need to show: y1,…,yd+1: x0 +iS yi R
–
–
•
In fact, we need them to be in the “good part” of R
We will handle this later
Solution:
–
–
choose y1,…,yd+1 randomly and uniformly
show the required condition occurs with positive probability
•
Each separate variable x0 +iS yi is uniform in F2n
•
Problem: handling the dependencies
Structure of regions
•
We need: x0+iS yi R
•
Recall: regions defined by polynomials f(1),f(2),…,f(k)
–
•
(for all non-empty S)
R = {x  F2n: f(1)(x)=c1, f(2)(x)=c2,…}
(c1,c2,…  F2)
So, we actually need:
{ f(j) (x0+iS yi)=cj }j=1..k, non-empty S  [d+1]
•
We need to find “randomness” conditions on f(j) s.t. all
the events are “almost independent”
–
Thus all occur simultaneously with positive probability
Randomness conditions (1st attempt)
•
We need: for any x0, if y1,…,yd+1 are uniform, then the
set of random variables
{f(j) (x0+iS yi) :
j=1..k, S  [d+1], S non-empty}
are almost independent
•
Actually, this can never be true 
•
Reason: f(j) are polynomial of degree ≤ d-1
–
–
•
d derivations zeros f(j)
Random variables are linearly dependent
These dependencies can be handled
Randomness conditions (2nd attempt)
•
We need: for any x0, if y1,…,yd+1 are uniform, then the
set of random variables
{f(j) (x0+iS yi) :
j=1..k, S  [d+1], S non-empty, |S| ≤ deg(f(j))}
are almost independent
•
It turns out its enough to prove this for random x
–
•
Proof: Cauchy-Schartz
Even for random x, this can still be false
–
Reason: non-linear dependencies
Non-linear dependencies
Example 1:
–
–
f(1) decomposes:
f(1) is biased
f(1)(x) = g(x) h(x)
 f(1)(x) is not uniform
Example 2:
–
–
A derivative of f(1) decomposes
f(1)y_1,y_2 (x) =
f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2) =
g(x,y1,y2)h(x,y1,y2)
 f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2) is biased
 {f(1)(x), f(1)(x+y1), f(1)(x+y2), f(1)(x+y1+y2)} is not uniform
Solving non-linear dependencies
Example 1: f(1)(x) is ε-far from uniform
• Bias(f(1)(x))  ε
• Degree of f(1) ≤ d-1
– We can use induction on “bias implies low rank”
• Decompose f(1) into a constant number of
polynomials
Solving non-linear dependencies
• f(1)(x) = G(g(1)(x),…,g(t)(x))
– deg(g(i)) ≤ deg(f(1)) – 1
• f(1),…,f(k) were used to approximate p(x)
– Pr[MAJ(f(1)(x),…,f(k)(x)) = p(x)]  1 – 2-O(d)
• Replace f(1) by g(1),…,g(t)
– Pr[ MAJ( G(g(1)(x),…,g(t)(x)) ,f(2)(x)…,f(k)(x)) = p(x)]  1 – 2-O(d)
– Replace by a single combiner function C1:
Pr[ C1( g(1)(x),…,g(t)(x) ,f(2)(x)…,f(k)(x) ) = p(x)]  1 – 2-O(d)
• Got a set of “smaller degree” polynomials
approximating p
Solving non-linear dependencies
Example 2:
f(1)(x) - f(1)(x+y1) - f(1)(x+y2) + f(1)(x+y1+y2)
is ε-far from uniform
• Bias(f(1)(x)+f(1)(x+y1)+f(1)(x+y2)+f(1)(x+y1+y2))  ε
• f(1)(x)-f(1)(x+y1)-f(1)(x+y2)+f(1)(x+y1+y2) is a polynomial of
degree ≤ d-1 (in variables x,y1,y2)
– Again, we can use induction
• Here we deviate from the original Green & Tao proof
– In large fields, it is enough to consider just Example 1
Solving non-linear dependencies
• General solution:
If {f(j) (x+iS yi)} is non-uniform
– Find a biased linear combination
– Decompose it
• In each step, we replace a polynomial with a constant
number of smaller degree polynomials
• We choose what is “non-uniform” adaptively:
– If we have T polynomials, we need bias ≤ 2-O(T) of all linear
combinations to be close to uniform
 Required bias is a function of T
• Still, the process stops after finitely many steps
– We end with a constant number of polynomials
Getting back to the big picture
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
•
The proof has two steps:
1. Good regions are excellent: p=F on all points in region
(i.e. p(x) is constant on good regions)
2. Assuming almost all regions are excellent,
we will prove all regions are excellent
(i.e. p(x) is constant on all regions)
Proof of step 1
• Let x0 be in a good region R
• Using the “randomness” of f(1),…,f(k)
– {x0+iS yi  R} are almost independent
– Joint event occurs with positive probability
• In fact, x0+iS yi are almost pairwise-independent,
even given that they all lie in R
• Since R is good, with positive probability, they all lie in
the “good part” of R
– Proof: union bound
Proof of step 2
• Assume almost all regions are excellent
– i.e. p is constant on these regions
• Let x0,x1 be in a (bad) region R
– We need to show: p(x0)=p(x1)
• Consider x0+iS yi, x1+iS yi
• Assume that for any non-empty S:
– Region(x0+iS yi) = Region(x1+iS yi)
– Assume also this is an excellent region
• Then p(x0+iS yi) = p(x1+iS yi) for all non-empty S
 p(x0)=p(x1)
• For random y1,…,yd+1, this happens with positive probability
Proof of step 2
Good areas: p(x)=F(x)
Bad areas: p(x)≠F(x)
x0+y1
x1+y1
x0+y1+y2
x1+y1+y2
x1
x0+y2
x1+y2
x0
Summary of results
Bias implies low rank
• If:
– p(x1,…,xn) degree d polynomial over F
– The distribution of p(x) is ε-far from uniform
• Then:
p(x)=C(f(1)(x),…,f(k)(x))
deg(f(i)) ≤ d-1, k=k(F,d,ε)
• In fact:
f(j) are derivatives of p
f(j)(x) = pa_j(x) = p(x+aj)-p(x)
Summary of results
Approximation and computation are equivalent
• If:
Pr[p(x) = C(g(1)(x),…,g(k)(x))]  1/F + ε
deg(g(i)) ≤ d-1
• Then:
p(x) = C’(f(1)(x),…,f(k’)(x))
deg(f(i)) ≤ d-1, k’=k’(F,d,ε,k)
• In fact:
f(j)(x)=pa_j(x) or
f(j)=g(j)(x+aj)
Open problems
• Give a good bound on the constants
– Even in the case of d<<|F|
• Given p, can we compute its rank?
• Assume p(x)=f1(x) f2(x)+ f3(x) f4(x)
– Can we find f1,…,f4 efficiently?
– Or is it NP-hard?
• Generalization to other “constant-depth” models
– E.g. constant-depth circuits

Download Report

d+1 - UCSD CSE

Paperzz.com

Your Paperzz