pps

Complexity Theory
Lecture 12
Lecturer: Moni Naor
Recap
Last week:
Hardness and Randomness
Semi-random sources
Extractors
This Week:
Finish Hardness and Randomness
Circuit Complexity
The class NC
Formulas = NC1
Lower bound for Andreev’s function
Communication characterization of depth
Derandomization
A major research question:
• How to make the construction of
– Small Sample space `resembling’ large one
– Hitting sets
Efficient.
Successful approach: randomness from hardness
– (Cryptographic) pseudo-random generators
– Complexity oriented pseudo-random generators
Extending the result
Theorem : if E contains 2Ω(n)-unapproximable functions then
BPP = P.
• The assumption is an average case one
• Based on non-uniformity
Improvement:
Theorem: If E contains functions that require size 2Ω(n) circuits
(for the worst case), then E contains 2Ω(n)-unapproximable
functions.
Corollary: If E requires exponential size circuits, then BPP = P.
How to extend the result
• Recall the worst-case to average case reduction
For permanent
• The idea: encode the function in a form allowing
you to translate a few worst case error into random
errors
Properties of a code
n
2
C:{0,1}
ℓ
2
{0,1}
Want a code

Where:
• 2ℓ is polynomial in 2n
• C is polynomial time computable
– efficient encoding
– Certain local decoding properties
Codes and Hardness
• Use for worst-case to average case:
truth table of f:{0,1}n  {0,1}
worst-case hard
mf:
0 1 1 0 0 0 1 0
truth table of f’:{0,1}ℓ  {0,1}
average-case hard
C(mf)
0 1 1 0 0 0 1 0 0 0 0 1 0
Codes and Hardness
• if 2ℓ is polynomial in 2n, then
f  E implies f’  E
• Want to be able to prove:
if f’ is s’-approximable, then f is computable by a
size s = poly(s’) circuit
Codes and Hardness
Key point: circuit C that approximates f’ implicitly
defines a received word RC that is not far from C(mf)
RC:
0 0 1 0 1 0 1 0 0 0 1 0 0
C(mf): 0 1 1 0 0 0 1 0 0 0 0 1 0
• Want the decoding procedure D to computes f
exactly
D
C
Requires a special notion
of efficient decoding
Decoding requirements
• Want that
– for any received word R that is not far from C(m),
– for any input bit 1 · i · 2n
can reconstruct m(i) with probability 2/3 by accessing only
poly(n) locations in R
Example of code with good local decoding properties: Hadamard
But exponential length
This gives a probabilistic circuit for f of size
poly(n) ¢ size(C) + size of decoding circuit
Since probabilistic circuit have deterministic version of similar size contradiction
Extractor
• Extractor: a universal procedure for “purifying” imperfect
source:
2k strings
source string
{0,1}n
seed
t bits
Ext
near-uniform
m bits
Truly random
– The function Ext(x,y) should be efficiently computable
– truly random seed as “catalyst”
– Parameters: (n, k, m, t, )
Extractor: Definition
(k, ε)-extractor: for all random variables X with min-entropy k:
– output fools all tests T:
|Prz[T(z) = 1] – Pry 2
[T(Ext(x,
t
R {0,1} , xX
y)) = 1]| ≤ ε
– distributions Ext(X, Ut) and Um are ε-close (L1 dist ≤ 2ε)
Um uniform distribution on :{0,1}m
• Comparison to Pseudo-Random Generators
– output of PRG fools all efficient tests
– output of extractor fools all tests
Extractors: Applications
• Using extractors
– use output in place of randomness in any application
– alters probability of any outcome by at most ε
• Main motivation:
– use output in place of randomness in algorithm
– how to get truly random seed?
– enumerate all seeds, take majority
Extractor as a Graph
Want every subset of
size 2k to see
almost all of the rhs
with equal probability
2t
Size: 2k
{0,1}m
{0,1}n
Extractors: desired parameters
2k strings
source string
seed
t bits
{0,1}n
• Goals:
short seed
long output
many k’s
good:
Allows going
over all seeds
O(log n)
m = kΩ(1)
k = nΩ(1)
Ext
near-uniform
m bits
optimal:
log n+O(1)
m = k+t–O(1)
any k = k(n)
Extractors
• A random construction for Ext achieves optimal!
– but we need explicit constructions
• Otherwise we cannot derandomize BPP
– optimal construction of extractors still open
• Trevisan Extractor:
– idea: any string defines a function
• String C over  of length ℓ define a function fC:{1… ℓ }   by fC(i)=C[i]
– Use NW generator with source string in place of hard function
From complexity to combinatorics!
Trevisan Extractor
• Tools:
– An error-correcting code
C:{0,1}n  {0,1}ℓ
• Distance between codewords: (½ - ¼m-4)ℓ
– Important: in any ball of radius ½- there are at most 1/2 codewords.
 = ½ m-2
• Blocklength ℓ = poly(n)
• Polynomial time encoding
– Decoding time does not matter
– An (a,h)-design S1,S2,…,Sm  {1…t } where
• h=log ℓ
• a = δlog n/3
• t=O(log ℓ)
• Construction:
Ext(x, y)=C(x)[y|S1]◦C(x)[y|S2]◦…◦C(x)[y|Sm]
Trevisan Extractor
Ext(x, y)=C(x)[y|S1]◦C(x)[y|S2]◦…◦C(x)[y|Sm]
C(x): 010100101111101010111001010
seed y
Theorem: Ext is an extractor for min-entropy k = nδ, with
– output length m = k1/3
– seed length t = O(log ℓ ) = O(log n)
– error ε ≤ 1/m
Proof of Trevisan Extractor
Assume X µ {0,1}n is a min-entropy k random
variable failing to ε-pass a statistical test T:
|Prz[T(z) = 1] - PrxX, y  {0,1}t[T(Ext(x, y)) = 1]| > ε
By applying usual hybrid argument:
there is a predictor A and 1 · i · m:
PrxX, y{0,1}t[A(Ext(x, y)1…i-1) = Ext(x, y)i] > ½+ε/m
The set for which A predict well
Consider the set B of x’s such that
Pry{0,1}t[A(Ext(x, y)1…i-1) = Ext(x, y)i] > ½+ε/2m
By averaging Prx[x 2 B] ¸ ε/2m
Since X has min-entropy k:
there are at least
ε/2m 2k
different x 2 B
The contradiction will be by showing a succinct encoding for
each x 2 B
…Proof of Trevisan Extractor
i, A and B are fixed
If you fix the bits outside of Si to  and  and let y’ vary over all
possible assignments to bits in Si. Then
Ext(x, y)i = Ext(x, y’)i = C(x)[y’|Si] = C(x)[y’]
goes over all the bits of C(x)
For every x 2 B short description of a string z close to C(x)
– fix bits outside of Si to  and  preserving the advantage
Pry’[P(Ext(x, y’)1…i-1)=C(x)[y’] ] > ½ + ε/(2m)
 and  is the assignment to {1…t}\Si maximizing the advantage of A
– for j ≠ i, as y’ varies, y’|Sj varies over only 2a values!
– Can provide (i-1) tables of 2a values to supply
Ext(x, y’)1…i-1
Trevisan Extractor
short description of a string z agreeing with C(x)
Output is C(x)[y’ ] w.p.
½ + ε/(2m) over Y’
Y’  {0,1}log ℓ
A
y’
…Proof of Trevisan Extractor
Up to (m-1) tables of size 2a describe a
string z that has a ½ + ε/(2m) agreement
with C(x)
• Number of codewords of C agreeing with z:
on ½ + ε/(2m) places is O(1/δ2)= O(m4)
Given z: there are at most O(m4) corresponding
x’s
• Number of strings z with such a
description:
a
δ2/3
• 2(m-1)2 = 2n
= 2k
• total number of x 2 B
O(m4)
2/3
k
2
2/3
<< 2k(ε/2m)
Johnson Bound:
A binary code with distance (½ - δ2)n
has at most O(1/δ2) codewords in any
ball of radius (½ - δ)n.
•C has minimum distance (½ - ¼m-4)ℓ
Conclusion
• Given a source of n random bits with min entropy k
which is n(1)
it is possible to run any BPP algorithm with and obtain the
correct answer with high probability
Application: strong error reduction
• L  BPP if there is a p.p.t. TM M:
x  L  Pry[M(x,y) accepts] ≥ 2/3
x  L  Pry[M(x,y) rejects] ≥ 2/3
• Want:
x  L  Pry[M(x,y) accepts] ≥ 1 - 2-k
x  L  Pry[M(x,y) rejects] ≥ 1 - 2-k
• Already know: if we repeat O(k) times and take majority
– Use n = O(k)·|y| random bits;
Of them 2n-k can be bad strings
Strong error reduction
Better: Ext extractor for k = |y|3 = nδ, ε < 1/6
– pick random w R {0,1}n
– run M(x, Ext(w, z)) for all z  {0,1}t
• take majority of answers
– call w “bad” if majzM(x, Ext(w, z)) is incorrect
|Prz[M(x,Ext(w,z))=b] - Pry[M(x,y)=b]| ≥ 1/6
– extractor property: at most 2k bad w
δ
n
– n random bits; 2 bad strings
Strong error reduction
Strings where the majority of
neighbors are bad
Property: every
subset of size 2k
sees almost all of
the rhs with equal
probability
2t
Bad strings for input
at most 1/4
Upper bound
on Size: 2k
All strings for running the
original randomized algorithm
{0,1}m
{0,1}n
Two Surveys on Extractors
• Nisan and Ta-Shma, Extracting Randomness: A Survey and
New Constructions 1999, (predates Trevisan)
• Shaltiel, Recent developments in Extractors, 2002,
www.wisdom.weizmann.ac.il/~ronens/papers/survey.ps
Some of the slides based on C. Umans course:
www.cs.caltech.edu/~umans/cs151-sp04/index.html
Circuit Complexity
• We will consider several issues regarding circuit
complexity
Parallelism
• Refinement of polynomial time via (uniform)
circuits allow
circuit
C
depth  parallel time
size  parallel work
depth of a circuit is the length of longest path from input to output
Represents circuit latency
Parallelism
• the NC Hierarchy (of logspace uniform circuits):
NCk = O(logk n) depth, poly(n) size circuits
Bounded fan-in (2)
NC = [k NCk
What is
NC0
• Aim: to capture efficiently parallelizable problems
• Not realistic?
– overly generous in size
– Does not capture all aspects of parallelism
– But does capture latency
• Sufficient for proving (presumed) lower bounds on best latency
Matrix Multiplication
nxn
matrix A
nxn
matrix B
nxn
= matrix AB
• Parallel complexity of this problem?
– work = poly(n)
– time = logk(n)?
• which k?
Matrix Multiplication
arithmetic matrix multiplication…
A = (ai, k) B = (bk, j) (AB)i,j = Σk (ai,k x bk, j)
… vs. Boolean matrix multiplication:
A = (ai, k) B = (bk, j) (AB)i,j = k (ai,k  bk, j)
– single output bit: to make matrix multiplication a
language: on input A, B, (i, j) output (AB)i,j
Matrix Multiplication
• Boolean Matrix Multiplication is in NC1
– level 1: compute n ANDS: ai,k  bk, j
– next log n levels: tree of ORS
– n2 subtrees for all pairs (i, j)
– select correct one and output
Boolean formulas and NC1
• Circuit for Boolean Matrix Multiplication is actually a
formula.
– Formula: fan-out 1. Circuit looks like a tree
This is no accident:
Theorem: L  NC1 iff decidable by polynomial-size
uniform family of Boolean formulas.
Boolean formulas and NC1
from small depth circuits to formulas
• Proof:
– convert NC1 circuit into formula
• recursively:



– note: logspace transformation
• stack depth log n, stack record 1 bit – “left” or “right”
Boolean formulas and NC1
from forumulas to small depth circuits
– convert formula of size n into formula of depth
O(log n)
• note: size ≤ 2depth, so new formula has poly(n) size


key transformation
C
D
C1
D

1

D
C0
0
Boolean formulas and NC1
– D any minimal subtree with size at least n/3
• implies size(D) ≤ 2n/3
– define T(n) = maximum depth required for any size n
formula
– C1, C0, D all size ≤ 2n/3
T(n) ≤ T(2n/3) + 3
implies T(n) ≤ O(log n)
Relation to other classes
• Clearly NC µ P
– P  uniform poly-size circuits
• NC1 µ Logspace
on input x, compose logspace algorithms for:
• generating C|x|
• converting to formula
• FVAL C|x|(x)
– FVAL is: given formula and assignment what is the value of the output
logspace composes!
Relation to other classes
• NL µ NC2:
Claim: Directed S-T-CONN  NC2
–
–
–
–
Given directed G = (V, E) vertices s, t
A = adjacency matrix (with self-loops)
(A2)i, j = 1 iff path of length at most 2 from node i to node j
(An)i, j = 1 iff path of length at most n from node i to node j
Boolean MM
– Compute with depth log n a tree of Boolean matrix
multiplications, output entry s, t
• Repeated squaring!
– log2 n depth total
NC vs. P
Can every efficient algorithm be efficiently
?
parallelized?
NC = P
• Common belief: NC ( P
P-Completeness
A language L is P-Complete if:
• L2P
• Any other language in P is reducible to L via a Logspace
reduction
P-complete problems are the least-likely to be parallelizable
if a P-complete problem is in NC, then P = NC
– we use logspace reductions to show problem P-complete and we
have seen Logspace in NC
Some P-Complete Problems
• CVAL – Circuit value problem
– Given a circuit and an assignment, what is the value of
the output of circuit
– Canonical P-Complete problem
• Lexicographically first maximal independent set
• Linear Programming
• Finding a happy coloring of a graph
NC vs. P
Can every uniform, poly-size Boolean circuit family be
converted into a uniform, poly-size Boolean formula
family?
?
NC1 = P
Is the NC hierarchy proper:
is it true that for all NCi ( NCi+1
Define ACk = O(logk n) depth, poly(n) size circuits with
unbounded fan-in  and  gates
Is the following true:
ACi ( NCi+1 ( ACi+1
Lower bounds
• Recall:
NP does not have polynomial-size circuits (NP  P/poly)
implies P ≠ NP
• Major goal: prove lower bounds on (non-uniform)
circuit size for problems in NP
– Belief: exponential lower bound
• super-polynomial lower bound enough for P ≠ NP
– Best bound known: 4.5n
• don’t even have super-polynomial bounds for problems in
NEXP!
Lower bounds
• lots of work on lower bounds for restricted classes
of circuits
• Formulas
– Outdegree of each gate is 1
• Monotone circuits
– No nots (even at the input level)
• Constant Depth circuits
– Polynomial size but unbounded fan-in
Counting argument for formulas
• frustrating fact: almost all functions require huge
formulas
Theorem [Shannon]: With probability at least 1 – o(1),
a random function
f:{0,1}n  {0,1}
requires a formula of size Ω(2n/log n).
Shannon’s counting argument
• Proof (counting):
n
2
2 =
– B(n) =
# functions f:{0,1}n  {0,1}
– # formulas with n inputs + size s, is at most
F(n, s) ≤ 4s2s(2n)s
n+2 choices
per leaf
4s binary trees with s
internal nodes
2 gate choices per
internal node
Shannon’s counting argument
c2n/log
n/log n)
(c2
(16n)
– F(n,
n) <
n/log n) (c2n)
n)
(c2
(c2
< 16
2
= (1 + o(1))2
n
2
< o(1)2
(if c ≤ ½)
Probability a random function has a formula of
size s = (½)2n/log n is at most
F(n, s)/B(n) < o(1)
Andreev’s function
• best lower bound for formulas:
Theorem (Andreev, Hastad ‘93): the Andreev function
requires (,,)-formulas of size at
Ω(n3-o(1)).
Andreev’s function
yi
selector
XOR
...
n-bit string y
XOR
log n copies;
n bits total x
n/log n bits each
The Andreev function A(x,y)
A:{0,1}2n  {0,1}
Andreev’s function
Theorem: the Andreev function requires (,,)-formulas of
size at
Ω(n3-o(1)).
First show Ω(n2-o(1)).
Two important ideas:
• Random restrictions
• Using the existential counting lower bounds on a smaller
domain
General Strategy
Restrict the function and show:
• This must simplify the formula (a lot)
But
• The remaining function is still quite complex, so needs a relatively
large formula
Conclude: we must have started with a large formula
Definition: L(f) = smallest (,,) formula computing f
– measured as leaf-size
– Directly related to formula size
Random restrictions
key idea: given function
f:{0,1}n  {0,1}
restrict by ρ to get fρ
– ρ sets some variables to 0/1, others remain free
• R(n, m) = set of restrictions that leave m variables
free
Random restrictions
Claim: Let m = n. Then
EρR(n, n)[L(fρ)] ≤  L(f)
– each leaf survives with probability 
• may shrink even more…
– propagate constants
What happens to the Xor of a subset under random
restriction:
– if at least one member of the Xor survives, the Xor is not fixed
and can get obtain both values
shrinkage result
From the counting argument: there exists a function h
h:{0,1}log n {0,1}
for which L(h) > n/2loglog n.
– hardwire truth table of that function into y to get function A*(x)
– apply random restriction from set
R(n, m = 2(log n)(ln log n))
to A*(x).
selector
XOR
...
n-bit string y
XOR
The lower bound
– probability a particular XOR is killed by restriction:
• probability that we don’t “miss it” m times:
(1 – (n/log n)/n)m ≤ (1 – 1/log n)m
≤ (1/e)2ln log n ≤ 1/log2n
– probability even one of the log n XORs is killed by the
restriction is at most:
log n(1/log2n) = 1/log n < ½.
The lower bound
–
–
probability even one of XORs is killed by restriction is
at most:
log n(1/log2n) = 1/log n < ½.
by Markov’s inequality:
Pr[ L(A*ρ) > 2 EρR(n, m)[L(A*ρ)] ] < ½.
Conclusion: for some restriction ρ’ both events happen:
•
•
all XORs survive and
L(A*ρ’) ≤ 2 EρR(n, m)[L(A*ρ)]
The lower bound
– if all XORs survive, can restrict formula further to
compute the hard function h
• may need additional ’s (free)
L(h) = n/2loglogn ≤ L(A*ρ’)
≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)L(A*))
≤ O( ((log n)(ln log n)/n)1 L(A*) )
– Conclude: Ω(n2-o(1)) ≤ L(A*) ≤ L(A).
Shrinkage factor
Random restrictions and shrinkage factors
• Recall:
EρR(n, єn)[L(fρ)] ≤  L(f)
– each leaf survives with probability 
But may shrink even more by propagating constants
Lemma [Hastad 93]: for all f
EρR(n, єn)[L(fρ)] ≤ O(2-o(1)L(f))
The lower bound with new shrinkage factor
– if all XORs survive, can restrict formula further to
compute hard function h
• may need to add ’s
L(h) = n/2loglogn ≤ L(A*ρ)
≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)2-o(1)L(A*))
≤ O( ((log n)(ln log n)/n)2-o(1) L(A*) )
– Conclude: Ω(n3-o(1)) ≤ L(A*) ≤ L(A).
Shrinkage factor
What can be done in NC1
• Addition of two number each n bits
– In fact can be done in AC0
• Adding n bits
– Can compute majority or threshold
– Something that cannot be done in AC0
• Multiplication
– Adding n numbers
• Division
Two different characterization of NC1
• Through communication complexity
• Through branching programs
More on Depth: a communication
complexity characterization of depth
• For Boolean function f:{0,1}n  {0,1} let
– X=f-1(1)
– Y=f-1(0)
Consider relation Rf µ X £ Y £ {1,…,n} where
(x,y,i) are such that xi ≠ yi
• For Monotone Boolean functions define
Mf µ X £ Y £ {1,…,n}
where
(x,y,i) are such that xi = 1 and yi = 0
A communication complexity
characterization of depth
• What is the communication complexity of Rf
assuming
D(Rf)
– Alice has x 2 X=f-1(1)
– Bob has y 2 Y=f-1(0)
Lemma: Let C be a circuit for f. Then
D(Rf) · depth(C)
Lemma: Let C be a monotone circuit for f. Then
D(Mf) · depth(C)
From circuits to protocols
both monotone and non-monotone case
• For each  gate Alice says which of the two inputs wires to
the gate is ‘1’ under x
– If both are ‘1’ picks one
– This wire must be ‘0’ under y
• For each  gate Bob says which of the two inputs wires to
the gate is ‘0’ under y
– If both are ‘0’ picks one
– This wire must be ‘1’ under x
At the leaves, find an i such that xi ≠ yi
if the circuit is monotone, then we know that xi = 1 and yi = 0
Property maintained for the subformula considered: Alice
assignment yields ‘1’ and Bob’s assignment yields ‘0’
From protocols to circuits
both monotone and non-monotone case
Lemma: Let P be a protocol for Rf. Then there is a
Label:
formula of depth C(P)
• Alice’s move with 
• Bob’s move with 
•Leaf with rectangle A £ B and
output i with either zi or  zi
z0
z1 z2
z3 z4
zz55
z6
z7 ...
A communication complexity
characterization of depth
Theorem: D(Rf)=depth(f)
Theorem: for any monotone function
D(Mf)=depthmonotone(f)
Applications:
• depthmonotone(STCON)=(log2 n)
• depthmonotone(matching)= (n)
Example: Majority
• Input to Alice x1, x2, …, xn such that the majority
are 1
• Input to Bob y1, y2, …, yn such that the majority
are 0
Partition the input into x1, x2, …, xn/2 input into and
xn/2+1, xn/2+2, …, xn and report the result on each
half

Download Report

pps

Paperzz.com

Your Paperzz