School of Computer Science, Tel

TEL-AVIV UNIVERSITY
RAYMOND AND BEVERLY SACKLER
FACULTY OF EXACT SCIENCES
BLAVATNIK SCHOOL OF COMPUTER SCIENCE
Boolean functions whose Fourier
transform is concentrated on pair-wise
disjoint subsets of the inputs
Thesis submitted in partial fulfillment of the requirements for the M.Sc.
degree in the School of Computer Science, Tel-Aviv University
by
Aviad Rubinstein
The research for this thesis has been carried out at Tel-Aviv University
under the supervision of Prof. Muli Safra
October 2012
Acknowledgements
I would like to thank:
Muli for teaching me so much about math and life and for giving more than I could have
asked for;
Jarett Schwartz for comments on an earlier version of this thesis;
Shai Vardi for help with editing;
and my family and friends for everything else.
Abstract
We consider Boolean functions f : {±1}m → {±1} that are close to a sum of independent
functions {fj } on mutually exclusive subsets of the variables {Ij } ⊆ P ([m]). We prove that
any such function is close to just a single function fk on a single subset Ik .
We also consider Boolean functions f : Rn → {±1} that are close, with respect to any
product distribution over Rn , to a sum of their variables. We prove that any such function is
close to one of the variables.
Both our results are independent of the number of variables, but depend on the variance of
f . I.e., if f is ( · Varf )-close to a sum of independent functions or random variables, then it
is O ()-close to one of the independent functions or random variables, respectively. We prove
that this dependence on Varf is tight.
Our results are a generalization of [16], who proved a similar statement for functions
f : {±1}n → {±1} that are close to a linear combination of uniformly distributed Boolean
variables.
Contents
1
Introduction
1.1 The long code and related works . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Preliminaries
9
n
2.1 Fourier transform over {±1} . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 L2 -squared semi-metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3
Related Works
13
3.1 Related Works on Low Influence Functions . . . . . . . . . . . . . . . . . . . 13
3.2 Related Works on Almost Linear Functions . . . . . . . . . . . . . . . . . . . 14
4
Our Main Results
17
4.1 From our results to the FKN Theorem . . . . . . . . . . . . . . . . . . . . . . 19
5
High-Level Outline of the Proof
20
6
Proofs
6.1 From variance of absolute value to variance of absolute value of sum: proof of
lemma 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 From variance of absolute value of sum to variance: proof of lemma 12 . . .
6.3 Proof of the main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Proof of the extension to FKN Theorem . . . . . . . . . . . . . . . . . . . .
22
.
.
.
.
3
4
6
7
22
23
26
26
7
Proofs of technical claims
28
7.1 Convexity of the variance function: proof of claim 16 . . . . . . . . . . . . . . 28
7.2 Expected squared distance: proof of claim 17 . . . . . . . . . . . . . . . . . . 29
7.3 Constant absolute value: proof of claim 18 . . . . . . . . . . . . . . . . . . . . 31
8
Tightness of results
34
8.1 Tightness of the main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
8.2 Tightness of lemma 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1
9
Discussion
36
9.1 Hypercontractivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.2.1 Interesting Conjectures . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2
Chapter 1
Introduction
We consider n-variate Boolean functions f : {±1}n → {±1}. The study of Boolean functions
has prospered in recent decades thanks to its importance in many areas of mathematics and
theoretical computer science.
For example, Boolean functions can be used to model voting mechanisms. Consider a
population of n voters, where each can vote in favor (+1) or against (−1) some action, party,
candidate, etc.. A voting mechanism f takes these voters’ preferences (a binary string of length
n) and aggregates them into a single communal decision. Examples of interesting voting mechanisms include:
• dictatorship - the outcome is determined by one voter’s vote (corresponding to f (X) =
xi );
• majority - the outcome is decided
P in accordance with the majority of the population
(corresponding to f (X) = sign ( xi ));
• and tribes [5] - the population is divided into smaller “tribes”; the decision is accepted if
at least one tribe is unanimously in favor of accepting.
How do we compare mechanisms? Which of these mechanism is the best? Can we come up
with a better mechanism? In attempts to improve our understanding of these questions, several
properties of voting mechanisms have been studied, such as:
• balance - we say that a mechanism is balanced if, given that the votes are chosen uniformly at random, each outcome (+1 or −1) is equally likely;
• stability - we say that a mechanism is stable if independent random changes in voters’
votes are unlikely to change the outcome of the vote;
• the influence of the ith voter is the likelihood that a change in her vote will change the
final outcome of the mechanism;
• and junta - we say that a mechanism is a junta if its outcome depends on only a small
number of votes.
Using deep insights from Fourier analysis, interesting conclusions can be drawn about these
voting mechanisms. For example, it can be mathematically proven that dictatorship is the balanced voting mechanism that maximizes stability. On the other hand, tribes is the balanced
3
function that minimizes the maximal individual influence [22]. Finally, majority achieves both
properties, as it maximizes stability amongst all balanced functions with low maximal individual influence [32].
Results on Boolean functions also find many applications in learning theory, where one
often wishes to learn the behavior of an unknown Boolean function. A practical example may
be training a Boolean classifier that decides whether a given n-bit string (e-mail) should be
classified as positive (an important email) or negative (spam).
Boolean functions come up again in random graph theory: The adjacency matrix of the
graph can be viewed as a |V |2 -bit string, and one can ask a variety of Boolean-valued questions about the graph’s properties, such as is the graph connected? is it k-colorable? is it
λ-expanding? etc.
1.1
The long code and related works
One of the most important historical driving forces in the study of Boolean functions has been
their applications to testing of error correcting codes [7]. In particular, the long code [4] can be
viewed as evaluations of dictatorships functions. Each codeword in the long code corresponds
to the evaluation of a dictatorship f (X) = xi on all the points on the n-dimensional Boolean
hypercube. Indeed, the the long code is highly inefficient - since there are only n possible
dictatorships, the long code encodes log n bits of information in a 2n -bit codeword. Despite its
low rate, the long code is an important tool in many results on hardness of approximation and
probabilistically checkable proofs (such as [4, 19, 18, 27, 11, 26, 9, 8, 25, 2]).
The great virtue of the long code is that it is a locally testable code: It is possible to distinguish, with high probability, between a legal codeword and a string that is far from any legal
codeword, by querying just a few random bits of the string. Naturally, this is a highly desirable
property when constructing probabilistically checkable proofs, which are proofs that must be
verified by reading a few random bits of the proof.
To see that the long code is indeed locally testable, recall that every word is encoded by
the evaluations of a dictatorship. A naive approach to testing dictatorship would first find the
dictator variable and then go through all the other variables to make sure that the function does
not depend on them. Each query to a Boolean function can give at most one bit of information
about the function. Since there are n candidate variables, finding the dictator variable requires
at least log n queries. Clearly, this naive strategy could not be implemented using just a few
queries.
A function is called anti-symmetric (sometimes also “odd” or “folded”) if it satisfies f (X) =
−f (−X) for every input X. For example, the dictatorship f (X) = xi is anti-symmetric.
Observe that any anti-symmetric function is in particular balanced: For uniformly distributed
Boolean random inputs, X and −X are equally likely, so the probability that f (X) = 1 is the
same as the probability that f (−X) = 1, which by anti-symmetry is equal to the probability
that f (X) = −1.
Instead of using naive testing for dictatorship, recall that dictatorship is the stablest balanced
Boolean function; thus in particular it is the stablest anti-symmetric Boolean function. Stability
and anti-symmetry are local properties that one can test by querying the function at only two
points. A simple test for anti-symmetry and stability takes a random point X and a close
point Y , and verifies that f (X) = −f (−Y ). If f is stable, then f (X) = f (Y ) with high
4
probability; if f is anti-symmetric then f (Y ) = −f (−Y ) for all Y . Thus a stable antisymmetric function would pass the test with high probability, whereas any function that is far
from stable and anti-symmetric would not.
Since our test only probabilistically approximates the stability, we cannot guarantee that
the function tested is an exact dictatorship. However, a theorem of Bourgain says that it is
sufficient to know that an anti-symmetric function has a relatively high stability to conclude
that it depends almost completely on a small number of variables.
Theorem. (Bourgain’s Junta Theorem, Informal [7]) Every balanced Boolean function with
high stability is close to a junta.
In many applications of the long code, knowing that the codeword is close to a junta suffices,
because it allows us to list-decode the codeword. In general, list-decoding means computing a
short list of candidate codewords which are close to a given string; in the context of the long
code, list-decoding means producing a short list of candidate variables with high influence that
may be close to dictators.
Another property, closely related to stability, is the total influence of a function, i.e. the
sum of the influences of its variables. Intuitively the total influence answers the question “on
average, how many different bits / voters can I change, such that each change (separately)
would cause a change in the output of the function / mechanism?” Hence the total influence is
sometimes also called average sensitivity.
Unlike stability, the total influence is not a local property of a function in the same sense
of property testing. In fact even for monotone functions, approximating the total influence
requires more than a few queries [34]. Rather the total influence is local in the sense that it can
be forced to be low using local constraints (e.g. see the construction in [11]).
Among balanced functions, dictatorships minimize the total influence. For some applications of the long code (e.g. [11, 8, 25, 2]) the total influence rather than the stability is bounded
in order to show that a function is close to dictatorship. The following theorem by Friedgut can
be used:
Theorem. (Friedgut’s Lemma, Informal [13]) Every balanced Boolean function with low total
influence is close to a junta.
Every function over the n-dimensional Boolean hypercube can also be represented as a
polynomial in the variables {x1 , x2 , . . . , xn }. The dictator function, f (X) = xi is the only
balanced Boolean function that is linear1 . Using local queries, it is also possible to estimate
whether a Boolean function is close to a linear one. These properties also can be used by
long-code testers [9] together with the following theorem by Friedgut, Kalai, and Naor:
Theorem. (FKN Theorem, Informal [16]) Every balanced Boolean function that is almost
linear is almost a dictatorship.
Intuitively, one may expect such results to be true, because a linear combination that is
“well-spread” among many independent variables (i.e. far from dictatorship of one variable)
should be distributed similarly to a “bell-curved” Gaussian; in particular, it should be far from
1
Throughout the paper we allow “linear” functions to have a nonzero constant term. In other contexts these
functions are called “affine” to distinguish from functions that do not have a constant term..
5
the ±1 distribution of a Boolean function which is bimodal, i.e. has two distinct modes or
“peaks” at −1 and +1.
Often, it is easy to show that a function behaves “nicely” in a local sense. The three theorems mentioned above, as well as many others in the field, share the common intuitive theme,
which is proving that if a Boolean function satisfies a local property (high stability, low total
influence, close to linear, etc.), then it must be simple in a global sense. This ability to deduce
from the local behaviour of a function to its global properties is useful in many applications in
social choice theory, learning theory, graph theory, complexity theory, and more.
1.2
Our results
In this work extend the intuition from the FKN Theorem, that a well-spread sum of independent
variables must be far from Boolean. In particular we ask the following questions:
1. What happens when the variables are not uniformly distributed over {±1}? In particular,
we consider variables which are not even Boolean or symmetric.
In a social choice setting, it may be intuitive to consider a mechanism that takes into
account how strong is each voter’s preference. For example, in some countries the elections are known to be highly influenced by the donations the candidates manage to collect
(“argentocracy”).
In the context of computational complexity, Boolean analysis theorems that consider
non-uniform distributions have proven very useful. In particular, Dinur and Safra use the
p-biased long code in their proof of NP-hardness of approximation of the Vertex Cover
problem [11]. In the p-biased long code each codeword corresponds to a dictatorship, in
a population where each voter independently chooses −1 with probability 0 < p < 21
and +1 with probability 1 − p. Friedgut’s extension of his Lemma to such non-uniform
product distributions was key to Dinur and Safra’s proof.
In this work we prove that even when the variables are not uniformly distributed over
{±1}, every Boolean function that is close to their sum must be close to one of them:
Theorem. (Theorem 9 for balanced functions, Informal) Every balanced Boolean function that is almost a linear combination of independent variables (not necessarily Boolean
or symmetric) is almost a dictatorship.
2. What happens when rather than a sum of variables, we have a sum of functions over
independent subsets of Boolean variables?
In a social choice setting, it may be intuitive to consider a situation where the population
is divided into tribes; each tribe has an arbitrarily complex internal mechanism, but the
outcomes of all the tribes must be aggregated into one communal decision by a simple
(i.e. almost linear) mechanism.
Our main hope is that this setting will also find interesting applications in computational
theory settings where such special structures arise.
Observe that this question is tightly related to the previous question because each arbitrary function over a subset of Boolean variables can be viewed as an arbitrarilydistributed random variable.
6
In this work we prove that any balanced function that is close to a sum of functions over
independent subsets of its variables is almost completely determined by a function on a
single subset:
Theorem. (Corollary 10 for balanced functions, Informal) Every balanced Boolean function that is close to a sum of functions on mutually exclusive subsets of the population is
close to a dictatorship by one subset of the population.
As we will see later, the precise statement of the FKN Theorem does not require the Boolean
function to be balanced. If we do not require the function to be linear, there is an obvious
exception to the theorem - constant functions, f (X) = 1 and f (X) = −1, are not dictatorships but are considered linear. The general statement of the FKN Theorem says that a Boolean
function that is almost linear is either almost a dictatorship or almost a constant function. More
precisely, it says that the distance2 of any Boolean function from the nearest linear (not necessarily Boolean) function is smaller by at most a constant multiplicative factor than the distance
from either a dictatorship or a constant function.
One may hope to extend this relaxation to non-Boolean random variables or subsets of
Boolean random variables. E.g. we would like to claim that the distance of any Boolean
function from a sum of functions on mutually exclusive subsets of the variables is at most the
distance from a function on a single subset or a constant function. However, it turns out that
this is not the case - in Lemma 11 we show that this naive extension of the FKN Theorem is
false!
The variance of a Boolean function measures how far it is from a constant (either −1 or
1). For example, the variance of any balanced Boolean function is 1, whereas any constant
function has a variance of 0. In order to extend our results to non-balanced Boolean functions,
we have to correct for the low variance. In Theorem 9 and Corollary 10 we prove that the above
two theorems extend to non-balanced functions relatively to the variance:
Theorem. (Theorem 9, Informal) Every Boolean function that is ( × variance)-close to a sum
of independent variables (not necessarily Boolean or symmetric) is -close to a dictatorship.
Theorem. (Corollary 10, Informal) Every balanced Boolean function that is ( × variance)close to a sum of functions on mutually exclusive subsets of the population is -close to a
dictatorship by one subset of the population.
Intuitively these amendments to our main theorems mean that in order to prove that a
Boolean function is close to a dictatorship, we must show that it is very close to linear.
Finally, in Lemma 11 we show that this dependence on the variance is necessary and tight.
1.3
Organization
We begin with some preliminaries in section 2. In section 3 we give a brief survey of related
works. In section 4 we formally state our results. In section 5 we give an intuitive sketch of
the proof strategy. The interesting ingredients of the proof appear in section 6, whereas some
of the more tedious case analyses are postponed to section 7. Tightness for some of the results
2
For the ease of introduction, we use the word “distance” in an intuitive manner throughout this section.
However, formally we will use the squared-L2 semidistance. See Section 2.2 for more details.
7
is shown in section 8. Finally, in section 9 we make some concluding comments and discuss
possible extensions.
8
Chapter 2
Preliminaries
2.1
Fourier transform over {±1}n
Every function over the n-dimensional Boolean hypercube can also be represented as a polynomial in the n-tuple of variables X = (x1 , x2 , . . . , xn ). Since each variable can take only two
possible values, it suffices to consider only the zeroth and first powers of each variable. In park
ticular, notice that for xi ∈ {±1} we have that for any even power x2k
i = 1 = 1, while for any
n
2k+1
odd power x
= xi . Thus, we can represent any function f : {±1} → R as a multilinear
polynomial {x1 ,Q
x2 , . . . , xn }, i.e. a linear combination of monomials, called characters, of the
form χS (X) = i∈S xi for every S subset of [n] = {1, 2, . . . , n}. We denote the coefficient of
the χS by fb(S); this is called the Fourier coefficient of the subset S. Thus we have the Fourier
representation of f :
Y
X
X
xi
fb(S)
fb(S) χS (X) =
f (X) =
S⊆[n]
S⊆[n]
i∈S
The main tool in analysis of Boolean functions is the study of their Fourier coefficients. All
the terms mentioned in the introduction with respect to voting mechanisms can be redefined in
terms of the Fourier coefficients; for example, a function is balanced iff its empty character has
a zero coefficient (fb(φ) = 0).
The Fourier representation is particularly useful because the Fourier characters form an
orthonormal basis for the real- (or complex-) valued functions over {±1}n :
(
1 S=T
hχS , χT i = EX χS (X) χT ¯(X) = EX [χS4T (X)] =
0 S 6= T
The Fourier coefficients are given by the projection of the function to this basis:
fb(S) = EX∈{±1}n [f (X) · χS (X)]
Note that by Parseval’s Identity, for every Boolean function, the squares of the Fourier
coefficients sum to one.
Fact 1. For f : {±1}n → {±1},
X
fb(S)2 = 1
S⊆[n]
9
This fact is particularly useful because it means that the Fourier weights,
n
o
fb(S)
S⊆[n]
define a probability distribution over P ([n]).
2.2
L2-squared semi-metric
Throughout the paper, we define “closeness” of random variables using the squared L2 -norm:
kX − Y k22 = E (X − Y )2
It is important to note that this is a semi-metric as it does not satisfy the triangle inequality.
Instead, we will use the 2-relaxed triangle inequality:
Fact 2.
kX − Y k22 + kY − Zk22 ≥
1
kX − Zk22
2
Proof.
kX − Y k22 + kY − Zk22 ≥
1
1
(kX − Y k2 + kY − Zk2 )2 ≥ kX − Zk22
2
2
Although it is not a metric, the squared L2 -norm has some advantages when analyzing
Boolean functions. In particular, when comparing two Boolean functions, the squared L2 norm does satisfy the triangle inequality because it is simply four times the Hamming distance,
and also twice the L1 -norm (“Manhattan distance”): kf − gk22 = 4 · Pr [f 6= g] = 2 kf − gk1 .
Additionally, the squared L2 -norm behaves “nicely” with respect to the Fourier transform:
Fact 3.
kf −
gk22
=
X
2
b
f (S) − gb (S)
Proof.
kf − gk22 =
2.3
X
2 X 2
f[
− g (S) =
fb(S) − gb (S)
Variance
The variance of random variable X is defined as
VarX = E X 2 − (EX)2
Observe that for a function f the variance can also be defined in terms of its Fourier coefficients,
Fact 4.
Varf =
X
S6=φ
10
fb(S)2
Proof.
Varf = E f 2 − (Ef )2

!2 
X
= E
fb(S) χS  −
!2
E
X
S
fb(S) χS
S
!2
!
=
X
fb(S) fb(T ) EχS χT
−
S,T
X
fb(S) EχS
S
!
=
X
fb(S)2
2
b
− f (φ)
S
Another useful way to define the variance is the expected squared distance between two
random evaluations:
Fact 5. For any random variable X,
VarX =
1
· Ex1 ,x2 ∼X×X (x1 − x2 )2
2
Proof.
Ex1 ,x2 ∼X×X (x1 − x2 )2 = Ex21 + Ex22 − 2Ex1 x2 = 2 EX 2 − (EX)2 = 2VarX
We can also view the variance as the L2 -squared semidistance from the expectation
Fact 6.
VarX = kX − EXk22
Proof.
E X 2 − (EX)2 = E X 2 − 2E [XEX] + E (EX)2 = E (X − EX)2
Recall also that the expectation EX minimizes this semi-distance kX − EXk22 :
Fact 7.
VarX = min kX − Ek22
E∈R
Proof. Differentiate twice with respect to E:
d
kX − Ek22 = 2 (EX − E)
dE
d2
kX − Ek22 = −2
2
dE
11
Finally, using the 2-relaxed triangle inequality (fact 2), we can bound the difference in
variance in terms of the L2 -squared semimetric:
Fact 8.
1
Varf ≥ Varg − kf − gk22
2
Proof.
Varf + kf − gk22 = kf − Ef k22 + kf − gk22 ≥
12
1
1
kg − Ef k22 ≥ kg − Egk22
2
2
Chapter 3
Related Works
3.1
Related Works on Low Influence Functions
The influence of a variable xi on a function is defined as the expected variance of f restricted
to a fixed assignment on all other variables:
Infi [f ] = Ex[n]\{i} Varxi f (X) | x[n]\{i}
In other words, one may think of the influence of the ith coordinate on f as the likelihood of a
flip in the assignment to xi to cause a flip the value of f .
The total influence of the function f (sometimes also called average sensitivity) is the sum
of the individual influences of all its variables:
Inf [f ] =
n
X
Infi [f ]
i=1
When f is a Boolean function, the squared coefficients of its Fourier transform sum to 1, and
thus define a distribution. An alternative and equivalent definition of the total influence of a
Boolean f is the expected degree of a character of f , with respect to the distribution defined by
the squared coefficients:
Inf [f ] = ES∼fb(S)2 |S|
Intuitively, a Boolean function with a low total influence has a low degree “on average”.
A sequence of works shows the intuitive statement that functions with low total influence are
(almost) only dependent on a small number of coordinates. A function that depends on only k
variables is called a k-junta.
The seminal theorem of Friedgut [13] states that a Boolean function with total influence k
must be -close to a eO(k/) -junta, i.e. a function that depends on eO(k/) variables:
Theorem. (Friedgut’s Lemma [13]) For any Boolean function f : {±1}n → {±1}, any integer
k > 0, and real number > 0, if Inf [f ] ≤ k, then f is -close to a eO(k/) -junta
Bourgain
has
a strong variant of this theorem where the premise assumes that the func [7]
1
tion is O k − 2 −γ -close to k-degree function:
13
Theorem. (Bourgain’s Junta Theorem [7]) For any Boolean function f : {±1}n → {±1}, any
integer k > 0, and real numbers γ, > 0, there exists a constant cγ, s.t. if
X
1
fb(S)2 < cγ, k − 2 −γ
|S|<k
( )
O k2
then f is -close to a
e
-junta.
(The exact parameters of Bourgain’s Junta Theorem have been slightly improved in the
works of Khot and Naor, Hatami, and Kindler and O’Donnell [24, 20, 28].)
The theorems of Friedgut and Bourgain have many applications in the last decade, including
NP-hardness [11] and UG-hardness [26, 8, 25, 2] of approximation, embeddability theory [26,
30], and learning theory [33].
There are also several results proving different variants of these statements. Kindler and
Safra, and later also Hatami [29, 27, 20], generalize Bourgain’s theorem to general p-biased
product distributions, with somewhat weaker parameters. Dinur, Kindler, Friedgut, and O’Donnell
[10] prove a statement similar to Bourgain’s theorem for functions with a continuous bounded
range [−1, 1]. Sachdeva and Tulsiani [35] prove a generalization of Friedgut’s theorem, where
the domain is the product of an arbitrary graph Gn , rather than the Boolean hypercube {±1}n .
In a recent result, Kindler and O’Donnell [28] prove a variant of Bourgain’s theorem for functions of variables sampled from a Gaussian distribution; they also provide a new proof of Bourgain’s theorem using “elementary” methods.
3.2
Related Works on Almost Linear Functions
A particularly interesting case of low degree functions are those that are close to linear. In their
seminal paper [16], Friedgut, Kalai, and Naor prove that if a Boolean function is -close to
linear, then it must be (K · )-close to a dictatorship or a constant function.
Theorem. (FKN Theorem [16]) Let f : {±1}n → {±1} be a Boolean function, and suppose
that f ’s Fourier transform is concentrated on the first two levels:
X
fˆ2 (S) ≥ 1 − |S|≤1
Then for some universal constant K:
1. either f is (K · )-close to a constant function; i.e. for some σ ∈ {±1}
kf − σk22 ≤ K · 2. or f is (K · )-close to a dictatorship; i.e. there exists k ∈ [n] and σ ∈ {±1} s.t. f :
kf − σ · xk k22 ≤ K · 14
While the original motivation for proving the FKN Theorem was applications in social
choice theory [23], it has since found important applications in other fields, such as in Dinur’s
combinatorial proof of the PCP theorem [9].
There are also many works on generalizations on the FKN Theorem. Alon et al. [1] and
Ghandehari and Hatami [17] prove generalizations for functions with domain Znr for r ≥ 2.
Friedgut [14] proves a similar theorem that also holds for Boolean functions of higher degrees
and over non-uniform distributions; however, this theorem requires bounds on the expectation of the Boolean function. In [31], Montanaro and Osborne prove quantum variants of the
FKN Theorem for any “quantum Boolean functions”, i.e. any unitary operator f s.t. f 2 is the
identity operator. Falik and Friedgut [12] prove a representation-theory version of the FKN
Theorem, for functions which are close to a linear combination of an irreducible representation
of elements of the symmetric group.
The FKN Theorem is an easy corollary once the following proposition is proven: If the
absolute value of the linear combination of Boolean variables has a small variance, then it must
be concentrated on a single variable. Formally,
Proposition. (FKN P
Proposition [16]) Let (Xi )ni=1 be a sequence of independent variables with
supports {±ai } s.t.
a2i = 1. For some universal constant K, if
X Var Xi ≤ i
then for some k ∈ [n]
ak > 1 − K · Intuitively, this proposition says that if the variance was spread among many of the variables, i.e. the “weights” (ai ) were somewhat evenly distributed, then one would expect the
sum of such independent variables to be closer to a Gaussian rather than a bimodal distribution
around ±1.
This proposition has been generalized in several ways in a sequence of fairly recent works
by Wojtaszczyk and Jendrej et al. ([36], [21]), which of our particular interest to us. Jendrej et
al. prove extensions of the FKN Proposition to the following cases:
1. The case where Xi ’s are independent symmetric
Theorem. ([21]) Let (Xi )ni=1 be a sequence of independent symmetric variables. Then
there exists an universal constant K, s.t. for some k ∈ [n]
X
Var P Xi
i6=k
inf Var Xi + E ≥
E∈R
K
i
2. The case where all the Xi ’s are identically distributed
15
Theorem. ([21]) Let (Xi )ni=1 be a sequence of i.i.d. variables which are not constant
a.s.. Then there exists a KX , which depends only on the distribution from which the Xi ’s
are drawn, s.t. for any sequence of real numbers (ai )ni=1 , for some k ∈ [n]
P
2
X
i6=k ai
inf Var E +
ai X i ≥
E∈R
KX
i
Remark. Although it may be intuitive to think of “almost linear” as a stricter condition than
“low total influence”, observe that these conditions are incomparable. In fact there are functions
which are -close to linear, yet have total influence of Ω (n). For example, consider a function
f that is determined by a xor of Ω (n) variables with probability , and by a single variable with
probability 1 − :


 1  
 
log 1
log n/2
M
^
^
yj  ∨ ¬
xi  ∧ 
xi  ∧ (z)
f = 
i=1
j=1
i=1
In the other direction, one can of course consider the xor of two variables, which has a total
influence of 2, yet it is Ω (1)-far from linear. Nonetheless results such as Friedgut’s Lemma
[13] extend trivially to Boolean functions which are close to low-influence functions because
they only claim that the Boolean function is close to a junta.
16
Chapter 4
Our Main Results
In this work we consider the following relaxation of linearity in the premise of the FKN Theorem: Given a partition of the variables {Ij } and a function fj (not necessarily Boolean or
symmetric) on the variables in each subset, we look at the premise that the Boolean function f
is close to a linear combination of the fj ’s. Our main result (Corollary 10) states, loosely, that
any such f must be close to being completely dictated by its restriction fk to the variables in a
single subset Ik of the partition.
While making a natural generalization of the well-known FKN Theorem, our work also has
a surprising side: In the FKN Theorem and similar results, if a function is -close to linear then
it is (K · )-close to a dictatorship, for some constant K. We prove that while this is true in the
partition case for balanced functions, it does not hold in general. In particular, we require f to
be ( · Varf )-close to linear in the fj ’s, in order to prove that it is only (K · )-close to being
dictated by some fk . We show (Lemma 11) that this dependence on Varf is tight.
Our first resultPis a somewhat technical theorem, generalizing the FKN Proposition. We
consider the sum ni=1 Xi of a sequence of independent random variables. In particular, we do
not assume that the variables are Boolean, symmetric, balanced, or identically distributed. Our
main technical theorem, which generalizes the FKN Proposition, states that if this sum does
not “behave like” any single variable Xk , then it is also far from Boolean. In other words, if a
sum of independent random variables is close to a Boolean function then most of its variance
comes from only one
P variable.
We show that P
Xi is far from Boolean by proving a lower bound on the variance of its
absolute value, Var | Xi |. Note that for any Boolean
P function f , |f | = 1 everywhere, and
thus Var |f | = 0. Therefore the lower bound on Var | Xi | is in fact also a lower bound on the
(semi-)distance from the nearest Boolean function:
X X 2
Var Xi ≤
min
f
−
Xi 2
f is Boolean
P
By saying that the sum ni=1 Xi “behaves like” a single variable Xk , we mean that their
difference is almost a constant function; i.e. that
2 2
X
X
X X
min Xk −
X i − c = Xi − E
Xi = Var
Xi
c∈R i
2
i6=k
is small.
17
i6=k
2
i6=k
Furthermore, the definition of “small” depends on the expectation and variance of the sum
of the sequence, which we denote by E and V
X
X
E = E
Xi =
EXi
X
X
V = Var
Xi =
VarXi
Formally, our main technical theorem states that
Theorem 9. Let (Xi )ni=1 be a sequence of independent (not necessarily symmetric) random
variables, and let E and V be the expectation and variance of their sum, respectively. Then for
some universal constant K2 ≤ 13104 we have
X V · Var P Xi
i6=k
Xi ≥
Var K2 (V + E 2 )
i
The main motivation for proving this theorem is that it implies the following generalization
of the FKN Theorem:
Intuitively, while the FKN Theorem holds for Boolean functions that are almost linear in
individual variables, we generalize to functions that are almost linear with respect to a partition
of the variables.
Formally, let f : {±1}m → {±1} and let I1 , . . . , In be a partition of [m]; denote by fj the
restriction of f to each subset of variables:
X
fb(S) χS
fj =
φ6=S⊆Ij
Our main corollary states that if f behaves like the sum of the fj ’s then it behaves like some
single fk :
Corollary 10. Let f , Ij ’s, and fj ’s be as defined above. Suppose that f is concentrated on
coefficients that do not cross the partition, i.e.:
X
fˆ2 (S) ≥ 1 − ( · Varf )
S : ∃j, S⊆Ij
Then for some k ∈ [n], f is close to fk + fb(φ):
2
b
f − fk − f (φ) ≤ K2 · 2
In particular, notice that it implies that f is concentrated on the variables in a single subset
Ik .
Unlike the FKN Theorem and many similar statements, it does not suffice to assume that f
is -close to linear. Our main results require a dependence on the variance of f . We prove in
Section 8.1 that this dependence is tight up to a constant factor by constructing an example for
which Varf = o (1) and f is ( · Varf )-close to linear with respect to a partition, but f is still
Ω ()-far from being dictated by any subset.
18
Lemma 11. Corollary 10 is tight up to a constant factor. In particular, the division by Varf is
necessary.
there exists aseries of functions
f (m) : {±1}2m → {±1} and a partition
More precisely,
(m) (m)
(m)
(m)
(m)
I1 , I2
s.t. the restrictions f1 , f2
of f (m) to variables in Ij satisfy
X
fˆ2 (S) = 1 − O 2−m · Varf
(m)
S : ∃j, S⊆Ij
but for every j ∈ {1, 2}
2
(m)
d
(m)
(m)
−m
f
= ω 2−m · Varf
− fj − fj (φ)
=θ 2
2
4.1
From our results to the FKN Theorem
We claim that our results generalize the FKN Theorem. For a constant variance, the FKN
Theorem indeed follows immediately from Corollary 10 (for some worse constant KFKN ≥
K2
Varf ). However, because the premise of Corollary 10 depends on the variance, it may not be
obvious how to obtain the FKN Theorem for the general case, where the variance may go to
zero. Nonetheless, we note that thanks to an observation by Guy Kindler [15, 36] the FKN
Theorem follows easily once the special case of balanced functions is proven:
Given a Boolean function f : {±1}n → {±1}, we define a balanced Boolean function
g : {±1}n+1 → {±1} that will be as close to linear as f ,
g (x1 , x2 , . . . , xn , xn+1 ) = xn+1 · f (xn+1 · x1 , xn+1 · x2 , . . . , xn+1 · xn )
First, notice that g is indeed balanced because:
2E [g (X; xn+1 )] = E [f (X)] − E [f (−X)] = E [f (X)] − E [f (X)] = 0
Where the second equality holds because under uniform distribution taking the expectation
over X is the same as taking the expectation over −X.
Observe also that every monomial fb(S) χS (X) in the Fourier representation of f (X) is
|S|+1
multiplied by xn+1 in the Fourier transform of g (X; xn+1 ). (The |S|+1 in the exponent comes
from |S| for all the variables that appear in the monomial, and another 1 for the xn+1 outside
|S|+1
the function). Since xn+1 ∈ {±1}, for odd |S| we have that xn+1 = 1, and the monomial does
|S|+1
not change, i.e. gb (S) = fb(S); for even |S|, xn+1 = xn+1 , so gb (S ∪ {n + 1}) = fb(S). In
particular, the total weight on the first and zeroth level of the Fourier representation is preserved
because
gb ({i}) = fb({i}) ; gb ({n + 1}) = fb(φ)
(4.1)
P
If f satisfies the premise for the FKN Theorem, i.e. if |S|≤1 fˆ2 (S) ≥ 1 − , then from
(4.1) it is clear that the same also holds for g. From the FKN Theorem for the balanced special
case we deduce that g is (K · )-close to a dictatorship, i.e. there exists k ∈ [n + 1] such that
gb ({k})2 ≥ 1 − (K · ). Therefore by (4.1) f is also (K · )-close to either a dictatorship (when
k ∈ [n]) or a constant function (when k = n + 1). The FKN Theorem for balanced functions
follows as a special case of our main results, and therefore this work also provides an alternative
proof for the FKN Theorem.
19
Chapter 5
High-Level Outline of the Proof
Before proving Theorem 9 for a sequence of n variables, we begin with the special case of only
two random variables:
Lemma 12. Let X, Y be any two independent random variables, and let E and V be the
expectation and variance of their sum, respectively. Then for some universal constant K1 ≤
4368,
V · min {VarX, VarY }
(5.1)
Var |X + Y | ≥
K1 (V + E 2 )
Intuitively, in the expression on the left-hand side of (5.1) we consider the sum of two
independent variables, which we may expect to variate more than each variable separately. Per
contra, the same side of (5.1) also has the variance of the absolute value, which is in general
smaller than just the variance (without absolute value). Lemma 12 bounds this loss of variance.
On a high level, the main idea of the proof of lemma 12 is separation to two cases, depending
on the variance of the absolute value of the random variables, relative to the original variance
of the variables (without absolute value):
1. If both |X + EY | and |Y + EX| have relatively small variance, then X +EY and Y +EX
can be both approximated by random variables with constant absolute values. In this case
we prove the result by a case analysis.
2. If either |X + EY | or |Y + EX| has a relatively large variance, we prove an auxiliary
lemma that states that the variance of the absolute value of the sum, Var |X + Y | is not
much smaller than the variance of the absolute value of either variable (Var |X + EY | , Var |Y + EX|)
:
Lemma 13. Let X, Y be any two independent random variables, and let E be the expectation
of their sum. Then for some universal constant K0 ≤ 4,
Var |X + Y | ≥
max {Var |X + EY | , Var |Y + EX|}
K0
20
Note that in this lemma, unlike the former statements discussed so far, the terms on the righthand side also appear in absolute value. In particular, this makes the inequality hold with
respect to the maximum of the two variances.
We find it of separate interest to note that this lemma is tight in the sense that it is necessary
to take a non-trivial constant K0 > 1:
Claim 14. A non-trivial constant is necessary for Lemma 13. More precisely, there exists two
independent balanced random variables X̄, Ȳ , such that the following inequality does not hold
for any value K0 < 4/3:
max Var X̄ , Var Ȳ Var X̄ + Ȳ ≥
K0
(In particular, it is interesting to note that K0 > 1.)
Discussion and proof appear in section 8.2.
21
Chapter 6
Proofs
6.1
From variance of absolute value to variance of absolute
value of sum: proof of lemma 13
We begin with the proof of a slightly more general form of the lemma 13:
Lemma 15. Let X̄, Ȳ be any two independent balanced random variables, and let E be any
real number. Then for some universal constant K0 ≤ 4,
max Var X̄ + E , Var Ȳ + E Var X̄ + Ȳ + E ≥
K0
(Lemma 13 follows easily by taking E = E [X + Y ].)
Proof. This lemma is relatively easy to prove partly because the right-hand side contains the
maximum of the two variances. Thus, it suffices to prove separately that the left-hand side is
Var|X̄+E |
Var|Ȳ +E |
greater or equal to
and to
. Wlog we will prove:
K0
K0
Var X̄ + E Var X̄ + Ȳ + E ≥
K0
(6.1)
Separating into two inequalities is particularly helpful, because now Ȳ no longer appears in
the right-hand side. Using the convexity of the variance, we can reduce the proof of the above
inequality to proving it for the special case where Ȳ is a balanced variable with only two values
in its support.
Claim 16.Forany balanced random variable Ȳ there exists a sequence of balanced random
variables Ȳk , each with a support of size at most two, and a probability distribution {λk }
over the sequence, s.t.
Var X̄ + Ȳ + E ≥ Ek∼λk Var X̄ + Ȳk + E The proof appears in section 7.1.
It follows from claim 16 that Var X̄ + Ȳ + E is in particular greater or equal to Var X̄ + Ȳk + E for some k. Therefore, in order to prove lemma 15, it suffices to prove the lower bound
22
Var X̄ + Ȳ + E (with respect to Var X̄ + E ) for every balanced Ȳ with only two possible
values.
Recall (fact 5) that we can express the variances of X̄ + Ȳ + E and X̄ + E in terms of
the expected squared distance between two random evaluations. We use a simple case analysis
1
to prove that adding any balanced Ȳ with support of size two preserves
(up to
a factor of 4 ) the
expected squared distance between any two possible evaluations of X̄ + E .
Claim 17. For every two possible evaluations x1 , x2 in the support of X̄ + E ,
E(y1 ,y2 )∼Ȳ ×Ȳ (|x1 + y1 | − |x2 + y2 |)2 ≥
1
(|x1 | − |x2 |)2
4
The proof appears in section 17.
Finally, in order to achieve the bound on
the variances
(inequality (6.1)), take the expectation over all choices of (x1 , x2 ) ∼ X̄ + E × X̄ + E :
1
Var X̄ + Ȳ + E =
Ex1 ,x2 ,y1 ,y2 (|x1 + y1 | − |x2 + y2 |)2
2
1
Ex1 ,x2 (|x1 | − |x2 |)2
≥
8
1
=
· Var X̄ + E 4
(Where the two equalities follow by Fact 5, and the inequality by claim 17.)
6.2
From variance of absolute value of sum to variance:
proof of lemma 12
We advance to the more interesting lemma 12, where we bound the variance of the |X + Y |
with respect to the minimum of VarX and VarY . Intuitively, in the expression on the left-hand
side of (6.2) we consider the sum of two independent variables, which we may expect to variate
more than each variable separately. Per contra, the same side of (6.2) also has the variance of
the absolute value, which is in general smaller than just the variance (without absolute value).
We will now bound this loss of variance.
Lemma. (Lemma 12) Let X, Y be any two independent random variables, and let E and V
be the expectation and variance of their sum, respectively. Then for some universal constant
K1 ≤ 4368,
V · min {VarX, VarY }
(6.2)
Var |X + Y | ≥
K1 (V + E 2 )
Proof. We change variables by subtracting the expectation of X and Y ,
X̄ = X − EX
Ȳ = Y − EY
Note that the new variables X̄, Ȳ are balanced. Also observe that we are now interested in
showing a lower bound for
Var |X + Y | = Var X̄ + Ȳ + E 23
On a high level, the main idea of the proof is separation to two cases, depending on the
variance of the absolute value of the random variables, relative to the original variance of the
variables (without absolute value):
1. If either Var X̄ + E or Var Ȳ + E has a relatively large variance, we can simply apply
lemma 15 that states that the variance of the absolute value of the sum, Var X̄ +Ȳ + E is not much
smaller
than the variance of the absolute value of either variable (Var X̄ + E
and Var Ȳ + E ).
2. If both Var X̄ + E and Var Ȳ + E have relatively small variance, then X̄ + E and
Ȳ + E can be both approximated by random variables with constant absolute values,
(i.e. random variables with supports {±dX } and {±dY } for some reals dX and dY ,
respectively). For this case we prove the result by a case analysis.
Formally, let 0 < a < 1 be some parameter to be determined later, and denote
MXY = min VarX̄, VarȲ = min {VarX, VarY }
1. If either of the variances of the absolute values is large, i.e.
max Var X̄ + E , Var Ȳ + E ≥ a · MXY
Then we can simply apply lemma 15 to obtain:
max Var X̄ + E , Var Ȳ + E a · MXY
≥
Var X̄ + Ȳ + E ≥
K0
K0
2. On the other hand, if both the variances of the absolute values are small, i.e.
max Var X̄ + E , Var Ȳ + E < a · MXY
(6.3)
then X̄ + E and Ȳ + E are almost constant in absolute value.
In particular, let the variables X 0 and Y 0 be the constant-absolute-value approximations
to X̄ + E and Ȳ + E, respectively:
X 0 = sign X̄ + E · E X̄ + E Y 0 = sign Ȳ + E · E Ȳ + E From the precondition (6.3) it follows that X̄ + E and Ȳ + E are close to X 0 and
Y 0 , respectively:
X̄ − (X 0 − E)2 < a · MXY
22
0
Ȳ − (Y − E) < a · MXY
2
In particular, by the 2-relaxed triangle inequality (facts 2 and 8) we have that the variances
are similar:
1
Var (X 0 − E) − a · VarX̄
(6.4)
VarX̄ >
2
1
VarȲ >
Var (Y 0 − E) − a · VarȲ
(6.5)
2
2
1
Var X̄ + Ȳ + E ≥
Var |Y 0 + X 0 − E| − X̄ + Ȳ + E − |Y 0 + X 0 − E|2
2
1
>
Var |Y 0 + X 0 − E| − 4a · MXY
(6.6)
2
24
Hence, it will be useful to obtain a bound equivalent to (6.2), but in terms of the approximating variables, X 0 , Y 0 . We will then use the similarity of the variances to extend the
bound to X̄, Ȳ and complete the proof of the lemma.
We use case analysis over the possible evaluations of X 0 and Y 0 to prove the following
claim:
Claim 18. Let X̄, Ȳ be balanced random variables and let X 0 , Y 0 be the constant-absolutevalue approximations of X̄, Ȳ , respectively:
X 0 = sign X̄ + E · E X̄ + E Y 0 = sign Ȳ + E · E Ȳ + E Then the variance of the absolute value of X 0 + Y 0 − E is relatively large:
Var |X 0 + Y 0 − E| ≥
Var (X 0 − E) Var (Y 0 − E)
16 VarX̄ + E 2
The proof appears in section 7.3.
Now we use the closeness of the approximating variables X 0 , Y 0 to return to a bound for
the balanced variables X̄, Ȳ :
1
Var |X 0 + Y 0 − E| − 4a · MXY
Var X̄ + Ȳ + E ≥
2
Var (X 0 − E) Var (Y 0 − E)
≥
− 4a · MXY
32 (V + E 2 )
(1 − 2a)2 VarX̄ · VarȲ
≥
− 4a · MXY
128
V + E2
1 − 4a VarX̄ · VarȲ
≥
− 4a · MXY
128
V + E2
(Where the first line follows by equation (6.6); the second line from claim 18; the third
from (6.4) and (6.5); and the fourth is true because (1 − 2a)2 ≥ 1 − 4a.)
Combining the two cases, we have that
a
·
M
1
−
4a
Var
X̄
·
Var
Ȳ
XY
Var X̄ + Ȳ + E ≥ min
,
− 4a · MXY
K0
128
V + E2
Finally, to optimize the choice of the parameter a, set
a
1 − 4a max VarX̄, VarȲ
=
− 4a
K0
128
V + E2
1 − 4a V
≥
− 4a
256 V + E 2
V
256
V
1024 + 4
+
a ≥
2
V +E
K0
V + E2
K0
V
a ≥
·
1028K0 + 256 V + E 2
25
Therefore,
a · MXY
Var X̄ + Ȳ + E ≥
K0
1
V
≥
·
· MXY
1028K0 + 256 V + E 2
and thus (5.1) holds for K1 = 1028K0 + 256 ≤ 4368.
6.3
Proof of the main theorem
Lemma 12 bounds the variance of the absolute value of the sum of two independent variables,
Var |X + Y |, in terms of the variance of each variable. In the following theorem we generalize
this claim to a sequence of n independent variables.
Theorem. (Theorem 9) Let (Xi )ni=1 be a sequence of independent (not necessarily symmetric)
random variables, and let E and V be the expectation and variance of their sum, respectively.
Then for some universal constant K2 ≤ 13104 we have
X V · Var P Xi
i6=k
Xi ≥
Var K
(V
+
E 2)
2
i
Proof. In order to generalize lemma 12 to n variables, consider the two possible cases:
1. If for every i, VarXi ≤ V /3, then we can partition [n] into two sets A, B s.t.
X
X
2V
V
≤
VarXi ,
VarXi ≤
3
3
i∈A
i∈B
P
P
Thus, substituting X = i∈A Xi and Y = i∈B Xi in lemma 12, we have that for every
k
P
P
X X
i6=k Xi
X V ·
V
·
VarX
i
i∈B
3
Var Xi ≥
Xi = Var Xi +
≥
K1 (V + E 2 )
K1 (V + E 2 )
i
i∈A
i∈B
2. Otherwise, if VarXk > V /3, apply lemma 12 with X = Xk and Y =
P
i6=k
Xi to get:
P
V
X X
· i6=k Xi
3
Var Xi = Var Xk +
Xi ≥
K1 (V + E 2 )
i
i6=k
6.4
Proof of the extension to FKN Theorem
Corollary 10, the generalization of the FKN Theorem, follows easily from Theorem 9.
26
Corollary. (Corollary 10) Let f : {±1}m → {±1} be a Boolean function, (Ij )nj=1 a partition
of [m]. Also, for each Ij let fj be the restriction of f to the variables with indices in Ij . Suppose
that f is concentrated on coefficients that do not cross the partition, i.e.:
X
fˆ2 (S) ≥ 1 − ( · Varf )
S : ∃j, S⊆Ij
Then for some k ∈ [n], f is close to fk + fb(φ):
2
f − fk − fb(φ) ≤ K2 · 2
Proof. From the premise it follows that f is · Varf -close to the sum of the fj ’s and the empty
character:
2
X
fj − fˆ (φ) ≤ · Varf
f −
j
2
Since f is Boolean, this implies in particular that
X
ˆ
fj + f (φ) ≤ · Varf
Var j
Thus by the main theorem, for some k ∈ [n]
P
P
Var j fj · Var j6=k fj
≤ · Varf
P
K2 · Var j fj + fˆ (φ)2
X
f j ≤ K2 · Var
j6=k
27
Chapter 7
Proofs of technical claims
7.1
Convexity of the variance function: proof of claim 16
Recall that we are trying to bound the variance of X̄ + Ȳ + E for balanced X̄ and Ȳ from
below. In order for Ȳ to be balanced, its support must be of size at least two1 . We claim that
because of the convexity property of the variance function, considering
a random variable Ȳ of
this minimal support size suffices for proving a lower bound on Var X̄ + Ȳ + E .
Claim. (Claim
16) For any balanced random variable Ȳ there exists a set of balanced random
variables Ȳα , each with a support of size at most two, and a probability distribution gȲ over
it, s.t.
Var X̄ + Ȳ + E ≥ Eα∼gȲ Var X̄ + Ȳα + E Proof. Let fȲ be the probability density function of Ȳ . For every {a, b} in the support of Ȳ
s.t. sign (a) 6= sign (b), let Ȳ{a,b} be the unique balanced variable with support {a, b}. We show
that the distribution of Ȳ is equal to a mixture of the distributions of Ȳ{a,b} .
Define the random variable Z to be sampled from the distribution of Ȳ{a,b} with probability
defined by the density function
gȲ ({a, b}) =
(|a| + |b|) · fȲ (a) fȲ (b)
R
f (c) |c| dc
c≥0 Ȳ
Because Ȳ is balanced, it follows that also
gȲ ({a, b}) =
(|a| + |b|) · fȲ (a) fȲ (b)
R
f (c) |c| dc
c<0 Ȳ
1
We assume for simplicity of notation that 0 is not in the support of Ȳ ; the result for the general case follows
trivially.
28
Observe that Ȳ and Z have the same distribution:
Z
fZ (a) =
gȲ ({a, b}) · Pr Ȳ{a,b} = a db
b : sign(b)6=sign(a)
Z
(|a| + |b|) · fȲ (a) fȲ (b)
|b|
R
db
=
·
b : sign(b)6=sign(a) c : sign(c)6=sign(a) fȲ (c) |c| dc |a| + |b|
R
f (b) |b| db
b : sign(b)6=sign(a) Ȳ
= fȲ (a) · R
f (c) |c| dc
c : sign(c)6=sign(a) Ȳ
= fȲ (a)
Therefore,
Var X̄ + Ȳ + E = Var X̄ + Z + E 2
= E X̄ + Z + E − E X̄ + Z + E h 2 i
= E{a,b}∼gȲ E X̄ + Ȳ{a,b} + E − E X̄ + Z + E h 2 i
≥ E{a,b}∼gȲ E X̄ + Ȳ{a,b} + E − E X̄ + Ȳ{a,b} + E = E{a,b}∼gȲ Var X̄ + Ȳ{a,b} + E 7.2
Expected squared distance: proof of claim 17
We use case analysis to prove that adding any balanced Ȳ with support of size two preserves
1
(up
to a factor of 4 ) the expected squared distance between any two possible evaluations of
X̄ + E .
Claim. (Claim 17) For every two possible evaluations x1 , x2 in the support of X̄ + E ,
E(y1 ,y2 )∼Ȳ ×Ȳ (|x1 + y1 | − |x2 + y2 |)2 ≥
1
(|x1 | − |x2 |)2
4
Proof. Denote
pY
= Pr Ȳ ≥ 0
Because Ȳ is balanced, its two possible values must be of the form
d ≥ 0. Assume wlog that x1 ≥ 0 and |x1 | ≥ |x2 |.
We divide to cases based on the value of pY (see also figure 7.1):
n
d
pY
o
−d
, for some
, 1−p
Y
1. If pY ≥ 12 then with probability at least 14 both evaluations of Ȳ are non-negative, in
which case the distance between |x1 | and |x2 | can only increase:
2
d d
2
− x2 +
Ey1 ,y2 (|x1 + y1 | − |x2 + y2 |) ≥ Pr [y1 , y2 ≥ 0] · x1 +
pY
pY 2
d
d
≥ Pr [y1 , y2 ≥ 0] · x1 +
− |x2 | −
pY
pY
1
≥
(|x1 | − |x2 |)2
4
29
Figure 7.1: Case analysis in the proof of claim 17
The top figure corresponds to the case where both y1 and y2 are non-negative, i.e. y1 = y2 =
d
≥ 0. Notice that the distance between |x1 + y1 | and |x2 + y2 | is at least the distance between
pY
|x1 | and |x2 |. This case occurs with probability p2Y .
d
The bottom figure corresponds to the case where pY < 21 and y1 = pdY > 1−p
= |y2 |.
Y
Notice that in this case the distance also cannot decrease. In particular, when pY < 14 , we
d
= 3 |y2 |, and therefore the distance actually increases by a significant
have y1 = pdY > 3 1−p
Y
amount. This case occurs with probability pY (1 − pY ).
2. If 41 ≤ pY < 12 then with probability at least 14 , y1 is non-negative; we also use
−d 1−pY implies y1 ≥ |y2 |:
d
pY
≥
Ey1 ,y2 (|x1 + y1 | − |x2 + y2 |)2 ≥ Pr [y1 ≥ 0] · (|x1 | − |x2 | + y1 − |y2 |)2
≥ Pr [y1 ≥ 0] · (|x1 | − |x2 |)2
1
(|x1 | − |x2 |)2
≥
4
3. Else, if pY < 41 , we need to sum over the possible signs of y1 , y2 , and use the fact that
30
d
pY
−d is much greater than 1−p
:
Y
Ey1 ,y2 (|x1 + y1 | − |x2 + y2 |)2 ≥ Pr [y1 ≥ 0, y2 ≥ 0] · (|x1 | − |x2 |)2 +
2
d
d
−
+
Pr [y1 ≥ 0, y2 < 0] · |x1 | − |x2 | +
pY
1 − pY
2
2d
Pr [y1 < 0, y2 < 0] · |x1 | − |x2 | −
1 − pY
≥ p2Y (|x1 | − |x2 |)2 +
d
d
2
−
(|x1 | − |x2 |) +
pY (1 − pY ) (|x1 | − |x2 |) + 2
pY
1 − pY
2d
1
2
2
(1 − pY ) (|x1 | − |x2 |) − 2
(|x1 | − |x2 |)
4
1 − pY
1
1
2
≥
(|x1 | − |x2 |) + 2d
1−2·
(1 − pY ) − pY (|x1 | − |x2 |)
4
4
1
(|x1 | − |x2 |)2
≥
4
7.3
Constant absolute value: proof of claim 18
We use a brute-force case analysis to prove a relative lower bound on the variance in absolute
value of a sum of two variables with constant absolute values:
Claim. (Claim 18) Let X̄, Ȳ be balanced random variables and let X 0 , Y 0 be the constantabsolute-value approximations of X̄, Ȳ , respectively:
X 0 = sign X̄ + E · E X̄ + E Y 0 = sign Ȳ + E · E Ȳ + E Then the variance of the absolute value of X 0 + Y 0 − E is relatively large:
Var (X 0 − E) Var (Y 0 − E)
16 VarX̄ + E 2
Proof. Denote pX = Pr X̄ + E ≥ 0 and dX = E X̄ + E (and analogously for pY , dY ).
Observe that
r h
2 p
2 i q
= Var X̄ + E + E X̄ + E
= VarX̄ + E 2
dX ≤ E X̄ + E
Var |X 0 + Y 0 − E| ≥
Thus we can bound (1 − pX ) pX from below by:
VarX 0 = (2dX )2 Pr [X 0 ≤ 0] Pr [X 0 > 0] ≤ 4 VarX̄ + E 2 (1 − pX ) pX
(1 − pX ) pX ≥
VarX 0
4 VarX̄ + E 2
31
(7.1)
Figure 7.2: Case analysis in the proof of claim 18
• With probability 2 (1 − pX )2 · (1 − pY ) pY both x’s are negative, and y’s are of opposite
signs. Notice that since we assume wlog E ≥ 0 and dX ≥ dY , the distance between
|−dX − dY − E| and |−dX + dY − E| is the same as the distance between −dX −dY −E
and −dX + dY − E (marked by dashed line on both figures); it is therefore always 2dY .
• With probability 2p2X · (1 − pY ) pY both x’s are positive, and y’s are of opposite signs.
Notice that the distance between |dX + dY − E| and |dX − dY − E| (marked by the solid
lines) is either 2dY (as in the top figure) or |2dX − 2E| when dX − dY − E ≤ 0 (as in
the bottom figure).
• With probability 2 (1 − pX ) pX · (1 − pY ) pY the x’s and y’s are of correlated signs. Notice that the distance between |dX + dY − E| and |−dX − dY − E| (marked by the dotted
lines) is either 2E (as in both figures) or 2dX + 2dY when dX + dY − E ≤ 0 (not shown).
Also, for Y 0 we have
VarY 0 = pY · (1 − pY ) (2dY )2
Assume wlog that dY < dX and E > 0. Recall (fact 5) that we can write the variance in
terms of the expected squared distance between evaluations. Then, summing over the different
32
possible signs of X 0 and Y 0 we have
1
Ex ,x ,y ,y ∼X 0 ×X 0 ×Y 0 ×Y 0 (|x1 + y1 − E| − |x2 + y2 − E|)2
2 1 2 1 2
≥ (1 − pX )2 · (1 − pY ) pY · (|−dX − dY − E| − |−dX + dY − E|)2 +
p2X · (1 − pY ) pY · (|dX − dY − E| − |dX + dY − E|)2 +
(1 − pX ) pX · (1 − pY ) pY · (|−dX − dY − E| − |dX + dY − E|)2
≥ (1 − pX )2 + p2X · (1 − pY ) pY · min {|2dX − 2E| , 2dY }2 +
Var |X 0 + Y 0 − E| =
≥
≥
=
=
(1 − pX ) pX · pY (1 − pY ) · min {2E, 2dX + 2dY }2
(1 − pX ) pX · pY (1 − pY ) · d2Y
VarX 0
(2dY )2
· pY (1 − pY ) ·
4
4 VarX̄ + E 2
0
VarX
· VarY 0
2
16 VarX̄ + E
Var (X 0 − E) Var (Y 0 − E)
16 VarX̄ + E 2
(Where the first inequality follows by taking the expectation over the different possible signs of
X 0 , Y 0 (see also figure 7.2); the second follows by taking the minimum over the possible signs
of the quantities in absolute values; the third by simple algebra; and the fourth by (7.1).)
33
Chapter 8
Tightness of results
8.1
Tightness of the main result
The premise of main result, corollary 10, requires f to be ( · Varf )-close to a sum of independent functions. One may hope to avoid this factor of Varf and achieve a constant ratio between
in the premise (K · ) in the conclusion, as in the FKN Theorem. However, we show that the
dependence on Varf is necessary.
Lemma. (Lemma 11) Corollary 10 is tight up to a constant factor. In particular, the division
by Varf is necessary.
there exists aseries of functions
f (m) : {±1}2m → {±1} and a partition
More precisely,
(m)
(m)
(m)
(m) (m)
of f (m) to variables in Ij satisfy
s.t. the restrictions f1 , f2
I1 , I2
X
fˆ2 (S) = 1 − O 2−m · Varf
(m)
S : ∃j, S⊆Ij
but for every j ∈ {1, 2}
2
(m)
d
(m)
(m)
−m
f
− fj − fj (φ)
= ω 2−m · Varf
=θ 2
2
Proof. By example. Let
X = ∧m
i=1 xi
Y = ∧m
i=1 yi
f = X ∨Y
Now the variance of f is Θ (2−m ):
Varf = 4 Pr [f = 1] Pr [f = −1] = 4 1 − Θ 2−m
Θ 2−m = Θ 2−m
Also, f is O (2−2m )-close to a sum of independent functions:
X +Y +X ·Y −1
2
(X − 1) (Y − 1) 2
= 4 · 2−2m
= 2
2
f =
kf − (X + Y − 1)k22
Yet, f is Ω (2−m )-far from any function that depends on either only the xi ’s or only the yi ’s.
34
8.2
Tightness of lemma 13
Lemma 13 compares the variance of absolute value of sum of independent variables, to the
variance of absolute value of each variable. Since both sides of the inequality consider absolute
values, it may seem as if we should only adding variation on the left side by summing independent variables. In particular, one may hope that the inequality should hold trivially, with
K0 = 1. We show that this is not the case.
Claim. (Claim 14) A non-trivial constant is necessary for Lemma 13. More precisely, there
exists two independent balanced random variables X̄, Ȳ , such that the following inequality
does not hold for any value K0 < 4/3:
max Var X̄ , Var Ȳ Var X̄ + Ȳ ≥
K0
(In particular, it is interesting to note that K0 > 1.)
Proof. By example. Let
1
2
1
Pr [X = ±2] =
4
1
Pr [Y = ±1] =
2
Pr [X = 0] =
Then we have that
EX = EY = 0
3
4
1
Pr [|X + Y | = 3] =
4
Pr [|X + Y | = 1] =
and therefore
Var |X + Y | =
3
3
= Var |X|
4
4
35
Chapter 9
Discussion
In this work, we extend the FKN Theorem to Boolean functions of variables that are not uniformly distributed over {±1}. In particular, we consider variables which are neither Boolean
nor symmetric. More importantly, we extend the FKN Theorem to functions which are almost
linear with respect to a partition of the variables - i.e. Boolean functions that are close to a
sum of independent functions. Both are interesting generalizations of an important theorem
and improve our understanding of the behaviour of Boolean functions.
9.1
Hypercontractivity
Many theorems about Boolean functions rely on hypercontractivity theorems such as the BonamiBeckner Inequality ([6, 3]). Writing a real-valued function over {±1}n as a polynomial yields
a distribution over the monomials’ degrees {0, . . . n}, where the weight of k is the sum of
relative weights of monomials of degree k. Hypercontractivity inequalities bound the ratios between norms of real-valued functions over {±1}n in terms of this distribution of weights over
their monomials’ degrees. In this work it is not clear how to use such inequalities because the
functions in question may have an arbitrary weight on high degrees within each subset.
All of the proofs presented in this work are completely self-contained and based on elementary methods. In particular, we do not use any hypercontractivity theorem. This simplicity
makes our work more accessible and intuitive. This trend is exhibited by some recent related
works, e.g. [36, 21, 28], that also present proofs that do not use hypercontractivity.
9.2
Open problems
It would be interesting to extend the notion of analysis of Boolean functions over partitions of
the variables to some of the related works mentioned in the introduction. For example, one may
define an appropriate measure of “low degree” over partitions and extend Friedgut’s Lemma or
Bourgain’s Junta Theorem and ask whether functions that are close to low degree with respect
to a partition of the variables are close to juntas with respect to the same partition. Alternatively,
one may look at extending Corollary 10 to biased or other non-uniform distributions over the
Boolean hypercube.
36
9.2.1
Interesting Conjectures
While the dependence on the variance is tight, it seems counter-intuitive. We believe that it is
possible to come up with a structural characterization instead.
Observe that the function used for the counter example in lemma 11 is essentially the (nonbalanced) tribes function, i.e. OR of two AN D’s. All the extreme examples we discovered so
far have a similar structure of an independent Boolean function on each subset of the variables
(e.g. AN D on a subset of the variables), and then a “central” Boolean function that takes as
inputs the outputs of the independent functions (e.g. OR of all the AN D’s).
We conjecture that such a composition of Boolean functions is essentially the only way to
construct counter examples to the “naive extension” of the FKN Theorem. In other words, if a
Boolean function is close to linear with respect to a partition of the variables, then it is close to
the application of a central Boolean function g on the outputs of independent Boolean functions
gj ’s, one over each subset Ij . Formally,
Conjecture. Let f : {±1}m → {±1} be a Boolean function, (Ij )nj=1 a partition of [m]. Suppose that f is concentrated on coefficients that do not cross the partition, i.e.:
X
fˆ2 (S) ≥ 1 − S : ∃j, S⊆Ij
Then there exist a “central” Boolean function g : {±1}n → {±1} and a Boolean function on
each subset hj : {±1}|Ij | → {±1} s.t. the composition of g with the hj ’s is a good approximation of f . I.e. for some universal constant K,
2
f (X) − g h1 (xi )
≤K ·
i∈I1 , h2 (xi )i∈I2 , . . . , hn (xi )i∈In
2
Intuitively, this conjecture claims that the central function only needs to know one bit of
information on each subset in order to approximate f .
We believe that such a conjecture could have useful applications because one can often deduce about the properties of the composition of independent functions f = g (h (xI1 ) , h (xI2 ) , . . . , h (xIn ))
from the properties of the composed functions g and h. For example if f , g, and h are as above,
then the total influence of f is the product of the total influences of g and h.
In fact, we believe that an even stronger claim holds. It seems that for all the Boolean
functions that are almost linear with respect to a partition of the variables, the “central” function
g is either an OR or an AN D of some of the functions on the subsets hj . Formally,
Conjecture. (Stronger variant) Let f : {±1}m → {±1} be a Boolean function, (Ij )nj=1 a
partition of [m]. Suppose that f is concentrated on coefficients that do not cross the partition,
i.e.:
X
fˆ2 (S) ≥ 1 − S : ∃j, S⊆Ij
Then there exist a Boolean function hj : {±1}|Ij | → {±1} for the each j ∈ [n] s.t. either the
OR or the AN D of those hj ’s is a good approximation of f . I.e. for some universal constant
37
K,
2
f (X) − ORj∈[n] hj (xi )i∈Ij ≤ K · 2
-or
2
f (X) − AN Dj∈[n] hj (xi )i∈Ij ≤ K · 2
38
Bibliography
[1] Noga Alon, Irit Dinur, Ehud Friedgut, and Benny Sudakov. Graph products, fourier analysis and spectral techniques. GAFA, 14:913–940, 2004.
[2] Nikhil Bansal and Subhash Khot. Optimal long code test with one free bit. In FOCS,
pages 453–462, 2009.
[3] William Beckner. Inequalities in fourier analysis. Annals of Mathematics, 102:159–182,
1975.
[4] Mihir Bellare, Oded Goldreich, and Madhu Sudan.
Free bits, pcps, and
nonapproximability-towards tight results. SIAM J. Comput., 27(3):804–915, 1998.
[5] Michael Ben-Or and Nathan Linial. Collective coin flipping, robust voting schemes and
minima of banzhaf values. In Foundations of Computer Science, 1985., 26th Annual
Symposium on, pages 408 –416, oct. 1985.
[6] Aline Bonami. Étude des coefficients de fourier des fonctions de lp (g). Annales de
l’institut Fourier, 20:335–402, 1970.
[7] J. Bourgain. On the distribution of the fourier spectrum of boolean functions. Israel
Journal of Mathematics, 131:269–276, 2002. 10.1007/BF02785861.
[8] Shuchi Chawla, Robert Krauthgamer, Ravi Kumar, Yuval Rabani, and D. Sivakumar.
On the hardness of approximating multicut and sparsest-cut. Computational Complexity, 15(2):94–114, 2006.
[9] Irit Dinur. The pcp theorem by gap amplification. In STOC, pages 241–250, 2006.
[10] Irit Dinur, Ehud Friedgut, Guy Kindler, and Ryan O’Donnell. On the fourier tails of
bounded functions over the discrete cube. In Proceedings of the thirty-eighth annual
ACM symposium on Theory of computing, STOC ’06, pages 437–446, New York, NY,
USA, 2006. ACM.
[11] Irit Dinur and Samuel Safra. On the hardness of approximating minimum vertex cover.
Annals of Mathematics, 162:2005, 2004.
[12] Dvir Falik and Ehud Friedgut. An algebraic proof of a robust social choice impossibility
theorem. In FOCS, pages 413–422, 2011.
[13] Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordinates.
Combinatorica, 18(1):27–35, 1998.
39
[14] Ehud Friedgut. On the measure of intersecting families, uniqueness and stability. Combinatorica, 28(5):503–528, 2008.
[15] Ehud Friedgut, Gil Kalai, and Assaf Naor. Boolean functions whose fourier transform is
concentrated on the first two levels. Online version.
[16] Ehud Friedgut, Gil Kalai, and Assaf Naor. Boolean functions whose fourier transform is
concentrated on the first two levels. Advances in Applied Mathematics, 29(3):427 – 437,
2002.
[17] Mahya Ghandehari and Hamed Hatami. Fourier analysis and large independent sets in
powers of complete graphs. J. Comb. Theory, Ser. B, 98(1):164–172, 2008.
[18] Johan Håstad. Clique is hard to approximate within n1−o(1) . In Proceedings of the 37th
Annual Symposium on Foundations of Computer Science, FOCS ’96, pages 627–, Washington, DC, USA, 1996. IEEE Computer Society.
[19] Johan Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, July
2001.
[20] Hamed Hatami. A remark on bourgain’s distributional inequality on the fourier spectrum
of boolean functions. Online Journal of Analytic Combinatorics, 1, 2006.
[21] Jacek Jendrej, Krzysztof Oleszkiewicz, and Jakub O. Wojtaszczyk. On some extensions
to the fkn theorem. In Phenomena in high dimensions in geometric analysis, random
matrices, and computational geometry, 2012.
[22] J. Kahn, G. Kalai, and N. Linial. The influence of variables on boolean functions. In
Proceedings of the 29th Annual Symposium on Foundations of Computer Science, SFCS
’88, pages 68–80, Washington, DC, USA, 1988. IEEE Computer Society.
[23] Gil Kalai. A fourier-theoretic perspective for the condorcet paradox and arrow’s theorem. Discussion Paper Series dp280, The Center for the Study of Rationality, Hebrew
University, Jerusalem, November 2001.
[24] Subhash Khot and Assaf Naor. Nonembeddability theorems via fourier analysis. In FOCS,
pages 101–112, 2005.
[25] Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within
2-epsilon. J. Comput. Syst. Sci., 74(3):335–349, 2008.
[26] Subhash Khot and Nisheeth K. Vishnoi. The unique games conjecture, integrality gap for
cut problems and embeddability of negative type metrics into l1 . In FOCS, pages 53–62,
2005.
[27] Guy Kindler. Property Testing, PCP, and Juntas. PhD thesis, Tel-Aviv University, 2002.
[28] Guy Kindler and Ryan O’Donnell. Gaussian noise sensitivity and fourier tails. In IEEE
Conference on Computational Complexity, pages 137–147, 2012.
[29] Guy Kindler and Shmuel Safra. Noise-resistant boolean-functions are juntas. 2003.
40
[30] Robert Krauthgamer and Yuval Rabani. Improved lower bounds for embeddings into l1 .
SIAM J. Comput., 38(6):2487–2498, 2009.
[31] Ashley Montanaro and Tobias J. Osborne. Quantum boolean functions, 2009.
[32] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz. Noise stability of functions with low influences: invariance and optimality. CoRR, abs/math/0503503, 2005.
[33] Ryan O’Donnell and Rocco A. Servedio. Learning monotone decision trees in polynomial
time. SIAM J. Comput., 37(3):827–844, 2007.
[34] Dana Ron, Ronitt Rubinfeld, Muli Safra, √
and Omri Weinstein. Approximating the influence of monotone boolean functions in o( n) query complexity. In APPROX-RANDOM,
pages 664–675, 2011.
[35] Sushant Sachdeva and Madhur Tulsiani. Cuts in cartesian products of graphs. CoRR,
abs/1105.3383, 2011.
[36] Jakub Onufry Wojtaszczyk. Sums of independent variables approximating a boolean function. Submitted, 2010.
41

Download Report

School of Computer Science, Tel

Paperzz.com

Your Paperzz