The Green-Tao Theorem on arithmetic progressions within the
primes
Thomas Bloom
November 7, 2010
Contents
1 Introduction
1.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Arithmetic Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Structure of the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
7
9
10
2 Arithmetic Progressions
2.1 How to count arithmetic progressions . . . . . . . . . . . . . . . . . . . . . .
2.2 Szemerédi Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Pseudorandomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
12
13
14
3 Uniformity Norms and the Generalised von
3.1 The Gowers Uniformity Norm . . . . . . . .
3.2 The Generalised von Neumann Theorem . .
3.3 Dual Norms . . . . . . . . . . . . . . . . . .
Neumann Theorem
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
17
17
19
20
4 Decomposition Theorem
4.1 The Green-Tao Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 The Gowers-Hahn-Banach Proof . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 The Relative Szemerédi Theorem . . . . . . . . . . . . . . . . . . . . . . . .
22
22
23
25
5 Progressions in the Primes
5.1 Counting the Primes and the W -trick . . . . . . . . . . . . . . . . . . . . . .
5.2 Pseudorandom Majorant . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 The Green-Tao Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
27
30
33
6 Further Results
6.1 Extensions of the Green-Tao Theorem . . . . . . . . . . . . . . . . . . . . .
6.2 Asymptotics for P (k, N ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Explicit Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
35
36
37
A Proof of the Decomposition Theorem
39
2
CONTENTS
3
B Estimates for ΛR
B.1 Euler Product for independent linear forms . . . . . . . . . . . . . . . . . . .
B.2 Euler product for simple linear forms . . . . . . . . . . . . . . . . . . . . . .
B.3 Pseudorandomness of ν . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
46
50
51
C Fourier transform
55
D The GI and MN Conjectures
57
4
Standard Notation
The following notation and definitions are standard, and will be used without comment
throughout this dissertation.
A k-term arithmetic progression (or just a k-progression) is a set of the form {a + nb :
0 ≤ n ≤ k − 1} for some a, b ∈ N. We exclude the degenerate case where b = 0.
We say that f = O(g) if there exists a constant C such that for all sufficiently large x,
|f (x)| ≤ Cg(x).
f = o(1) if lim f (x) = 0.
x→∞
f ∼ g if f = (1 + o(1))g.
[n] := {1, 2, . . . , n − 1, n}.
φ(n) := #{m ∈ [n] : (m, n) = 1}.
if n = 1;
1
k
µ(n) := (−1) if n is square-free and has k prime divisors;
0
otherwise.
Specific notation and conventions
We shall often be looking at the arithmetic mean of a function over a given set. For convenience, we denote this using the expectation notation,
1 X
E(f (x) : x ∈ X) = Ex∈X f (x) :=
f (x).
|X| x∈X
Also, since we shall be trying to count k-term progressions of primes, it is convenient to
introduce the following function:
P (k, N ) := # of k-term arithmetic progressions of primes in [N ].
In many cases, we shall be taking an average over a hypercube; that is, points of the form
(ω1 , . . . , ωn ) where ωi = 0 or 1 for all 1 ≤ i ≤ n. We denote the n-dimensional hypercube
by Cn := {0, 1}n . We denote the hypercube with the origin removed by Cn0 := {0, 1}n −
{(0, . . . , 0)}. Where we are dealing with such points, of the form ω = (ω1 , . . . , ωn ) and
h = (h1 , . . . , hn ), we define the scalar product to be
ω · h := ω1 h1 + · · · + ωn hn .
We will be working mainly over the ring of integers modulo N , denoted by ZN := Z/N Z,
and since we are letting N → ∞ we may always assume that N is a prime, and so ZN is a
field. We shall often be considering the space of functions f : ZN → R, which we denote by
RN . We give this the inner product
hf, gi := Ex∈ZN (f (x)g(x)),
CONTENTS
5
and where convenient the Lp norms,
kf kp := (Ex∈ZN |f (x)|p )1/p
for 1 ≤ p < ∞, and
kf k∞ := sup |f (x)|.
x∈ZN
We shall denote dependence on constants by subscripts. For example, Ok,δ implies that
the constant implicit in the O notation is dependent on the constants k and δ. Similarly,
ok implies that the rate of decay is dependent on k. Since in most cases there is only
one variable, I shall omit these subscripts for clarity, only making clear which variables the
constants depend on when this is important or not clear from context.
In almost all of this dissertation, the only variable is N , and hence, for instance,√f = o(1)
implies f tends to zero as N → ∞. The only other variable parameter is R := N , and
hence we may still take the o(1) errors to be decaying as N → ∞.
P
Whenever we use the variable p, we are ranging over primes. For example, p≤x p denotes
the sum of all primes less than or equal to x.
Acknowledgements
I would like to thank my supervisor for his encouragement, advice and a careful reading of
the first draft. I would also like to thank several of my fellow undergraduates for careful
proofreading and comments on clarity and structure.
Chapter 1
Introduction
Small arithmetic progressions within the primes are easy to come by. It is trivial to find
progressions with one or two terms, and 3, 5, 7 gives a 3-term arithmetic progression. A moment’s thought yields the 5-term arithmetic progression 5, 11, 17, 23, 29. The problem quickly
becomes a lot more difficult – the first 6-term arithmetic progression is 7, 37, 67, 97, 127, 157
and the current record holder has 26 terms:
43142746595714191 + 5283234035979900n for n = 0, . . . , 25.1
A natural problem to pose is whether we can find such progressions within the primes for
any given length. It is easy to prove that there can be no infinite arithmetic progression
within the primes, so the best we can hope for is the following recent theorem of Ben Green
and Terence Tao:
Theorem 1.1 (Green-Tao, 2008 [10]). There are arbitrarily long arithmetic progressions
within the primes.
In fact, they prove something much stronger, and give an increasing function of N as
a lower bound for how many such progressions are in the first N integers. It follows that
there are in fact infinitely many arithmetic progressions of primes of any finite length. In
this dissertation we describe a proof of this theorem, motivating and making clear the key
steps and insights needed as we go along. It is a synthesis of methods and ideas from [10],
[7] and [3], including some simplifications and new expository remarks. It is the first explicit
description of the entire proof which includes the simplifications made since [10].
In this introductory section we present the background to the problem, including heuristics, related conjectures, and previous partial results which the Green-Tao theorem builds
upon. We conclude by giving an overview of the structure of the proof. Chapters 2 to 5
focus on the different components of the proof, which are then brought together to prove (a
stronger form of) Theorem 1.1 at the end of Chapter 5. Chapter 6 gives some extensions
and related results which have since been obtained.
1 Found
by Benot Perichon using the PrimeGrid software by Geoff Reynolds and Jaroslaw Wroblewski, April 2010.
6
CHAPTER 1. INTRODUCTION
1.1
7
Heuristics
The prime number theorem states that if π(x) is the number of primes less than or equal to
x, then
x
π(x) ∼
.
log x
In probabilistic terms, this suggests that if we select an integer from [1, x] uniformly at
random, then it is prime with probability roughly 1/ log x. This model fails as a method
of proving statements about the primes – if the primes were truly ‘random’ then we would
expect roughly about the same number of even and odd primes. The failure is because the
model only considers the density of the primes, and not their arithmetical properties. This
model is surprisingly useful, however, in formulating conjectures about how we expect the
primes to behave. By including some information about their arithmetic properties (which
often only changes the original conjecture by a constant) these heuristics can be converted
into proven theorems.
For example, consider the following problem: how many primes less than or equal to x
are congruent to a modulo b? If a and b have a common divisor greater than 1, the answer is
trivially either 0 or 1, so suppose a and b are coprime. Let us select integers below x uniformly
at random. The probability that it is prime should be roughly 1/ log x and the probability
that it is congruent to a modulo b is 1/φ(b), since there are φ(b) coprime congruence classes
modulo b. Thus the probability that it meets both these conditions, assuming they are
independent, should be 1/φ(b) log x and this leads us to conjecture that
πa,b (x) ∼
x
φ(b) log x
where πa,b (x) is the number of primes less than or equal to x which are congruent to a modulo
b. This conjecture turns out to be correct, and is known as the Prime Number Theorem for
Arithmetic Progressions.
Encouraged by the success of the probabilistic model in counting primes within a given
arithmetic progression, let us try it with our problem of counting arithmetic progressions
within the primes. We now use the expectation notation, together with the fact that if A is
an event then E(# of times A occurs) = P(A):
E(#a, b such that a, a + b, . . . , a + (k − 1)b ≤ N are all prime)
= P(a, a + b, . . . , a + (k − 1)b are all prime)
≈ P(a is prime) · · · P(a + (k − 1)b is prime)
1
≈
logk N
where we have made the further assumption that the events of being prime are roughly
independent. Furthermore, since there are (within a constant factor of) N 2 arithmetic
8
progressions of length k in [1, N ], this leads us to conjecture that
P (k, N ) ∼
N2
.
logk N
In particular, since the right hand side is unbounded as N tends to infinity, there are infinitely
many k-term progressions within the primes. In fact, we shall prove something very similar
to this asymptotic, giving a lower bound which only differs from the above heuristic by a
constant factor. That is, we prove the following theorem.
Theorem 1.2 (Green-Tao). For any k ≥ 3 there exists a constant ck > 0 such that, for all
sufficiently large N ,
N2
P (k, N ) ≥ ck k .
log N
It is also possible to give an upper bound of this form, so P (k, N ) is within a constant
factor of the heuristic answer. The correct asymptotic seems to be similar - the heuristic
above multiplied by some absolute constant, which reflects the arithmetical information
about the primes which the probabilistic model does not include.
Conjecture 1.1. For any k ≥ 3 there exists a constant Ck > 0 depending only on k such
that
N2
P (k, N ) ∼ Ck k .
log N
This has been verified for 3 ≤ k ≤ 6 – see Chapter 6 for details. In fact, this is only a
special case of a more general conjecture obtained by Hardy and Littlewood using similar
heuristics. A linear form is a function in d variables of the shape
a0 + a1 n 1 + a2 n 2 + · · · + ad n d
where ai ∈ Q.
Conjecture 1.2 (Hardy-Littlewood Prime Tuples Conjecture [14]). Let ψ1 , . . . , ψk be linear
forms in d variables. Then for some constant C dependent only on the linear forms,
#{n = (n1 , . . . , nd ) ∈ [0, N ]d : ψ1 (n), . . . , ψk (n) prime} ∼ C
Nd
.
logk N
An affirmative answer to this conjecture would not only give asymptotic for P (k, N ), but
also settle the long-standing Twin Primes Conjecture and Goldbach Conjecture. See [7] for
more details. The Hardy-Littlewood conjecture is still unproven, although by extending the
methods used in [10], Green and Tao have proven it for a significant class of linear forms in
[7].
CHAPTER 1. INTRODUCTION
1.2
9
Arithmetic Progressions
The advances which led to the Green-Tao theorem are more about the structure of arithmetic
progressions than about the nature of the primes themselves. As Ben Green puts it in [5],
they “lie not in our understanding of the primes but rather in what we can say about
arithmetic progressions.”
The problem of finding arithmetic progressions within sets began with a 1936 paper [1]
of Erdős and Turán, in which they introduce the function rk (n), defined to be the size of the
largest subset of {1, . . . , n} with no k-term arithmetic progression. The problem of locating
k-term arithmetic progressions within sufficiently large sets is equivalent to showing that
rk (n) grows sufficiently slowly. In particular, they conjectured that rk (n) = o(n) for any
}|
k. If we define the density of a set of integers A to be lim inf N →∞ |A∩{1,...,N
, then this is
N
equivalent to the statement that any set of integers with positive density contains a k-term
progression for any k. In fact, Erdős later made the stronger conjecture that
Conjecture 1.3 (Erdős). If
X1
=∞
n
n∈A
then A contains arbitrarily long arithmetic progressions.
This is essentially equivalent to showing that rk (n) = O logn n for any k. Note that this is
stronger than rk (n) = o(n), since it gives an explicit upper bound on the rateP
of decay. The
Green-Tao theorem would be an immediate corollary of this conjecture, since p p1 diverges.
The weaker conjecture that rk (n) = o(n) was proven for the first non-trivial case k = 3 by
Roth in 1956, giving the following theorem.
Theorem 1.3 (Roth, 1956 [18]). Any subset of N with positive density contains a 3-term
arithmetic progression.
The case k = 4 was proven by Szemerédi in 1969, and in 1975 he managed to extend this
to arbitrary k using complicated combinatorial arguments.
Theorem 1.4 (Szemerédi, 1975 [19]). Any subset of N with positive density contains arbitrarily long arithmetic progressions.
In fact, we can strengthen Szemerédi’s theorem to show that we can find not just one,
but infinitely many progressions of any finite length using a simple combinatorial argument
first noted by Varnavides.
Corollary 1.1 (Varnavides, 1959 [24]). Let A ⊆ N and k ≥ 2 . If there exists δ > 0 such
that A ∩ {1, . . . , N } ≥ δ for all sufficiently large N , then there exists a constant ck,δ > 0
such that, for sufficiently large N , there are at least ck,δ N 2 k-term arithmetic progressions
in A ∩ {1, . . . , N }.
10
The prime number theorem implies the primes have density zero, and hence Szemerédi’s
theorem cannot be applied directly. It will, however, be invoked at a crucial step in proving
the Green-Tao theorem.
Szemerédi’s theorem has been reproven several times since 1975 using methods from a
surprisingly diverse array of mathematical fields – first by Furstenberg in 1977 using ergodic
theory and then twice by Gowers, using Fourier analysis and hypergraph regularity. More
recently, in 2009 an alternative combinatorial proof was found by the internet polymath
project. For a detailed survey of the different proofs see [21]. It was insights gained from
studying the common features of these different proofs which led to the methods used by
Green and Tao.
The first progress on the problem of progressions within the primes came from van der
Corput in 1939, who settled the result for k = 3 using the circle method.
Theorem 1.5 (van der Corput, 1939 [23]). There are infinitely many arithmetic progressions
consisting of three primes.
After this theorem, although significant results were obtained for sets of positive density
as outlined above, no further results were obtained for the primes until 1981 when HeathBrown showed, using methods similar to van der Corput’s, the following partial result for
k = 4.
Theorem 1.6 (Heath-Brown, 1981 [15]). There are infinitely many arithmetic progressions
consisting of three primes and one almost-prime (that is, a number with only two prime
factors, counted with multiplicity).
1.3
Structure of the Proof
The outline of the proof given here is new, although of course all the ideas are implicit in the
original approach of Green and Tao. In [10], however, they did not make the transference
principle explicit; it is discussed in more detail in, for example, [3].
The fundamental insight behind Green and Tao’s work was that, heuristically, a large
random subset of the integers is very similar to the integers themselves, conclusions which
hold for the latter should hold for the former. In other words, there should be a kind
of transference principle which would allow results which hold for the integers to hold for
sufficiently random subsets.
Let us call such subsets pseudorandom sets. Applying the transference principle to Szemerédi’s theorem, we may hope the following to hold.
Conjecture 1.4 (Relative Szemerédi Theorem). Let X ⊂ N be sufficiently pseudorandom.
Then any subset of X with positive density inside X has arbitrarily long arithmetic progressions.
In particular, although the primes have zero density within N, we may hope to find some
pseudorandom set X ⊂ N in which the primes have positive density, and deduce that the
CHAPTER 1. INTRODUCTION
11
primes contain arbitrarily long arithmetic progressions. In [10], Green and Tao do exactly
this, and their proof can be divided into two distinct parts.
The first is proving the Relative Szemerédi theorem – that is, showing that the kind of
structure reflected in Szemerédi’s theorem is amenable to the transference principle mentioned above. This is accomplished using the machinery of Gowers uniformity norms, first
introduced by Gowers to prove Szemerédi’s theorem. In particular, these norms induce a
notion of distance between subsets of N with the following properties.
The first is that this distance preserves the structure of containing arithmetic progressions,
in that sets which are close have roughly the same number of arithmetic progressions. This
is formalised as the Generalised von Neumann theorem. The mathematics needed here relies
on the Gowers uniformity norms, and is similar to the kind of regularity arguments used in
the hypergraph proof of Szemerédi’s theorem by Gowers. This, along with a discussion of
the uniformity norms, is the subject of Chapter 3.
The second is the transference principle mentioned above: a set which is dense inside some
pseudorandom subset of the integers is close to a set which is dense within the integers. This
is formalised as the Decomposition theorem. In the original proof, Green and Tao used
finitary ergodic theory to prove this, inspired by Furstenberg’s proof of Szemerédi’s theorem
using ergodic theory. In this dissertation, we present a simpler proof discovered by Gowers
using functional analysis. This is the subject of Chapter 4.
The third component is Szemerédi’s theorem, which shows that the larger set obtained
from the decomposition contains many arithmetic progressions. We then invoke the Generalised von Neumann theorem to show that the original set also contains many arithmetic
progressions. This finishes the proof of the Relative Szemerédi theorem.
With this in place, the second part of the proof of the Green-Tao theorem is showing that
the hypothesis of the Relative Szemerédi theorem is met: the primes sit inside a pseudorandom set with positive density. For this, we will use a weighted version of the almost-primes
– numbers with few prime factors. We will discuss this part of the proof in Chapter 5. To
show that the almost-primes are sufficiently pseudorandom uses techniques from traditional
analytic number theory, and Green and Tao were able to use arguments and results already
established by Goldston and Yıldırım in their work on small gaps between the primes [2].
This part of the argument has since been simplified; we have incorporated these simplifications into Chapter 5.
Chapter 2
Arithmetic Progressions
In this chapter we discuss arithmetic progressions and Szemerédi’s theorem in more detail,
and formulate the precise definition of pseudorandomness which we shall require. The exposition in this chapter is new, but the ideas it discusses are all present in [10] and earlier
work.
2.1
How to count arithmetic progressions
In many problems dealing with the existence of certain structures in the natural numbers,
it is easier to try to solve the apparently more difficult problem of counting how many such
structures we may expect to find in any finite subset of the natural numbers. Hence it is
sufficient to show that this count is not zero.
Another simplification that can be made is to consider functions instead of sets. We may
pass from considering a subset A ⊆ N to its characteristic function 1A : N → {0, 1} using
the following important observation:
(
1 if x, x + r, . . . , x + (k − 1)r ∈ A;
1A (x)1A (x + r) · · · 1A (x + (k − 1)r) =
0 otherwise.
Hence we may count all arithmetic progressions within A ∩ ZN by the sum
X
1A (x)1A (x + r) · · · 1A (x + (k − 1)r).
x,r∈ZN
Note that we have switched from considering arithmetic progressions within {1, . . . , N } to
those within ZN . This is to avoid the problem that, for instance, x, r ∈ {1, . . . , N } does not
guarantee that x + (k − 1)r ∈ {1, . . . , N }. We discuss this issue further below.
Statements about the existence of arithmetic progressions then reduce to the above sum
being non-zero. In fact, whenever the sum is bounded below by a constant, it is also bounded
below by a constant multiple of N 2 for sufficiently large N , thanks to simple arguments such
as were used by Varnavides to prove Corollary 1.1. Hence we in fact consider the above sum
12
CHAPTER 2. ARITHMETIC PROGRESSIONS
13
weighted by N12 . This leads us to the expectation notation, and motivates us to make the
following definition:
Definition 2.1. Let f0 , . . . , fk−1 : ZN → R. The normalised count of k-term arithmetic
progressions in f0 , . . . , fk−1 is defined by
Υk (f0 , . . . , fk−1 ) := E (f0 (x)f1 (x + r) · · · fk−1 (x + (k − 1)r) : x, r ∈ ZN ) .
The normalised count of k-term arithmetic progressions in f is
Υk (f ) := Υk (f, . . . , f ).
Remark 2.1. The use of Υk was not present in [10], although they repeatedly use the
expectation it is shorthand for. Conventionally this expectation is denoted by Λ, but in this
context this would create confusion with the von Mangoldt function for the primes.
The discussion above shows that, for A ⊆ ZN , there are N 2 Υk (1A ) many k-term arithmetic
progressions in A.
There are two important things to observe about the way we are counting arithmetic
progressions. The first is that we include the degenerate case when r = 0. This will not be
a problem, as such degenerate cases will contribute at most 1/N to Υk , which we shall only
be estimating up to o(1) errors.
The second potential problem is the wraparound issue noted above – we are counting
arithmetic progressions in ZN , rather than {1, . . . , N }. For instance, when N = 5, we would
include {1, 4, 2} as a 3-term arithmetic progression. This will happen if and only if, for some
1 ≤ i ≤ k − 1, the term a + ir is larger than N , for then it would have to wraparound ZN .
One crude way to avoid this, which will be sufficient, is to restrict a and r to being less than
N/k. In the case of the primes, we will ensure this by incorporating some small factor in the
definition of our counting function f to ensure such wraparound arithmetic progressions are
not counted.
2.2
Szemerédi Revisited
By considering characteristic functions instead of sets, using a Varnavides argument, and
using the wraparound trick mentioned above to pass from considering {1, . . . , N } to ZN , we
may rewrite Theorem 1.4 as follows:
Theorem 2.1. For any k ≥ 1 and δ > 0 there exists a constant ck,δ > 0 such that the
following holds. Let f : ZN → {0, 1} such that, for sufficiently large N , the density Ef ≥ δ.
Then for sufficiently large N ,
Υk (f ) ≥ ck,δ .
An important consequence of the approach of Gowers was the realisation that the function
f need not be discrete, but it is sufficient that it be bounded above by 1. This leads us to
the following formulation of Szemerédi’s theorem:
14
Theorem 2.2. For any k ≥ 1 and δ > 0 there exists a constant ck,δ > 0 such that the
following holds. Let f : ZN → R such that 0 ≤ f (x) ≤ 1 for all x ∈ ZN and, for sufficiently
large N , the density Ef ≥ δ. Then for sufficiently large N ,
Υk (f ) ≥ ck,δ .
As explained in the introduction, this theorem solves the problem adequately for sufficiently dense sets of integers, but cannot be applied to the primes, since they have zero
density. In particular, if we let 1P be the characteristic function of the primes, then the
prime number theorem implies that E1P ∼ log1N → 0 as N → ∞. Thus the Ef ≥ δ > 0
hypothesis of Theorem 2.2 is not satisfied.
We can avoid this and increase the density of our prime counting function by weighting
it with a log N factor – that is, we instead consider the function f := log N · 1P . This now
satisfies the density hypothesis, but is no longer bounded above by 1. The Relative Szemerédi
theorem allows us to weaken this restriction, requiring only that it is bounded above by some
sufficiently pseudorandom function. In particular, as an analogue to Theorem 2.2, we have
the following precise version of Conjecture 1.4.
Conjecture 2.1 (Relative Szemerédi Theorem). Let ν : ZN → R be k-pseudorandom, and
let f : ZN → R such that 0 ≤ f ≤ ν. If there exists a constant δ > 0 such that, for
sufficiently large N , the density Ef ≥ δ, then there exists a constant c0k,δ > 0 such that, for
sufficiently large N ,
Υk (f ) ≥ c0k,δ .
To prove this theorem, we use the strategy outlined in the introduction. It will be proven
in Chapter 4 as Theorem 4.4.
2.3
Pseudorandomness
This section gives some original remarks on the notion of pseudorandomness from [10], and
presents a clearer classification of the pseudorandomness condition into two components.
We first explain what kind of pseudorandomness we will need. Recall that ν is a function
from ZN to R, and should be thought of as a weighted indicator function for a subset of the
integers which is sufficiently pseudorandom for a transference principle to hold. The precise
conditions given below are determined by what we require in the technical theorems to come
later, but they reflect the following principles:
1. ν should behave like the constant function 1, since the pseudorandom set we transfer
to should behave like the integers.
2. The events ν(a) and ν(b) should be independent, for distinct a and b, since the probability that distinct elements belong to the pseudorandom set is independent.
We divide the definition of pseudorandom below into two parts, which correspond to the
two components of the Relative Szemerédi theorem: the Generalised von Neumann theorem
CHAPTER 2. ARITHMETIC PROGRESSIONS
15
and the Decomposition theorem. In [10], Green and Tao also divide up the definition into
two parts, though they do it differently into a linear forms condition and a correlation
condition. In that form, however, it is not clear that there is a distinction between the types
of pseudorandomness which the two components require.
The first is the pseudorandomness required to prove the Generalised von Neumann theorem.
Definition 2.2 (Linear Pseudorandomness). We say that ν is linearly k-pseudorandom if
whenever we have a system of m ≤ k2k−1 linear forms in t ≤ 3k − 4 variables,
ψi (x) :=
t
X
Lij xj where 1 ≤ i ≤ m,
j=1
such that none of the t-tuples (Lij )1≤j≤t ∈ Qt are zero, none is a rational multiple of another,
and moreover for each i, j the coefficient Lij ∈ Q has numerator and denominator bounded
by k in absolute value, then
Ex∈ZtN (ν(ψ1 (x)) · · · ν(ψm (x))) = 1 + ok (1).
Remark 2.2. All the linear forms in this condition are assumed to be homogenous, that is,
having zero constant term so ψ(0) = 0. In particular, we have the measure condition,
E(ν) = 1 + o(1),
which agrees with our first principle. Linear pseudorandomness should be viewed as a kind
of independence between ν(ψ1 ), . . . , ν(ψm ), in accordance with our second principle. This
is a very strong condition, since it gives us a great deal of control over a large class of
linear forms, in particular the k linear forms in 2 variables which give a k-term arithmetic
progression:
ψ1 (x1 , x2 ) := x1 , ψ2 (x1 , x2 ) := x1 + x2 , . . . , ψk (x1 , x2 ) := x1 + (k − 1)x2 .
From the linear pseudorandomness condition with these linear forms we get
Υk (ν) = 1 + o(1),
and a lot of arithmetic progressions counted by our pseudorandom function. The power of
the transference principle is that we don’t lose too many of these when passing to suitable
f ≤ ν.
The next condition is required for the Decomposition theorem to hold.
Definition 2.3 (Simple Pseudorandomness). We say that ν is simply k-pseudorandom if
whenever we have m ≤ 2k−1 simple linear forms ψi in t ≤ k variables, that is, ones of the
shape
t
X
ψi (x) :=
ωij xj + bi
j=1
16
where ωij ∈ {0, 1}, such that the affine parts ωi = (ωi1 , . . . , ωit ) are not zero or rational
multiples of each other, then
E(ν(ψ1 (x)) · · · ν(ψm (x)) = 1 + o(1).
Furthermore, there exists a weight function τm : ZN → R+ such that E(τ q ) = Om,q (1) for all
1 ≤ q < ∞ and for all h1 , . . . , hm ∈ ZN we have the upper bound
X
Ex∈ZN (ν(x + h1 ) · · · ν(x + hm )) ≤
τ (hi − hj ).
1≤i<j≤m
Remark 2.3. We cannot apply the first part to give an asymptotic for the second part, since
the affine parts of the linear forms are all the same. We also observe that we cannot control
these expressions using linear pseudorandomness, since the forms are non-homogenous.
We now make the following umbrella definition, which is required for the Relative Szemerédi theorem.
Definition 2.4 (Pseudorandomness). We say that ν is k-pseudorandom if it is both linearly
k-pseudorandom and simply k-pseudorandom.
The constant function 1 is the easiest example of a pseudorandom function. In fact, it is
also an important one, since the space of pseudorandom functions is star-shaped around 1,
as the following easily verified lemma shows.
Lemma 2.1. If ν is linearly pseudorandom, then so is λν + (1 − λ) for any λ ∈ (0, 1);
similarly for simple pseudorandomness.
This lemma will be important in several places, since it will allow us to pass from bounds
of the form f ≤ ν + 1 to ones of form f ≤ ν losing only a constant factor, but preserving
pseudorandomness.
Remark 2.4. It is believed that these conditions are stronger than necessary. Weakening
the strength of the pseudorandomness necessary (particular the simple pseudorandomness
required for the Decomposition theorem) is one goal of current research in this area.
Chapter 3
Uniformity Norms and the
Generalised von Neumann Theorem
In this chapter we introduce the Gowers uniformity norm, which will play a central role
in the proof. We also prove the first component of the Relative Szemerédi theorem, the
Generalised von Neumann theorem. Once again, the substantial ideas in this chapter are all
present in [10], though the exposition in the first section contains some new ideas.
3.1
The Gowers Uniformity Norm
Recall that our strategy for proving the Relative Szemerédi theorem is to show that a set
dense in a pseudorandom set is ‘close’ in some sense to one dense in the natural numbers.
In terms of the functional approach in the previous chapter, we seek some metric d on RN
such that:
1. If d(f, g) is small then f and g count a similar number of k-term arithmetic progressions,
and
2. If f ≤ ν for some pseudorandom ν then there exists a bounded g such that d(f, g) is
small.
The easiest way to obtain a metric is to induce it from some norm on the space. We now
give the definition of the required norm as in [10]. First, however, we give some original
remarks to help motivate the definition.
We seek to decompose a function f ≤ ν, where ν is a pseudorandom function, as f = g +h
where g is bounded and h is ‘small’, in the sense that Υk (g + h) is well approximated by
Υk (g). We now need to specify what is meant by ‘small’.
Our initial approach might be to use Υk , the normalised count of arithmetic progressions,
directly – that is, we hope to achieve a decomposition where Υk (h) is small. There are two
problems with this approach.
17
18
The first is that we hope to approximate Υk (g + h) by Υk (g), and so we need that
X
Υk (g + h) = Υk (g) +
Υk (f1 , . . . , fk )
∅6=I⊆[k]
= Υk (g) + negligible terms.
where fi = h if i ∈ I and fi = g otherwise. In other words, we need not only Υk (h) small, but
also Υk (f1 , . . . , fk ) small whenever some fi = h. It would be sufficient to prove an inequality
such as
Υk (f1 , . . . , fk ) = Ok ( inf Υk (fi )).
1≤i≤k
This cannot hold, however, as shown by the following counterexample. Define f1 (n) = 1 if
n = 0, and f1 (n) = 0 otherwise, and let fi ≡ 1 for all i ≥ 2. Then inf Υk (fi ) = Υk (f1 ) =
1/N 2 , whereas Υk (f1 , . . . , fk ) = 1/N for all N .
The second problem with using Υk directly is that it is not a norm on RN , for the trivial
reason that it can be negative. We may avoid this by taking the absolute value, but although
it is easily verified that |Υk | is a seminorm, it is not a norm. For example, take N = 3, and
f : Z3 → R defined by f (0) = 0, f (1) = 1 and f (2) = −1. A simple calculation shows
Υk (f ) = 0, although f 6= 0. This is a problem for the existence of a decomposition, since
the analytic machinery we hope to use to find such a decomposition relies on h being small
in some norm on RN .
Hence we should not demand that h be small in terms of Υk , but rather in some other
norm on RN . To discover what this should look like, focus on the first of the problems
above – bounding Υk (f1 , . . . , fk ). The most common tool in bounding expectations is the
Cauchy-Schwarz inequality,
E(XY )2 ≤ E(X 2 )E(Y 2 ).
We need to bound Υk in terms of something involving only h, and so we must remove k − 1
functions. For concreteness, let us temporarily fix k = 3. After applying the Cauchy-Schwarz
inequality twice, we may bound the Υ3 (f1 , f2 , f3 ) term,
E(f1 (x)f1 (x + r)f2 (x + r + r) : x, r ∈ ZN ),
with a product where each term has the shape
E(f (x)f (x + h1 )f (x + h2 )f (x + h1 + h2 ) : x, h1 , h2 ∈ ZN ).
This is similar to the shape of Υ3 , but with the sum r+r replaced with h1 +h2 for independent
variables h1 , h2 . This suggests that if h is small with respect to this expectation, then we
can use the Cauchy-Schwarz inequality to show that Υ3 (f1 , f2 , f3 ) is small whenever some
fi = h, and hence that Υ3 (g + h) ≈ Υ3 (g).
Motivated by this, we make the following definition.
Definition 3.1. For any d ≥ 1, the Gowers d-uniformity norm of a function f : ZN → R is
defined as
!1/2d
Y
kf kU d := E
f (x + ω · h) : x ∈ ZN , h ∈ ZdN
.
ω∈Cd
CHAPTER 3. UNIFORMITY NORMS AND THE GENERALISED VON NEUMANN THEOREM
19
Note that, in the k = 3 case, kf k4U 2 is exactly the expectation we obtained above. In
general, we will use the U k−1 norm to deal with progressions of length k.
It is easy to verify that k · kU d is a seminorm for d ≥ 1. That it is also a genuine norm
when d ≥ 2 follows from the easily verified fact that
kf kU 2 = kfˆk4
where fˆ is the Fourier transform of f , and the less obvious monotonicity property
kf kU d−1 ≤ kf kU d .
Recalling that we are seeking an inequality of the shape
# of k-progressions counted by f ≤ kf kU k−1 ,
the monotonicity property agrees with the trivial observation that any (k + 1)-progression
truncated gives a k-progression (and hence if the count of (k + 1)-progressions is small, then
so is the count of k-progressions).
With this definition in place, we can restate our strategy for proving the Relative Szemerédi theorem. We need to show that the U k−1 norm has the following properties.
1. If khkU k−1 is small then (for suitable g) Υk (g) ≈ Υk (g + h).
2. If f ≤ ν then f = g + h where g is bounded and khkU k−1 is small.
The first is the Generalised von Neumann Theorem, which occupies the next section. The
second is the crucial Decomposition Theorem, which we discuss in the next chapter.
We need to control the count of arithmetic progressions over functions with small Gowers
uniformity norm. Since we shall often be referring to functions with small Gowers uniformity
norms, it is convenient to make the following definition.
Definition 3.2. We say that f is η-uniform if kf kU d ≤ η, and more generally say that f is
uniform if kf kU d is small.
For a more in-depth discussion of the Gowers uniformity norms, including a proof of the
monotonicity property mentioned above, see (for example) Appendix B of [7].
3.2
The Generalised von Neumann Theorem
We come now to the first component needed to prove the Relative Szemerédi theorem. A
specialised form of this theorem, when ν ≡ 1, was first used by Gowers in his proof of
Szemerédi’s theorem. The fact that it could be generalised to linearly pseudorandom ν was
first noticed by Green and Tao in [10] – indeed, the linearly pseudorandom condition which
we require ν to satisfy was chosen with the proof of this theorem in mind.
The proof is long and technical, and can be found in [10]. The idea is to repeatedly apply
the Cauchy-Schwarz inequality as outlined above until we are at a stage where we can apply
the pseudorandom condition.
20
Theorem 3.1 (Generalised von Neumann Theorem). Let ν be linearly pseudorandom, and
f0 , . . . , fk−1 obey the bounds |fi (x)| ≤ ν(x) for all x. Then
Υk (f0 , . . . , fk−1 ) = Ok ( inf
0≤i≤k−1
kfi kU k−1 ) + ok (1)
Remark 3.1. Using Theorem 2.1, and rescaling the fi where necessary, we may in fact
weaken the conditions to |fi | ≤ ν + 2. This fact will be needed when we apply this to prove
the Relative Szemerédi theorem, since we will need to apply it to h = f − g where 0 ≤ f ≤ ν
and 0 ≤ g ≤ 2.
Proof. Omitted. See [10], Section 3.
In particular, if h is uniform, then Υk (g + h) is approximately Υk (g). This is the key step
in the proof of the Relative Szemerédi theorem, so we formulate it precisely as follows.
Corollary 3.1. If f = g + h where |g|, |h| ≤ ν for some linearly pseudorandom ν, and h is
η-uniform, then
Υk (f ) = Υk (g) + Ok (η) + o(1).
Proof. Expanding out the expectation notation, we see that
X
Υk (g + h) = Υk (g) +
Υ(f1 , . . . , fk )
∅6=I⊂[k]
where fi = h if i ∈ I and fi = g otherwise. We then apply Theorem 3.1 and the condition
that h is η-uniform to show that each of the terms in the sum is bounded by Ok (η)+ok (1).
3.3
Dual Norms
This section closely follows the first part of section 6 in [10], although due to the usefulness
of dual norms in the new approach to the Decomposition theorem in the next chapter, we
define the dual norm in generality.
In general, whenever we have a norm k · k on RN we may define the dual norm as follows:
kf k∗ := sup{|hf, gi| : kgk ≤ 1}.
It is easy to check that this defines a seminorm on RN , and for the norms we shall be dealing
with it will also be a norm. The use of this definition lies in the inequality
hf, gi ≤ kf kkgk∗ .
In particular, whenever kgk∗ is small, and g correlates with f to a large degree, then kf k
must be large. That is, smallness of the dual norm prevents the norm of related functions
from being small. In the case of the U d norms, we say that g is anti-uniform if it has small
dual U d norm, and so anti-uniformity is an obstruction to uniformity.
Closely linked to the introduction of dual norms, we also introduce the concept of dual
functions – at least, with respect to the U d norms. In the following definition, and throughout
CHAPTER 3. UNIFORMITY NORMS AND THE GENERALISED VON NEUMANN THEOREM
21
the rest of this dissertation, we shall fix d = k − 1, recalling that k is to be taken as a fixed
quantity. The dual function of f is defined as
Y
.
Df (x) := E
f (x + ω · h) : h ∈ Zk−1
N
ω∈Ck−1 −0
The use of this lies in the following lemma. This will be useful later, when we shall apply it
to deduce that sufficiently uniform functions do not correlate much with their dual functions.
Lemma 3.1.
k−1
hf, Df i = kf kU2 k−1 .
Proof. Expand out both sides using their definitions.
Chapter 4
Decomposition Theorem
The goal of this chapter is to prove the following.
Theorem 4.1 (Decomposition Theorem). Let ν be simply pseudorandom, and η some parameter such that 1 > η > 0.
Suppose N is sufficiently large, depending on η. Then for every function 0 ≤ f ≤ ν we
can decompose it as f = g + h where 0 ≤ g ≤ 2 and h is η-uniform.
This is the final, and most crucial part of the proof of the Relative Szemerédi theorem,
and hence of the entire Green-Tao theorem. It is presented as a decomposition, which allows
us to decompose f into a bounded part (to which we may apply Szemerédi’s theorem) and
a uniform part, whose contribution is negligible by the Generalised von Neumann theorem.
It is, however, better viewed as a transference theorem: it allows us to transfer properties
of the integers to pseudorandom subsets of the integers. In this case, the desired property is
that dense subsets contain arbitrarily long arithmetic progressions. The relationship between
decomposition theorems and transference theorems holds in quite general terms, and is
discussed in detail in [3].
4.1
The Green-Tao Proof
The original proof used by Green and Tao in [10] is quite different to the one present below,
and relies on a finitary ergodic theory inspired by Furstenberg’s proof of Szemerédi’s theorem.
We briefly sketch their approach here before presenting the simpler proof by Gowers.
Their proof constructs the decomposition in stages. They begin by looking at the decomposition f = E(f ) + (f − E(f )). It follows from the pseudorandomness of ν that E(f ) is
bounded, so the remaining problem is to show that f − E(f ) is sufficiently uniform.
Of course, there is no guarantee that it will be. Instead, they use the machinery of
conditional expectations over σ-algebras to increase the uniformity as follows. By using dual
functions as obstructions to uniformity, if f − E(f ) is not sufficiently uniform sets can be
added to create an expanded σ-algebra B. These new sets are chosen so that the conditional
expectation E(f | B) absorbs the impact of the dual functions which were obstructing the
22
CHAPTER 4. DECOMPOSITION THEOREM
23
uniformity. In particular, the difference f − E(f | B) lacks these obstructions, and is more
uniform. Furthermore, it follows from the pseudorandomness of ν and the fact that f ≤ ν
that E(f | B) remains bounded.
They continue in this fashion, keeping E(f | B) bounded at each stage, while increasing
the uniformity of f − E(f | B). There is no guarantee, however, that this process will
terminate – that is, while the approximations are becoming more uniform at each stage,
they may never become sufficiently uniform.
Green and Tao show that this process must terminate using an energy increment argument
used in several approaches to Szemerédi’s theorem. This argument uses the fact that at
each stage in their construction, the pseudorandomness of ν ensures that E(f | B) remains
bounded. The energy, that is, the L2 -norm, of E(f | B) increases at each stage, but since it
is bounded above, there must be a stage at which the energy may not increase, and hence
no further approximations can be made and the process must terminate.
If the process has terminated, however, it must mean that the approximation at that stage
was sufficiently uniform, and so the decomposition at this stage meets our requirements.
4.2
The Gowers-Hahn-Banach Proof
The simpler proof outlined in this section takes a very different approach. Rather than
constructing a decomposition explicitly, it uses the Hahn-Banach theorem to derive a contradiction if no decomposition exists. This approach was independently discovered by Gowers
[3] and Reingold, Trevisan, Tulsiani and Vadhan [17].
The proof we give here follows the outline given in [3]. Some parts of the argument have
been simplified, since we do not require the generality given by Gowers, and the presentation
of the argument given here is new.
We begin by stating the version of the Hahn-Banach theorem1 that we will use.
Theorem 4.2 (Hahn-Banach theorem). Let K1 and K2 be closed convex subsets of RN ,
each containing 0, and suppose that f ∈ RN cannot be written as a convex combination
c1 f1 + c2 f2 with fi ∈ Ki . Then there exists φ ∈ RN such that hf, φi > 1 and hg, φi ≤ 1 for
every g ∈ K1 ∪ K2 .
With this theorem available to us, the strategy should be fairly obvious. Recall that we
need a decomposition f = g + h where g is bounded and h is uniform. We suppose that
no such decomposition exists and use Theorem 4.2 to derive a contradiction. Roughly, this
will be as follows: hf, φi will be large, but hν, φi will be small, contradicting the fact that
f ≤ ν. We hope to say that hν, φi is small since it is the sum of h1, φi, which is bounded,
and hν − 1, φi, which is o(1) since ν − 1 is uniform and φ is anti-uniform. There are, however,
significant technical difficulties to be overcome before we can put this into action. First, we
need the following simple consequence of pseudorandomness.
Lemma 4.1 (Uniformity of ν − 1). If ν : ZN → R is simply k-pseudorandom, then
kν − 1kU k−1 = o(1).
1 This
is quite different from the Hahn-Banach theorem as it is usually stated; for a derivation of the version stated, see [3].
24
Proof sketch. Expand out the definitions and use the binomial theorem.
Now let us try to prove Theorem 4.1 using only this Lemma and the Hahn-Banach theorem. In terms of the latter, we have two closed convex subsets of RN : positive functions
bounded by 2 and functions which are η-uniform. If the decomposition does not hold, then
by Theorem 4.2 we can find some function φ such that
1. hf, φi > 1,
2. hg, φi ≤ 1 for every g such that 0 ≤ g ≤ 2, and
3. hh, φi ≤ 1 for every h such that khkU k−1 ≤ η.
In particular, by setting g to be the function g(x) = 2 whenever φ(x) ≥ 0 and g(x) = 0
otherwise, we can suppose that h1, φ+ i ≤ 21 , where φ+ is the positive part of φ defined by
φ+ (x) := φ(x) when φ(x) ≥ 0 and 0 otherwise. We have the following chain of inequalities:
1 < hf, φi ≤ hf, φ+ i ≤ hν, φ+ i = h1, φ+ i + hν − 1, φ+ i ≤
1
+ kφ+ k∗U k−1 kν − 1kU k−1 .
2
Using Lemma 4.1, to obtain a contradiction for N sufficiently large it suffices to show that φ+
is anti-uniform. The problem is that condition 3 only gives us a bound for kφk∗U k−1 , and this is
not strong enough. The difficulty lies in passing from φ from φ+ , which is necessary since we
can only deduce from f ≤ ν that hf, φi ≤ hν, φi if φ is strictly non-negative; if some stronger
version of Theorem 4.2 were available that guaranteed φ ≥ 0 then the simple argument given
above would be sufficient. In particular, instead of simple pseudorandomness, all we would
need is the weaker condition kν − 1kU k−1 = o(1).
To fix this argument, we will show that φ+ can be approximated by a function that is
anti-uniform. This is technically messy, and we leave the details to Appendix A. It is in
proving this approximation, however, that the majority of the simple pseudorandomness
condition is required. It gives the following theorem.
Theorem 4.3 (Approximation with an anti-uniform function). Condition (3) above implies
that there exists a function ψ such that kψ − φ+ k∞ ≤ 1/8 and kψk∗U k−1 ≤ A for some A
depending only on η.
Using this theorem, we may adapt the chain of inequalities above to use this approximation
to φ+ , and obtain the inequality
1 < hf, φi ≤
3
+ o(1) + Akν − 1kU k−1 .
4
Since A is fixed and kν − 1kU k−1 is o(1), we have a contradiction for N large enough, which
proves Theorem 4.1. Once again, the details are technical and left to Appendix A.
CHAPTER 4. DECOMPOSITION THEOREM
4.3
25
The Relative Szemerédi Theorem
We now have all we need to prove the main component of the Green-Tao theorem. The proof
below fills in the sketch given in [10], making some minor changes since our Decomposition
theorem is different to the form in which it is given there.
Theorem 4.4 (Relative Szemerédi Theorem). Let k ≥ 3 and δ > 0, and let ν : ZN → R+ be
k-pseudorandom. Suppose f : ZN → R satisfies 0 ≤ f (x) ≤ ν(x) for all x ∈ ZN and Ef ≥ δ.
Then for all sufficiently large N ,
ck,δ/3
Υk (f ) ≥
2
where ck,δ/3 > 0 is the constant appearing in Theorem 2.2.
Proof. Let 0 < η < 1 be some parameter to be chosen later, and let f = g + h be the
decomposition given by Theorem 4.1. Hence we have 0 ≤ g ≤ 2 and h is η-uniform. We
would like to apply Szemerédi’s theorem to the function g; however, it is bounded above by
2 rather than 1, and its density is bounded below by a function of η, which we need to be
independent of our constant to be able to later take it sufficiently small. Hence we instead
consider the function (g + η)/(2 + η). We now have
δ
δ
E(f ) − E(h) + η
g+η
E
≥
> ,
=
2+η
2+η
2+η
3
since |E(h)| ≤ E(|h|) = khkU 1 ≤ khkU k−1 ≤ η. Furthermore, we have
0≤
g+η
≤ 1.
2+η
Hence for the function (g + η)/(2 + η), the conditions of Szemerédi’s theorem, Theorem 2.2,
are met and we may apply it to obtain the lower bound, for N sufficiently large (depending
only on k and δ)
g+η
Υk (g + η) ≥ Υk
≥ ck,δ/3
2+η
for some constant c dependent only on k and δ. Since η < 2, putting our upper bounds into
the definition of Υk gives us
Υk (f0 , . . . , fk−1 ) ≤ 2k−1 η = Ok (η)
whenever fj = η or g for 0 ≤ j ≤ k − 1, and at least one fi is equal to η. Hence
Υk (g) ≥ ck,δ − Ok (η).
On the other hand, khkU k−1 ≤ η, and |g|, |h| ≤ f +g ≤ ν +2. Hence by applying Corollary 3.1
(and Remark 3.1) we see that
Υk (f ) = Υk (g) + Ok (η) + ok (1).
26
In particular,
Υk (f ) ≥ ck,δ/3 − Ok (η) − ok (1).
By taking η small enough (depending on k and δ) we can ensure that the Ok (η) term is less
than c/4, and by taking N sufficiently large, we can also ensure that the ok (1) term is less
than c/4. Hence, for N sufficiently large,
Υk (f ) ≥
ck,δ/3
2
as required.
Weak Pseudorandomness
This section outlines a slightly different approach, the existence of which is hinted at by a
remark in [10].
If we can assume that δ is dependent on k, then we can prove a Relative Szemerédi
theorem from conditions which are strictly weaker than the pseudorandomness conditions
we have been using so far. This will be important in our application to the primes, when we
shall only be able to prove these weaker conditions.
Let εk be some sufficiently small constant depending only on k. Then we define weak
linear pseudorandomness and weak simple pseudorandomness by changing the asymptotics
and upper bounds required by adding a O(εk ) factor. This affects our argument as follows.
The Generalised von Neumann theorem is altered by a factor of Ok (εk ), using an almost
identical proof.
The Decomposition theorem requires the condition that η is sufficiently small depending
on εk . This is because a factor of kν − 1kU k−1 is no longer o(1), but is instead O(εk ) + o(1).
By using these modified theorems in the proof of our Relative Szemerédi theorem above,
we arrive at a lower bound of the form
Υk (f ) ≥ ck,δ − Ok (η) − Ok (εk ) + o(1).
If δ is dependent on k, then as long as we take εk sufficiently small, we may still arrive at the
required lower bound. This gives us the following alternative Relative Szemerédi theorem,
which we shall use in our application to the primes.
Theorem 4.5 (Alternative Relative Szemerédi Theorem). Let k ≥ 1 and let ν be a weak
k-pseudorandom function. Suppose f : ZN → R satisfies 0 ≤ f (x) ≤ ν(x) for all x ∈ ZN
1
(say). Then for all sufficiently large N ,
and Ef ≥ 10k
Υk (f ) ≥
ck
2
where ck = ck,1/30k > 0 is the constant appearing in Theorem 2.2.
Chapter 5
Progressions in the Primes
In this chapter we apply the Relative Szemerédi theorem to the problem of locating arithmetic
progressions within the prime numbers. The exposition here is original, and presents a
combined and simplified version of ideas used in [10] and [7].
To apply Theorem 4.5 to the primes, we require two things:
1. A suitable function f : ZN → R+ which counts only primes and with E(f ) ≥ δk for
some δk > 0 dependent only on k, and
2. A (weakly) k-pseudorandom function ν such that f ≤ ν.
5.1
Counting the Primes and the W -trick
The first obvious candidate for a function which counts only primes is the prime indicator
function, 1P (n), which is defined to be 1 if n is prime and 0 otherwise. This suffers from
the same difficulty which prevented us from applying the regular Szemerédi theorem to the
primes: the primes do not have positive density, since it follows from the prime number
theorem that E(1P ) = O( log1N ) = o(1).
There is, however, a standard method of avoiding this issue – instead of counting the
primes with weight 1, we count them with weight log p, using the von Mangoldt function1
(
log n if n is prime, and
Λ(n) :=
0
otherwise.
It is a simple consequence of the prime number theorem that E(Λ) = 1 + o(1), and hence is
certainly positive for sufficiently large N . Crucially, however, it is not bounded above – this
is the reason that we require the Relative Szemerédi theorem.
We may hope to use Λ to count the primes, but there are two complications, the first
easy to deal with and the second more troublesome.
1 Normally
the von Mangoldt function is also taken to count prime powers, and the contribution from these is shown to be
negligible. For this application, however, including the prime powers would not make the argument any easier, so we have just
excluded them from the definition.
27
28
The first is the wraparound problem noted in Chapter 2 – since we are working in ZN ,
we need to ensure that our arithmetic progressions are still progressions within [1, N ]. For
this we need a + (k − 1)b ≤ N which can be forced by only counting arithmetic progressions
in ZN with a, b ≤ N/k. Again, for this to hold it is sufficient if the progression is contained
entirely within [k N, 2k N ] for any k < 1/k. To ensure this, we shall modify Λ to be zero
on n outside this interval.
The second problem is that the primes do not behave as randomly as we need them to.
Recall that we need to show that the primes are inside some pseudorandom set with positive
density. Pseudorandom sets behave similarly to random sets – in particular, they should
have uniform distribution across all congruence classes as N tends to infinity. Since the
primes have positive density within this set, they should also reflect this, and be roughly
uniformly distributed across all congruence classes.
The primes are obviously not distributed this uniformly, however – for instance, there is
only one prime congruent to 0 modulo 2. It appears then that our primes in fact do not sit
inside a pseudorandom set with the required positive density. To avoid this difficulty, Green
and Tao introduce the W -trick and restrict their attention to primes in certain restricted
congruence classes.
The key observation is that how pseudorandom we need our set to be depends only
on k, which is fixed. Hence the majorant need not be fully random, but only random
enough dependent on k. In particular, it does not need to be uniformly distributed across
all congruence classes, but only those small enough to be detected by the linear forms in the
definition of k-pseudorandom. Hence the primes which are inside this majorant should be
roughly uniformly distributed in these small congruence classes.
If we only look at integers belonging to one of these small congruence classes, both in the
majorant and in the primes, this avoids the difficulty. The primes do not need to be uniformly distributed across all small congruence classes, since we know that our pseudorandom
majorant only includes one of them.
To be precise, let w be some sufficiently large integer, depending only on k, and then
define
Y
W :=
p
p≤w
to be the product of all primes below w. We restrict ourselves to looking at the congruence
class n ≡ 1 modulo W .2 Hence we look at Λ not over the interval {1, . . . , N }, but rather on
the set {W + 1, 2W + 1, . . . , N W + 1}, and set it to be zero everywhere else.
Restricting ourselves in this way reduces the primes we are counting by a factor polynomial
in W , and hence our eventual lower bound for the count of progressions in the primes will
be off by some polynomial factor in W . Since W is dependent only on k, however, we may
incorporate this factor into our constant ck .
In technical terms, we use the W -trick in verifying pseudorandomness to be able to deduce
that for p > w the linear forms we are estimating over are independent over Zp as well as
2 We may in fact choose any b coprime to W , and in fact choosing this using the pigeonhole principle avoids the use for the
prime number theorem. We follow Green and Tao in choosing 1 for simplicity here.
CHAPTER 5. PROGRESSIONS IN THE PRIMES
29
over Q. This enables us to bound certain local factor estimates with a crucial p−2 , and we
can obtain the estimates we require by taking w sufficiently large. For more details, see
Appendix B.
Including both the wraparound factor and the W -trick in the definition of Λ gives the
required prime counting function.
Definition 5.1 (Prime Counting Function).
(
Λ̃(n) :=
φ(W )
2λk W
0
log(W n + 1) if n ∈ [k N, 2k N ] and W n + 1 is prime, and
otherwise.
for some k and λk sufficiently small depending only on k.
The factor φ(W )/W here is required largely to provide a lower bound for the density
which is independent of W . This is to avoid circularity, since how large we need to take w
partially depends on the density of Λ̃. The factor of 1/2λk is to ensure that it is majorized
by the pseudorandom function we shall construct in the next section.
To show that Λ̃ is suitable for use in the Relative Szemerédi theorem, it remains to show
that its density is bounded below. We require the following classical theorem of analytic
number theory.
Theorem 5.1 (Prime Number Theorem for Arithmetic Progressions).
X
p≡a
p≤x
(mod b)
log p =
x
(1 + o(1)).
φ(b)
We use this theorem to show that our prime counting function Λ̃ satisfies the density
requirement:
Lemma 5.1. For sufficiently large N ,
E(Λ̃(x) : x ∈ ZN ) ≥
k
.
4λk
Remark 5.1. The constants k and λk are those in the definition of the prime counting
function, above, and the pseudorandom majorant, below. Exact values could be computed,
but the important thing is that they are dependent only on k, and hence so is the lower
bound for the density given in this lemma.
30
Proof. We simply expand out the expectation notation and apply Theorem 5.1 as follows:
X
φ(W )
E(Λ̃(x) : x ∈ ZN ) =
log(W n + 1)
2λk W N N ≤n≤2 N
k
k
W n+1 is prime
=
φ(W )
2λk W N
X
log p + o(1)
k W N ≤p≤2k W N
p≡1 (mod W )
φ(W )
2k W N
k W N
=
−
(1 + o(1))
2λk W N
φ(W )
φ(W )
k
=
(1 + o(1)).
2λk
In particular, for sufficiently large N , we can assume that 1 + o(1) ≥ 21 , which gives us the
result.
5.2
Pseudorandom Majorant
Now let us consider the construction of the pseudorandom majorant for Λ̃. Again, a natural
candidate is just Λ̃ itself. The problem is that verifying pseudorandomness for Λ̃ is extremely
difficult – the linear forms condition, for instance, is comparable in difficulty to the prime
tuples Conjecture 1.2, and hence harder than both the Twin Primes Conjecture and the
Goldbach Conjecture.
Instead, we use an idea from sieve theory. Traditional sieve theory methods have proven
extremely successful in obtaining estimates and asymptotics for the almost-primes (numbers
with few prime divisors), but its methods cannot be refined to include the primes themselves.
Fortunately, however, all we require now is a majorant for the weighted primes, and the
weighted almost-primes will do the job.
Recall that we counted the weighted primes using a restricted form of the Λ function.
Hence, we first consider the elementary identity
n
X
Λ(n) =
µ(d) log
.
d
d|n
We need to adjust this to also count numbers with few prime factors. Note that if n has
many prime factors, then most of its divisors will be small in relation to n. In particular,
by truncating the sum in the identity above, only summing over divisors less than some
parameter R, we obtain a function that is approximately Λ for numbers with many prime
factors. Hence this modified function only differs from Λ on the almost-primes.
Motivated by this, we introduce the truncated divisor sum
X
X
R
R
=
µ(d) log+
.
ΛR (n) :=
µ(d) log
d
d
d|n
d≤R
d|n
CHAPTER 5. PROGRESSIONS IN THE PRIMES
31
If p is a prime sufficiently large with respect to R then the only term counted in the sum
above will be d = 1, and so ΛR (p) = Λ(p). Of course, ΛR will count more than just the primes
but, as noted above, it will effectively only count almost-primes. Hence ΛR can be viewed as
a weighted indicator function for the almost-primes, and so just as we used Λ modified by the
W -trick to count the primes, we shall use ΛR modified by the W -trick as a pseudorandom
majorant. One small obstacle to overcome is that we require our pseudorandom majorant to
be positive, whereas ΛR can take on negative values. We circumnavigate this by the simple
trick of squaring the function to guarantee positive values.
By including the almost primes, obtaining estimates for ΛR is significantly easier than for
Λ, and ideas from sieve theory can be applied. Green and Tao were fortunate here, in that
Goldston and Yıldırım had already considered the truncated divisor sum in their work on
small gaps between primes [2], and had effectively proven the linear forms estimate required.
It was this proof which was used in the original paper [10].
A significant simplification for this part of the proof was discovered later by Green and
Tao, and is outlined in [7] and [20]. In the original proof, to be able to provide the required
estimates for sums of Λ2R , the log+ (R/d) factor in the definition of ΛR was replaced by a
contour integral using the identity
Z z
1
x
log+ x =
dz
2πi Γ z 2
for a vertical line Γ. This enabled them to replace the sum with a contour integral involving
the Riemann zeta function ζ, to which classical information about the zeta function could
be applied. In particular, they required the existence of a certain region to the left of the
line <(z) = 1 in which ζ is free of zeroes.
The new idea was as follows. They first replace the log+ (R/d) factor with a smooth
approximation,
log d
χ
log R
where χ is some smooth, bounded function with compact support. Note that log+ (R/d)
corresponds to taking χ(x) = log R(1 − x) for x ∈ (0, 1) and χ(x) = 0 otherwise. Then,
instead of replacing that with a contour integral, they could replace it with its Fourier
transform. The crucial fact here was that they could truncate the integral and consider it
over a bounded interval at the cost of o(1) errors since (as χ is smooth) the Fourier transform
decays very rapidly.
In the remainder of the argument then, instead of having to estimate functions involving
ζ(z) with |z| unbounded, they had only to consider z such that z = 1 + o(1). In this case,
the only fact about ζ required is the existence of a simple pole at z = 1. These ideas are
explained in detail in Appendix B.
To be able to use this simpler argument, we define a modified form of the truncated
divisor sum as follows.
32
Definition 5.2 (Modified Truncated Divisor Sum).
X
log d
Λ̃R (n) :=
µ(d)χ
log R
d|n
where χ is some smooth,R bounded function with compact support such that χ(0) = 1,
1
χ(x) = 0 for |x| ≥ 1 and 0 |χ0 (x)|2 dx = 1.
These conditions on χ are required for technical reasons, but χ(log d/ log R) should be
viewed as a smooth approximation to log+ (R/d), and so Λ̃R is a smooth approximation to
ΛR .
We require one final adjustment before the definition of the pseudorandom majorant. In
estimating the linear forms and correlation expectations required to demonstrate pseudorandomness, we get the hoped-for 1 + o(1) term, multiplied by a factor of φ(WW
. To
) log R
compensate for this, we scale our pseudorandom majorant to remove this factor. Hence we
arrive at our final definition as follows.
Definition 5.3 (Pseudorandom Majorant for the Primes). We define ν : ZN → R+ by
(
φ(W ) log R
Λ̃R (W n + 1)2 if n ∈ [k N, 2k N ], and
W
ν(n) :=
1
otherwise
for some sufficiently small k depending only on k, and where R := N λk for some sufficiently
small λk depending only on k.
Remark 5.2. This differs from the ν used in [10] by using the smoothed out ΛR for reasons
outlined above.
As outlined in the previous section, the W -trick is required so ν only considers numbers
congruent to 1 modulo W . The two-part definition of ν is needed due to difficulties in
passing between [1, N ] and ZN . Namely, we prove correlation estimates for Λ̃R over the
interval [1, N ], and to be able to apply these to the function ν (which is instead defined
over ZN ), we must truncate to some small interval. The λk determines how small the cut-off
parameter R is, and hence how ‘almost’ the almost-primes we count are.
We first verify that it is a majorant for our prime counting function Λ̃. The factor of
φ(W )
in the definition of Λ̃ is required partially to make this proof work.
2λk W
Lemma 5.2. For N sufficiently large depending on k, ν(n) ≥ Λ̃(n) for all n ∈ ZN .
Proof. Since we have squared the Λ̃R in the definition of ν, it is clear that ν(n) ≥ 0 for all
n ∈ ZN . The claim follows immediately unless W n + 1 is a prime and n ∈ [k N, 2k N ] (for
otherwise Λ̃(n) = 0), so let us suppose we are in this case.
By taking N sufficiently large we may also suppose that W n + 1 ≥ W2kN + 1 > N λk = R
since W is dependent only on k.
CHAPTER 5. PROGRESSIONS IN THE PRIMES
33
Therefore Λ̃R (W n + 1) = 1, and so
φ(W )
φ(W )
φ(W )
log R =
log N ≥
log
ν(n) =
W
λk W
λk W
n
2k
.
For n sufficiently large (which we may force by taking N sufficiently large)
and so
√
φ(W )
ν(n) ≥
log( W n + 1) = Λ̃(n).
λk W
n
2k
≥
√
Wn + 1
All that remains now is to verify that ν is weakly k-pseudorandom, for which we must
provide an asymptotic and an upper bound for ν taken over linear forms. The proofs of
these are long and technical, so have been postponed to Appendix B. The arguments there
are a simpler case of those given in [7], though we have made some simplifications since we
do not require the generality proven there.
Theorem 5.2. ν is weakly k-pseudorandom.
Proof. See Appendix B.
Remark 5.3. We only obtain the weak pseudorandomness condition discussed at the end
of Chapter 4, which is sufficient for these purposes since Lemma 5.1 gives a lower bound
for the density depending only k. To obtain the strong linearly pseudorandom condition we
must take w as an increasing function of N (as is done in [10]), which would prevent us from
obtaining the uniform lower bound in Theorem 5.3.
We have constructed a (weakly) pseudorandom majorant for our prime counting function
Λ̃, and can apply the Relative Szemerédi theorem to finally complete the proof of the GreenTao theorem.
5.3
The Green-Tao Theorem
We now have all that we need to prove the main result, that the primes contain infinitely
many arbitrarily long arithmetic progressions. In fact, we can deduce a stronger result,
giving an explicit lower bound. The proof below follows the outline given in [10], making it
explicit how to obtain the lower bound they mention as a remark.
Theorem 5.3 (The Green-Tao Theorem). For any k ≥ 3 there exists some constant c0k > 0
depending only on k such that, for N sufficiently large,
P (k, N ) ≥ c0k
N2
.
logk N
34
Proof. Choose M such that N = d2k W M + 1e. By Lemma 5.1, for sufficiently large N ,
En∈ZM (Λ̃) ≥
k
.
4λk
Furthermore, by Lemma 5.2 and Theorem 5.2, 0 ≤ Λ̃ ≤ ν where ν is weakly k-pseudorandom.
Hence by Theorem 4.5, the alternative Relative Szemerédi Theorem, there exists some constant ck > 0 such that, for sufficiently large M ,
Υk (Λ̃) ≥
ck
.
2
If we define 1P (n) = 1 if W n + 1 is prime and n ∈ [k M, 2k M ] and 0 otherwise, then
Λ̃(n) =
φ(W )
φ(W )
log(W n + 1)1P (n) ≤
log N 1P (n).
2λk W
2λk W
Let a, a+b, . . . , a+(k −1)b be an arithmetic progression of such n. Since a+ib ∈ [k M, 2k M ]
for all 0 ≤ i ≤ k − 1, and since we may take k < 1/k, this will be an arithmetic progression
in [1, M ], not just ZM . Furthermore, W a + 1, W a + 1 + W b, . . . , W a + 1 + (k − 1)W b will
be an arithmetic progression of primes in [1, N ], thanks to our initial choice of M .
Furthermore, since the degenerate case b = 0 can contribute at most M1 to the expectation,
we see that, for N sufficiently large,
1
2
P (k, N ) ≥ M Υk (1P ) −
M
k
ck
2λk W
2
≥ M
φ(W ) log N
4
2
N
k
4k W
ck
4W
≥
k
φ(W )
4
log N
2
N
≥ c0k k
log N
for some constant c0k > 0, as long as we take W sufficiently large depending on k.
Corollary 5.1. For any k, the primes contain infinitely many k-term arithmetic progressions.
It is crucial in the above proof that W was not dependent on N . If we take W as some
increasing function of N , as Green and Tao do, then although it makes certain parts of the
argument simpler (we do not need to appeal to weak pseudorandomness), we cannot get
a lower bound of the form we have stated. It is, however, strong enough to deduce this
corollary.
Chapter 6
Further Results
6.1
Extensions of the Green-Tao Theorem
Although the conclusion Theorem 5.3 was stated for primes, we may in fact use an almost
identical argument for subsets with positive density within the primes (since if A ⊆ B with
positive density and B ⊆ C with positive density then A ⊆ C with positive density). In
particular, since every prime congruent to 1 modulo 4 is the sum of two squares, Green and
Tao obtain the following previously unknown result:
Theorem 6.1. There are arbitrarily long arithmetic progressions where every term is the
sum of two squares.
A natural question to ask is whether the same methods can prove the stronger result that
the primes contain arbitrarily long polynomial progressions. This was shown by Tao and
Ziegler by similar methods in 2008, appealing this time to a Polynomial Szemerédi theorem
proven by Bergelson and Leibman in 1996. They show the following:
Theorem 6.2 (Tao-Ziegler [22]). Given any k polynomials with integer coefficients P1 , . . . , Pk
such that P1 (0) = · · · = Pk (0) = 0, the primes contain infinitely many progressions of the
form
x + P1 (m), . . . , x + Pk (m) with m > 0.
Generalising the other way, we can talk about prime elements in any ring, so it is natural
to ask whether Green-Tao results can be obtained in these alternative settings. Tao has
extended the method to deal with the Gaussian primes (those in the ring Z[i]):
Theorem 6.3 (Tao [21]). The Gaussian primes contain infinitely many instances of any
‘constellation’, that is, sets of the form
a + v0 b, . . . , a + vk−1 b all prime, with a ∈ Z[i] and b ∈ Z
for any fixed k and Gaussian integers v0 , . . . , vk−1 .
Hoàng Lê has also proven a version for the ring of polynomials over a finite field:
35
36
Theorem 6.4 (Hoàng Lê [16]). Let Fq be the finite field with q elements. Then for any k
we may find polynomials f, g ∈ Fq [t] with g 6= 0 such that for any polynomial P ∈ Fq [t] with
degree less than k, the polynomial
f + Pg
is irreducible.
For all these extensions, the previous remarks apply so that they are also valid for sets of
positive density within the primes.
6.2
Asymptotics for P (k, N )
Recent advances by Green, Tao and Ziegler appear to have established the conjectured
asymptotic for all k. We will now briefly outline the status of these results.
In [7] they conditionally established the Hardy-Littlewood prime tuples conjecture, Conjecture 1.2, for linear forms which are not rational multiples of each other. This result was
conditional on two partially resolved conjectures: the inverse Gowers-norm conjecture, GI(s),
and the Mobius-nilsequence conjecture, MN(s). In particular, they prove the following
Theorem 6.5 (Green-Tao, conditional). If the MN(k − 2) and GI(k − 2) conjectures hold,
then
N2
P (k, N ) ∼ Ck k
log N
where the constant is defined by
Y
Y pk−2 (p − k + 1)
pk−2
.
Ck :=
(p − 1)k−1 p≥k (p − 1)k−1
p<k
The GI(k − 2) conjecture roughly says that if a bounded function has large Gowers U k−1
norm then it correlates with a structured type of function known as a (k−2)-step nilsequence.
Hence if a function is not Gowers uniform then we can deduce a lot about its structure.
The MN(k − 2) conjecture roughly says that (k − 2)-step nilsequence obtained from this
does not correlate well with the Mobius function µ.
Putting these together, the idea behind the proof (ignoring significant technical difficulties) is similar to the one used in the Green-Tao theorem: we measure the primes using some
variant on the von Mangoldt function, call it Λ. If the Gowers norm of Λ − 1 is large, then by
the inverse Gowers-norm conjecture it correlates well with a nilsequence. Due to the close
connection between the Mobius function and Λ, however this leads to a contradiction to
the Mobius-nilsequence conjecture. Hence we can decompose Λ into a bounded part and a
uniform part, show that the uniform part is negligible and use some Szemerédi-type theorem
to give an asymptotic for the bounded part.
Hence to give asymptotics for arithmetic progressions within the primes it suffices to
prove these conjectures. GI(1) and MN(1) are classical and can be deduced from the circle
methods used by Hardy, Littlewood and Vinogradov. Green and Tao have proven GI(2) in
[9] and MN(2) in [11], giving the following unconditional asymptotics for the cases k = 3, 4.
CHAPTER 6. FURTHER RESULTS
37
Theorem 6.6 (Green-Tao).
P (3, N ) ∼ C3
N2
, and
log3 N
P (4, N ) ∼ C4
N2
;
log4 N
where the constants are defined by
C3 := 2
Y p(p − 2)
p≥3
C4 :=
(p − 1)2
≈ 1.3203, and
3 Y p2 (p − 3)
≈ 0.4764.
4 p≥5 (p − 1)3
The Mobius-nilsequence conjecture was established in all cases in [8], hence Theorem 6.5
is only dependent on the inverse Gowers-norm conjecture. Green, Tao and Ziegler have
further established the GN(3) conjecture in [12], so we have the following.
Theorem 6.7 (Green-Tao-Ziegler).
P (5, N ) ∼ C5
where the constant is
C5 :=
N2
,
log5 N
27 Y p3 (p − 4)
≈ 0.5189.
16 p≥5 (p − 1)4
Recently, Green, Tao and Ziegler have extended the strategy employed in [12] to all cases
in [13], finally removing the conditional status of Theorem 6.5 and giving an asymptotic for
prime progressions of any length.
6.3
Explicit Bounds
We finally mention the related question of where the first k-term prime progression can
be found. Let [1, Nk ] be the smallest interval containing a k-term prime progression, say
{a, a + b, . . . , a + (k − 1)b}.
Let us first consider lower bounds for Nk . It is easily verified that a ≥ k and that every
prime less than k must divide b, which gives the lower bound
Y
Nk > k + (k − 1)
p.
p<k
Q
It follows from the prime number theorem that p<n p = e(1+o(1))n , which gives the asymptotic lower bound
Nk > e(1+ok→∞ (1))k .
38
Now we turn to the harder problem of an upper bound for Nk . It is possible to compute
one by keeping careful track of the constants and rates of decay in the o(1) errors in the
proof of the Green-Tao theorem above. Since the proof invokes Szemerédi’s theorem, the
bounds obtained are heavily dependent on the optimal bound for Szemerédi’s theorem. The
best obtained so far is from Gowers’ proof using Fourier analysis in [4]. For a set of density
δ this proof gives an upper bound for the first occurrence of a k-term arithmetic progression
of:
k+9
δ −2
2
22
.
To calculate the best possible bound for the Green-Tao theorem from the given proof seems
a herculean task. Green and Tao have, however, in a brief note [6] estimated the constants
and errors in their original proof, obtaining the following (non-optimal) upper bound, where
c is some absolute constant:
100k
22
Nk < c2
22
22
.
Clearly, the gap between the lower and upper bounds is quite large. It has been conjectured
Q
that there exist k-term prime progressions with step equal to (the smallest possible) p≤k p
for all k, and so the asymptotic lower bound gives the correct order of Nk . This has been
verified up to k = 21 by computer.
Appendix A
Proof of the Decomposition Theorem
In this appendix we give a proof of Theorem 4.3 and how Theorem 4.1 follows from this.
The ideas here are those in [3], although we present the argument in a different way and
take advantage of the narrowness of our goal to make some simplifications.
First, we make the following definition.
Definition A.1 (Basic anti-uniform functions). Let ν : ZN → R be simply pseudorandom.
A function g : ZN → R is a basic anti-uniform function if g = Df for some 0 ≤ f ≤ ν.
The following two lemmas are the only parts where we use the fact that ν is simply
pseudorandom. They give us bounds on anti-uniformity which we shall use to construct the
anti-uniform ψ in Theorem 4.3.
Lemma A.1 (Basic anti-uniform functions are bounded). If 0 ≤ f ≤ ν for some simply
pseudorandom ν then kDf kL∞ = Ok (1).
k−1
Sketch proof. We in fact show that kDf k∞ ≤ 22 + o(1), by writing out the definition of
Df and applying the simple pseudorandomness condition.
Lemma A.2 (Basic anti-uniform products are anti-uniform). Let 0 ≤ f1 , . . . , fm ≤ ν for
some simply pseudorandom measure ν. Then
kDf1 · · · Dfm k∗U k−1 = Om (1).
Remark A.1. This is the part of the proof which requires the bulk of the simple pseudorandomness condition. If a way to avoid this lemma were found, then the Decomposition
theorem could be proven with only the hypothesis that kν − 1kU d is sufficiently small for
some large d.
Sketch proof. Recalling the definition of the dual norm, we need to show that for all f ∈ RN
such that kf kU k−1 ≤ 1, we have the bound
hf,
m
Y
Dfj i = Om (1).
j=1
39
40
After applications of the Cauchy-Schwarz and Hölder inequalities, together with the fact
that f ≤ ν, it is sufficient to show that
m
Y
= Om (1).
E E
ν(y + ω · h) : y ∈ ZN : h ∈ Zk−1
N
ω∈Ck−1
Using the simply pseudorandom condition and the triangle inequality, we may bound this
expression by E(τ m ) = Om (1) for some weight function τ , and we are done.
The idea now is to construct ψ, our anti-uniform approximation to φ+ , as follows. We
show that φ is a small linear combination of basic anti-uniform functions, and then use this
fact to construct a polynomial in φ which is anti-uniform and approximates φ+ . For the first
task, we define the following norm.
Definition A.2. We define the basic norm by
( n
)
n
X
X
kφkB := inf
|λi | : φ =
λi Dfi , 0 ≤ f1 , . . . , fn ≤ ν ,
i=1
i=1
and if kφkB ≤ η then we say that φ is η-basic.
Remark A.2. It is a simple exercise to verify that the expression above does in fact give
a norm on RN . In this definition, and for the rest of this proof, ν will be a fixed simply
pseudorandom function.
The k · kB norm measures how well we can approximate φ by a linear combination of basic
anti-uniform functions, which are well-behaved by the above lemmas. It is easy to deduce
the following analogues for Lemma A.1 and Lemma A.2.
Lemma A.3 (Basic functions are bounded). If φ is η-basic then kφk∞ = O(η).
Lemma A.4 (Basic powers are anti-uniform). If φ is η-basic then, for any integer m,
kφm k∗U k−1 = Om (η m ).
The following lemma constructs an anti-uniform approximation to φ, provided that it is
sufficiently basic.
Lemma A.5 (Approximation with an anti-uniform polynomial). There exists a polynomial
P such that if 1 > η > 0, then for every η-basic function φ,
1. kP φ − φ+ k∞ ≤
1
8
2. kP φk∗U k−1 ≤ A
for some A dependent only on η.
APPENDIX A. PROOF OF THE DECOMPOSITION THEOREM
41
Proof. By Lemma A.3 we know that for every η-basic φ, we have kφk∞ ≤ Cη < C for some
C independent of φ and η. Now choose some polynomial P such that |P (x) − x+ | ≤ 81 on
[−C, C], and hence kP φ − φ+ k∞ ≤ 18 . Note that P is independent of both φ and η.
Furthermore, by Lemma A.4, for any integer m and any η-basic φ, we have kφm k∗U k−1 =
Om (η m ) . If we denote the polynomial P by an xn + · · · + a0 then by the triangle inequality
kP φk∗U k−1 ≤ |an |kφn k∗U k−1 + · · · + |a0 |
= O(|an |η n + · · · + |a0 |)
and we denote this last quantity by A, noting that it is dependent only on η.
Finally, we show that condition (3) implies that φ is sufficiently basic to be able to
construct such an anti-uniform approximation.
Lemma A.6 (Lack of correlation with uniform functions implies basic). If hh, φi ≤ 1 for
k−1
every η-uniform 0 ≤ h ≤ ν then φ is η −2 -basic.
k−1
Sketch proof. Use the fact that kDhkB ≤ 1 and Lemma 3.1 to deduce that if khk∗B ≤ η 2
then h is η-uniform. The result follows using the fact that the dual of a dual norm is the
original norm.
Putting these together, we get the following precise form of Theorem 4.3.
Theorem A.1. If hh, φi ≤ 1 for every η-uniform function h then there exists a polynomial
P (x) such that kP φ − φ+ k ≤ 81 and kP φk∗U k−1 ≤ A for some A dependent only on η.
Proof. Combine Lemmas A.6 and A.5.
We can now complete our original strategy for proving Theorem 4.1, as shown below.
Proof of Theorem 4.1. Suppose no such decomposition exists. Then by the Hahn-Banach
theorem there exists a φ such that
1. hf, φi > 1,
2. hg, φi ≤ 1 for every g such that 0 ≤ g ≤ 2, and
3. hh, φi ≤ 1 for every η-uniform h.
Condition (2) implies that h1, φ+ i ≤ 21 , and by Theorem A.1, condition (3) implies there
exists a polynomial P such that kP φ − φ+ k∞ ≤ 18 and kP φk∗U k−1 ≤ A for some A dependent
only on η. Using this, we may obtain the bound
hν, φ+ i = h1, φ+ i + h1, P φ − φ+ i + hν − 1, P φi + hν, φ+ − P φi
1 1
1
≤
+ + Akν − 1kU k−1 + (1 + o(1))
2 8
8
3
=
+ o(1)
4
42
since, by Lemma 4.1, kν − 1kU k−1 = o(1). We have also used the fact that η, and hence A, is
fixed for the duration of this proof. Since f ≤ ν, and f and φ+ are both strictly positive, we
can deduce that hf, φ+ i ≤ hν, φ+ i. Appealing to condition (1) above, we have the following
inequalities:
3
1 < hf, φi ≤ hf, φ+ i ≤ hν, φ+ i ≤ + o(1).
4
Hence have a contradiction for N sufficiently large, and so the required decomposition must
exist.
Appendix B
Estimates for ΛR
In this appendix we outline the number theoretical arguments needed to show that the
function ν constructed in Chapter 5 is pseudorandom. The proofs given here are a synthesis
of those in Section 10 of [10] and Appendix D of [7].
Theorem B.1 (Linear Forms Estimate). P
Suppose we have m linear forms in t variables
x = (x1 , . . . , xt ), each of the form ψi (x) = tj=1 Lij xj + bi with integer coefficients bounded
1/4
by |Lij | ≤ w2
and with the t-tuples (Lij )tj=1 never identically zero and none a rational
multiple of another. Let B ⊂ Nt be a product of t intervals, each of length at least R10m .
Define the modified linear forms θi (x) = W ψi (x) + 1. Then
m
1
W
Om ( w
2
2
)
E(ΛR (θ1 (x)) · · · ΛR (θm (x)) : x ∈ B) = e
(1 + o(1))
.
φ(W ) log R
Theorem B.2 (Correlation Estimate). Suppose we have m linear forms of the form ψi (x) =
x + bi for distinct |bi | ≤ N 2 , and let θi and B be as above. Then
m Y
1
W
Om ( w
2
2
)
E(ΛR (θ1 (x)) · · · ΛR (θm (x)) : x ∈ B) = e
)(1 + o(1))
(1 + O(p−1/2 ))
φ(W ) log R
p|∆
where ∆ denotes the integer
∆=
Y
|hi − hj |.
1≤i<j≤m
Remark B.1. Note that the linear forms in the second theorem are not covered by the first,
since all the (Lij ) are identically {1}. We will combine the proof forQ
both below, since they
are identical except for the different evaluation of the Euler product p Fp below. These are
included in separate sections below.
Sketch proof. For both theorems we must estimate the same expectation, so let us denote
this by Eψ,R . Expanding out the definitions we get
!
!
m
m
Y
X Y
log bi
log ai
χ
E
1ai ,bi |θi (x) : x ∈ B
Eψ,R =
µ(ai )µ(bi )χ
log R
log R
i=1
i=1
a,b∈Nm
43
44
where we write a = (a1 , . . . , am ) and b = (b1 , . . . , bm ). Since χ(x) = 0 for x ≥ 1 we have
removed the restriction that a, b ≤ R. Furthermore, if we let D be the least common multiple
of a1 , . . . , bm , then we may replace the presence of B in the expectation above with ZtD , with
only (assuming λk in the definition of R is sufficiently small) a o(1) error, which can be
included in the right hand side.
We shall denote the expectation factor as ωa,b , and expand it as an Euler product using
the Chinese Remainder Theorem:1
!
m
Y
Y Y
Y
t
ωa,b := E
E
ωa,b (p).
1ai ,bi |θi (x) : x ∈ ZtD =
1
:
x
∈
Z
p|θj (x)
p =
i=1
p
j such that
p|aj or p|bj
p
Note that this is the only factor influenced by the choice of linear forms. Hence we can write
!
m
X Y
log ai
log bi
µ(ai )µ(bi )χ
Eψ,R =
ωa,b .
χ
log
R
log
R
m
i=1
a,b∈N
Let ψ be the inverse Fourier transform of ex χ(x), so that
Z
ψ(t)e−ix(1+t) dt.
χ(x) =
R
Since ex χ(x) is smooth with compact support, ψ is also smooth and decays rapidly – for
any A > 0, |ψ(t)| = OA ((1 + t)−A ).2 By restricting the range of integration to I :=
[− log1/2 R, log1/2 R], we see that (for any c ∈ R and A > 0)
Z
1+it
−1
log c
χ
= c− log R ψ(t)dt + OA (c log R log−A R).
log R
I
log c
1+it
To simplify notation, let t0 := log
.
Since
χ
= O(c−1/ log R ), the above gives us
R
log R
Z
Z Y
m
m
m
Y
Y
log bj
ψ(xj )ψ(yj )
log aj
−A
χ
= ···
dxj dyj + OA (log R (aj bj )−1/ log R ).
χ
0 y0
x
j
j
log R
log R
I
I j=1
a b
j=1
j=1
j
j
It can be shown3 that the error term contributes OA (logO(1)−A R) to Eψ,R , which is o(1) for
A large enough. Hence we have
Z
Z
m
m
X Y
Y µ(aj )µ(bj ) Y
ψ(xj )ψ(yj )dxj dyj + o(1).
Eψ,R = · · ·
ωa,b (p)
x0j yj0
I
I
m
aj b j
p
j=1
j=1
a,b∈N
1 In
the product we should properly have p|D, but this restriction can be dropped for the multiplicand is 1 otherwise.
Appendix C for details.
3 See [7], p. 69
2 See
APPENDIX B. ESTIMATES FOR ΛR
45
By unique factorisation and the presence of µ, we may factor the first term as an Euler
product:
m
m
X Y
Y µ(aj )µ(bj ) Y
X
Y µ(aj )µ(bj )
Y
:=
ωa,b (p)
=
ω
(p)
Ep .
a,b
0
0
0
0
xj yj
xj yj
m
m
aj bj
aj b j
p
p
j=1
j=1
a,b∈N
a,b∈{1,p}
and hence it remains to evaluate
Z Y Y
Z
m
Ep
ψ(xj )ψ(yj )dxj dyj + o(1).
Eψ,R = · · ·
I
I
p
j=1
Applying Lemma B.2, we get for some modified Euler factors Fp defined below
Z
Z Y Y
m (1 + ixj )(1 + iyj )
−m
Fp
Eψ,R = (1 + o(1)) log R · · ·
ψ(xj )ψ(yj ) dxj dyj
2
+
i(x
+
y
)
j
j
I
I p
j=1
Z Z
m
Y
(1 + ix)(1 + iy)
−m
Fp log R
ψ(x)ψ(y)dxdy
= (1 + o(1))
2 + i(x + y)
I I
p
Y
Fp independent of x0j , yj0 )
(providing that we obtain an estimate for
p
= (1 + o(1))
Y
= (1 + o(1))
Y
Fp log
−m
Z Z
R
R
p
R
m
(1 + ix)(1 + iy)
ψ(x)ψ(y)dxdy + o(1)
2 + i(x + y)
Fp log−m R.
p
We could replace the limits of integration by R at the cost of o(1) factors thanks to the rapid
convergence of ψ, and the final equalities are a consequence Q
of Lemma B.1. Only now does
the proof for our two theorems diverge, in the estimation of p Fp . I have divided these up
into the sections below. Simply plug in Corollaries B.1 and B.2 respectively to obtain the
two theorems.
Lemma B.1 (Sieve Factor Calculation).
Z Z Z ∞
(1 + ix)(1 + iy)
ψ(x)ψ(y) dxdy =
χ0 (t)dt = 1.
2
+
i(x
+
y)
0
R R
Proof sketch. For the first equality, evaluate the integral using the observation that
Z ∞
1
=
e−(1+ix)t e−(1+iy)t dt
2 + i(x + y)
0
to separate the variables, and recalling that ψ(x) is the inverse Fourier transform of ex χ(x).
The second equality follows from our original choice of χ.
46
It remains to give estimates for the Euler factors. Note that by explicitly considering all
the possible a, b ∈ {1, p}m we can rewrite the Euler factors Ep in the more convenient form
Ep =
X
ωa,b (p)
a,b∈{1,p}m
m
Y
µ(aj )µ(bj )
x0 y 0
j=1
=
aj j b j j
where
ωX (p) := E(
Y
X (−1)|I|+|J| ωI∪J (p)
I,J⊆[m]
p
P
j∈I
P
x0j + j∈J yj0
1p|θj (x) | x ∈ Ztp ).
j∈X
It will also be convenient to define the altered Euler factors
0
0
m
Y
(p1+xj − 1)(p1+yj − 1)
0
.
Ep :=
1+x0j +yj0
−
1)
p(p
j=1
p
and then, as promised, we define Fp := E
. These factors are much easier to evaluate than
Ep0
Ep directly, and the following lemma allows us to pass between them in the proof above.
Lemma B.2 (Euler Product Evaluation).
Y
p
Ep =
Y
p
m 1 + o(1) Y (1 + ixj )(1 + iyj )
Fp
.
logm R j=1
2 + i(xj + yj )
Proof sketch. This is a simple consequence of the definition of Ep0 and Lemma B.3 below.
1+ix
Note that since, for instance, xj ∈ [− log1/2 R, log1/2 R] and x0j := log Rj we have that 1 + x0j =
1 + o(1) and so the lemma is applicable.
Lemma B.3 (Zeta Function Estimate). When <(s) > 1 and s = 1 + o(1),
Y
1
1 − s = (1 + o(1))(s − 1).
p
p
B.1
Euler Product for independent linear forms
Q
All that remains is to provide a suitable estimate for p Fp . The strategy in this section
and the next is the same, and runs as follows. We first obtain bounds on the local factor
estimates ωX (p) for p ≥ w (it is here that the W -trick is applied). Secondly, we use these
bounds to estimateQEp in terms of Ep0 for each prime p ≥ w. Finally, we use thisQto estimate
Q
0
0
p Ep in terms of
p Ep , and since Fp = Ep /Ep , these give us an estimate for
p Fp .
Lemma B.4 (Local Factor Estimate). For p ≥ w
ω∅ (p) = 1
ωX (p) =
1
whenever |X| = 1
p
APPENDIX B. ESTIMATES FOR ΛR
ωX (p) ≤
47
1
whenever |X| ≥ 2
p2
Proof. When X = ∅ we are simply taking the expectation of the empty product, which is 1.
When |X| = 1 then, for some j,
ωX (p) = E(1p|θj (x) | x ∈
Ztp )
#{x ∈ Ztp : θj (x) ≡ 0
=
pt
(mod p)}
=
1
p
since θj : Ztp → Zp is a uniform covering. For the final claim, note that it suffices to prove it
for the case |X| = 2, so let us suppose X = {j, k}. Write the linear forms as
ψj (x) =
t
X
ai
i=1
t
X
ci
xi + lj and ψk (x) =
xi + lk .
bi
d
i=0 i
Suppose that the pure linear forms W (ψj − bj ) and W (ψk − bk ) are multiplies of each
other mod p, so that for some λ and every 1 ≤ i ≤ t we have
W
ai
ci
≡ λW
bi
di
(mod p).
Since p - W , we may rearrange this to give
a2 d 2
at d t
a1 d 1
≡
≡ ··· ≡
b1 c 1
b2 c 2
bt c t
(mod p)
and hence, for any 1 ≤ i ≤ t,
a1 d 1 b i c i ≡ b 1 c 1 ai d i
(mod p).
In other words,
p| |a1 d1 bi ci − b1 c1 ai di | ≤ |a1 d1 bi ci | + |b1 c1 ai di | < w ≤ p
1/4
where for the inequalities we have used the bounds |ai |, |bi |, |ci |, |di | < w2
. Hence we have
equality not only in Zp but also in Z, and so
ai
a1 d 1 c i
=
.
bi
b1 c 1 d i
This contradicts our hypothesis that the pure linear forms are not rational multiples of one
another. Hence the pure linear forms are also independent over Zp .
Let Z be the set of solutions to θj (x) ≡ θk (x) ≡ 0 (mod p). Since W (ψj − lj ) and
W (ψk − lk ) are not multiples of each other modulo p, it follows that Z is contained in the
intersection of two skew affine subspaces of Ztp , and hence has cardinality at most pt−2 . By
definition, ωX (p) = |Z|
, and we are done.
pt
48
Lemma B.5 (Euler Factor Estimate). For p ≥ w
1
Ep = 1 + Om
Ep0
2
p
Proof. We divide up the sum in the definition of Ep into the cases I = J = ∅, |I| ∪ |J| = 1
and |I| ∪ |J| ≥ 2 and apply Lemma B.8 to get
Ep :=
X (−1)|I|+|J| ωI∪J (p)
I,J⊆[m]
= ω∅ (p) −
p
P
j∈I
m X
1
0
p1+xj
j=1
= 1−
m
X
P
x0j + j∈J yj0
0
+
0
1
p xj + p yj − 1
p
j=1
1+x0j +yj0
0
p1+yj
−
1
0
0
p1+xj +yj
Om (1/p2 )
X
+
I,J⊆[m]
|I|∪|J|≥2
p
P
j∈I
x0j +
P
j∈J
yj0
!
+ Om (1/p2 ).
Hence we need to show that
0 0 xj
y
Pm
+p j −1
1 − j=1 p 1+x
0 +y 0
j
j
Ep
1
1
p
+ Om
= Q
= 1 + Om
,
1+y 0
1+x0
0
2
j −1)
j −1)(p
Ep
p
p2
m (p
j=1
1+x0 +y 0
j −1)
j
p(p
which follows from Taylor expansion.
Lemma B.6.
Y
1
1
1 + Om
= eOm ( w ) .
2
p
p≥w
Proof. We use the inequality |1 + a| ≤ ea to see that
“P
”
P
Y
1
1 + Om 1 ≤ eOm p≥w p2 ≤ eOm ( n≥w n12 ) .
p2 p≥w
Since x−2 is a decreasing function, we may also bound the sum above by an integral, to get
Z ∞
X 1
1
1
2
≤ +
dx = .
2
2
n
w
w
w x
n≥w
Combining these two inequalities gives us the result.
Lemma B.7 (Euler Product Simplification).
m
Y
Y
1
W
Om ( w
)
Ep = e
+ o(1)
Ep0
φ(W
)
p
p
APPENDIX B. ESTIMATES FOR ΛR
49
Proof. We divide the product into two parts, p < w and p ≥ w, and evaluate each separately.
Applying first Lemma B.5 and then Lemma B.6, we get
Y
Y
Y
Y
1
1
Om ( w
0
)
Ep =
1 + Om
E
=
e
Ep0 .
p
2
p
p≥w
p≥w
p≥w
p≥w
Q
Since Ep = 1 for p < w, so in particular
Noticing that (since W =
Q
p<w
W
φ(W )
p<w
Ep = 1, it remains to show that
m
+ o(1) =
Y
Ep0−1 .
p<w
p and φ is multiplicative)
Y p
Y p
W
=
=
,
φ(W ) p<w φ(p) p<w p − 1
it suffices in turn to show that for all p < w
m
p
+ o(1) = Ep0−1 .
p−1
After some algebraic manipulation, we see that
Ep0−1
Furthermore, since x0j :=
two cases. Hence we get
=p
1+ixj
log R
m
m
Y
0
0
p1+xj +yj − 1
0
0
(p1+xj − 1)(p1+yj − 1)
j=1
.
0
where xj ∈ R, pxj = 1 + o(1) and similarly for the other
m
Y
(1 + o(1))p − 1
((1 + o(1))p − 1)2
j=1
m
p
m
−m
= p (p − 1 + o(1)) =
+ o(1)
p−1
Ep0−1 = pm
which concludes the proof.
Corollary B.1 (Euler Product Estimate).
Y
p
Fp = e
1
Om ( w
)
(1 + o(1))
W
φ(W )
m
.
50
B.2
Euler product for simple linear forms
Lemma B.8 (Local Factor Estimate). For p ≥ w
ωX (p) =
1
whenever |X| ≥ 2 and p|∆,
p
ωX (p) = 0 whenever |X| ≥ 2 and p - ∆,
1
ωX (p) =
whenever |X| = 1, and
p
ω∅ (p) = 1.
Proof sketch. The final two claims proceed exactly as in the previous section. For the first
two claims, note that if |X| ≥ 2 then ωX (p) is equal to 1/p if all the residue classes hi
(mod p) are equal, and 0 otherwise.
Lemma B.9 (Euler Factor Estimate).
1
Ep = 1 + O
Ep0 whenever p - ∆, and
2
p
1
Ep = 1 + O √
Ep0 whenever p | ∆.
p
Proof. For the first claim, we apply Lemma B.8 and argue as in the proof of Lemma B.5.
For the second, note that if p|∆ then, using the definition of Ep and Lemma B.8,
Ep = 1 +
(−1)|I|+|J|
1 X
P
P
0
0
j∈I xj + j∈J yj
p
p
I,J⊆[m]
I∪J6=∅
|I|
|J|
X
X
1 1
(−1)
(−1)
P
P
= 1− +
0
0
x
j∈I j
j∈J yj
p p
p
p
I⊆[m]
J⊆[m]
m
1 1Y
1
1
= 1− +
1 − x0
1 − y0
p p j=1
pj
pj
1
= 1+O √ .
p
0
Similarly, one can show that Ep = 1 + O √1p , and we are done.
Lemma B.10 (Euler Product Simplification).
m
Y
Y
Y
1
W
Om ( w
)
Ep = e
+ o(1)
(1 + O(p−1/2 ))
Ep0 .
φ(W )
p
p
p|∆
APPENDIX B. ESTIMATES FOR ΛR
51
Proof. As in the proof of Lemma B.7, we know that
m
Y
Y
W
Ep =
+ o(1)
Ep0
φ(W
)
p<w
p<w
and also that
Y
1
Ep = eOm ( w )
p≥w
p-∆
Y
Ep0 .
p≥w
p-∆
An application of the above Euler factor estimate gives
Y
Y
Y
Ep =
(1 + O(p−1/2 ))
Ep0
p≥w
p|∆
p≥w
p|∆
p|∆
and combining these three products gives us the required result.
Corollary B.2 (Euler Product Estimate).
Y
Fp = e
1
Om ( w
)
(1 + o(1))
p
B.3
W
φ(W )
m Y
(1 + O(p−1/2 )).
p|∆
Pseudorandomness of ν
This section proves the linear and simple pseudorandomness conditions. The arguments in
this section are exactly those in [10], Section 9, and are included here for completeness.
Theorem B.3 (Weak Linear Pseudorandomness Condition). If we have m ≤ k · 2k−1 homogenous linear forms ψi in t ≤ 3k − 4 variables with rational coefficients bounded by k in
both numerator and denominator, and none equal to zero or a rational multiple of another,
then
E(ν(ψ1 (x)) . . . ν(ψm (x)) : x ∈ ZtN ) = 1 + m + o(1),
where |m | ≤ εk for some εk depending only on k.
Proof. First we clear denominators and assume that all the linear forms have integer coefficients, at the cost of increasing the bound
p w on the coefficients to (k +1)!. Taking w sufficiently
large, we can assume that (k + 1)! < 2 and so we can apply Theorem B.1 to these linear
forms.
We must first chop up the range of summation into boxes √
before we can apply Theorem B.1, to deal with the two-part definition of ν. Let Q = N , and divide Ztp into Qt
roughly equal sized boxes, Bu1 ,...,ut = Bu .
Bu = {x ∈ ZtN : xj ∈ [buj Qc, b(uj + 1)Qc)}
52
Call u ∈ ZtQ nice if every linear form takes the box Bu entirely inside or outside of the
interval [k N, 2k N ]. Note that by definition of Q and the upper bound on m, N/Q > R5m ,
so we may apply Theorem B.1 to obtain
1
E(ν(ψ1 (x)) · · · ν(ψm (x)) | x ∈ Bu1 ,...,ut ) = eOm ( w ) (1 + o(1))
) log R 2
since we can replace each factor by either 1 or φ(WW
Λ̃R (θi (x)).
So the nice boxes have already been dealt with, and give us the answer we’re looking for.
Next we show that most boxes are nice – more precisely, that the proportion of non-nice
boxes is at most O(1/Q).
Suppose u is not nice; then there exists some linear form
Ptψ and x, y ∈ Bu such that
ψ(x) ∈ [k N, 2k N ] but ψ(y) 6∈ [k N, 2k N ]. Suppose ψ(x) = j=1 Lj xj + b. Then
ψ(x), ψ(y) =
t
X
Lj bQuj c + b + O(Q).
j=1
Hence
Either k N or 2k N =
t
X
Lj bQuj c + b + O(Q),
j=1
and so
t
X
j=1
Lj uj = k Q +
b
+ O(1)(modQ).
Q
But since (Lj ) is non-zero, the number of t-tuples u which satisfy this is at most O(Qt−1 ), and
hence the proportion of non-nice boxes with respect to ψ is O(1/Q). But since the number
of linear forms is bounded also, the total proportion of non-nice boxes is also O(1/Q).
) log R 2
When u is not nice, we can bound ν by the trivial bound 1 + φ(WW
Λ̃R (θi (x)). Multiplying out and applying Theorem B.1 again, we get
1
E(ν(ψ1 (x)) · · · ν(ψm (x)) | x ∈ Bu ) = eOm ( w ) (O(1) + o(1)).
Putting it all together
LHS = E(E(ν(ψ1 (x)) · · · ν(ψm (x)) | x ∈ Bu ) | u ∈ ZtQ ) + o(1)
1
= eOm ( w ) (1 + O(1/Q) + o(1)) = 1 + + o(1)
m
where m can be taken sufficiently small by taking w sufficiently large.
Theorem B.4 (Weak Simple Pseudorandomness Condition). Whenever we have m ≤ 2k−1
simple linear forms ψi in t ≤ k variables, then
E(ν(ψ1 (x)) · · · ν(ψm (x)) = 1 + m + o(1).
APPENDIX B. ESTIMATES FOR ΛR
53
Furthermore, there exists a weight function τm : ZN → R+ such that E(τ q ) = Om,q (1) for all
1 ≤ q < ∞ and for all h1 , . . . , hm ∈ ZN we have the upper bound
X
Ex∈ZN (ν(x + h1 ) · · · ν(x + hm )) ≤ (1 + m )
τ (hi − hj ).
1≤i<j≤m
In both cases, |m | ≤ εk for some εk sufficiently small depending on k.
Proof. The first part is proven exactly as for the previous theorem. For the second, we
construct our weight function τ in the next lemma, with the additional requirement that
τ (0) := exp(Cm log N/ log log N ).
Note this preserves the bounds E(τ q ) = Om,q (1) for all q, since the weight at 0 contributes
at most om,q (1).
First suppose at least two of the hi are equal. We may bound the left hand side crudely
by kνkm
∞ . Standard estimates give us
kνk∞ exp(C log N/ log log N )
and so
X
kνkm
∞ τ (0) ≤
τ (hi − hj ),
1≤i<j≤m
which is the required bound.
Now suppose that all hi are distinct. By Theorem B.2,
1
E(ν(x + h1 ) · · · ν(x + hm ) : x ∈ ZN ) = eOm ( w ) (1 + om (1))
Y
(1 + O(p−1/2 ))
p|∆
≤ (1 + m )(1 + om (1))
X
τ (hi − hj )
1≤i<j≤m
and by choosing w sufficiently we may ensure that m is as small as required. Furthermore,
by adjusting the function τ by a constant factor depending only on m (and hence k), we can
absorb the om (1) error into the sum, which gives the required result.
Lemma B.11 (Construction of the weight function). For any m ≥ 1 there is a weight
function τm : Z → R+ such that for all distinct h1 , . . . , hm we have
Y
X
1
≤
τ (hi − hj ),
1 + Om √
p
1≤i<j≤m
p|∆
where
∆ :=
Y
|hi − hj |.
1≤i<j≤m
Furthermore, for any 0 < q < ∞,
E(τ q (n) : 0 < |n| ≤ N ) = Om,q (1).
54
Q
Proof. We take τm (n) := Om (1) p|n (1 + √1p )Om (1) for all n 6= 0. We note that by the
arithmetic mean-geometric mean inequality,
Om (1)
Y
Y
Y
1
1
1 + Om √
≤
1+ √
p
p
1≤i<j≤m
p|∆
p|hi −hj
Om (1)
X
Y 1
≤ Om (1)
1+ √
p
1≤i<j≤m
p|hi −hj
X
=
τ (hi − hj ).
1≤i<j≤m
Hence it remains to show that
Om (q)
Y
1
: 0 < |n| ≤ N = Om,q (1)
E
1+ √
p
p|n
Om (q)
1
for all 0 < q < ∞. Since 1 + √1p
is bounded by 1 + p1/4
for all but Om,q (1) many
primes p, we have
Om (q)
Y
Y
1
1
E
: 0 < |n| ≤ N ≤ Om,q (1)E
1+ √
1 + 1/4 : 0 < n ≤ N .
p
p
p|n
p|n
We now use the fact that
E
Y
p|n
Q
1
1+ √
p
p|n
1+
Om (q)
1
p1/4
≤
1
d|n d1/4
P
to get that
: 0 < |n| ≤ N ≤ Om,q (1)
1
2N
X X 1
d1/4
1≤|n|≤N d|n
N
1 X N
≤ Om,q (1)
2N d=1 d5/4
= Om,q (1)
and we are done.
Appendix C
Fourier transform
In this appendix we prove a standard fact about rapid decay of the Fourier transform required
in the proof of Theorems B.1 and B.2. The proof here is taken from [25].
Lemma C.1. If f is a bounded function with compact support, then fˆ is also bounded. In
fact, we have
kfˆk∞ ≤ kf k1
Proof. For any t ∈ R,
Z
Z
−ixt
ˆ
|f (t)| := f (x)e dx ≤ |f (x)|dx =: kf k1 < ∞.
Theorem C.1. Suppose f is C N with compact support and f (n) ∈ L1 for all 0 ≤ n ≤ N .
Then
(n) (t) = (it)n fˆ(t)
fd
when 0 ≤ n ≤ N and furthermore
|fˆ(t)| = O((1 + |t|)−N )
Proof. We use induction on N . For N = 1, by integration by parts we have
Z
Z
0
−ixt
0
b
f (t) = f (x)e dx = it e−ixt f (x)dx = itfˆ(ξ)
The inductive step follows easily.
(n) is bounded, and hence the first part of the theorem
It follows from Lemma C.1 that fd
implies that tn fˆ is bounded if n ≤ N , say |tn fˆ| ≤ D. Note that we can take this bound
to be uniform over all n, since there are only finitely many n to be considered. From the
binomial theorem it follows that for some constant C
C(1 + |t|)N ≤
N
X
n=0
55
|tn |
56
And hence
|fˆ|
N
X
|tn | ≤ DN
n=0
so
DN
DN
|fˆ| ≤ PN
≤
= O((1 + |t|)−N )
N
n
C(1
+
|t|)
n=0 |t |
Corollary C.1. If f is smooth with compact support then fˆ(t) = OA ((1 + |t|)−A ) for any
A > 0.
Appendix D
The GI and MN Conjectures
In this appendix we give a formal statement of the conjectures mentioned in Chapter Six.
These statements are taken from [7], which contains an in-depth discussion of these conjectures and the results surrounding them.
Definition D.1 (Nilpotent). Let G be connected, simply connected, Lie group with central
series G0 ⊇ G1 ⊇ G2 ⊇ . . . (that is, G0 = G1 = G and Gi+1 = [G, Gi ] for i ≥ 2). We say
that G is s-step nilpotent if Gs+1 = 1.
Definition D.2 (Nilmanifold). Let G be an s-step nilpotent group, and Γ ⊆ G a discrete,
cocompact subgroup. Then the quotient G/Γ is an s-step nilmanifold.
Definition D.3 (Nilsequence). An s-step nilsequence is a sequence of the form (F (g n x))n∈N
where g ∈ G, x ∈ G/Γ and F : G/Γ → R is a continuous function for some s-step nilmanifold
G/Γ.
Conjecture D.1 (Inverse Gowers norm conjecture for s). Suppose that 0 < δ ≤ 1. Then
there exists a finite collection Ms,δ of s-step nilmanifolds with the following property.
Given any N and 1-bounded function f on [N ] such that
kf kU s+1 [N ] ≥ δ
there is a nilmanifold in Ms,δ and a 1-bounded s-step nilsequence (F (g n x)) on it with a
bounded Lipschitz constant (i.e. a bound dependent only on s and delta, not N ) such that
|E[N ] f (n)F (g n x)| s,δ 1
Let us see what this gives us in the case s = 1, i.e. for the Gowers U 2 norm. A group is
1-step nilpotent if and only if it is Abelian. In this case, one can in fact take G = R and
Γ = Z and M1,δ is just the singleton set {R/Z} independent of δ. This case of the conjecture
is easy to prove: if f has a large U 2 norm then it is easy to show that it correlates with a
2πin
linear character, i.e. a function of the form e N , and this is a 1-step nilsequence on R/Z
2πi
taking F to be the identity, x = 1 and g = e N .
57
58
Conjecture D.2 (Mobius Nilsequence Conjecture). Let G/Γ be an s-step nilmanifold and
(F (g n x)) a bounded s-step nilsequence. Then for any A > 0
|E[N ] µ(n)F (g n x)| log−A N
where the implicit bound is dependent on A, s, the nilmanifold and the Lipschitz constant of
the nilsequence (but not, importantly, on the nilsequence itself, not on g or x).
Bibliography
[1] P. Erdős and P. Turán, On some sequences of integers, J. London Math. Soc. 11 (1936),
261–264.
[2] D. Goldston and C. Y. Yıldırım, Small gaps between primes, I, preprint available at
arXiv:0504336.
[3] W. T. Gowers, Decompositions, Approximate Structure, Transference, and the HahnBanach theorem, preprint available at arXiv:0811.3103.
[4]
, A new proof of Szemerédi’s theorem, GAFA 11 (2001), 465–588.
[5] Ben Green, Long arithmetic progressions of primes, Analytic Number Theory: a tribute
to Gauss and Dirichlet (Tschnikel Duke, ed.), Clay Mathematics Proceedings, 2007,
pp. 149–168.
[6] Ben Green and Terence Tao, A bound for progressions of length k in the primes, available
at http://www.math.ucla.edu/ tao/preprints/Expository/quantitative AP.dvi.
[7]
, Linear equations in primes, to appear in Annals of Math., preprint available at
arXiv:0606088.
[8]
, The Möbius function is strongly orthogonal to nilsequences, preprint available
at arXiv:0807.1736.
[9]
, An inverse theorem for the Gowers U 3 (G)-norm, with applications, Proc. Edinburgh Math. Soc. 51 (2008), no. 1, 71–153.
[10]
, The primes contain arbitrarily long arithmetic progressions, Annals of Math.
167 (2008), 481–547.
[11]
, Quadratic uniformity of the Möbius function, Annales de l’Institut Fourier
(Grenoble) 58 (2008), no. 6, 1863–1935.
[12] Ben Green, Terence Tao, and Tamar Ziegler, An inverse theorem for the Gowers U 4 norm, submitted to the Glasg. Math. J., preprint available at arXiv:0911.5681.
[13]
, An inverse theorem for the Gowers U s+1 [N ]-norm, preprint available at
arXiv:1009.3998.
59
60
[14] G. H. Hardy and J. E. Littlewood, Some problems of “partitio numerorum” III: On the
expression of a number as a sum of primes, Acta Math. 44 (1923), 1–70.
[15] D. R. Heath-Brown, Three primes and an almost-prime in arithmetic progression, J.
London Math. Soc. 23 (1981), 396–414.
[16] Thài Hoàng Lê, Green-Tao theorem in function fields, preprint available at
arXiv:0908.2642.
[17] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan, New proofs
of the Green-Tao-Ziegler dense model theorem: An exposition, preprint available at
arXiv:0806.0381.
[18] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 242–252.
[19] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression,
Acta Arith. (1975), 299–345.
[20] Terence Tao, A remark on Goldston-Yıldırım correlation estimates, available at
http://www.math.ucla.edu/ tao/preprints/Expository/gy-corr.dvi.
[21]
, What is good mathematics?, Bull. Amer. Math. Soc. 44 (2007), 623–634.
[22] Terence Tao and Tamar Ziegler, The primes contain arbitrarily long polynomial progressions, Acta Math. 201 (2008), 213–305.
[23] J. G. van der Corput, Über Summen von Primzahlen und Primzahlquadraten, Math.
Ann. 116 (1939), 1–50.
[24] P. Varnavides, On certain sets of positive density, J. London Math. Soc. 34 (1959),
358–360.
[25] Thomas Wolff,
Lectures in harmonic
http://www.math.ubc.ca/ ilaba/wolff/.
analysis,
available
online
at
© Copyright 2025 Paperzz