SIEVES
1. Introduction
Let (an ) be a sequence of positive real numbers. In many cases, this will be the
characterisitic function of a set, such as numbers of the form n2 + 1. We wish to
sift the sequence by a set of primes S. This will usually be all the primes, but
occasionally might be, say, only those primes which are 1 mod 4. We let Sz be the
primes in S that are less than z, and Tz their product.
One goal of sieve theory is to estimate
X
A(x, Sz ) =
an .
n≤x
gcd(n,Tz )=1
√
In particular,
P if S contains all the primes and z > x, then the above sum is
ap over the primes. Of course, this is difficult to obtain in practise.
essentially
p≤x
The typical inputs to the sieve are asymptotic estimates for (an ) in residue
classes. That is, we need information on the sums
X
Ad (x) =
an .
n≤x
n≡0(d)
The d-uniformity in such estimates plays a large rôle in determining the strength of
our theorems. We write A(x) = A1 (x), and usually assume that Ad (x) ∼ g(d)A(x)
for some multiplicative function g(d) for some range of d. Another way of considering this is to write
Ad (x) = g(d)A(x) + rd (x)
and to assume some sort of bound on the error/remainder terms rd .
The following upper-bound is often called the “fundamental lemma of the sieve”:
Y
A(x, Sz ) ≪ A(x)
(1 − g(p))
for z some small power of x.
p∈Sz
√
Typically we are able to take z about of size x, at least when A(x) is linear. Of
course we need more assumptions about the uniformity of (an ) to be able to prove
such a bound.
1.1. Acknowledgments. Much of this is based upon notes by H. Iwaniec. Notes
of D. Charles, B. Green, and A. Kumchev were also used, and I thank H. A. Helfgott
for useful conversations.
2. Basic tools
2.1. Inclusion-exclusion and Möbius inversion. The famous Möbius inversion
formula is based on the simple fact that
(
X
1 for n = 1,
µ(d) = δ(n) =
0 otherwise.
d|n
1
2
SIEVES
This can easily be shown by first noting that we can replace n by its squarefree
kernel n̂; letting k be the number of distinct prime factors of n̂, there are kl divisors
of n̂ with exactly l prime factors, and so the above sum is
k
X
l k
,
(−1)
l
l=0
so that the result follows from the binomial theorem. An alternative
Q proof can be
derived from the expanding the reciprocal Euler products ζ(s) = p (1 − 1/ps )−1
P
Q
and 1/ζ(s) = p (1 − 1/ps ) = n µ(n)/ns as sums and multiplying them together.
Dedekind’s version of Möbius inversion is that
XX
X
X
µ(d)g(n/d) =
µ(d)g(e),
f (d) ⇐⇒ f (n) =
g(n) =
de=n
d|n
d|n
while Möbius originally phrased it more in terms of
X
X
G(x) =
F (x/n) ⇐⇒ F (x) =
µ(m)G(x/m).
n≤x
m≤x
A third form of this idea is that
X
X
(1)
F (d) =
G(de) ⇐⇒ G(m) =
µ(n)F (mn).
e
n
This can be proven by writing d = nm on the left, multiplying by µ(n) and summing
over n, writing f = en, swapping the order of summation, and then applying the
above fact:
X
X
X
µ(n)F (nm) =
µ(n)
G(emn)
n
n
=
X
n
e
µ(n)
X
G(f m) =
X
G(f m)
f
f ≡0(n)
X
µ(n) = G(m).
n|f
The first form of Möbius inversion can be formalised via the operation of Dirichlet
convolution of arithmetic functions. Given two such functions f and g, we define
their convolution as
XX
(f ⋆ g)(n) =
f (a)g(b).
ab=n
Note that δ acts as an identity in this algebra, as δ ⋆ f = f for all f . We see that
the first fact of above can be re-written as µ ⋆ 1 = δ, so that µ is inverse to the
constant function 1. The Möbius inversion formula is equivalent to
g = 1 ⋆ f ⇐⇒ f = µ ⋆ g,
which follows from applying µ to each side of the left and using µ ⋆ 1 ⋆ f = δ ⋆ f = f .
The logarithm function L(n) = log n has an interesting place in this algebra. In
particular, we have the additive relation log(ab) = log(a) + log(b), and so we get
L · (f ⋆ g) = (L · f ) ⋆ g + f ⋆ (L · g), implying that multiplication by L is a derivation.
We define the von Mangoldt function as Λ = µ ⋆ L, so that
XX
X
n
µ(d) log .
Λ(n) =
µ(a) log b =
d
ab=n
d|n
SIEVES
3
We can then use the fact that log nd = log n − log d to get
X
X
µ(d),
µ(d) log d + log n
Λ(n) = −
d|n
d|n
and the second term is δ(n) log n and thus vanishes. Via inversion we get that
X
Λ(d)
log n =
d|n
and from this it follows that
k
Λ(p ) =
(
log p
0
n = pl , l > 0
otherwise.
2.2. Notation, partial summation, and Cauchy’s inequality. We use τ (n) to
denote the number of prime divisors of n; these are also given by the Dirichlet series
coefficients for ζ(s)2 . In general, we let τk (n) be the Dirichlet series coefficients
for ζ(s)k , and this counts the number of distinct ways writing n as a product of k
positive factors. We let ω(n) be the number of prime divisors of n.
We use the notation f = O(g) to denote that f /g is bounded in some region of
interest, and f ≪ g means the same. For sums over dyadic intervals, we denote
the sum over x < n ≤ 2x as n ∼ x. The floor of x, that is, the greatest integer less
than x, is denoted by ⌊x⌋, and the fractional part is given by {x} = x − ⌊x⌋.
We recall the partial summation; this is easily proved by integration by parts
using Riemann-Stieltjes integrals. Letting (an ) be a sequence and F (x) a differentiable function, we have that
Z x X X
X
an F (n) = F (x)
an F ′ (t) dt.
an −
n≤x
n≤x
1
n≤t
We also recall Cauchy’s inequality, which states that if (bn ) and (cn ) are sequences of complex numbers, then
2 X
X
X
2
2
b n cn ≤
|bn |
|cn | .
n
n
n
2.3. Consequences from prime number theory. We define slightly atypical
prime-counting functions, that will be used in our proof of the Bombieri-Vinogradov
theorem, and then later for small gaps between primes. Rather than the usual
ψ-functions that count prime powers, we define
X
X
X
ψ̂(x) =
log p and ψ̂(x, χ) =
log p and ψ̂(x; q, a) =
χ(p) log p,
p∼x
p∼x
p≡a (mod q)
p∼x
which count only primes and count them in dyadic intervals. The prime number
theorem states, in particular, that there is some absolute constant ĉ > 0 such that
x
ψ(x) = x + O 3ĉ√log x
e
√
and for nonprincipal Dirichlet characters χ modulo q with q ≤ eĉ
x
ψ̂(x, χ) ≪
√
e2ĉ log x
log x
we have
4
SIEVES
though here we need to be careful about possible Siegel zeros if we want this latter
statement to be effective.
In particular, we will say that a character is S-Siegel if its L-function has a real
1/3
zero β ≥ 1 − log
S . It is then a standard theorem that there is at most one such
√
log x
primitive character with q ≤ S. For all other characters with q ≤ eĉ
the effective result that
√
log x
2
+ xe−2ĉ log x
ψ̂(x, χ) ≪ (log x) · x exp −
(2)
3 log S
we obtain
√
Taking S = eĉ log x , when 2ĉ ≤ 1/3ĉ we recover the same bound as above (obviously
we can make ĉ sufficiently small to ensure that this inequality is true). Since the
√
class number formula implies that we have the bound β ≤ 1 − 1/ q for zeros of real
L-functions of modulus q, we see that a modulus q ≤ (3 log S)2 cannot be S-Siegel.
Finally, we recall the theorem of Mertens, which states
−1
Y
1
1−
∼ eγ log x.
p
p≤x
2.4. Bounds on multiplicative functions. Let f be a multiplicative, nonnegative function supported on the squarefree numbers. We assume that
X
(3)
f (p) log p = κ log t + O(1)
p≤t
for some κ > 0. We shall call this the regularity hypothesis (which could be made
weaker, with similar effect on the results), and by partial summation it implies that
X
f (p) = κ log log t + O(1).
p≤t
We also need the mild assumption
X
(4)
f (p)2 log p < ∞,
p
This boundedness implies
X
X
Y
f (d) ≤
1 + f (p) ≤ exp
f (p) ≪ (log t)κ .
d≤t
p≤t
p≤t
We can now use the theory of multiplicative functions from Wirsing to get that
(f being nonnegative and supported on squarefree integers):
Theorem 2.1. With the above assumptions, we have that
Y 1 κ
X
1
κ
κ−1
1−
1+f (p) .
where cf =
f (d) = cf (log t) +O (log t)
Γ(κ + 1) p
p
d≤t
A similar theorem can be shown when restricting the divisors of d to a given set
of primes, but we do not do this here (we could just modify f ).
P
f (m). First, following Tchebyshev, we have that
Proof. We write Mf (x) =
m≤x
MfL (x) =
X
m≤x
f (m) log m =
X X
f (np) log p,
p≤x n≤x/p
SIEVES
5
P
as f (m) appears with multiplicity
log p = log m in the right-hand expression.
p|m
P
f (p) log p = κ log t + E(t), this then simplifies to
Writing
p≤t
MfL (x) =
X
f (n)
X
f (n)
X
i X
h
X
x
f (p) log p
f (n) κ log + E(x/n) −
f (n)
n
n≡0(p)
n≤x
=
n≤x
=κ
f (p) log p
X
f (p) log p −
p≤x/n
gcd(p,n)=1
n≤x
=
X
X
f (n)
n≤x
p≤x/n
X
f (p) log p
p|n
p≤x/n
p≤x
X
n≤x
x
f (n) log +
n
X
n≤x
f (n)E(x/n) −
= κMf (x) log x − κMfL (x) +
X
n≤x
n≤x/p
X
f (p) log p
p≤x
f (n)E(x/n) −
X
f (mp)
mp≤x/p
XX
f (p)f (mp) log p.
mp2 ≤x
We move κMf (x) log x and κMfL (x) to the left side to get
(κ + 1)MfL (x) − κMf (x) log x = Nf (x),
(5)
where, since E(t) = O(1) and using the boundedness hypothesis (4), we get
X
XX
f (p)f (mp) log p
Nf (x) =
f (n)E(x/n) −
mp2 ≤x
n≤x
≪ Mf (x) +
X
m≤x
f (m)
X
p≤x
f (p)2 log p ≪ Mf (x) ≪ (log x)κ .
The idea is now that we have some sort of a functional equation for Mf (x), and
the smallness of Nf (x) will force the desired asymptotic for Mf (x).
To this end, via partial summation we note that
Z x
du
Mf (u) ,
MfL (x) = Mf (x) log x −
u
1
so that (5) becomes
(6)
Mf (x) log x − (κ + 1)
Z
1
x
Mf (u)
du
= Nf (x).
u
It is convenient to move the starting point of the integral from 1 to 2, and this
introduces an extra factor of (κ + 1) log 2, which we add to Nf (x), the result being
denoted Nf⋆ (x). We are now faced with a standard problem; if the integral involving
Mf (u) involved Nf⋆ (u) instead, we would have a nice expression for Mf (x).
The main idea here is to multiply both sides of (6) by some factor K(x) and
then integrate from 2 to y. The factor will be chosen so that we get cancellation
on the left side upon swapping the order of integration. We then substitute the
resulting expression for the integral back into (6). We get
Z x
Z y
Z y
Z y
du
Mf (u) dx =
K(x)
K(x)Mf (x) log x dx − (κ + 1)
K(x)Nf⋆ (x) dx,
u
2
2
2
2
6
SIEVES
and so by reversing the order of integration we get
Z y
Z y
Z y Z y
du
K(x) dx Mf (u)
K(x)Nf⋆ (x) dx =
K(x)Mf (x) log x dx − (κ + 1)
u
u
2
2
2
Z y
Z y
Z y
du
du
− (κ + 1)
=
K(x)Mf (x) log x dx + (κ + 1)
K̃(y)Mf (u) ,
K̃(u)Mf (u)
u
u
2
2
2
where K̃ is an antiderivative of K. The third term is exactly the sort of thing we
want, so as to be able to replace the integral in (6). The first and second terms
cancel when we choose K(x) log x = −(κ + 1)K̃(x)/x, so that we have
Z
dx
log K̃(x) = −(κ + 1)
= −(κ + 1) log log x + C,
x log x
and so K̃(x) = eC /(log x)κ+1 , where we can choose the value of C. So we can take
K(x) = 1/x(log x)κ+2 , and replace the integral in (6) over Mf (u) with one over
Nf⋆ (u) to get
Z x
1
Mf (x) log x = −
K(u)Nf⋆ (u) du + Nf⋆ (x)
K̃(x) 2
Z x
Nf⋆ (u) du
κ+1
+ Nf⋆ (x).
= (κ + 1)(log x)
κ+2 u
2 (log u)
The bound on Nf⋆ (x) implies that the integral is cf +O(1/ log x), and so by dividing
through by log x we get the desired asymptotic.
We are
P left to compute cf . Again the technique is well-known, as we consider
ζf (s) = m f (m)/ms as s → 0 and compare it to ζ(s + 1)κ . Writing the ζf (s) sum
as an integral and using integration by parts and then substituting x = et we get
Z ∞
Z ∞
Z ∞
dMf (x)
−s
=
−
M
(x)
dx
=
−
Mf (et ) de−st
ζf (s) =
f
xs
1
0
1
Z ∞
Z ∞
1 s u κ
κ −st
cf + O
=−
cf + O
t de
=
e−u du
t
u
s
0
0
s
cf
as s → 0 as we substituted u = st.
= κ Γ(κ + 1) + O κ
s
s
Of course, we have ζ(s + 1)κ ∼ 1/sκ as s → 0, and the product over primes
κ Y
1
f (p)
ζf (s)
1
−
1
+
=
ζ(s + 1)κ
ps+1
ps
p
has a limit as s → 0 by the regularity hypothesis. This gives the result.
Here is another useful fact, generalised from the harmonic series.
P
f (p) log p ≪ t. Then
Lemma 2.1. Suppose that
p≤t
(7)
X
n≤x
f (n) ≪
x X f (n)
.
log x
n
n≤x
Proof. As above, we have the Tchebyshev relation, and get
X
XX
X
X
X
x
f (m) log m =
f (np) log p ≤
f (p) log p ≪
f (n)
f (n) .
n
x
x
m≤x
n≤x p≤ n
n≤x
p≤ n
n≤x
SIEVES
7
This is a relation between the logarithmic weighting and the reciprocal weighting.
By partial summation we get
Z x X
X
1 X
dt
f (m) = f (1) +
f (m) log m +
f (m) log m
log x
t(log t)2
2
m≤x
m≤x
m≤t
Z x X
x X f (n)
dt
f (m)
≪
+
log x
n
m
(log t)2
2
n≤x
m≤t
Z x
X
x X f (n)
x X f (n)
dt
f (m)
≪
≪
+
.
2
log x
n
m
log x
n
2 (log t)
n≤x
n≤x
m≤x
3. Sieves: phrasing of sieve problems
3.1. A typical setup. We wish to estimate
X
A(x, Sz ) =
an .
n≤x
gcd(n,Tz )=1
Using the Möbius relation, we have that
X
X
an =
A(x, Sz ) =
an
n≤x
n≤x
gcd(n,Tz )=1
X
µ(d) =
d| gcd(n,Tz )
X
d|Tz
µ(d)
X
an .
d|n
n≤x
The inner sum is exactly the sort of congruential sum that, as input to our sieve
mechanism, we expect to be able to estimate asymptotically. Denoting it by Ad (x),
we get that
X
µ(d)Ad (x).
A(x, Sz ) =
d|Tz
We write
Ad (x) = g(d)A(x) + rd (x),
where g(d) is a multiplicative function; the idea of this decomposition is to have
the error/remainder term rd (x) as small as possible. With this decomposition, we
get that
X
X
µ(d)rd (x) = M (Sz )A(x) + R(x, Sz )
µ(d)g(d)A(x) +
A(x, Sz ) =
d|Tz
d|Tz
where
M (Sz ) =
X
d|Tz
µ(d)g(d) =
Y
p|Tz
X
µ(d)rd (x)
1 − g(p) and R(x, Sz ) =
d|Tz
and so we want g to be sufficiently nice to be able to make estimates here. The
bulk of the problem is in getting good estimates for the error term R(x, Sz ).
3.2. Twin primes. Let (an ) be the characteristic function of numbers of √
the
form m(m + 2). When m ≤ x, if m(m + 2) has no prime factors less than x,
then both m and m + 2 are prime. We wish to sift
√ the sequence by the residue
classes 0, −2 (mod p), ideally for all primes up to x. For each prime p > 2, we
should have that g(p) = 2/p, and it is not too difficult to
√ estimate the remainder
terms. Unfortunately, we are unable to take z as large as x; we can take it almost
this large, and this will give an upper bound.
8
SIEVES
3.3. Small and large sieves. The main difference between small and large sieves
is the average size of g(p). In fact, pg(p) measures the number of residue classes
mod p that are sieved out by our process. If the average of pg(p) exists, it is often
denoted by κ, and called the sieve dimension. The case where κ = 1 is extremely
important, and is called the linear sieve. Also important is the half-linear sieve
when κ = 1/2, and κ = 2 comes up as above in the twin prime problem. The
large sieve typically refers to the case where g(p) is itself not small on average. For
instance, sifting by the quadratic non-residues mod p eliminates (p − 1)/2 residue
classes for each p.
3.4. An example. We consider the characteristic function of the integers an = 1
for all n. We wish to estimate
X
1,
Φ(x, Sz ) =
n≤x
gcd(n,Tz )=1
particularly in the case that Tz is the product of the primes up to z. As we
eliminate the 0 mod p class for each prime, it is clear that g(p) = 1/p best models
the situation. To estimate the remainder term rd (x) we first note that
X
X X
X X
Φ(⌊x/e⌋, Tz ),
1=
1=
⌊x⌋ =
e|Tz
n≤x
gcd(n,Tz )=e
e|Tz
m≤x/e
gcd(m,Tz )=1
e|Tz
and so by Möbius inversion we have
Φ(⌊x⌋, Tz ) =
X
µ(d)⌊x/d⌋.
d|Tz
Next we use ⌊x/d⌋ = (x/d) − {x/d} to get that
Y
X
X
(1 − 1/p) + R(x, Sz ),
µ(d){x/d} = x
µ(d)(x/d) −
Φ(x, Sz ) =
p|Tz
d|Tz
d|Tz
#Sz
while the main term is given
where as a trivial estimate we have |R(x, Sz )| ≤ 2
by xφ(Tz )/Tz . This gives an asymptotic as long as Tz does not have too many
prime factors. In particular, it fails when #Sz > log x.
In general we cannot do much better than this, but in this specific example we
can use a trick (in its general form attributed to Rankin) to estimate {x/d} more
sharply. In particular, this is the case when d > x, which should be true for many
factors of Tz . The inequality that we use is
{x/d} ≤ min(1, x/d) ≤ (x/d)α ,
where α can be chosen with 0 ≤ α ≤ 1. This gives us that
Y
X
X
(1 + 1/pα ).
(x/d)α = xα
µ(d){x/d} ≤
|R(x, Sz )| = d|Tz
We now insert
φ(Tz )
Tz
=
Q
p∈Sz
d|Tz
p|Tz
(1 − 1/p) into the picture, so as to be able to derive
asymptotics more easily, and then note that the multiplicand is bigger than 1 but
decreases with p, and so we can replace the product over primes dividing Tz by the
product of primes up to z to get
φ(Tz ) Y
φ(Tz ) Y
|R(x, Sz )| ≤ xα
(1+1/pα )(1−1/p)−1 ≤ xα
(1+1/pα )(1−1/p)−1 .
Tz
Tz
p|Tz
p≤z
SIEVES
9
Choosing α so that z 1−α = C we have C/p ≥ 1/pα for all p ≤ z, and so
|R(x, Tz )| ≤ x
φ(Tz ) − log x/ log z Y
C
(1 + C/p)(1 − 1/p)−1 ,
Tz
p≤z
where the product is bounded by OC (log z)C+1 by the formula of Mertens. By
z)
choosing log x/ log z = B/ log log x, we thus get that Φ(x, Tz ) ∼ φ(T
Tz x provided
that #Sz ≤ π(z) ≤ z = x1/B log log x and C + 1 < C B . As C can be freely chosen,
we can take B as close to 1 as desired. This is much superior to our previous range
of uniformity.
3.5. Upper-bound and lower-bound sieves. The above example shows how
important it is to have good control over the remainder terms rd (x). One idea is to
try to drop information before using the asymptotics for Ad (x). For instance, we
might be able to estimate
X
X
λ+
λ−
d Ad (x)
d Ad (x) ≤ A(x, Sz ) ≤
d|Tz
d|Tz
±
using various weights λ±
d . This would be particularly useful if the λd were 0 for
large d.
Due to positivity we have that Ad (x) ≥ 0. Recalling the expansion of
X
µ(d)Ad (x),
A(x, Tz ) =
d|Tz
we are thus led to seek systems of weights such that
X
X
X
λ+
µ(d) ≤
λ−
d
d ≤
d|m
d|m
d|m
for all m dividing Tz . Of course, the middle sum is just δ(m). We call these choices
of λ±
d to be upper and lower bound sieves (though we introduce them together, we
can have an upper bound sieve without a corresponding lower bound sieve, and
vice-versa). If we have λ±
d = 0 for d > D, then we say that the sieve is of level D.
The first sieves to be constructed were restrictions of the Möbius function to
integers which had their primes factors lying in given intervals. This was largely
developed by Brun following work of Merlin, and then later by Rosser. These are
usually called combinatorial sieves. Though we shall not consider them here, they
can be quite powerful albeit clumsy to use.
3.6. Some sieve facts. We give some more basic facts about sieves. We write
σ ± = λ± ⋆ 1, so that the sieve condition is that σ − ≤ δ ≤ σ + , at least on divisors
of Tz . Given any multiplicative function f supported on the divisors of Tz (in
particular on squarefree numbers), we use λ± = σ ± ⋆ µ to get
XX
XX
X
σb± µ(a)f (a)f (b)
σb± µ(d/b)f (b)f (d/b) =
λ±
d f (d) =
ab|Tz
d|Tz b|d
d|Tz
=
X
b|Tz
σb± f (b)
Y
p∈Sz
p∤b
1 − f (p) =
X
b|Tz
Q
p|b
Y
σb± f (b)
·
1 − f (p) .
(1 − f (p))
p∈Sz
10
SIEVES
We assume that 0 ≤ f (p) < 1 for all p. It turns out that the multiplicative function
defined by
f (p)
fˆ(p) =
1 − f (p)
is quite important. For instance, we have 1/fˆ = (1/f ) ⋆ µ, but here we note that
−1
fˆ(p)
,
1 − f (p)
= 1 + fˆ(p) =
f (p)
so that we can write the above as
Y
X
X
σb± fˆ(b) ·
1 − f (p) .
λ±
d f (d) =
p∈Sz
b|Tz
d|Tz
±
Noting that σ1± = λ±
1 = 1, the positivity/negativity of σ gives us that
Y
X
X +
λ−
λd f (d),
1 − f (p) ≤
d f (d) ≤
p∈Sz
d|Tz
d|Tz
where the rightmost sum is also thus seen to be nonnegative.
The partial converse statement that
Y
X
λ+
1 − f (p)
d f (d) ≪
p∈Sz
d|Tz
is also quite often true, and is indeed related to the fundamental lemma. However, unlike the above, it depends on a specific relation between the multiplicative
function f and the sieve. In particular, when f satisfies the regularity (3) and
boundedness (4) hypotheses, the choice of λ+ in the Selberg upper-bound sieve will
imply this inequality. In many situations, this can then be used to get an upper
bound on A(x, Sz ).
4. The Selberg upper-bound sieve
This is a quite general upper-bound sieve first described by Selberg, who called
it the Λ2 sieve. There is also a related Λ2 Λ− lower-bound sieve, but this is not so
useful, and we do not describe it here. As in the last section, we want an upperbound P
sieve (λd ) of level D, which means that λ1 = 1 and λd = 0 for d > D and
λd ≥ 0 for all m.
σm =
d|m
Selberg eases this last condition by choosing λd with
X 2
X
ρd ,
λd =
σm =
d|m
d|m
√
where the ρd are real numbers with ρ1 = 1 and ρd = 0 for d > D. Obviously the
squaring ensures the desired inequality. From this λ-ρ relation, by inducting on the
number of prime divisors of l we get that
XX
ρa ρb .
(8)
λl =
lcm(a,b)=l
SIEVES
11
When we apply this to our situation with A(x, Sz ), we get that
2
X
X
X
X
X X
ρn
ρm
ρd
=
ak =
A(x, Sz ) =
ak ≤
ak
k≤x
k≤x
gcd(k,Tz )=1
=
XX
ρm ρn Al (x) = A(x)
XX
g(l)ρm ρn +
XX
ρm ρn rl (x),
m,n|Tz
m,n|Tz
m,n|Tz
k≡0(m)
k≡0(n)
n|Tz
m|Tz
d| gcd(k,Tz )
where l = lcm(m, n) throughout. We wish to choose the ρ in such a way to optimise
this inequality. If we ignore the remainder term, we can note that the main term is
just a quadratic form in the ρ. We thus diagonalise this, and minimise it. The ρ that
we obtain will be sufficiently small, in particular |ρd | ≤ 1, so as to make estimation
of the remainder not be our main worry in most cases.
We assume that 0 < g(p) < 1 for p|Tz and g(p) = 0 otherwise. We also take ρ
to be supported on divisors of Tz , and in particular squarefree numbers. The main
term in the above is now given by
XXX
XX
X
g(abc)ρac ρbc ,
g(l)ρm ρn =
λl g(l) =
G=
a,b,c|Tz
m,n|Tz
l|Tz
where we wrote m = ac and n = bc with c = gcd(m, n) so abc = mn/c = lcm(m, n).
Our goal is make this look like a quadratic form, and then to diagonalise it so as
to compute its minimum subject to ρ1 = 1.
We pull out the c to get (dropping the Tz -condition; it is in the ρ-support), and
use the multiplicativity of g and the squarefree support of ρ to get
X
XX
g(a)g(b)ρac ρbc .
G=
g(c)
c
gcd(a,b)=1
We next pull the standard trick of inserting the Möbius relation for something the
is equal to 1, in this case, the gcd of a and b, and then swapping the sums yields
X
X
X
X
X
XX
X
g(b)ρbc ,
g(a)ρac
µ(e)ρac ρbc =
g(c)
G=
µ(e)
g(c)
g(a)g(b)
c
a
b
c
e| gcd(a,b)
e
a≡0(e)
b≡0(e)
whereupon we replace m = a/e and n = b/e and use g-multiplicativity again to get
X
X
X
X
G=
g(c)
µ(e)
g(em)ρemc
g(en)ρenc
c
e
m
n
X
2
X 1 X
=
µ(e)
g(cem)ρcem .
g(c) e
c
m
The inner sum is looking good for our diagonalised quadratic form, and we can
simplify it by via the multiplicative function.
ĝ(p) =
g(p)
1
=
1 − g(p)
−1 + 1/g(p)
so that
and by writing n = ce and s = cem we get
2 X
X
X
=
g(s)ρs
(1/g ⋆ µ)(n)
G=
n|Tz
s≡0(n)
n|Tz
1/ĝ = 1/g ⋆ µ,
1
ĝ(n)
X
s≡0(n)
g(s)ρs
2
.
12
SIEVES
This can now be diagonalised by making the linear change of variables
X
X ξ2
X
d
g(m)ρm = µ(d)
g(de)ρde , so that G =
ξd = µ(d)
.
ĝ(d)
e
d|Tz
m≡0(d)
√
We immediately see that ξd = 0 for d > D. We can invert the ξ-ρ relation by
using the third Möbius relation (1) to get
X
ρl g(l) =
µ(n)ξnl µ(nl)
n
so that
ρl =
µ(l) X
ξd ,
g(l)
d≡0(l)
P
and with l = 1 this gives 1 = ρ1 = d ξd , where the sum is over divisors of Tz .
We want to minimise G on this hyperplane. Via Cauchy’s inequality we have
X 2
X 2 X
ξd
ĝ(d) ≥
ξd
= 12 = 1.
(9)
GH =
ĝ(d)
d|T
d|Tz
d≤
d|Tz
√z
D
Incidentally, here we have that
Y
X
Y
−1
ĝ(d) =
H≤
1 + ĝ(p) =
1 − g(p)
.
p∈Sz
d|Tz
p∈Sz
√
Equality occurs in the above when ξd = ĝ(d)/H for d ≤ D, and this gives
µ(m)ĝ(m) X
µ(m) X
ĝ(e).
ĝ(d) =
(10)
ρm =
Hg(m) √
Hg(m)
√
d≤ D
d≡0(m)
e≤ D/m
gcd(e,m)=1
In particular, we can see that |ρm | ≤ 1 (the argument is due to van Lint and
Richert) by grouping the terms of H according to the gcd of d and m to get
X X
X
X
H=
ĝ(k)
ĝ(d) =
ĝ(e)
k|m
≥
√
d≤ D
gcd(d,m)=k
X
k|m
ĝ(k) ·
√
e≤ D/k
gcd(e,m)=1
k|m
X
ĝ(e) =
√
e≤ D/m
gcd(e,m)=1
where we used that
X
ĝ(k) =
Y
p|m
k|m
ĝ(m)
g(m)
· µ(m)ρm
H = µ(m)ρ(m)H,
g(m)
ĝ(m)
Y ĝ(p)
ĝ(m)
1 + ĝ(p) =
=
.
g(p)
g(m)
p|m
By the formula (8) for λd , this implies that |λd | ≤ τ3 (d).
We can now recap most of the above:
Theorem 4.1. We have
A(x, Sz ) =
X
an ≤
n≤x
gcd(n,Tz )=1
where
HD =
X
√
d≤ D
d|Tz
ĝ(d)
and
A(x)
+ RD (x, Sz ),
HD
RD (x, Sz ) =
XX
m,n|Tz
|ρm ρn rl (x)|.
SIEVES
13
Of course, for this to be of use, we need a good lower bound for HD , and some
D
control over
√ R (x, Sz ). For simplicity, we will now take S to be the set of all primes
and z > D.
In order to be able to lower-bound HD , we need to assume that
X
X
g(p) log p = κ log t + O(1) and
g(p)2 log p < ∞,
p
p≤t
so as to use the section on multiplicative functions. The relationship between the
functions g and ĝ implies that
if the above assumptions hold for g, then since we
have ĝ(p) = g(p) + O g(p)2 , they also hold for ĝ, though maybe with different
implied constants. We get that
1 −1 √
.
HD = cĝ (log D)κ 1 + O
log D
A typical bound for the remainder terms would be something like |rd (x)| ≤ g(d)d,
or possibly with an extra τ (d) on the right. In many cases, the exact power of
logarithm does not matter much. Under the above assumption, and assuming that
pg(p) ≥ 1 for p ∈ Sz (so that dg(d) is multiplicatively non-decreasing) we get
X
2
XX
XX
D
|ρm ρn g(mn)mn| =
|ρm ρn g(l)l| ≤
R (x, Sz ) ≤
dg(d)|ρd |
≤
1
HD
√
d≤ D
m,n|Tz
m,n|Tz
2 2
X
X
1 X
ĝ(u) ≤
σ(v)ĝ(v)
dĝ(d)
HD √
√
√
u≤ D/d
gcd(d,u)=1
d≤ D
v≤ D
multiplied and divided by HD using (10), and then wrote v = ud, taking σ(v) as
the sum of the divisors of v. Using the bound (7) from multiplicative functions,
we get a sum over σ(v)ĝ(v)/v, which is multiplicative and regular with the same κ
as ĝ, and so we get:
√
2
2
1 X
1
D X σ(v)ĝ(v)
D
σ(v)ĝ(v) ≪
R (x, Sz ) ≤
HD √
HD log D √
v
v≤ D
v≤ D
√
2
D
1
D
=
· HD
≪
HD log D
(log D)2
Our final theorem thus comes as:
Theorem 4.2. Suppose that (an ) is a nonnegative sequence with
X
g(p) log p = κ log t + O(1)
p≤t
for
P some2 κ > 0 and |rd (x)| ≤ dg(d) with pg(p) ≥ 1 when it is nonzero and
p g(p) log p < ∞. Then for z > D we have
X
A(x)
D
A(x, Sz ) =
an ≪
+
,
κ
(log
D)
(log
D)2
n≤x
gcd(n,Tz )=1
κ−2
and when D(log D)
≪ A(x) we have
A(x, Sz ) ≪ A(x)
Y
p≤D
1 − g(p)).
14
SIEVES
Here we again used the regularity hypothesis
that
√ for g(p) in the last step. Note
√
when A(x) is essentially linear, we can take D to be almost as large as x.
One can show that our choice in Selberg’s sieve gives
√
κ
log D/d
√
ρd ∼ µ(d)
log D
for d not too large, and the weights on the right side are sometimes called the
quasi-optimal weights; they do not lose much, and are often easier to analyse.
We return to the idea that, for a given multiplicative function f supported
on the squarefree numbers and satisfying the regularity (3) and boundedness (4)
hypotheses, we should have some sieve λ+
d with
X
Y
λ+
1 − f (p)
d f (d) ≪
d<D
p<D
The left sum is just G, and so the result follows from multiplicative function theory
as above.
5. Applications of the Selberg sieve
5.1. Twin primes. First we recall the above setting of twin primes. We let (an )
be the characteristic function of numbers of the form m(m + 2). For m ≤ x, we
have that√ m and m + 2 are both√prime when the product has no prime factor as
large as x. Conversely, if√m ≤ x with both m and m + 2 prime, then it has no
prime factor smaller than x. Writing π2 (x) for the number of m ≤ x with m and
m + 2 both prime we get the bound
√
π2 (x) − π2 ( x) ≤ A(x2 + 2x, Sz )
√
where z = x and S is the set of all primes.
We use the Selberg upper-bound sieve with g(p) = 2/p for p > 2, so that we
have the regularity condition with κ = 2. We have that m(m + 2) ≡ 0 (mod p)
precisely when m ≡ 0, −2 (mod p). Thus we have that
X
2 X
1=
Ap (x2 + 2x) =
1 + E,
p
m≤x
m≡0,−2(p)
m≤x
where |E| ≤ 2. From this, arguing in the same manner for composite moduli, we
readily obtain that |rd (x)| ≤ dg(d). The Selberg sieve gives that
D
x
,
+O
A(x2 + 2x, Sz ) ≤
HD
(log D)2
and we choose D = x/(log x) so that
κ
Y
√ κ
1
1
HD ∼ (log D)
1 + ĝ(p)
1−
Γ(κ + 1) p
p
to get that
−κ log log x 1
x · 22 · 2! Y
.
· 1+O
1 − g(p) 1 −
π2 (x) ≤
(log x)2 p
p
log x
The main term is larger by a factor of κ! · 2κ than the expected amount. The
argument works in the same manner for other polynomials, with pg(p) being the
number of roots mod p of the polynomial.
SIEVES
15
5.2. The Brun-Titchmarsh theorem. Here we let (an ) be the characteristic
function of an arithmetic progression in some given interval, namely we want that
n ≡ a (mod q) and x < n ≤ x + y where gcd(a, q) = 1 and q < y. Our set S of
primes shall be those with p ∤ q, and z will be taken as x. As everything with a
prime divisor less than x is sieved out, we have the inequality
π(x + y; q, a) − π(x; q, a) ≤ A(x + y, Sx ),
where the π are the usual prime-counting functions in arithmetic progressions.
We clearly want g(p) = 1/p when p ∤ q, and as we are just counting arithmetic
progressions, we see that
X
y
Ap (x + y) =
an =
+E
pq
n≡a(q)
n≡0(p)
where |E| ≤ 1, thus giving us the remainder condition. We could estimate HD via
elementary means, but instead just note that
Y
√
φ(q)
log D
HD ∼ log D · (1 − 1/p) =
2q
p|q
using multiplicative functions. From the Selberg sieve, since A(x + y) = y/q + E,
we thus get
D
2y
,
+O
π(x + y; q, a) − π(x; q, a) ≤
φ(q) log D
(log D)2
and by choosing D = y/q we get the standard bound
π(x + y; q, a) − π(x; q, a) ≤
2y
+O
φ(q) log(y/q)
2 .
log(y/q)
y/q
Thus we get an upper bound that is essentially twice the expected size. Montgomery and Vaughan have shown that the error term can be omitted. Typically in
applications we will have y small compared to x, say of size log x maybe, while q
will be fixed. There is also a relationship here with Siegel zeros, as if the factor of
two could be lowered, then they do not exist (see the work of Selberg).
6. The large sieve
The large sieve developed in various parts of mathematics. It can be seen as a
tool in functional analysis, or probability theory, but we shall look at its arithmetic
applications. The main idea is to estimate the trigonometric polynomial
X
S(α) =
an e(αn),
n
where we have the standard notation e(x) = e2πix , and the an are complex numbers
supported on M < n ≤ M + N . Via Cauchy’s inequality we get the bound
2
X
X
X
X
|S(α)|2 = an e(αn) ≤
|e(αn)|2
|an |2 = N ·
|an |2 .
n
M<n≤M+N
n
n
This must be best-possible for a given α, but if we have a sequence of well-spaced α’s,
then we might expect to get some cancellation.
16
SIEVES
The usual hypothesis is that
kαr − αs k ≥ δ
for r 6= s,
where kxk is the distance to the nearest integer (this is natural on the torus R/Z).
We say that such a sequence is δ-spaced. The large sieve inequality (which we shall
prove in its dual form) will state that
X
X
(11)
|S(αr )|2 ≤ (δ −1 + N )
|an |2 ,
r
n
and indeed this is essentially best possible. Applications to number theory come
about from, say, taking the Dirichlet characters mod q for q ≤ Q, as for these the
distinct character values differ by at least 1/Q2 .
6.1. Generalised Hilbert’s inequality. The first step in proving the large sieve
inequality is to prove a generalised version of Hilbert’s inequality. (It is related to
the classical Hilbert transform.)
Lemma 6.1. Let λr be real numbers with λr+1 − λr ≥ δ. Then for any complex
numbers (zr ) we have
X X
zr z̄s π X
≤
|zr |2 .
λr − λs δ
r
r6=s
Proof. We note that we can rescale any nonzero ~z to have norm 1. We shall estimate
XX z̄s 2
W =
λ
−
λ
r
s
r
s6=r
and then apply Cauchy’s inequality. We note that for W we are estimating the
norm of the matrix/operator M = (µrs ) P
with µrs = (λr − λs )−1 for r 6= s and
zs
µrr = 0 for all r. Indeed, we have that
z,
λr −λs is the rth component of M ~
s6=r
and so we get W = kM ~zk ≤ kM k. We can also assume that ~z is extremal for M .
The skew-symmetry (or more precisely, skew-Hermiticity) of the matrix/operator
implies that such an extremal vector is an eigenvector, which says that there is
some real number η for which we have
X
zk
= iηzu
λk − λu
k6=u
for each u.
By expanding the square in W and isolating the diagonal we get that
XX
X
1
1
W =
z̄s zt
λ
−
λ
λ
−
λt
r
s r
s
t
r6=s,t
X
X
X X z̄s zt X 1
1
1
2
=
|zs |
.
+
−
(λr − λs )2
λs − λt
λr − λs
λr − λt
s
r6=s
r6=s,t
s6=t
We further simplify the last inner sum by using that
X
X
X
2
1
1
1
1
=
−
+
−
λr − λs
λr − λt
λs − λt
λr − λs
λr − λt
r6=s,t
r6=s
r6=t
SIEVES
so that we have
X
X
W =
|zs |2
s
r6=s
+
17
X X 2z̄s zt
1
+
2
(λr − λs )
(λs − λt )2
s6=t
XX
s6=t
X X z̄s zt X
1
1
z̄s zt X
−
.
λs − λt
λr − λs
λs − λt
λr − λt
r6=s
s6=t
r6=t
To simplify the second line, we use the eigenvector property for the t-sum in the
first term and the s-sum in the second, to get that it is
X X −iηzs z̄s X X iηzt zt
−
= 0.
λr − λs
λr − λt
r6=s
r6=t
Thus for extremal vectors we can apply 2|zs zt | ≤ |zs |2 + |zt |2 to get
X
X
X X 2z̄s zt
X
X
1
1
W =
|zs |2
+
≤3
.
|zs |2
2
2
(λ
−
λ
)
(λ
−
λ
)
(λ
−
λs )2
r
s
s
t
r
s
s
r6=s
s6=t
r6=s
Finally we use the hypothesis that |λr − λs | ≥ δ|r − s| to get that
X
X
X
X
X
2ζ(2)
1
1
2
≤
3
≤
3
.
|z
|
|zs |2 ·
W ≤3
|zs |2
s
2
2 (r − s)2
(λ
−
λ
)
δ
δ2
r
s
s
s
s
r6=s
r6=s
This gives us an estimate for W , and now we use Cauchy’s inequality to get that
X
2 2
X
XX z̄s 2
X X zr z̄s 2 X
π
2
2
2
≤
≤W
,
|zr | ·
|zr | ≤
|zr |
λr − λs
λr − λs
δ2
r
r
r
r
r6=s
s6=r
from which the result follows.
6.2. The large sieve inequality. Next we relate this generalised Hilbert inequality to exponential sums. We have the following lemma.
Lemma 6.2. Let αr be δ-spaced, and zr be complex. Then
X X
X
zr z̄s
≤ δ −1
|zr |2 .
sin π(αr − αs )
r6=s
r
Proof. The idea is to apply the previous lemma to the doubly indexed sets given by
zm,r = (−1)m zr and λm,r = m + αr for 1 ≤ m ≤ K. Of course, we will eventually
let K → ∞, as this is an averaging process over the m. This gives
K
X
XX
(−1)m−n zr z̄s π X
2
≤
|z
|
|(−1)m |2 .
r
m − n + αr − αs δ r
m=1
(r,m)6=(s,n)
Noting that when r = s the (m, n) term cancels the (n, m) term for n 6= m, we
change the sum-restriction to r 6= s, and write k = m − n and divide by K to get
K
X
X X
K − |k|
(−1)k π X
zr z̄s
≤
|zr |2 ,
K k + αr − αs δ r
r6=s
k=−K
as the Fejer kernel appears from counting how often k appears. By letting K → ∞
k
P
π
and recalling that k (−1)
k+α = sin πα for α non-integral, we get the result.
18
SIEVES
Corollary 6.1. For any real x we have
X
X X
sin 2πx(αr − αs ) zr z̄s
≤ δ −1
|zr |2 .
sin π(αr − αs )
r
r6=s
Proof. Apply the preceding lemma to zr e(xαr ) and zr e(−xαr ).
We are now ready to prove the large sieve inequality. Since (in Banach spaces)
a linear operator and its adjoint (conjugate transpose in the matrix setting) have
the same norm, we shall show the dual inequality of (11).
Theorem 6.1. For any complex numbers zn and δ-spaced αr we have
2
X
X
X
−1
z
e(nα
)
+ N)
|zr |2 .
r
r ≤ (δ
M<n≤M+N
r
r
P
Proof. By squaring out the left side, the diagonal gives N r |zr |2 , and the rest is
XX
X
XX
sin πN (αr − αs )
zr z̄s
e n(αr − αs ) =
zr z̄s e k(αr − αs )
sin π(αr − αs )
r6=s
M<n≤M+N
r6=s
1
2 (N
+ 1). This exponential part can be inserted into the zr and zs ,
where k = M +
P
and so by the corollary we get a bound of δ −1 r |zr |2 .
Corollary 6.2. Let (an ) be complex numbers supported on M < n ≤ M + N . Then
2
X
X
X X⋆ X X⋆ ≤ (Q2 + N )
S(b/q)2 =
a
e(bn/q)
|an |2 .
n
q≤Q b (mod q)
n
q≤Q b (mod q) M<n≤M+N
Proof. The star restricts the sum to coprime
residue
classes, and the result is immediate as for distinct aq11 and aq22 we have aq11 − aq22 ≥ q11q2 ≥ Q12 .
Consider a sequence
of complex numbers all of norm one. Then we might expect
√
S(b/q) to be size N typically, giving us a total of Q2 N when summing over b and q.
The bound we would get is then (Q2 + N )N , and thus it is seemingly the diagonal
contribution which scuppers our accounting. This might appear rather strange,
because it says that taking N large, that is, having large inner sums, is bad, but
recall that we square out and then swap the summations, so the cancellation is
detected elsewise.
6.3. Estimates on sifted sets. Let (an ) be a finitely supported sequence. We
take a slightly different viewpoint than before, and consider the sifted sum
X
A(U ) =
an
n∈U
where U is a set of squarefree integers which satisfy various congruence conditions
modulo primes. For instance, we might take U to be the set of integers for which −1
is a square. For each prime p, we let ̟(p) be the number of residue classes modulo p
̟(p)
and extend it multiplicatively to
which are not in U . Then we let ĝ(p) = p−̟(p)
squarefree integers. We consider the exponential sum and congruence sums given
by
X
X
an .
SU (α) =
an e(αn) and Xq (b) =
n∈U
n≡b (mod q)
n∈U
SIEVES
19
Lemma 6.3. Let q be squarefree. We have
X⋆ SU (b/q)2
ĝ(q)|SU (0)|2 ≤
b (mod q)
We might first comment on the usefulness of this. Suppose that the reduction
of U mod p only has one element for each prime p, so that ĝ(p) = p − 1. As
might be expected, this says that all the sums on the right are all the same size as
each sum on the right is just a phase shift of SU (0). Conversely, if the reduction
of U has p − 1 elements, then ĝ(p) = 1/(p − 1), and the sum on the left can be
estimated by the large sieve, even though this is essentially the small sieve case.
Finally, when U mod p has about p/2 elements (quadratic residues for instance),
then ĝ ∼ 1 — this can be useful in the case where (an ) is real and we can show
cancellation in the sums on the right.
Proof. First consider q = p a prime. Sorting by b mod p, we have that
2 X
2 X
X
X
X
2
2
|Xp (b)|2 ,
Xp (b) ≤
1
an = |SU (0)| = b (mod p)
b (mod p)
n≡b (mod p)
n∈U
b (mod p)
b (mod p)
Xp (b)6=0
the last step by Cauchy’s inequality. The number of congruence classes for which
Xp (b) = 0 is at least ̟(p), while the last sum can be re-written via orthogonality
of additive characters to get
XX
X
X
am ān
|Xp (b)|2 = p − ̟(p)
|SU (0)|2 ≤ p − ̟(p)
b (mod p)
b (mod p)
XX
XX
1
am ān = p − ̟(p)
= p − ̟(p)
am ān
p
m≡n (mod p)
m,n∈U
m,n∈U
p − ̟(p)
=
p
2
X X
p − ̟(p)
am e(bm/p) =
p
b (mod p) m∈U
m,n≡b (mod p)
m,n∈U
X
e(bm/p)e(−bn/p)
b (mod p)
X SU (b/p)2
b (mod p)
which gives the result for prime moduli upon moving |SU (0)|2 to the left.
For q = q1 q2 with gcd(q1 , q2 ) = 1, we use the familiar Chinese Remainder trick
X⋆
X⋆
X⋆
|SU (b1 /q1 + b2 /q2 )|2 .
|SU (b/q)|2 =
b (mod q)
b1 (mod q1 ) b2 (mod q2 )
Assuming the bound holds for q1 and q2 , we change an → an e(nb1 /q1 ) and estimate
the q2 -sum and get
X
X
|SU (b1 /q1 )|2 ≥ ĝ(q2 )ĝ(q1 )|SU (0)|2 ,
|SU (b/q)|2 ≥ ĝ(q2 )
b (mod q)
b1 (mod q1 )
and the result follows by inducting on the number of prime factors of q.
In particular, by summing over q ≤ Q we get that
X
X
X X⋆ SU (b/q)2 ≤ (Q2 + N )
|an |2 ,
ĝ(q) · |SU (0)|2 ≤
q≤Q
q≤Q b (mod q)
n
and by taking (an ) to be the characteristic function of an interval, we get an upper
bound on the sifted sum SU (0) similar to that from the Selberg upper-bound sieve.
20
SIEVES
7. The Bombieri-Vinogradov theorem
√
We now prove the Bombieri-Vinogradov theorem. For Q ≤ x/e6ĉ log x this says
X
√
x x(log x)4
√
max ψ̂(x; q, a) −
≪
+ Q x(log x)4
log
x
ĉ
φ(q)
(a,q)=1
e
q≤Q
which essentially gives us GRH on average. Indeed, the expected value of the error
term in the prime
progressions for a given a and q
p number theorem for arithmetic
√
is about size x/q, which would give us Qx when summing over Q. Sadly, √
we
cannot quite achieve this — even GRH would√only give us an error term of size x
has the
for each q and summing this over q gives Q x as indicated above. This √
impact of restricting the most efficacious use of the theorem to when Q ≤ x.
One can also consider the square of the error term, and this is much easier to
handle. In fact, one can handle other sequences in a similar manner, and this falls
under the guise of the Barban-Davenport-Halberstam theorem.
7.1. Review of Gauss sums for Dirichlet characters. Let χ be a primitive
Dirichlet character modulo q. For any integer n we recall the Gauss sum
X
χ(b)e(bn/q).
G(n, χ) =
b (mod q)
Lemma 7.1. For primitive χ we have G(n, χ) = χ̄(n)G(1, χ).
Proof. When gcd(n, q) = 1 we have
X
X
χ(c)e(c/q) = χ̄(n)G(1, χ),
χ(bn)e(bn/q) = χ̄(n)
G(n, χ) = χ̄(n)
c (mod q)
b (mod q)
as the invertibility of n implies that bn runs over all the reduced residues.
When g = gcd(n, q) > 1, we wish to show that G(n, χ) = 0, and this follows
from choosing (by primitivity) some c congruent to 1 mod q/g with χ(c) 6= 1, as
X
X
b(n/g)
χ(cb)e
χ(cb)e(bn/q) =
χ(c)G(n, χ) =
(q/g)
b (mod q)
b (mod q)
X
X
X
bc(n/g)
χ(d)e(dn/q) = G(n, χ).
χ(cb)e
χ(cb)e(bcn/q) =
=
=
(q/g)
b (mod q)
d (mod q)
b (mod q)
√
Lemma 7.2. For primitive χ we have |G(1, χ)| = q.
Proof. Writing δ(c, d) = 1, 0 depending on whether c = d, we have that
X
X
X
X
χ̄(d)e(−bd/q)
χ(c)e(bc/q)
|G(b, χ)|2 =
φ(q)|G(1, χ)|2 =
=
X
X
χ(c)
X
χ̄(d)
d (mod q)
c (mod q)
=q
d (mod q)
b (mod q) c (mod q)
b (mod q)
χ(c)
c (mod q)
X
X
e b(c − d)/q
b (mod q)
χ̄(d)δ(c, d) = q
d (mod q)
X
|χ(c)|2 = qφ(q).
c (mod q)
SIEVES
21
Lemma 7.3 (Pólya-Vinogradov). Suppose that χ is primitive. Then
X
√
χ(n) ≤ q log q.
M<n<N
Proof. We have that
X
X
X
G(1, χ̄)
χ(n) =
G(n, χ̄) =
M<n<N
M<n<N
X
χ̄(c)e(cn/q) =
M<n<N c (mod q)
X
χ̄(c)
c (mod q)
X
e(cn/q).
M<n<N
So we get
q−1 X
q−1
X
X
X
≤ √1
≤ √1
csc(πc/q),
e(cn/q)
χ(n)
q c=1 q c=1
M<n<N
M<n<N
the last step from summing the geometric series to (essentially) get e(cN/q)−e(cM/q)
,
1−e(c/q)
and then trivially bounding the numerator by 2 and multiplying the denominator
by e(−c/2q). Recalling that csc(πx) ≤ 1/(2x) for 0 < x ≤ 1/2, and using the
symmetry of the cosecant to handle m ≥ q/2, for even q we get
q−1
X
c=1
q/2−1
csc(πc/q) ≤ 1 + (q/2) · 2
and for odd q we have
q−1
X
c=1
X 1
≤ 2 + q log(q/2 − 1) ≤ q log q
c
c=1
(q−1)/2
csc(πc/q) ≤ (q/2) · 2
The result now follows immediately.
X 1
≤ 1 + q log(q − 1) ≤ q log q.
c
c=1
7.2. Large sieve for multiplicative characters.
Lemma 7.4. Let (an ) be a sequence of complex numbers. Then
2
X q
X⋆ X
X
≤ (Q2 + N )
a
χ(n)
|an |2 .
n
φ(q)
q≤Q
χ mod q n≤N
n≤N
We only consider the inner sum over primitive characters; a more general statement is possible.
Proof. We call the sum A and recall the Gauss sum relation G(n, χ) = χ̄(n)G(1, χ)
√
and |G(1, χ)| = q to get
2
2
X q X⋆ X
X q
X⋆ X
1
A=
an χ̄(n) =
an G(n, χ)
2
φ(q)
φ(q) |G(1, χ)|
q≤Q
q≤Q
χ mod q n≤N
χ mod q n≤N
2
X⋆ X
X 1
χ(b)S(b/q)
=
φ(q)
q≤Q
χ mod q b (mod q)
where
S(b/q) =
X
n≤N
an e(bn/q) and G(n, χ) =
X
χ(b)e(bn/q).
b (mod q)
22
SIEVES
We next extend the character sum to include imprimitive characters, and expand
the square to get an upper bound of
A≤
X
q≤Q
X
1
φ(q)
X
X
χ(b)S(b/q)
χ mod q b (mod q)
χ̄(c)S̄(c/q)
c (mod q)
X
1
χ̄(c)χ(b)
φ(q)
q≤Q b (mod q)
χ mod q
c (mod q)
X
X X⋆
|S(b/q)|2 ≤ (Q2 + N )
|an |2 .
=
=
X⋆
X
S(b/q)
X⋆
S̄(c/q)
n≤N
q≤Q b (mod q)
the penultimate step using orthogonality of characters, and then Corollary 6.2. 7.3. Combinatorial identities. Let f be an arithmetic function. We wish to
write a sum over primes involving f as a combination of congruence sums (to relatively small moduli) and bilinear terms (which will take care of the larger moduli).
Of course, it is easier to use the von Mangoldt function than to sum over primes,
and the difference herein can be neglected if our sequence is not too sparse — however, we will see below that this difference between primes and prime powers is
already enough to limit the allowable range of moduli in the Bombieri-Vinogradov
theorem (at least if we bound things in the most simple manner).
We start by noting that
X
Λ(n)f (n) =
n∼X
X
X
f (n)
n∼X
Λ(c)
X
µ(b),
b|(n/c)
c|n
as the inner-most sum is nonzero only when c = n, and by inverting the order of
summation we get
X
Λ(n)f (n) =
n∼X
X
n∼X
f (n)
X
µ(b)
X
Λ(c).
c|(n/b)
b|n
We now split-off the small values of b at some parameter B, and use the summation formula for the von Mangoldt function to compute the c-sum in these parts.
We get
X
n∼X
Λ(n)f (n) =
X
n∼X
f (n)
X
b≤B
b|n
µ(b) log(n/b) +
X
f (n)
n∼X
X
b|n
b>B
X
X
µ(b)
Λ(c)
+
c|(n/b)
c≤B
c|(n/b)
c>B
We can hopefully estimate the first sum via congruential means due to the smallness
of b, and we similarly split off small values of bc in the second sum. In the first part
of this second sum, we can move the b-sum to the inside. If we take B < X then
we have c 6= n, so that the sum over all b|(n/c) is zero; thus we can flip the b > B
condition to b ≤ B via negating the result. We get
SIEVES
X
Λ(n)f (n) =
n∼X
X
f (n)
n∼X
+
X
b≤B
b|n
X
f (n)
n∼X
=
X
b≤B
+
µ(b)
X
b≤B
−
=
b≤B
−
X
µ(b)
b≤B
X
X
Λ(c)
X
µ(b)
X
X
Λ(c)
c|n
c≤B
µ(b)+
b|(n/c)
b≤B
X
Λ(c)
c≤B
b≤B
X
µ(b)
b≤B
f (bcv) +
f (bu) log u −
µ(b)
X
X
f (bcv)+
X
f (bcv)−
v∼X/bc
Λ(c)
X
X/B 2 <v<2X/B l∼X/v
X
µ(b)
b≤B
f (lv)ηl +
X
X
Λ(c)
c>B
b>B
v∼X/bc
X
v∼X/bc
c≤B/b
X
X
u∼X/b
Λ(c)
f (bcv)
f (bu) log u −
B/b<c≤B
X
X
µ(b)
v∼X/bc
u∼X/b
X
n∼X
X
f (bu) log u −
c>B
µ(b)
f (n)
c|(n/b)
c>B
u∼X/b
X
X
X
X
Λ(c)
µ(b)
X
b>B
=
µ(b) log(n/b) −
b|n
b>B
µ(b)
23
Λ(c)
c≤B/b
X
X
f (bcv)
v∼X/bc
X
f (bcv)−
v∼X/bc
X
f (bm)µ(b)ξm
B<b<2X/B m∼X/b
In the third step we split c according to whether bc ≤ B (desiring v ≫ X/B), and
then wrote the last two sums as bilinear forms with
X
X
Λ(c),
µ(b)Λ(c) and ξm =
ηl =
bc=l
bc≥B and b,c≤B
cv=m
c≥B
so that |ηl | ≤ log l and |ξm | ≤ log m. In practise, the first two expressions (to the
moduli b and bc) can be handled by congruence sums or the like (with f = χ we will
detect cancellation in the innermost sum via Pólya-Vinogradov), while the third
and fourth will be viewed as a bilinear form and estimated via some other method
(such as the large sieve). This is a rather particular version of Vaughan’s identity
that will suffice for our applications.
7.4. Separation of variables. It is often useful to be able to take summation
conditions such as mn ∼ x and relax this to m ∼ M , n ∼ N with M N ∼ x. The
following lemma gives a method for doing this, and usually loses only a logarithm.
The idea is to transform the summation condition to an expression with (mn)it
appearing, and this can then be split into m and n separately. When used in
conjunction with the large sieve, the terms like mit will not matter, because we can
use arbitrary sequences and changing am → am mit will not change the norm of the
sequence. As many of our bounds only depend on the coefficients of m and n being
small in size, this will not affect the end result.
7.5. Let x ≥ 1 be given. The function gx in the theorem below has
RLemma
∞
|g
(t)|
dt < log 6x and for every positive integer l we have
−∞ x
(
Z ∞
1 if l ≤ x,
it
gx (t)l dt =
0 otherwise.
−∞
24
SIEVES
Proof. Let f be the function given by f (u) = u for 0 ≤ u ≤ 1, and f (u) ≡ 1 for
1 ≤ u ≤ ⌊x⌋, and f (u) = ⌊x + 1⌋ − u for ⌊x⌋ ≤ u ≤ ⌊x + 1⌋, and f (u) ≡ 0 elsewhere.
Thus for positive integers l we have f (l) = 1 for l ≤ x and f (l) = 0 otherwise. The
inverse Mellin transform of f is given by
Z
Z ∞
h(s) ds
h(it) dt
f (u) =
=
s
it 2π
2πi
(0) u
−∞ u
where
h(s) =
Z
0
∞
f (u)us−1 du =
Z
1
0
u · us−1 du +
Z
⌊x⌋
us−1 du +
Z
⌊x+1⌋
⌊x⌋
1
⌊x + 1⌋ − u us−1 du
⌊x + 1⌋
1
1
1
⌊x + 1⌋s − ⌊x⌋s +
⌊x⌋s+1 − ⌊x + 1⌋s+1
+ ⌊x⌋s − 1 +
=
s+1 s
s
s+1
1
1
⌊x⌋
1 ⌊x⌋ + 1
⌊x + 1⌋ ⌊x + 1⌋
=
−
−
+
−
+ ⌊x⌋s
+ ⌊x + 1⌋s
s+1 s
s
s
s+1
s
s+1
Z 1
Z ⌊x+1⌋
1
1
1
1 + ⌊x⌋s+1 − ⌊x + 1⌋s+1 =
=
us du −
us du.
s(s + 1)
s 0
s ⌊x⌋
Thus for imaginary s = it the very first expression gives that |h(it)| ≤ 1 + log x,
while the very last gives |h(it)| ≤ 2/t and the penultimate implies |h(it)| ≤ 2(x+1)
t(t+1) .
From these we obtain that
Z x
Z 1
Z ∞
Z ∞
2
2(x + 1)
1 + log x dt + 2
|h(it)| dt ≤ 2
dt
dt + 2
t
t2
1
−∞
0
x
4(x + 1)
≤ 2π log 6x
≤ 2 + 2 log x + 4 log x +
x
The result follows upon taking gx (t) = −h(−it)/2π.
We leave it to the interested reader to show a similar result for summation
conditions of the type m ≤ n; here we want (m/n)it to appear and the integral of
the absolute value of the g-function will be bounded by log 6x2 . We can also, of
course, restrict a summation to a dyadic interval by using the function g2x − gx .
7.5. Proof of main theorem. We recall our definition that
X
log p.
ψ̂(x; q, a) =
p∼x
p≡a (mod q)
√
Theorem 7.1. Suppose that 2 ≤ Q ≤ x/e6ĉ log x . Then we have
X•
√
x x(log x)4
√
+ Q x(log x)4 .
max ψ̂(x; q, a) −
≪
log
x
ĉ
φ(q) (a,q)=1
e
q≤Q
√
Here the sum excludes a possible primitive P -Siegel character for P = eĉ log x .
Alternatively, such a character could be included, but at the cost of replacing the
first term with x/(log x)A for any A > 0 and making the result ineffective. Often
the result is presented with a maximum also being taken over y ≤ x; this adds no
difficulty to the proof, but this is unnecessary for our application.
The result is mainly
√ useful when the first term dominates, and thus we can’t
take Q larger than x in size in applications. The Elliott-Halberstam conjecture
implies (essentially) that we can ignore the second term for Q as large as x1−ǫ
SIEVES
25
√
for any ǫ > 0. Breaking the x barrier would be a major result and have many
consequences. Through the use of Kloosterman sums, this barrier can be passed
when, say, we fix |a| to be small (this is most definitively done in work of Bombieri,
Friedlander, and Iwaniec), but the general case is not currently known.
Proof. By orthogonality of characters (when a and q are coprime) we have that
X
1 X
log p =
ψ̂(x; q, a) =
χ̄(a)ψ̂(x, χ)
φ(q)
p∼x
χ mod q
p≡a (q)
where ψ̂(x, χ) =
P
χ(p) log p. It has been noted by some authors that attempts
x<p≤2x
to improve the Bombieri-Vinogradov will likely exploit the variation of χ̄(a) here.
Indeed, we simply bound it in absolute value, which certainly gives a bound for
the maximal sum. Subtracting off x/φ(q) with the principal character, and writing
S(x, Q) for the sum in question, we have
X X• 1
S(x, Q) ≤
ψ̂(x, χ) − δχ x = T (x, Q) + U (x, Q),
φ(q)
q≤Q
χ mod q
where δχ = 1 when χ is principal and zero otherwise, and the T -U splitting is by
whether the character is principal. The contribution from the principal characters
is easily bounded via the prime number theorem by
X 1
x
x
T (x, Q) ≪
· 2ĉ√log x ≪ (log x) · 2ĉ√log x .
φ(q) e
e
q≤Q
The q-sum is easily estimated either by elementary techniques or by comparing the
residues of the corresponding Dirichlet series and ζ(s).
For the nonprincipal characters, we group the sum so as to combine the contributions from all characters induced by the same primitive character and get
X• X⋆
X
X• log Q X⋆
1
U (x, Q) ≪
|ψ̂(x, χ)|
≪
|ψ̂(x, χ)|,
φ(qr)
φ(q)
q≤Q χ mod q
q≤Q
r≤Q/q
χ mod q
where we estimated the r-sum as above (either by elementary means or by a residue
comparison). In essence, we took each primitive character modulo q, and then
considered the lift to the modulus qr; these have the same error term (we need not
fiddle with primes dividing r, as they are less than x).
For small moduli, we will bound the inner sum by using results related to
the
√
prime number theorem for√arithmetic progressions. From (2), for q ≤ P = eĉ log x
we have |ψ̂(x, χ)| ≪ x/e2ĉ log x , though this bound will not be effective unless we
exclude a possible P -Siegel character. This gives that
X 1 X⋆
xP log Q
U (x, Q) ≪ 2ĉ√log x + (log x)
|ψ̂(x, χ)| = U1 (x, Q) + U2 (x, Q).
φ(q)
e
P <q≤Q
χ mod q
As U1 (x, Q) is already sufficiently small, we need only estimate U2 (x, Q) here,
and use the above combinatorial identity to dissect the ψ̂-function. We write ψ̃(x, χ)
for the dyadically-summed
χ-twisted
von Mangoldt function, and note that we have
√
the bound ψ̃(x, χ) − ψ̂(x, χ) ≪ x(log x)2 , as the summands only differ on prime
26
SIEVES
powers. From the above version of Vaughan’s identity with f = χ we have
X
ψ̃(x, χ) =
Λ(n)χ(n)
n∼x
=
X
χ(b)µ(b)
b≤B
−
X
χ(m) log(m) −
m∼x/b
X
x/B 2 <v<2x/B
X
f (lv)ηl +
X
µ(b)
b≤B
X
X
Λ(c)χ(bc)
χ(v)−
v∼x/bc
c≤B/b
X
X
f (bm)µ(b)ξm .
B<w<2x/B m∼x/w
l∼x/v
√
3ĉ log x
p
Here we choose B = e
≤ x/Q.
Both of the first two sums can be estimated via the Pólya-Vinogradov inequality.
The second is direct and, writing l = bc, we crudely estimate the triple sum as
X
√
√
τ (l)(log B) q(log q) ≪ B q(log x)3 ,
l≤B
where q is the modulus of the character (and is less than x). For the first sum we
first use partial summation and get the same result:
Z m X Z 2x/b X
X X
dt dt √
χ(m)
χ(m) ≪ B q(log q)(log x).
=
t
t
1
1
b≤B m∼x/b
b≤B
t≤m≤2x/b
Putting these back into the expression for U2 (x, Q) and recalling the difference
between ψ̃ and ψ̂, we get a contribution of no more than
X √
√
√
2
3
≪ (log x)
B q(log x) + x(log x) ≪ BQ3/2 (log x)4 + Q x(log x)3 ,
P <q≤Q
p
which is sufficient provided that B ≤ x/Q.
To estimate the contributions from the other two parts of ψ̃(x, χ) we split the
outer sums dyadically, which gives an extra factor of log x, and get a bound of
!
X X
X⋆ X X
X
1
2
µ(u)ξj χ(ju) ,
ηl χ(lv) + ≪ (log x)
φ(q)
P <q≤Q
χ mod q
u∼U j∼x/u
v∼V l∼x/v
x
for some V, U with B ≤ V, U ≤ B
. We also split the q-sum into dyadic intervals
k
indexed by 2 P for k ≤ log2 (Q/P ); this will allow us to induce q/φ(q) into our sums
(as appears in the large sieve inequality) with little loss. We are desirous of using
Cauchy’s inequality, but first must use the variable-splitting trick of Lemma 7.5
with h = g2x − gx to get
a bound of
≪ (log x)2
log2 (Q/P ) Z ∞
X
k=0
−∞
|h(t)|F (t) dt
where F (t) is
!
X q/2k P X⋆ X
X
X
X
ηl χ(l)lit + µ(u)χ(u)uit ηj χ(j)j it ,
χ(v)v it
φ(q)
k
q∼2 P
χ mod q
v∼V
l∼x/V
u∼U
j∼x/U
SIEVES
27
and the integral of |h| is less than (log x) in size. Next we apply Cauchy’s inequality
and the multiplicative version of the large sieve to bound F (t) by
2 !1/2
2 !1/2
X q/2k P X⋆ X
X q/2k P X⋆ X
it
it
+
ηl χ(l)l χ(v)v φ(q)
φ(q)
k+1
k+1
q≤2
P
χ mod q v∼V
q≤2
2 !1/2
X q/2k P X⋆ X
it µ(u)χ(u)u φ(q)
k
+
q≤2·2 P
χ mod q u∼U
P
χ mod q l∼x/V
2 !1/2
X q/2k P X⋆ X
it ξj χ(j)j φ(q)
k
q≤2·2 P
χ mod q j∼x/U
1/2
1/2 X 1/2 1/2 X
x
1
+
|ηl |2
V + (2k P )2
·
12
+ (2k P )2
≪ k
2 P
V
v∼V
l∼x/V
1/2
1/2 X 1/2 1/2 X
1
x
+ k
|ξj |2
U + (2k P )2
·
12
+ (2k P )2
2 P
U
u∼U
j∼x/U
1/2
√
x log x
x
x
k
2
k
4
x
+
+
V
+
+
U
(2
P
)
+
(2
P
)
≪
2k P
V
U
r
√
x log x 1/2
x k
k
2
x +
≪
(2 P )+(2 P ) .
2k P
B
x
The last line here uses B ≤ U, V ≤ B
. Recalling the sum over k, the integral of |h|,
and the above parts we estimated via Pólya-Vinogradov, we get a bound of
√
x log x x(log x)2
+ Q x(log x) + BQ3/2 (log x)4 ,
+ √
U2 (x, Q) ≪ (log x)3 ·
P
B
√
√
p
which gives the result upon taking B = e3ĉ log x ≤ x/Q and P = eĉ log x .
8. Two lemmata
√
Lemma 8.1. Suppose that Z ≥ 3 and 1 ≤ u ≤ log Z is an integer. Then
X τu (m)
1 (log Z)u+1
≪√
.
m
u
u!
m≤Z
Proof. Recalling that Γ(s) has e−x as its inverse Mellin transform, we have
Z
∞
∞
X
X
X τu (m)
τu (m) −m/Z
τu (m)
Γ(s) ds
≤e
e
=e
s 2πi
m
m
m
(m/Z)
(2)
m=1
m=1
m≤Z
Z
Z ∞
e
ds
≪ Z δ ζ(1 + δ)u
=
Γ(s)ζ(s + 1)u Z s
|Γ(δ + it)| dt
2 (δ)
2πi
−∞
√
Recalling that ζ(1 + δ) ≤ (1 + δ)/δ and taking δ = u/ log Z ≤ 1/ log Z, we get
u
u u
X τu (m)
u
(log Z)u+1 e
1 (log Z)u+1
Zδ 1 + δ
1+
=
≪√
≪
,
m
δ
δ
u
u
log Z
u!
u
m≤Z
where we noted that the√integral is ≪ 1/δ and used that (e/u)u ≤
(1 + 1/v)v ≤ e with v = log Z.
√
u/u! and
28
SIEVES
√
Lemma 8.2. Suppose that Z ≥ 3 and 1 ≤ u ≤ log Z is an integer. Then
X
√ (log Z)u
.
τu (m) ≪ uZ
u!
m≤Z
Proof. We imitate the proof of the previous lemma, getting
u
Z
X
√ (log Z)u
1+δ 1 + δ
u s ds
≪ uZ
≪Z
.
Γ(s)ζ(s) Z
τu (m) ≪
2πi
δ
u!
(1+δ)
m≤Z
9. Recent work on gaps between primes
We will describe recent work of Goldston, Yıldırım, and others that shows that
there are small gaps between primes on average. This will use the BombieriVinogradov theorem, some typical techniques from analytic number theory, and
consideration of prime k-tuples on average. The “Small gaps between primes exist”
preprint of Goldston, Motohashi, Pintz, and Yıldırım shows that
pn+1 − pn
lim inf
= 0,
n→∞
log pn
and we include improvements that appear to yield
pn+1 − pn
< ∞.
lim inf
n→∞ (log pn )7/9 (log log pn )1/9
Here pn denotes the nth prime. It is claimed that an exponent of 21 on log pn can
be obtained via similar methods. However, we should stress that the main thrust
of the mechanism is not in reducing this exponent, but that an improvement in the
Bombieri-Vinogradov theorem would imply the existence of a constant C such that
there are infinitely many pairs of primes that differ by no more than C.
We aim to show that, for large N , there are two primes in the interval [N, 3N ]
that differ by no more than c(log N )7/9 (log log N )1/9 . We first recall that Heath√
log N
Brown has shown (effectively) that if there is a P -Siegel character
with
P
=
e
,
√
then there is a pair of twin primes larger than log P = log N . (In fact, his proof
is an application of sieve methods, using the idea that µ ≈ χ for an exceptional
character, which gives a sifting function that is essentially periodic.) Thus we can
assume that there is no such exceptional character (this only affects effectivity).
9.1. Overview. The main idea of the proof is in considering the sum
X X
2
Y =
Λ̂(n + h) − log 3N ΛR
H (n, k + l) .
n∼N
h≤H
Here Λ̂ is the von Mangoldt function restricted to primes. If Y is positive, then we
must have some integer n ∼ N with
X
Λ̂(n + h) − log 3N > 0,
h≤H
and this means that there is some subinterval of length H in [N, 2N + H] that has
at least two primes. Thus we want H to be as small possible.
If the Bombieri-Vinogradov theorem could be extended to handle moduli as large
as xθ for any θ > 1/2, we would get a proof that there are bounded gaps between
SIEVES
29
primes (we do not explicitly derive this result here, though it follows in the same
manner). The argument we give here uses averaging over prime k-tuples, though
this would unnecessary with a Bombieri-Vinogradov extension. Indeed, much of our
worry is in obtaining results uniform in k, whereas this problem would disappear if
we could take θ > 1/2 (we would only need k to be larger than some constant).
2
Note that we have yet to mention the term ΛR
H (n, k + l) in the above. Indeed,
its positivity is most important, and other than that, we want to be able to compute asymptotics for the two parts of Y . Here H is an admissible k-tuple contained
2
in [1, H] and ΛR
H (n, k + l) is a truncated version of the generalised von Mangoldt
function. It is related to the quasi-optimal Selberg weights for a sieve of dimension k + l, so we are detecting k-tuples that have
√ no more than k + l prime factors,
and so expect many primes. We will take l ∼ k in the end. The truncation parameter R is related to the sieving limit D of before; we want it as large as possible, but
we need (essentially) that R2 ≪ N 1/2 to be able to apply the Bombieri-Vinogradov
theorem.
The crux of the argument is in computing asymptotics for the two parts of Y . The
part without Λ̂ can be analysed with standard tools from analytic number theory
(such as contour integration, though here a bivariate version), while the second part
can be handled similarly, with the Bombieri-Vinogradov theorem used to control
the error term. This can be seen as a version of the so-called weighted sieve, as we
allow positive and negative contributions, and determine which dominates.
9.2. Definitions. Let H be a tuple (ordered set) of distinct integers taken from the
interval [1, H]. For each prime p, let ΩH (p) be the set of residue classes modulo p
given by the −h for h ∈ H, and extend ΩH (d) multiplicatively to squarefree numbers
via the Chinese remainder theorem, so that n ∈ ΩH (d) when n ∈ ΩH (p) for all p|d.
If we desire, we can invert the notation and note that
Y
n ∈ ΩH (d) ⇐⇒ d|PH (n) where PH (n) =
(n + h).
h∈H
We call H admissible if #ΩH (p) < p for all primes p.
Recalling the quasi-optimal weights from the Selberg sieve, we define (for a ≥ 1)
(
a
Z s
1
µ(d) log R/d
if d ≤ R,
µ(d)
R
ds
R
a!
λa (d) =
=
a+1
2πi (1) d
s
0
if d > R,
where the integral is over the vertical line ℜs = 1. Our truncated approximation
of a generalisation of the von Mangoldt function is then given by
X
a
1 X
µ(d) log R/d .
λR
(12)
ΛR
a (d) =
H (n, a) =
a! d|P (n,H)
n∈ΩH (d)
d≤R
The truncation parameter R will be chosen as large as possible; it turns out that we
need R2 ≤ Q where Q is the maximal modulus allowed in the Bombieri-Vinogradov
theorem, so we will have R of size about N 1/4 . We shall also choose H ≤ log N , as
else the final theorem is of no value in any case. We declare that the final choices
of our parameters will be (for some small constant ċ > 0)
k ∼ ċ
(log N )4/9
,
(log log N )2/9
l∼
√
k,
R ∼ N 1/4 /e
√
log N
,
30
SIEVES
which will give us positivity of Y when H ≫ (log N )7/9 (log log N )1/9 . Much of
our trouble is in attaining the uniformity in k; derivation of the result with k an
arbitrarily large constant is comparatively much easier, and leads to the qualitative
result that there are infinitely gaps smaller than any (constant) multiple of log N .
9.3. First evaluation. First we will evaluate
X
2
W (H) =
ΛR
H (n, k + l) ,
n∼N
where H is an admissible k-tuple. We expand the square in this, and note that
with m = lcm(d1 , d2 ) we have that ΩH (d1 ) ∩ ΩH (d2 ) = ΩH (m), so that we get
X
X
X
1
λR
λR
W (H) =
k+l (d2 )
k+l (d1 )
d2
d1
n∼N
n∈ΩH (m)
#ΩH (m)
(log R)2(k+l) X X
+O
#Ω
(m)
H
m
(k + l)!2
d2
d1
d1 ≤R d2 ≤R
since the n-sum is N/m+O #ΩH (m) . The number of ways of writing m as the lcm
of d1 and d2 is bounded by τ3 (m), as can be seen from considering m = (a/g)(b/g)g
where g = gcd(a, b). Furthermore, since #H = k we have that #ΩH (m) ≤ τk (m).
Noting that τ3 (m)τk (m) ≤ τ3k (m), we use Lemma 8.2 bounded the error term as
=N
X
λR
k+l (d1 )
E1 ≪
X
λR
k+l (d2 )
√ 2 (log R2 )3k (log R)2(k+l)
√ (log N )6k
kR
≪
.
·
N
(3k)!
(k + l)!2
(3k)! eck
The fact that R2 is much smaller than N will imply that this is easily ignorable.
We are left to compute the double sum in the main term, and we note that, via
using the integral representation for λR
a , this term is given by
Z Z
N
Rs1 +s2
Ŵ (H) =
ds1 ds2 ,
F
(s
,
s
)
H
1
2
(2πi)2 (1) (1)
(s1 s2 )k+l+1
where (multiplicativity and computing the 3 terms with m = p gives the product)
X X µ(d1 ) µ(d2 ) #ΩH (m) Y #ΩH (p) 1
1
1
.
1−
+
−
=
FH (s1 , s2 ) =
ds11 ds22
m
p
ps1 ps2 ps1 +s2
p
d1
d2
Because we have that ΩH (p) = k for p > H, we consider the function
k
ζ(s1 + 1)ζ(s2 + 1)
GH (s1 , s2 ) = FH (s1 , s2 )
,
ζ(s1 + s2 + 1)
and below show that this is regular somewhat to the left of the imaginary axes.
Note that we have that GH (0, 0) is equal to the singular series for H, as we have
−k
Y
1
#ΩH (p)
1−
.
1−
GH (0, 0) = S(H) =
p
p
p
The method to get an asymptotic for Ŵ (H) is sufficiently standard that here we
just state the result, and postpone the details until later. We get that
2√
2 2l (log R)k+2l
k log k
Ŵ (H) = N · S(H) ·
1+O
.
l
(k + 2l)!
log N
SIEVES
31
Here the pole-order k + 2l comes from 2(k + l) − k, in which the two (k + l)’s come
from 1/s1k+l+1 and 1/s2k+l+1 and
the subtracted term comes from
R the ζ-quotient
in GH (s1 , s2 ). The factor of 2ll appears from an integral like (ξ + 1)2l /ξ l+1 dξ
around a small circle; this integrand comes about by multiplying the ζ-factor by
s1 s2 /(s1 + s2 ) and changing variables. It is not entirely ridiculous to try to improve
the error term, and this would then affect the final result. Oppositely, an asymptotic
eck
can be derived more easily. Finally, note
with the bracketed term as 1 + O log
N
that the bound on our previous error term E1 fits in this error term, due to the
fact (see below) that S(H) ≫ 1/eck for admissible H.
9.4. Second evaluation. We now evaluate the sum
X
2
IH (h) =
Λ̂(n + h)ΛR
k+l (n, H)
n∼N
first assuming that h ∈
/ H and that H ∪ {h} is admissible. Expanding, we get that
X
X
X
λR
λR
IH (h) =
Λ̃(n + h)
k+l (d2 )
k+l (d1 )
n∼N
=
X
λR
k+l (d1 )
X
λR
k+l (d1 )
d1 ≤R
=
d2 |PH (n)
d2 ≤R
d1 |PH (n)
d1 ≤R
X
λR
k+l (d2 )
X
λR
k+l (d2 )
d2 ≤R
d1 ≤R
X
c∈ΩH (m)
d2 ≤R
X
X
Λ̃(n + h)
n∼N
n≡c mod m
ψ̂(N ; m, c + h),
c∈ΩH (m)
where m is the lcm of d1 and d2 . We now replace ψ̂ by its asymptotic evaluation N/φ(m), calling the error term E. We postpone the main term for below, and
note that the error is
X (log R)k+l X (log R)k+l X
E2 ≪
E(N ; m, c + h)
(k + l)!
(k + l)!
d1 ≤R
d2 ≤R
2(k+l)
(log R)
≪
(k + l)!2
X
m≤R2
c∈ΩH (m)
τ3 (m) · #ΩH (m) · max E(N ; m, r).
r
Here we used that the number of ways of writing an lcm is τ3 . We recall that
#ΩH (m) ≤ τk (m) and τ3 τk ≤ τ3k , and next we split the sum according to whether
τ3k (m) ≤ (log N )A for some parameter A (a variant of Rankin’s trick). Upon
using the trivial bound that E ≤ N/m for large τ3k (m) and Bombieri-Vinogradov
when τ3k (m) is small, this gives (for any α > 0) that
α
X
N
(log R)2(k+l)+A X
τ3k (m)
τ
(m)
E2 ≪
max
E(N
;
m,
r)
+
3k
A
r
(k + l)!2
(log
N
)
m
m≤R2
m≤R2
X τ3k (m)10/9
√
(log R)2(k+l)+A
N
N
2
4
√
R
≪
N
(log
N
)
+
+
(k + l)!2
m
(log N )A/9
eĉ log N
2
m≤R
2(k+l)+A
≪
(log R)
(k + l)!2
N
√
eĉ log N
+
4k10/9 −A/9
N (log N )
k 10/9 !
≪
N
.
(2k)! eck
Here we chose α = 1/9 and A = 37k 10/9 , and then used that k ≤ (log N )4/9
10/9
and (10/9)(4/9) < 1/2, while noting that τ3k ≤ τ4k10/9 before using Lemma 8.1.
32
SIEVES
9.4.1. Main term. The main term in IH (h) is given by
X
X
X
1
N
λR
λR
δ gcd(c + h, p) .
k+l (d1 )
k+l (d2 ) ·
φ(m)
d1 ≤R
d2 ≤R
c∈ΩH (m)
Here we have δ(x) = 1 for x = 1 and δ(x) = 0 else. The inner sum is
Y
X
Y X
δ gcd(c + h, m) =
#ΩHh (p) − 1 .
δ gcd(c + h, m) =
p|m c∈ΩH (p)
c∈ΩH (m)
p|m
h
Here H denotes the union of H with {h}, and we noted that the gcd is 1 except
for c = −h, which occurs in the c-sum exactly when h ∈ H.
The argument is now exactly analogous to the previous. Noting that φ(p) = p−1,
we want to estimate the integral
s1 +s2
Z Z Y
N
ds1 ds2
R
1
1
#ΩHh (p) − 1 1
IˆH (h) =
,
+
−
1
−
s1
s2
s1 +s2
k+l+1
(2πi)2
p
−
1
p
p
p
(s
s
)
1 2
p
and we multiply/divide by the kth power of a ζ-quotient as before. For the evaluation at (0, 0) we get
−k Y
−k
Y
1
p−1
p − 1 − #ΩHh (p) + 1
#ΩHh (p) − 1
1−
=
1−
p−1
p
p−1
p
p
p
−(k+1)
Y p − #ΩHh (p) p p − 1 −k Y
#ΩHh (p)
p−1
=
1−
=
,
p
p−1
p
p
p
p
p
and this is exactly S(Hh ).
So assuming that h ∈
/ H and Hh is admissible, we get that
2√
2 k log k
2l (log R)k+2l
1+O
,
IˆH (h) = N · S(H ∪ {h}) ·
(k + 2l)!
log N
l
h∈
/ H.
2
Next we note that when h ∈ H we have that Λ̂(n + h)ΛR
H (n, k + l) is nonzero
only when (n + h) is prime, and in this case, writing Hh = H\{h}, since R < N we
have that d|PH (n) ⇐⇒ d|PHh (n) for d ≤ R, so that by (12) we have
2
R
2
Λ̂(n + h)ΛR
H (n, k + l) = Λ̂(n + h)ΛHh (n, k + l)
We can then apply the above to Hh upon making the change k → k−1 and l → l+1.
So when h ∈ H (and H is admissible) we get that
2√
2 k log k
2l + 2 (log R)k+2l+1
1+O
, h ∈ H.
IˆH (h) = N · S(H) ·
log N
l + 1 (k + 2l + 1)!
Finally, we note that the error term E2 from the Bombieri-Vinogradov theorem is
much smaller than the error term here.
9.5. Comparison of terms. Let Tk (H) be the set of all distinct k-tuples of integers taken from [1, H]. We wish to estimate
X⋆ X X ⋆
2
Y =
Λ̂(n + h) − log 3N ΛR
H (n, k + l) ,
H∈Tk (H) n∼N
h≤H
where the star on the h-sum restricts to h such that Hh is admissible, and the
outer star similarly insists that H be admissible. We will then choose H so that
SIEVES
33
this is positive. This will then give two primes in some subinterval of length H
in [N, 2N + H]. For the term involving Λ̂, we split the h-sum depending
on whether
we have h ∈ H. Factoring out the common factor of N (log R)k+2l 2ll /(k + 2l)!, we
get asymptotics of
X X
X X⋆
⋆
(2l + 2)(2l + 1)
log R
S(H)
S(H∪{h})+
Y+ =
k
+
2l
+
1
(l
+
1)(l
+
1)
h≤H
h≤H
H∈Tk (H)
H∈Tk (H)
h∈H
/
h∈H
from the terms with Λ̄, and
Y− = log 3N
X⋆
S(H)
H∈Tk (H)
for the other term. The error terms give
We write
2
2√
k log k
YE ≪ (Y+ + Y− )
.
log N
Uk (H) =
X
S(H),
H∈Tk (H)
(where it does not matter whether the sum is over admissible H) and see that the
second double sum in the above display is kUk (H), as the h-sum has k members each
of which repeats the H-sum. Meanwhile, the first double sum in Y+ is Uk+1 (H);
indeed, ignoring admissibility as we may, there is one-to-one map between (H, h)
and H ∪ {h}, where this union puts h at the end of the tuple, and the latter
enumerates Tk+1 (H).
We now use that the l-quotient 2 2l+1
is 4 + O(1/l) combined with the estil+1
√
mate (k + 2l + 1) = k 1 + O(l/k) and the choice l ∼ k to get
1 − log 3N
Y+ − Y− − YE = Uk+1 (H) + Uk (H) log R · 4 + O √
k
2 √
2
k log k −O
Uk+1 (H) + Uk (H) log N
log N
√
Recalling that R = N 1/4 /e log N , we get that Ŷ = Y+ − Y− − YE is estimated by
k 4 log k k 4 log k
log N p
Uk (H + 1)
1− O
− O √ + log N +
Ŷ = Uk (H)
Uk (H)
(log N )2
log N
k
Finally we use a fact (14) about Uk (H) that we prove in Lemma 9.1 below, namely
Uk+1 (H)
k log H
,
≥H · 1−O
Uk (H)
H
at least when k log H ≪ H. This gives us that
k 4 log k
log N p
+ k log H .
Ŷ ≥ Uk (H) H − O √ + log N +
log N
k
Our choice of k ∼ ċ(log N )4/9 /(log log N )2/9 implies that there is some constant
c > 0 such that this is positive when H ≥ c(log N )7/9 (log log N )1/9 . This gives the
desired result of Goldston and Yıldırım about small gaps between primes. We now
turn to proving the technical lemmata about GH (s1 , s2 ) and Uk (H), again noting
34
SIEVES
log R
log N
that if R can be taken with
bounded gaps.
> 1/4 + ǫ for some ǫ > 0, then we would get
9.6. The singular series on average. We wish to estimate
X
S(H)
Uk (H) =
H∈Tk (H)
under a mild assumption such as H ≫ k(log H). We let z = max(k 100 , H 10 ) and
let Z = e100z (or take the limit as Z → ∞). The parameter z is a sieving limit for
which we detect quasi-primes, while the parameter Z introduces an averaging over
an immense number of shifts of an interval of length H.
We take Qz to be the set of integers which are coprime to all primes less than z.
For each integer i from 1 to Z we define
X
fi =
1.
i+1≤m≤i+H
m∈Qz
This counts z-quasiprimes in the interval [i + 1, i + H]. We let ai (k) = fki · k! be
the number of k-tuples of distinct z-quasiprimes from this interval.
We can note that ai (k + 1) = ai (k)(fi − k) by binomial coefficients, and so
Z
X
ai (k + 1) =
Z
X
i=1
i=1
fi · ai (k) − k
Z
X
ai (k).
i=1
We note that (fi − fj ) and ai (k) − aj (k) have the same sign for any i, j, k, and
thus we get that
X
Z
Z
Z
Z X
Z
X
X
X
ai (k) .
fi
fi · ai (k) −
(fi − fj ) ai (k) − aj (k) = 2 Z
0≤
i=1
i=1 j=1
i=1
i=1
Replacing the right side of the display previous, we get that
Z
X
(13)
i=1
ai (k + 1) ≥
Z
Z
Z
X
X
1 X
ai (k).
ai (k) − k
fi ·
Z i=1
i=1
i=1
We use a familiar averaging technique to exploit the regularity in the fi -sum to get
Z
X
i=1
fi =
Z
X
i=1
X
1=
i+1≤m≤i+H
m∈Qz
Z−H
X
H+
m=H+1
m∈Qz
H
X
m=1
m∈Qz
m+
Z
X
(Z + 1 − m) = H
m=Z−H+1
m∈Qz
Z
X
1 + O(H 2 ).
m=1
m∈Qz
Q
(1 − 1/p)−1 ∼ eγ log z, we have that the sum
p and Vz =
p≤z
p≤z
here is Z/Vz + O P (z) , and we note that P (z) ≤ e3z/2 = O(Z 1/50 ). Also, since
H ≤ z 1/10 we have that H 2 = O (log Z)1/5
P .
Thus we are left to estimate the sum i ai (k). The main fact we use is that
X
an (k) =
1.
Writing P (z) =
Q
H∈Tk (H)
PH (n)∈Qz
Indeed, there are fn integers u in [1, H] with (n + u) coprime to P (z), and the sum
in the above display counts the the number of ways of choosing k of these (including
SIEVES
35
fn
order), which gives k · k! = an (k) as desired. Now we sum this over n and switch
the order of summation to get
Z
X
an (k) =
n=1
Z
X
n=1
X
1=
H∈Tk (H)
PH (n)∈Qz
X
H∈Tk (H)
Z
X
X
1=
n=1
PH (n)∈Qz
R(H).
H∈Tk (H)
We now proceed to estimate the inner sum R(H). We have that
R(H) =
Z
X
1=
n=1
PH (n)∈Qz
Z
X
X
n=1
µ(d) =
X
d|P (z)
d|PH (n)
d|P (z)
µ(d)
Z
X
1.
n=1
d|PH (n)
The inner sum is quite regular, and we get that
Y
X
Y
Z
#ΩH (p)
µ(d)#ΩH (d)
R(H) =
+O
+ O(1) = Z
(1 + k) .
1−
d
p
p≤z
p≤z
d|P (z)
This gives
−k
Z Y
1
#ΩH (p)
R(H) = k
1−
+ O (k + 1)3z/2 log z .
1−
Vz
p
p
p≤z
Since we have k ≤ z the error is O(e3z/2 ) = O(Z 1/50 ), so we get
X 2 Z
k
+ O(Z 1/50 ).
R(H) = S(H) k exp O
2
Vz
p
p>z
The exponential term here is 1 + O(k 2 /z log z). Plugging back into (13), we get
that
X
X
X
1 HZ
R(H).
R(H) − k
·
R(H) ≥
+ O Z 1/5
Z Vz
H∈Tk (H)
H∈Tk+1 (H)
H∈Tk (H)
Evaluating R(H), we get
Z
X
S(H) ≥
Vzk+1H∈T (H)
k+1
Z
H
k2
· k
− 2k · 1 − O
Vz
z log z
Vz
X
S(H).
H∈Tk (H)
Summing and removing common factors gives us that
k2
Uk+1 (H) ≥ Uk (H) · (H − 10k log z) · 1 − O
.
z log z
Finally, we use that z = max(k 100 , H 10 ) and H ≫ k(log H) and get the following:
Lemma 9.1. Suppose that H ≫ k(log H). Then
k log H
.
(14)
Uk+1 (H) ≥ H · Uk (H) · 1 + O
H
36
SIEVES
9.7. Contour integrals. First we outline the method we use. We will consider
Z Z
N
Rs1 +s2
ds1 ds2
T (α) =
F
(s
,
s
)
α
1
2
(2πi)2 (1) (1)
(s1 s2 )k+l+1
k
Z Z
N
Rs1 +s2
ζ(s1 + s2 + 1)
=
ds1 ds2 ,
G
(s
,
s
)
α
1
2
(2πi)2 (1) (1)
ζ(s1 + 1)ζ(s2 + 1) (s1 s2 )k+l+1
where
Fα (s1 , s2 ) =
Y
p
1 − αp
1
1
1
+ s2 − s1 +s2
ps1
p
p
.
#Ω
(p)−1
(p)
Hh
In our applications we will have that αp is either #ΩH
or
p
p−1 , but here
1
k
we shall only assume that 0 ≤ αp ≤ 1 − p for all p and αp = p + O pk2 for p > H.
√
To evaluate T (α) we shall move the contours close (about 1/ log N away)√to the
imaginary axes, and truncate them at a high height (this will be J = e log N ).
The largeness of the denominator (s1 s2 )k+l+1 will ensure that the truncations give
negligible impact. Then we move the inner contour across the imaginary
axis (the
√
same distance away). The new integral is small because of the 1/R1/ log N term,
the horizontal bits are negligible as before, and we get residues from poles at s1 = 0
and s1 = −s2 , while the zero-free region for the ζ-function excludes other poles.
The residue at s1 + s2 = 0 is small mainly because Rs1 +s2 ≪ 1 with everything
else controlled, though an accounting of logarithms is necessary to ensure this. By
moving the other contour similarly across the imaginary axis, the resulting integral
is again small, and we get a residue at s2 = 0. Thus the above expression for T (α)
is essentially given from the residue at s1 = s2 = 0. This double residue is then
expanded in terms of Gα (actually a slight normalisation of it, which eases the
calculation) and its logarithmic derivatives, for which we need good bounds, as we
want the leading Gα (0, 0) term to dominate.
9.7.1. Bounds for Gα . Recall the definition of Gα = Gα (s1 , s2 ) as an Euler product:
−k
−k
k
Y
1/p
1/p
1/p
1
1
1
1 − s1
1 − s2
1 − s1 +s2 .
1 − αp s1 + s2 − s1 +s2
Gα =
p
p
p
p
p
p
p
The first part of this is Fα (s1 , s2 ), while the second is a ζ-quotient.
We write η = max(−ℜs1 , 0) + max(−ℜs2 , 0), always taking η ≤ 1/ log H for
convenience. We first consider primes p ≥ H, for which we have αp = kp + O pk2 .
Their contribution to log Gα is
2 2η 2η X k/p k/p
k p
kp
k/p k/p −k/p
k/p
+O
+
+
+
− s1 + s2 − s1 +s2 +O
p
p
p
p2
ps1 ps2 ps1 +s2
p2
p≥H
which gives a contribution B1 bounded as
B1 ≪ k 2 H 2η−1 ≪
k2
k
≪
.
H
log H
We introduce the parameter U = (6k)1/(1−2η) ≤ (6k)1+3η ≤ 12k, and for primes
with U ≤ p ≤ H we use the fact that 3kp2η /p ≤ 1/2 to bound the contribution B2
to log Gα by
X 3kp2η
kH 2η
k
≪
≪
.
B2 ≪
p
log H
log H
U≤p≤H
SIEVES
37
For p ≤ U we use the trivial upper-bound of log(1 + 3kp2η−1 ) − 3k log(1 − p2η /p),
which is bounded in size by log k + kp2η−1 . The p-sum of the second term is
bounded in size by kH 2η and the first by k so that, upon adding the above estimates
for B1 and B2 , we get that
(15)
log |Gα (s1 , s2 )| ≪ k
and so |Gα (s1 , s2 )| ≪ eck
when η ≤ 1/ log H.
This is an upper bound for Gα ; however, we could have Gα (s1 , s2 ) = 0.
9.7.2. Moving contours. Now we return to the double integral representation
k
Z Z
N
Rs1 +s2
ζ(s1 + s2 + 1)
T (α) =
ds1 ds2 ,
G
(s
,
s
)
α
1
2
(2πi)2 L2 L1
ζ(s1 + 1)ζ(s2 + 1) (s1 s2 )k+l+1
where we moved the contours to L1 given by
√
log N
1
log J
+ it and L2 given by
1
2 log J
+ it
and we truncate the contours to |t| ≤ J. This truncation gives
where J = e
an error of no more than
E3 ≪ N ·
3k
√
eck (log J)3k R3/ log J
N
1+1/ log N (log J)
·
≪
N
≪
,
Jk
Jk
(2k)! eck
in which the denominator swamps the other contributions, the last step using
(a ridiculously crude bound) that J ≥ k(log k)4 . Here we used that ζ and its
reciprocal are both bounded in size by log J on the contours.
−1
Now we move the contour L1 to L−
1 which is given by log J + it, where again
the contribution from the horizontal segments is easy to estimate. This gives us
residues from the poles at s1 = 0 and s1 = −s2 , and the zero-free region for the
zeta-function implies (for large N ) that there are no other poles (there are better
zero-free regions, but we do not require them). The integral over L−
1 (and then
over L2 ) is small due to the dominant R−1/ log J factor, and indeed we can bound
its contribution to T (α) as
E4 ≪ N · eck (log J)3k R−1/2 log J ≪ N ·
N
(c log J)3k
√
≪
.
0.1
log
N
(2k)! eck
e
Here we noted that ζ and its reciprocal are bounded by log J on the contour, both
near the origin and with
√ t near J; the final estimate again comes from comparing
k log log J + k log k to log N , where the latter is much bigger since k ≤ (log N )4/9 .
We are left with the estimation of
Z N
ds2 .
res + res
T̃ (α) =
2πi L2 s1 =0 s1 =−s2
We next note that the contribution from the residue at s1 = −s2 is small. For this,
1
we make a circle C(s2 ) of size 9 log
J about s2 , noting that
res
s1 =−s2
1
=
2πi
Z
C(s2 )
Gα (s1 , s2 )
ζ(s1 + s2 + 1)
ζ(s1 + 1)ζ(s2 + 1)
k
Rs1 +s2
ds1 ,
(s1 s2 )k+l+1
1
1
Since we have the crude bound 9 log
can use the previous
J ≤ log H , on C(s2 ) we
bound (15) of |Gα | ≪ eck , while we have ζ(s1 + s2 + 1) ≪ log J. Noting the
38
SIEVES
bounds |s1 | ≪ |s2 | ≪ |s1 | and that 1/sζ(s + 1) ≪ 1 for our contours, we get
Z
Z
N
ds2
≪ N · (c log J)k+2l+2
res ds2 ≪ N
eck (log J)k
2l+2
2πi L2 s1 =−s2
|s
|
2
L2
≪ N (c log N )k/2+l+1 ≪
N (log N )k
,
(k + 2l + 1)! eck
where the last step as usual follows
from making the comparison of logarithms:
(k/2) log log N +k log k ≤ 17k/18 log log N ≤ k log log N , recalling k ≤ (log N )4/9 .
9.7.3. The double residue. Finally we want to estimate
Z
N
T̂ (α) =
res ds2 ,
2πi L2 s1 =0
and for this we write
k
k
(s1 + s2 )ζ(s1 + s2 + 1)
s1 + s2
(16) Zα (s1 , s2 ) = Gα (s1 , s2 )
= Fα (s1 , s2 )
,
s1 ζ(s1 + 1)s2 ζ(s2 + 1)
s1 s2
which is regular (and nonzero) about (0, 0). This function has the advantage that
the poles come from powers of s1 and s2 rather than from ζ-functions, and this will
allow a more direct analysis. We put this into the integral to get
Z
Zα (s1 , s2 )Rs1 +s2
N
ds2 ,
res
T̂ (α) =
2πi L2 s1 =0 (s1 + s2 )k (s1 s2 )l+1
1
and move the integral to the contour L−
2 given by given by − 2 log J + it. The only
residue is at s2 = 0, and the contributions from the horizontal parts at height J
and the integral over L−
2 are seen to be small as before (the first as with E3 since J
is big, and the second as with E4 as the R-exponent of is sufficiently negative).
Writing the residues as integrals we get
Z Z
N
Zα (s1 , s2 )Rs1 +s2
Zα (s1 , s2 )Rs1 +s2
=
ds1 ds2 ,
T (α)=N · res res
s2 =0 s1 =0 (s1 + s2 )k (s1 s2 )l+1
(2πi)2 C2 C1 (s1 + s2 )k (s1 s2 )l+1
where C1 is a small circle and C2 is twice as big. We substitute s1 = s, s2 = sξ
and take C3 : |ξ| = 2 to get
Z Z
N
ds
s dξ
.
T̂ (α) =
Zα (s, sξ)Rs(ξ+1) · k+2(l+1)
(2πi)2 C3 C1
(ξ + 1)k ξ l+1
s
Now we expand Zα in a bivariate Taylor series, getting that
Zα (x, y) =
∞ X
∞
(i,j)
X
Zα (0, 0)
i=0 j=0
and by expanding R
T (α) =
N
(2πi)2
xi y j ,
i!j!
s(ξ+1)
∞ X
∞
X
i=0 j=0
via the exponential function, we get
Z Z
∞
(i,j)
Zα (0, 0) X (log R)d
sd si+j ds (ξ + 1)d ξ j dξ
.
k+2l+1 (ξ + 1)k ξ l+1
i!j!
d!
C3 C1 s
d=0
The s-integral picks out d = k + 2l − (i + j) to give
T (α) =
∞
∞
(i,j)
N X X Zα (0, 0) (log R)k+2l−(i+j)
2πi i=0 j=0
i!j!
k + 2l − (i + j) !
Z
C3
(ξ + 1)2l−(i+j) ξ j−l−1 dξ.
SIEVES
39
⋆
By typical methods (for instance, the residue at infinity), the integral is 2l−(i+j)
,
l−j
where the star restricts
the
arguments
to
have
the
same
sign
(else
it
is
zero),
and
−A
A+B
where −B
= B−1
for A, B > 0.
A−1 (−1)
9.7.4. Lower bounds on Gα (0, 0) and bounds on√derivatives. We now restrict ourselves to the bi-disc D given by |s1 |, |s2 | ≤ 1/8 k log k. From our above bounds
with (15) on B1 and B2 , the primes p ≥ U contribute no more than logckH to log Gα
in the larger domain η ≤ 1/ log H, and so in particular in D. For the primes
with p ≤ U we have that Xp (s1 , s2 ) = p−s1 + p−s2 − p−s1 −s2 is (in D) given by
1 − s1 log p + 1 − s2 log p − 1 − (s1 + s2 ) log p + O (|s21 | + |s22 |)(log p)2 ,
1
and thus (calculating the O-constant) we get |Xp (s1 , s2 )| ≤ 1+ 3p
. We have assumed
1
1
that αp ≤ 1 − p , so that αp Xp (s1 , s2 ) ≤ 1 − 2p and we get
log 1 − αp 1 + 1 − 1
≤ log 1 − 1 − 1
≪ log p.
ps1
ps2
ps1 +s2 2p
Thus summing this over p ≤ U gives a contribution to log Gα that is bounded
by U ≪ k, and so by evaluating at (0, 0) we get the claimed lower bound of
Gα (0, 0) ≫ 1/eck on the singular series.
Next we turn to bounding ∂1 Zα at (0, 0), though in comparison to Zα (0, 0) itself.
The ζ-part of Zα contributes an amount bounded by ck. With Gα , we again note
that for p ≥ U and |s1 |, |s2 | ≤ 1/ log H, from the previous estimates for B1 and
k
B2 we have the bound | log G≥U
α (s1 , s2 )| ≪ log H . Thus by the Cauchy estimate
we have that the derivative of this with respect to either s1 or s2 is bounded by
k
log H · log H for |s1 |, |s2 | ≤ 1/2 log H, and so in particular in D.
For the primes p ≤ U , we can compute the derivative directly and get
X αp |p−s1 [1 − p−s2 ]| log p
X (log p)2
∂1 Fα≤U (s1 , s2 ) ≪
√
≪
≪ k 3/2 ,
F ≤U (s , s ) 1
|1 − αp Xp (s1 , s2 )|
k log k
α
1
2
p≤U p
p≤U
G≤U
α
while the ζ-quotient contribution to
and the ζ-quotient in the definition (16)
are bounded in size by k. Thus we have that
(1,0)
Zα (s1 , s2 ) 3/2
Zα (s1 , s2 ) ≪ k
√
in the region |s1 |, |s2 | ≤ 1/8 k log k, and similarly with the s2 -derivative.
(1,0)
1 Zα
so that the product formula for derivatives gives
= Zα · ∂Z
We write Zα
α
(m−u,n−v)
n m X
X
n (u,v) ∂1 Zα
m
Zα
(17)
Zα(m+1,n) =
.
u v=0 v
Zα
u=0
(m,0)
(0, 0) = 0 for m ≥ 1, and similarly when taking derivatives
We note that Zα
of the second variable. Indeed, by the above formula we need only show that
(log Zα )(m,0) vanishes at (0, 0), and then the result follows by induction. We have
∞
XX
1
1
1
αp (−1)k
k
.
+
−
Hp (s1 , s2 ) where Hp (s1 , s2 ) =
log Gα =
k
ps1 ps2 ps1 +s2
p
k=1
Since ∂1m Hp (s1 , s2 )(0, 0) = 0 for any m ≥ 1, using the product rule for derivatives
gives the same evaluation of (log Gα )(m,0) at the bi-origin. For the ζ-quotient part
40
SIEVES
of log Zα , we note it is of the form V (s1 , s2 ) = log U (s1 + s2 )P
− log U (s1 ) − log U (s2 )
for some function U . Expanding log U (s) in a power series l cl sl , we get that the
mth s1 -derivative of V (s1 , s2 ) at the bi-origin is m! cm − m! cm = 0 as desired.
By the Cauchy formula for derivatives, there is some constant c with
Z Z ∂1 Zα
(s
,
s
)
ds
ds
√
1
2
1
2
∂1 Zα (i,j)
i+j
Zα
= i! j!· (0,
0)
,
≤ ck 3/2 ·i!·j!· 8 k log k
Zα
i+1 j+1
2
s1 s2 (2πi)
√
where we integrated over the small circles |s1 |, |s2 | = 1/8 k log k.
Z (m,n) (0,0)
(m,n)
We divide (17) by Zα , and write Yα
= αZα (0,0) ; upon noting that v = n
does not contribute due to the vanishing of derivatives, the above bound yields
n−1
m X n
X
√
m+n−u−v
m
.
(n − v)!Yα(u,v) 8 k log k
(m − u)!
Yα(m+1,n) ≤ ck 3/2
v
u
v=0
u=0
When m = 1 we have
√
n
Yα(1,n) ≤ 2ck 3/2 · n! · 8 k log k ,
and by induction, we get (using the result when m ≤ n)
√
n
(18)
Yα(m,n) ≤ (2ck 3/2 )m · m! · 2n n! · 8 k log k ,
as from this assumption we can compute that (the u = m term dominates)
m
n−1
X √
√
n X
m−u
Yα(m+1,n) ≤ ck 3/2 · m! · n! · 8 k log k
8 k log k
(2ck 3/2 )u
2v
u=0
≤ ck
3/2
√
n
· m! · n! · 8 k log k · (m + 1) · 2(2ck 3/2 )m · 2n .
v=0
Returning to the expression
T̂ (α) = N ·
⋆
∞ X
∞
(i,j)
X
Zα (0, 0) (log R)k+2l−(i+j) 2l − (i + j)
,
i!j!
l−j
k + 2l − (i + j) !
i=0 j=0
the i = j = 0 term gives the main contribution as
(log R)k+2l
2l
.
T̃ (α) = N · Zα (0, 0) ·
·
(k + 2l)!
l
For 2 ≤ i+j ≤ 2l we group according
to i+j = r, taking j ≤ i by symmetry, and
bound the binomial coefficient by 2ll , and then multiply and divide by Zα (0, 0)
before using (18) to get a contribution bounded as
X
r/2
2l X
√
r−j (log R)k+2l−r
2l
,
(2ck 3/2 )j 16 k log k
E5 ≪ N · Zα (0, 0)
·
l
k + 2l − r !
r=2 j=1
X
2l
r/2 (log R)k+2l−r
2l
32ck 2 log k
≪ N · Zα (0, 0)
·
l
k + 2l − r !
r=2
r
2l
p
r
2k
2l (log R)k+2l X
32ck log k
·
≪ N · Zα (0, 0)
(k + 2l)! r=2
log R
l
2
2√
k+2l
2l (log R)
k log k
≪ N · Zα (0, 0)
.
·
l
(k + 2l)!
log R
SIEVES
41
Here we pulled out the desired
that k + 2l ≤ 2k, and executed
√ main term, noting
the r-sum assuming that k 2 log k = o log R .
When 2l ≤ i + j ≤ k + l we√
switch the binomial
√ coefficient around and bound it
j−l−1
r
l
l
by r−2l−1 ≤ l ≤ r /l! ≤ (6 k) due to l ∼ k, and get
r/2
k+l X
√
√ lX
r−j (log R)k+2l−r
,
(2ck 3/2 )j 16 k log k
E5 ≪ N · Zα (0, 0) · (6 k)
k + 2l − r !
r=2l j=1
k+l
√ lX
r/2 (log R)k+2l−r
32ck 2 log k
≪ N · Zα (0, 0) · (6 k)
k
+
2l
−
r
!
r=2l
√
2l
√
(log R)k+2l
32ck 2 log k
· (6 k)l
≪ N · Zα (0, 0)
(k + 2l)!
log R
2
2√
k+2l
(log R)
1
2l (log R)k+2l
k log k
≪ N · Zα (0, 0)
.
· 2l ≪ N · Zα (0, 0)
·
(k + 2l)!
2
(k + 2l)!
log R
l
The last line notes essentially that 192ck 9/2 log k/(log R)2 ≤ 1/2 √due to the asymp√
2
k 2
as l ∼ k.
totic k ∼ ċ(log N )4/9 /(log log N )2/9 , and then that (1/2)2l ≤ k loglog
R
Noting that Zα (0, 0) = Gα (0, 0), this gives the desired bound that
2√
2 k log k
(log R)k+2l
2l
1+O
.
T (α) = N · Gα (0, 0) ·
·
(k + 2l)!
log N
l
We can note that the main part of the error term is induced by
(1,1)
Zα (0, 0) X αp (log p)2
kp(log p)2
− k(γ 2 + 2γ1 )
−
=
Zα (0, 0)
1 − αp
(p − 1)2
p
1
+ γ − γ1 (s − 1) + O (s − 1)2 , and the sum will indeed be of
where ζ(s) = s−1
size k 2 log k in the case when 1 − αp ≈ 1/p for most primes p ≤ k.
10. Bombieri and the asymptotic sieve
We now discuss the limitations of sieve methods. For instance, if we only use
information about congruence sums, we cannot detect information about primes.
This is one instance of the parity problem. This has been overcome in some situations through the use of bilinear forms (similar to Vaughan’s identity), but as
Selberg pointed out long ago, cannot be overcome in general. We describe some of
Bombieri’s work in this genre.
10.1. Limits of sieves. Suppose that we have a nonnegative sequence (an ) with
summatory function A(x), and there is a multiplicative function g such that the
congruence sums satisfy Ad (x) = g(d)A(x) + rd (x) with
X
d≤x1−ǫ
|rd (x)| ≪B,ǫ
A(x)
(log x)B
for all B, ǫ > 0. This is about as strong of aP
statement that could be expected in
ap ?
general. Can we then get an asymptotic for
p≤x
42
SIEVES
10.1.1. Selberg’s example. The answer to this is in the negative. Let λ(n) be the
Liouville function which is totally multiplicative and −1 on all primes. We have
Y
−1
∞
X
λ(n) Y
1
1
1
ζ(2s)
=
.
1 − s + 2s − + · · · =
1+ s
=
F (s) =
s
n
p
p
p
ζ(s)
p
p
n=1
By the zero-free region for ζ(s), as with the prime number theorem we obtain
X
x
λ(n) ≪ c√log x .
e
n≤x
Now consider the sequence an = 1 + λ(n), which is clearly nonnegative. We have
X
X
Ad (x) =
1 + λ(m)λ(d)
1 + λ(n) =
n≤x
n≡0(d)
=
m≤x/d
X
x
x/d
x
√
λ(m) = + O
+ O(1) + λ(d)
.
d
d
ec log x/d
m≤x/d
So with g(d) = 1 for all d, for all B, ǫ > 0 we get
X
X 1
x
x
x log x
√
|rd (x)| ≪
.
≪ c√ǫ log x ≪B,ǫ
d ec log x/d
(log x)B
e
d≤x1−ǫ
d≤x1−ǫ
P
However, it is clear that p ap = 0. Contrariwise, by taking an ≡ 1, we get a
P
ap ∼ x/ log x.
sequence with the same g and the same remainder estimate, but
p≤x
This exhibits the parity problem. As Bombieri’s work will show, we can get
results for sums supported on (say) numbers with either one or two prime factors,
or five/six, etc., but the information solely from congruence sums cannot distinguish
between an even and odd number of prime factors.
10.2. Preliminaries. We define a generalisation of the von Mangoldt function.
This is given by Λk = µ ⋆ Lk where L(n) = log n, so that Lk = 1 ⋆ Λk and
k
X
n
µ(d) log
Λk (n) =
.
d
d|n
Thus we have that Λ1 = Λ and can note that
1 ⋆ Λk+1 = Lk+1 = L · Lk = L · 1 ⋆ Λk = L ⋆ Λk + 1 ⋆ LΛk = 1 ⋆ Λ ⋆ Λk + 1 ⋆ LΛk
and then by grouping and convolving with µ on the left, we obtain the recurrence
formula Λk+1 = LΛk + Λ ⋆ Λk . This gives that Λk is nonnegative, and thus from
Lk = 1 ⋆ Λk we get that 0 ≤ Λk ≤ Lk . We note that Λk is supported on numbers
with at most k distinct prime factors. Finally, the additivity of the logarithm and
the binomial theorem together imply
k X
k
Λj (m)Λk−j (n) when gcd(m, n) = 1.
(19)
Λk (mn) =
j
j=0
It will be technically convenient to have a slightly modified version of this, namely
k X
k X
k−j
k X
x
x
x
k
n
x
µ(d) log
µ(d) log +log
Λj (n) log
Λk (n) =
=
=
.
j
d
n
d
n
j=0
d|n
d|n
SIEVES
43
Taking only the j = k term gives a lower bound for n ≤ x, while an upper bound
follows from iterating the bound LΛk ≤ Λk+1 to get Lk−j Λj ≤ Λk for j ≤ k, so
(20)
Λxk (n)
log n
log x
k
k−j
k x
Λk (n) X k
j
≤
(log n) log
= Λk (n) ≤ Λxk (n).
(log x)k j=0 j
n
We also have that Λxk (n) ≤ (log x)k for n ≤ x, which easily follows by using the
bound Λj (n) ≤ (log n)j in the definition of Λxk (n) and use the binomial theorem.
10.2.1. More inequalities for the modified generalised von Mangoldt functions. We
next generalise (19) for Λxk . Assuming that gcd(m, n) = 1, we can write any divisor d|mn uniquely as d = ab with a|m and b|n. Again from additivity of the
x
logarithm we have log xd = log bm
+ log m
a , and so we get
Λxk (mn) =
X
d|mn
k X
k
X
x
x
m
µ(b) log
µ(d) log
µ(a)
=
=
+ log
d
bm
a
a|m
b|n
j X
k−j X
k X
k X
x
k
m
k
x/m
µ(b) log
=
Λk−j (m)Λj (n).
µ(a) log
=
j
j
bm
a
j=0
j=0
b|n
a|m
Now note that when m > 1 we have that Λ0 (m) = 0, and so we can eliminate j = k
x/m
≤ Λxj , and assume that mn ≤ x
from the j-sum. We use Λk−j ≤ Lk−j and Λj
x
so that log m ≤ log n . Expanding the definition of Λxj then (sloppily) gives us that
Λxk (mn)
≤k
k−1
X
j=0
≤ k log m
≤ k log m
(21)
k−1
(log m)k−j Λxj (n)
j
j x j−i k−1 X j
x k−j−1
Λi (n) log
log
i
j
n
n
i=0
k−1
X
j=0
k−1
X
Xk−1
i=0
≤ k2k log m
j
i
j=i
x k−1−i
k−1
Λi (n) log
n
j
k−1
X
i=0
x k−1−i
k−1
Λi (n) log
= k2k Λxk−1 (n) log m.
i
n
Since this is true when mn ≤ x and gcd(m, n) = 1 and
Qm > 1, we can induct
on the number of prime
divisiors.
For
squarefree
n
=
i pi with r prime fac
Q
Q
k−r
k−r
is an upper
tors we get that Λk i pi , x ≪k (log x)
i log pi . Here (log x)
x
x
bound for Λk−r (1). Using that Λk (n) is supported on integers with no more than k
prime divisors, for a nonnegative mulitplicative function g supported on squarefree
integers we get
k
X
X
(22)
g(n)Λk (n, x) ≪k log x +
g(p) log p .
n≤x
p≤x
10.2.2. Some more sieve facts. We continue with some comments like those from
Section 3.6, as we are interested in monotonicity of sieves. Let (λd ) be an upper
44
SIEVES
bound sieve of level z, and g be a nonnegative multiplicative function supported on
squarefree integers. Writing σ = 1 ⋆ λ so that λ = µ ⋆ σ, for all q we have
Y
XX
X
X
1 − g(p)
σ(b)g(b)
λd g(d) =
µ(a)σ(b)g(a)g(b) =
gcd(ab,q)=1
gcd(a,b)=1
gcd(d,q)=1
=
ĝ(q)
g(q)
p∤bq
gcd(b,q)=1
X
σ(b)g(b)
Y
p∤b
gcd(b,q)=1
ĝ(q) X
1 − g(p) ≤
λd g(d),
g(q)
d
g(p)
where ĝ(p) = 1−g(p)
as before (assuming g(p) < 1), and the inequality comes from
dropping the gcd condition on b (exploiting the positivity of σ), which puts us in
the q = 1 case, and the argument is then reversed. By subtracting to get the
complementary sum, for any prime p ≤ z we get
X
X
ĝ(p) X
λd g(d) ≥ 1 −
λd g(d) = −ĝ(p)
λd g(d).
g(p)
d
d≡0(p)
d
Let α be multiplicative and β be additive, both nonnegative, and
X
X
X
X
X
β(p)
α(q) =
β(p) =
α(q)
α(q)β(q) =
η(d) =
q|d
q|d
=
X
β(p)α(p)
Y
X β(p)α(p) Y
1 + α(l) =
1 + α(l) .
·
1 + α(p)
p|d
l|(d/p)
p|d
q|d
q≡0(p)
p|d
p|q
l|d
In the second line, l is prime. We let f (p) = g(p) 1 + α(p) , and make the assumption that f (p) < 1 for all p ≤ z. We then use the above inequality with f for each
prime p ≤ z, and weight each contribution by α(p)β(p) fg(p)
(p) before summing to get
X
α(p)β(p)
p≤z
X fˆ(p)
X
g(p) X
λd f (d) ≥ −
α(p)β(p)g(p) ·
λd f (d).
f (p)
f (p)
d≡0(p)
p≤z
d
We now proceed to manipulate the left side of this, while for the right side we simply
P α(p)β(p)g(p)
ˆ
1
. Noting that ff (p)
define ψ(z) =
1−f (p)
(p) = 1−f (p) , we get the inequality
p≤z
−ψ(z)
(23)
X
d≤z
λd f (d) ≤
X
α(p)β(p)
p≤z
=
X
d≤z
X
X
g(p)
g(p) X
α(p)β(p)
λd f (d) =
λd f (d)
f (p)
f (p)
d≡0(p)
d≤z
p|d
f (d) X α(p)β(p) X
·
=
λd g(d) · η(d).
λd g(d)
g(d)
1 + α(p)
p|d
d≤z
10.2.3. Multiplicative functions. The following subsections can get quite technical.
In the sequel, we will assume an asymptotic (for u ≥ 1) on a nonnegative multiplicative function g such as
X♭
1
,
g(d) = c1 log u + c2 + O
(24)
(log 9u)B
d≤u
(for some large B) where the ♭ restricts to squarefree integers. Indeed, we shall
often simply assume that g is supported on squarefree integers. Also, it is nice to
SIEVES
45
have 0 ≤ g(d) < 1. Since the sum is asymptotically to a linear function in log u, we
are in the situation of the linear sieve. As with Theorem 2.1, we have that
Y
1
,
1 + g(p) 1 −
c1 =
p
p
which we want to be positive. Although the error term we assume in (24) is significantly stronger than that in Theorem 2.1, it can often be established in practise.
10.2.4. Coprimality restrictions. We show that (24) implies similar asymptotics
when restricted by a coprimality condition. Indeed, for q ≥ 1 we shall show
X♭
τ (q)
.
g(m) = αq c1 log x + βq + c2 + O
(25)
(log 9x)B
m≤x
gcd(m,q)=1
where
αq =
Y
p|q
1
1 + g(p)
To show this, we write
−1 X
Y
un
g(p)
=
1+ s
P (s) =
p
ns
n
and βq =
X g(p) log p
p|q
so that
X
1 + g(p)
.
X g(m)
g(m)
= P (s)
,
s
m
ms
m
gcd(m,q)=1
p|q
implying that
X X
X X
x
1
ud g(m) =
ud c1 log + c2 + O
.
g(m) =
d
(log 9x/d)B
m≤x
d≤x
d≤x m≤x/d
gcd(m,q)=1
We also introduce
−1 X
Y
g(p)
|un |
1− s
Q(s) =
=
p
ns
n
p|q
and R(s) = −
∞
X X g(p)l log p
Q′
.
(s) =
Q
psl
p|q l=1
The main terms are estimated (with Rankin’s trick for d ≥ x) as
(B)
X
X
X
Q (0)
|ud |(log d)B
=
α
+
O
,
ud = P (0) −
ud = P (0) + O
q
(log x)B
(log 9x)B
d≤x
and
d≥x
−
X
d
ud log d = P ′ (0) + O
d≤x
X
d≥x
|ud |(log d)B+1
(log x)B
= P (0)βq + O
Q(B+1) (0)
.
(log 9x)B
The last simplifies since P (0) = αq . Turning to an estimate of the error term, we
B
B
note that (log 9x)B = log 9x
≪ log 9x
(log 3d)B , so we get a bound of
d + log d
d
P (j) B X |ud |(log 3d)B
XX
1
B
j Q (0)
j
B−j
|u
|(log
d)
(log
3)
≪
=
.
d
B
(log 9x)B
(log 9x)B
(log 9x)B
j
j=0
d≤x
d≤x
Differentiating the equation −Q′ (s) = Q(s)R(s) repeatedly (j − 1) times gives
j−1 ∞
X
XX
j−1
(j)
−Q (0) =
Q(k) (0)R(j−1−k) (0) ≪B maxQ(k) (0)
g(p)l (2l log p)j
k<j
k
k=0
p|q l=1
46
SIEVES
The bound (24) implies that g(p) ≪ 1/(log p)B , and combined with g(p) < 1 this
gives that the last sum is bounded uniformly in p. So we get |Qj (0)| ≪j Q(0)ω(q)j
by induction, which gives the total error to be bounded as
≪B
Q(0)ω(q)B+1
αq τ (q)
≪
,
(log 9x)B
(log 9x)B
where we noted that
Q(0) · ω(q)B+1 ≪
Y
p|q
Y
2
2
1
· (4/3)ω(q) B B ≪B,g
= αq τ (q),
1 − g(p)
1 + g(p)
p|q
since, again from (24), there are only finitely many primes with
1+g(p)
1−g(p)
≥ 3/2.
10.2.5. Sums over primes. We now show that (24) implies the expected asymptotic
X
log 9u
.
(26)
g(p) log p = log u + c + O
(log 9u)B
p≤u
for some constant c. To see this, we can use Tchebyshev’s trick with ng(n) to note
X
XX
X
X
g(n) · n log n =
g(mp) · mp log p =
g(p) p log p
mg(m).
n≤x
mp≤x
p≤x
m≤x/p
gcd(m,p)=1
By partial summation of (24) we have the asymptotic
X
x(log x)
,
g(n) · n log n = c1 x(log x − 1) + O
(log 9x)B
n≤x
and by partial summation of (25) we have
X
c1
x
n · g(n) =
.
x+O
1 + g(p)
(log 9x)B
n≤x
gcd(n,p)=1
Plugging this into the previous display, we get
X g(p) log p
x(log x)
.
+O
c1 x log x − c1 x = c1 x
1 + g(p)
(log 9x)B
p≤x
To obtain (26), we use g(p) < 1 and g(p) ≪ 1/(log p)B to note that the sum here is
X
XX
X
log x
k
k
g(p) log p +
(−1) g(p) log p =
g(p) log p + c̃ + O
.
(log 9x)B
p≤x
p≤x k≥2
p≤x
10.2.6. Sums over almost primes. Furthermore, once we have this relation for sums
over primes, we can derive a relation for sums over almost primes, namely that
X
(27)
g(n)Λk (n) = (log u)k + c · k(log u)k−1 + O (log u)k−2+1/(B−1) .
n≤u
Writing n = md, by the recurrence for Λk+1 we have
X
X
X
X
Pk+1 (x) =
g(n)Λk+1 (n) =
g(n)Λk (n) log n+
g(m)Λk (m)
g(d)Λ(d).
n≤x
n≤x
m≤x
d≤x/m
gcd(m,d)=1
SIEVES
47
The inner sum is estimated from (26), and by putting this back into the above and
turning m → n, the logarithm simplifies and we get
X
X
(log 9x/n)
g(p) log p + O
Pk+1 (x) =
g(n)Λk (n) log x + c + O
.
(log 9x/n)B
n≤x
p|n
We bound the contribution of the first error term as
X
X
X
X
g(pm)Λk (pm) ≪k
g(p)2 Λk (p) log p
g(m)Λk−1 (m)
g(p) log p
p≤x
p≤x
m≤x/p
≪
X
g(p)2 (log p)k+1
p≤x
X
m≤x
m≤x
g(m)Λk−1 (m) ≪ (log x)k−1 ,
where we used (19) to note that Λk (pm) ≤ Λk (p)Λk−1 (m) as p 6= 1 (taking care
when k = 1) and in the last step used induction before noting that the p-sum
converges. Indeed, returning to the main term, induction now gives (27) upon
1/(B−1)
log 9x/y
taking y with x/y = e(log x)
so that (log
≪ log1 x and noting that then
9x/y)B
X
g(n)Λk (n)
n≤x
X
X
log 9x/n
log 9x/y
≪
+
g(n)Λk (n)
g(n)Λk (n),
B
B
(log 9x/n)
(log 9x/y)
n≤y
y≤n≤x
and this is bounded by (log x)k−1 + (log x)k−1 log xy ≪ (log x)k−1+1/(B−1) .
10.3. Statement of results. The first main result of Bombieri is the following:
Proposition 10.1. Suppose that (an ) is a sequence of nonnegative real numbers
supported on squarefree integers in [1, x]. Suppose there is some multiplicative
function g and some constants c1 > 0, B > 20, and c2 such that we have the
estimate Ad (x) = g(d)A(x) + rd (x) with
X♭
1
g(d) = c1 log u + c2 + O
(log 9u)B
d≤u
for all u with 1 ≤ u ≤ x and, letting D ≥ x1−1/2k ≥ x3/4 be a parameter,
X♭
A(x)
.
τ5 (d)|rd (x)| ≪
(log x)2
d≤D
P
an ≪ A(x)/(log x)2 , for k ≥ 2 we have
Supposing a crude bound like
n≤D
(28)
Sk (x) =
X
n≤x
where H =
Q
p
log(x/D)
an Λk (n) = Hk · A(x)(log x)k−1 1 + Og,k
,
log x
−1
[this corresponds to 1/cĝ in previous notation].
1 − g(p) 1 − 1p
The condition that the sequence be supported on squarefree integers is technically convenient, and will be removed below. In the condition on the remainder
term, the thrust of Bombieri’s final result is that D can be taken as x1−ǫ for
any ǫ > 0, so that the remainder term is quite well-behaved — the extra τ5 in the
sum should only perturb our estimate by a power of logarithm, which does not
bother us much. The third condition just says an is not too spiky. The condition
that (an ) is supported on [1, x] is commonly denoted as the “local sieve”, and allows
us to ignore moduli larger than x without comment.
48
SIEVES
We shall first approximate Λk with Λxk , and here the crude bound implies that
k
the induced error is small. The term log x/d from the convolution for Λxk is small
(when k ≥ 2) for d near x, so the large moduli (larger than D2 /x) contribute little.
The main term is then computed via congruence sums for the smaller moduli.
A consequence of Proposition 10.1 is the Selberg relation; taking an = 1 for all n
with 1 ≤ n ≤ x and k = 2, we get
X
X
X X
log p log q =
Λ2 (n) ∼ 2x log x,
(log p)2 +
p≤x
n≤x
p≤x q≤x/p
from which the prime number theorem can be obtained via elementary means.
P
10.4. First transformations.
By definition we have Sk (x) = n an Λk (n). We
P
define Skx (x) = n an Λxk (n), and from (20) we get
k k X
X
X
log n
log D
x
x
Sk (x) ≥ Sk (x) ≥
an Λk (n)
≥
an Λxk (n)+
an (log n)k .
log x
log x
n≤x
n≥D
n≤D
We extend the first sum to n ≤ x, and use the crude bound on the errors to get
x
k x
k A(x)
,
Sk (x) ≥ Sk (x) ≥ (1 − ǫ) Sk (x) + O (log x)
(log x)2
x
where ǫ = log(x/D)
log x . So if we can prove the asymptotic formula (28) for Sk (x), then
x
x
we will have |Sk (x) − Sk (x)| ≪ kǫSk (x), and so (28) will also be true for Sk (x).
10.5. The introduction of an upper-bound sieve. The next technical reduction is quite important. We intend to split the convolution for Λxk (n) based on
whether a divisor d is bigger than D2 /x. However, this splitting forgets the information that Λxk (n) is supported on integers with no more than k distinct prime
factors. Bombieri introduces an upper-bound sieve λe of level z where we will have
z = x/D in the end. In particular, we have that
λ1 = 1,
|λe | ≤ τ3 (e),
λe = 0 for e > z,
(we could perhaps get |λe | ≤ 1 by using a combinatorial sieve) and we have that
X
λe ≥ 0 for all n.
σn = 1 ⋆ λ =
e|n
Let Pz =
Q
p and note σn = 1 when gcd(n, Pz ) = 1 (only e = 1 contributes).
p≤z
1/2k
, this is often true for things in the support of Λxk (n). We get
X
X
X
Ŝkx (x) =
an σn Λxk (n) =
an Λxk (n) + O
an 1 + σn Λxk (n) .
Since z ≤ x
n≤x
n≤x
n≤x
gcd(n,Pz )>1
Due to fact that |λe | ≤ τ3 (n), we get that |σn | ≤ 4k for integers n with at most k
prime factors. Noting that Λxk (n) is supported on such integers, we break the sum
at D to get a bound for the error term of
X
X
X
E1 ≪ 4k
an Λxk (n) ≪
an Λxk (n) + (log x)k
an .
n≤x
gcd(n,Pz )>1
D≤n≤x
gcd(n,Pz )>1
n≤D
SIEVES
49
The second sum is small by the crude bound. By writing n = mp where p ≥ n1/k
is the largest prime factor of n and m ≤ n1−1/k ≤ x1−1/k , by (21) we get that the
remaining part of the error term is bounded by
X
X
X
X
Ê1 ≪
amp Λxk (mp) ≪
k2k Λxk−1 (m)
amp log p.
m≤x1−1/k D/m≤p≤x/m
gcd(p,m)=1
gcd(m,Pz )>1
m≤x1−1/k
gcd(m,Pz )>1
D/m≤p≤x/m
gcd(p,m)=1
Estimating log p ≤ log x, we now use a sieve inside a sieve (as it were) on the sum
over p. That is, we let (bl ) = (alm ), and note that for primes not dividing m,
the distribution function g is the same for ~a and ~b. We write y = D/m and then
estimate by Theorem 4.1 (Selberg’s upper bound sieve) that
X
X
Y
X♭
amp ≤
bl ≪ B(x)
τ3 (l)|rml (x)|.
1 − g(p) +
p≤x/m
gcd(p,Py )=1
√
p≤ y
p∤m
l≤x/m
gcd(l,Py )=1
l≤y
Since B(x) = Am (x), upon accounting for primes p|m the error estimate becomes
X♭
X
ĝ(m) Am (x)
x
τ3 (l)|rml (x)| .
+
E1 ≪
Λk−1 (m) log x
g(m) log D/m
1−1/k
l≤D/m
m≤x
gcd(m,Pz )>1
We note D/m ≥ x1/2k so that log D/m ≫k log x, and ĝ(m)
g(m) ≪ τ (m) as there are
finitely many p with ĝ(p) ≥ 2g(p). Using Am (x) = g(m)A(x) + rm (x) we then get
X
X♭
X
x
τ4 (d)|rd (x)|.
(log x)k
E1 ≪
Λk−1 (m) ĝ(m)A(x) + τ (m)|rm (x)| +
m≤x1−1/k
m≤x1−1/k
gcd(m,Pz )>1
d≤D
The remainder terms are bounded by A(x)(log x)k−2 , and in the ĝ-sum we write
m = lp where p ≤ z is prime, and use (21) and then (22) and (26) to get
X
X
X
ĝ(p) log p
ĝ(m)Λxk−1 (m) ≪k
ĝ(l)Λxk−2 (l) ≪ (log z)(log x)k−2 .
m≤x1−1/k
gcd(m,Pz )>1
p≤z
l≤x
Since log z = ǫ log x, this gives a total error estimate of
Ŝkx (x) = Skx (x) + O ǫA(x)(log x)k−1 .
2+g(p)
10.5.1. The choice of sieve. Let f (p) = g(p) 1+g(p)
be multiplicative and let Sg be
the (finite) set of primes p for which f (p) ≥ 1. We wish to shape our sieve after f ,
but at the same time we must exclude the primes with f (p) ≥ 1. From the Selberg
sieve (9) we choose λe supported outside Sg in such a way to minimise
−1
X
X
fˆ(e)
.
G=
λe f (e) =
e
√
e≤ z
e∈S
/ g
From (26) we see that f satisfies the regularity hypothesis for a sieve of dimension 2,
and the crude bound follows since g satisfies the same [due to (24)]. Letting f˜ be
the restriction of f outside Sg , (and similarly with g̃), from Theorem 2.1 we get
X
X
1
(29)
λe f (e) =
.
λe f˜(e) ≪
(log
z)2
e
e
50
SIEVES
10.6. Splitting and first estimation. Now we split the sum Ŝkx (x) according to
the size of the divisor in the convolution for Λxk . This gives that
k X
k
X
X
X
x
x
µ(d) log
Ŝkx (x) =
an σn
µ(d) log
+
= U1 + U2 .
an σn
d
d
d|n
d|n
n≤x
n≤x
d≤D2 /x
d>D2 /x
Here we estimate the secondary term U2 . Writing n = dt we have that
k k X
X
X
X
x
x
2 log
≤ 2 log
U2 ≪
an σn
an σn
1
D
D
d|n
t|n
n≤x
n≤x
d>D2 /x
k X
X
x
λe
≪ 2 log
D
2
e≤z
≪ (2ǫ log x)k A(x) ·
t<x /D2
X
e≤z
λe
X
t<nx/D2
an
n≡0(t)
n≡0(e)
X
t<x2 /D2
g(m) + (log x)k
X
τ3 (e)
e≤z
X
rm (x),
t<x2 /D2
where m = lcm(e, t). Provided that z(x2 /D2 ) ≤ D, the remainder term is
X♭
U2R ≪
τ5 (m)rm (x) ≪ A(x)(log x)k−2 ,
m≤zx2 /D2
The choice of z = x/D and requirement of D ≥ x3/4 ensure this.
We are left to deal with the primary term. Our choice of λ will save two logarithms here, and the t-sum will only lose one. Since g is supported on squarefree
integers, We can assume that t is squarefree, and so we are left with
X
X♭
X
X X♭
Û2 ≪ (ǫ log x)k A(x) ·
λe
g(m) = (ǫ log x)k A(x)
g(l),
λe g(e)
e≤z
t≤x2 /D2
e≤z
q|e
l<x2 /qD2
gcd(l,q)=1
writing q = gcd(e, t) and l = m/q. The inner sum is now estimated by (25) to get
X♭
τ (q)
x2
+ c2 + c1 log(q) − β̂(q) + O
g(l) = α(q) c1 log
qD2
(log x/D)B
2
2
l<x /qD
gcd(l,q)=1
where α is multiplicative and β̂ is additive, given on primes by α(p) =
1
1+g(p)
and β̂(q) = α(p) log p. We then sum over q and multiply by g(e) to get
X g(e)τ3 (e)
x2
Û2 ≪ (ǫ log x)k A(x) ·
λe c1 log 2 + c2 f (e) − c1 g(e)η(e) + O
D
(log x/D)B
e≤z
P
2+g(p)
where f (p) = g(p) 1+α(p) = g(p)· 1+g(p)
is multiplicative and η(e) = α(d)β̂(d).
d|e
Since we have λe = 0 when (f, g) 6= (f˜, g̃), in the above expression we can replace
the former by the latter. Then we have f˜(p) < 1 for all primes, and so by (23) the
e-sum here is bounded above by
X
Y
1
x
˜
·
1 + g(p)τ9 (p)
λe f (e) + O
c1 log + c2 + c1 ψ̃(z)
D
(log z)B
e≤z
p≤z
SIEVES
51
where
X α(p)β̂(p)g̃(p)
g̃(p) log p
≪ log z.
1 − 2g(p)2 − g(p)3
1 − f˜(p)
p≤z
p≤z
P
Recalling our choice of sieve weights and (29), we have that e λe f˜(e) ≪ 1/(log z)2 ,
9
Q Q
while we can bound p 1 + 9g(p) ≤ p 1 + g(p) ≪ (log z)9 . Recalling that
log z = ǫ log x, these then give the desired bound that (we use k ≥ 2 here)
ψ̃(z) =
=
X
U2 ≪ (ǫ log x)k A(x) · c1 log z · (log z)−2 ≪ A(x)(ǫ log x)k−1 ≪ ǫA(x)(log x)k−1 .
10.7. Second estimation. Next we turn to the estimation of U1 . We have
k X
k X
X
X
X
x
x
U1 =
an σn
µ(d) log
µ(d) log
=
·
an ,
λe
d
d
2
n≤x
d|n
n≤x
e≤z
d≤D2 /x
d≤D /x
n≡0(m)
where here we have m = lcm(d, e). The inner sum is Am (x), and we insert its
approximation g(m)A(x) + rm (x), estimating the remainder term as
X♭
X
X♭
τ5 (m)|rm (x)| ≪ A(x)(log x)k−2 .
|rm (x)| ≪ (log x)k
τ3 (e)
(log x)k
e≤z
d≤D2 /x
m≤D
We are left to evaluate the main term
k
X
X
x
g(m)µ(d) log
Û1 = A(x)
λe
.
d
2
e≤z
d≤D /x
10.7.1. A trick to evaluate the main term. We shall next use a trick to avoid evaluation of the main term. We note that it is somewhat independent of the actual
sequence, as it only depends on the multiplicative function g [and its size on A(x)].
In particular, we can choose a sequence (bn ) with the same g for which we can
estimate the main term directly. We let bn = µ(n)2 ĝ(n) for n with x/e < n ≤ x,
g(p)
, and e ≈ 2.718 has log e = 1.
where ĝ is multiplicative with ĝ(p) = 1−g(p)
Since g satisfies (24), we have that ĝ does also, but with different constants. We
can note that the leading coefficient is given by
Y
Y
−1
1
1
1 + ĝ(p) 1 −
c3 =
1−
1 − g(p)
=
= 1/H.
p
p
p
p
Thus for every squarefree d we have
X
X
Y
Bd (x) =
ĝ(n) = ĝ(d)
ĝ(m) = c3 ĝ(d)
x/e<n≤x
n≡0(d)
x/ed<m≤x/d
gcd(d,m)=1
p|d
τ (d)ĝ(d)
1
+O
1 + ĝ(p)
(log x/9d)B
where we used (25) again, and exploited that log e = 1.
ĝ(p)
= g(p), so that ~b is
Here we have that B(x) = c3 , and we note that 1+ĝ(p)
well-approximated by g. We are left to estimate the cumulative error term as
X♭ τ (d)ĝ(d)
Y
Y
2
1
1
(log x)2
≪
.
1+2ĝ(p) ≪
1+ĝ(p) ≪
B
B
B
(log x/D)
(ǫ log x)
(log x)
(log x)B
d≤D
p≤x
p≤x
This gives the remainder estimate, but with (B − 2) instead of B; this is OK
because we required that B ≥ 20 to begin, and only used that B ≥ 15 in the
52
SIEVES
proof. Thus ~b satisfies the conditions of Proposition 10.1 with the multiplicative
function g. Finally we conclude by noting that for ~b by (27) we have
X
X
µ(n)2 ĝ(n)Λk (n) = k(log x)k−1 + O (log x)k−2 .
Sk (x) =
bn Λk (n) =
n≤x
x/e<n≤x
10.8. Generalising from squarefree sequences. Now we show that if Proposition 10.1 is true for (an ) supported on squarefree integers, then it is true if we
drop the squarefree restriction. To this end, we write n = ln̂ where n̂ is squarefree
and l|n̂∞ ; in fact, n̂ is the squarefree kernel of n. We let
X
alm ,
bm = µ(m)2
l|m∞
so that for squarefree d we have
X
X
X
X
alm =
an
µ(m)2 .
Bd (x) =
µ(m)2
m≤x
d|m
l|m∞
n≤x
d|n
m|n
(n/m)|m∞
Calling the last inner sum T (n), we have that every prime that divides n must also
divide m, but also that m must be squarefree for a nonzero contribution. Therefore
only when m = n̂ is there a contribution to the sum, so that T (n) = 1 and we
get that Bd (x) = Ad (x) for all squarefree d. Similarly, the definition of Λxk (n)
implies that only squarefree divisors d contribute, and for such d we note that d|n
is equivalent to d|n̂. This gives Λxk (n) = Λxk (n̂), and so
X
X
X
X
X
bm Λxk (m) =
an Λxk (n) =
µ(m)2
an Λxk (n)T (n) =
an Λxk (n).
m≤x
(n/m)|m∞
m≤x
n≤x
n≤x
We conclude that if (28) is true for (bm ) then it is also true for (an ).
10.9. The parity problem. The above result of Bombieri’s was only a first step
in his work. Proposition 10.1 implies that if we have the remainder estimate
X♭
A(x)
(30)
τ5 (d)|rd (x)| ≪ǫ
(log
x)2
1−ǫ
d≤x
for all ǫ > 0, then for all k ≥ 2 we have the asymptotic
X
an Λk (n) ∼ H · A(x) · k(log x)k−1 .
n≤x
Of course, it is natural to expect this asymptotic also to be true when k = 1.
Bombieri shows that the asymptotic holds for k = 1 if and only if the sequence
(an ) gives equal weight to integers n with an even/odd number of prime divisors.
In particular, Selberg’s example puts all the weight on those with an even number
of prime divisors. Also, note that the generalised von Mangoldt functions Λk (n)
for k ≥ 2 give equal weight to those n with an even/odd number of prime factors.
It transpires that the generalised von Mangoldt functions Λk with k ≥ 2 together
with the multi-convolutions Λ~k = Λk1 ⋆ · · · ⋆ Λkr with some ki ≥ 2 form some sort
of a basis for the functions that equally weight the parity of number of divisors.
Bombieri shows that knowing (30) for all ǫ > 0 also implies that
X
n≤x
an Λ~k (n) ∼ H · A(x) ·
(~k)!
~
(log x)|k|−1
~
(|k| − 1)!
SIEVES
53
Q
P
assuming that some ki ≥ 2, where |~k| = i ki and (~k)! = i ki !.
Orthogonal to the functions that equally weight parity are the multi-convolutions
of just Λ, namely ~k = (1, . . . , 1). If we are able to show an asymptotic for such
vectors, then we can obtain an asymptotic formula for an summed over integers
with a fixed number of prime factors. Writing Pr for the set of squarefree integers
with exactly r prime factors, an approximation theorem of Stone-Weierstrass type
would extend the result to get
Z
X
H · A(x)
(31)
an Fr (n) ∼ δr
Gr dµr ·
,
log x
Tr
n≤x
n∈Pr
for some constants δr , where Fr (n) = Gr
compactly supported on
log p1
log pr
log n , . . . , log n
r
o
n
X
ui = 1
with
Tr = (u1 , · · · , ur ), 0 < ur < · · · u1 < 1,
i=1
and Gr is smooth and
dµr =
du1 · · · dur−1
.
u1 · · · ur
P
Knowing δr is essentially equivalent to estimating n an Λ~k (n) with ~k = (1)r , as
the other contributants in the Fr -approximation are determined as indicated above.
What Bombieri realised is that the δr are determined by δ1 , where the latter is
given by the asymptotic formula
X
p≤x
ap ∼ δ 1 · H ·
A(x)
.
log x
Obviously we expect δ1 = 1 for any well-distributed sequence, while Selberg’s example has δ1 = 0 (taking different linear combinations of the constant sequence and
the Liouville function can give any value with 0 ≤ δ1 ≤ 2). Bombieri’s phrasing of
the parity problem is that we have
(
δ1
if r is odd,
δr =
2 − δ1 if r is even.
We can note that we have Λ3 − LΛ2 = Λ ⋆ Λ2 = L2 (Λ ⋆ Λ) + (Λ ⋆ Λ ⋆ Λ), and we
can get asymptotics for both terms on the left via Bombieri’s method (using partial
summation for the second). However, it is much more difficult to do so with either
term on the right individually, as they are supported on integers with exactly two
or three prime factors respectively. Similar ideas can be used to isolate a function
supported on (Λ⋆)r and (Λ⋆)s for any r and s of opposite parity.
10.10. Bilinear information. The parity problem has been overcome in some
instances via the use of bilinear information. We already saw the use of bilinear
bounds in our proof of the Bombieri-Vinogradov theorem, but it was only in 1995 or
so that Duke, Friedlander, and Iwaniec were able to give the first instance (though
works on primes in short intervals by Jutila and Iwaniec and then later by HeathBrown are closely related) of the breaking of the parity problem in the classical
setting (they showed that any irreducible quadratic polynomial has equidistribution
of its roots modulo p as p → ∞ — the corresponding result [for all degrees of
polynomials] when averaging over all moduli is due to Hooley in the 1960s). The
54
SIEVES
work of Fouvry and Iwaniec then considered the sequence given by the number of
representations of n as n = a2 + b2 , and showed
XX
Y χ4 (p) XX
x
2
2
ξb ψ(b)+OB
1−
ξb Λ(a +b )=
.
where
ψ(l)
=
(log x)B
p−1
2
2
2
2
a +b ≤x
a +b ≤x
p∤l
In particular, by supporting ξ only on the primes, this shows that there are infinitely
many primes p = a2 + b2 with b prime.
Friedlander and Iwaniec then showed that the squares were sufficiently wellbehaved that one could still get an asymptotic in the above when taking ξ to be
the characteristic function of the squares — thus there are infinitely many primes
of the form p = a2 + b4 , and was the first time that such methods had been used
on a sequence whose support was significantly less than linear (here A(x) ∼ cx3/4 ).
In the wake of this, Heath-Brown showed that there are infinitely many primes of
the form p = a3 + 2b3 , and this was generalised with Moroz to other cubic forms.
The bilinear estimate used by Friedlander and Iwaniec is of the form (simplified)
X X
≪ A(x) 9 for all N with A(x)1/2+ǫ ≤ N ≤ x1/2−ǫ .
a
µ(n)
mn
(log x)9
m
n∼N
mn≤x
where the exact power of logarithm is largely irrelevant, and m might be weighted
by a (generalised) divisor function. Looking at the inner sum, its smallness says
that we expect that there should be some µ-cancellation in the support of the
sequence, and furthermore that this should be true when considering the sequence
restricted to suitable arithmetic progressions. There is some hope that this might
hold when A(x) ≫ x2/3 , but for sparser sequences the inner sum has less than
one term (on average) when N ≤ x/A(x). In the case of Friedlander and Iwaniec,
the proof of this bilinear bound proceeds by re-interpreting the problem in the
domain of Gaussian integers, and then achieves the estimate via a use of Cauchy’s
inequality and much (about 80 pages) technical work to handle divisors (of the
determinant of a pair of lattice points) in various ranges (the result is ineffective [as
was Fouvry-Iwaniec] due to the handling of small moduli via zero-free regions). The
method does not work mutatis mutandis for a2 + b6 because the use of Cauchy’s
inequality loses too much in that case (in a recent paper called “The illusory sieve”,
Friedlander and Iwaniec show that a2 + b6 is infinitely often prime, but under the
[unlikely] assumption of Siegel zeros). On the other hand, Heath-Brown exploits a
multi-dimensional version of the large sieve to show a corresponding bilinear bound
(and the result is also ineffective).
© Copyright 2026 Paperzz