Siegel`s Lemma and Minkowski`s Theorems

Siegel’s Lemma and
Minkowski’s Theorems
Master’s Thesis
Louna Seppälä
Department of Mathematical Sciences
University of Oulu
Autumn 2015
Contents
Introduction
3
1 Preliminaries
5
1.1
Some notations . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Basics on algebraic number theory . . . . . . . . . . . . . . . .
6
2 Siegel’s lemma
8
2.1
Siegel’s lemma in the field of rational numbers . . . . . . . . .
9
2.2
Siegel’s lemma in imaginary quadratic fields . . . . . . . . . . 11
3 Siegel’s lemma in an arbitrary algebraic number field
22
4 Minkowski’s convex body theorems
29
4.1
Convex bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2
Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3
The convex body theorems . . . . . . . . . . . . . . . . . . . . 35
5 Some Diophantine inequalities
44
5.1
A few Diophantine inequalities over R
5.2
On simultaneous Diophantine inequalities . . . . . . . . . . . . 51
5.3
A Diophantine inequality over C . . . . . . . . . . . . . . . . . 53
1
. . . . . . . . . . . . . 44
6 Applying Siegel’s lemma
57
6.1
Approximations for eαt . . . . . . . . . . . . . . . . . . . . . . 57
6.2
Siegel’s lemma refined . . . . . . . . . . . . . . . . . . . . . . 61
6.3
Vandermonde determinants . . . . . . . . . . . . . . . . . . . 65
Bibliography
71
2
Introduction
Geometry of numbers is a branch of mathematics founded by the German
Hermann Minkowski 1 , which uses geometric arguments in n-dimensional
euclidean space to prove arithmetic results. The diverse applications reach
from coding theory to functional analysis, and have turned out to be especially useful in Diophantine approximation, the problem of approximating
irrational numbers by rational ones. Siegel’s
2
and Minkowski’s existential
theorems form the core of this work: When dealing with a group of linear
equations where the number of unknowns exceeds the number of equations,
Siegel’s lemma confirms the existence of a non-trivial solution whose size is
bounded by a certain positive function depending on the coefficients of the
linear forms and the number of unknowns. Minkowski’s theorems in turn concern convex bodies and lattices in the space Rn : when a convex body satisfies
a specific condition with respect to the lattice, it is bound to intersect the
lattice in a non-zero point.
In the first part of the text three versions of Siegel’s lemma are presented
and proved. The use of geometry in the proof of the second version is particularly delightful, as one can visualize the mathematics behind the theorem.
Geometric thinking comes into focus even more when moving to the second
1
2
1864 - 1909
Carl Ludwig Siegel, 1896 - 1981, Germany
3
part, the purpose of which is to introduce two theorems of Minkowski and
then explore the consequences by presenting a selection of Diophantine inequalities. The final part returns to Siegel’s lemma again with an example
on its use. A fourth version of the lemma is briefly mentioned because of the
possible affordance the example provides: the ending concentrates on finding
out whether this Bombieri-Vaaler version of Siegel’s lemma could be applied
to improve the result of the example in rational case.
Chapter 1, rather list-like as it is, should offer enough basics on algebraic
number theory for the reader to be able to follow. The results of linear algebra
are also used extensively throughout the text. Apart from these, there are
not much serious prerequisites. The source material is listed in Bibliography
and referred to in appropriate points of the study. Whenever a proof has
some kind of a source, it is mentioned in the beginning.
4
Chapter 1
Preliminaries
The reader is assumed to be familiar with the fundamentals of algebraic
numbers, but some important concepts are shortly covered here. For more
details, consult [10] or [14], for example. The definitions in this chapter follow
the lecture notes [10].
1.1
Some notations
The following sets appear in this text:
Q
rational numbers
Z = {. . . , −2, −1, 0, 1, 2, . . .}
rational integers
Z+ = {k ∈ Z | k ≥ 1}
positive rational integers
N = {0, 1, 2, . . .}
natural numbers
R+ = {x ∈ R | x > 0}
positive real numbers
R[x]
polynomials in x with coefficients in R
Mk×n (R)
k × n matrices with entries in R
Here R is a ring or field specified later. A column vector x̄ ∈ Rn is denoted by
5
(x1 , . . . , xn )T and the standard basis vectors by ē1 = (1, 0, . . . , 0)T , . . . , ēn =
(0, 0, . . . , 1)T . For the length of a vector x̄ ∈ Rn the norms
v
u n
n
X
uX
|xk |2 and kx̄k∞ = max |xk |
|xk |, kx̄k2 = kx̄k = t
kx̄k1 =
k=1
k=1
1≤k≤n
are used. The volume V(C) of a subset C ⊆ Rn means the Riemann or
Lebesgue integral over C, when it exists.
1.2
Basics on algebraic number theory
Definition 1. A number α ∈ C is algebraic if there exists a polynomial
p(x) ∈ Q[x] \ Q such that p(α) = 0. Otherwise, α is transcendental.
Definition 2. The minimum polynomial of an algebraic number α ∈ C is the
monic polynomial Mα (x) ∈ Q[x] \ Q of lowest degree for which Mα (α) = 0.
Definition 3. The degree of an algebraic number α ∈ C is the degree of its
minimum polynomial: degQ α := deg Mα (x).
Definition 4. An algebraic number α ∈ C is an algebraic integer when
Mα (x) ∈ Z[x]. The ring of all algebraic integers is denoted by the symbol B.
Definition 5. An algebraic number field K is a finite field extension of Q.
Theorem 1. If K is an algebraic number field, then K = Q(α) for some
α ∈ K. In other words, all algebraic number fields are field extensions of Q
generated by one element.
Proof. See [10, p. 47] or [14, p. 40], for example.
Definition 6. The degree of an algebraic number field K is [K : Q] :=
dimQ K.
6
Theorem 2. If K = Q(α) for some algebraic number α ∈ C and degQ α = m,
then [K : Q] = m.
Proof. See [14, p. 23]. (Follows from the fact that the field K is a vector space
over Q spanned by the elements 1, α, . . . , αm−1 .)
√
√
Definition 7. A quadratic field Q( d) = {a + b d | a, b ∈ Q}, where d ∈ Z
is square-free, is an algebraic number field of degree two over Q.
Definition 8. The set of integers of an algebraic number field K is ZK =
K ∩ B.
√
Theorem 3. Let K = Q( d), d ∈ Z, be a quadratic field. Then
o
n
√ ZK = a + b d a, b ∈ Z , d ≡ 2 or 3 (mod 4),
)
(
√ a+b d ZK =
a, b ∈ Z, a ≡ b (mod 2) , d ≡ 1
2
(mod 4).
Proof. See [10, p. 63] or [14, p. 67].
Theorem 4. Let K be an algebraic number field of degree m. Then there
exist exactly m different monomorphisms σi : K → C, i = 1, . . . , m.
Proof. See [14, p. 41].
Definition 9. Let K be an algebraic number field of degree m and σi , i =
1, . . . , m, its field monomorphisms. The conjugates of an element α ∈ K
relative to K are the m complex numbers σi (α) =: α(i) , i = 1, . . . m.
Definition 10. The field norm of an algebraic number α ∈ K is N (α) :=
Qm (i)
i=1 α , where m = [K : Q].
7
Chapter 2
Siegel’s lemma
Consider the system of equations



a11 x1 + a12 x2 + . . . + a1N xN = 0,






a21 x1 + a22 x2 + . . . + a2N xN = 0,
..



.





a x + a x + . . . + a x = 0,
M1 1
M2 2
MN N
where the coefficients amn are (rational or more general algebraic) integers
and M < N . Siegel’s lemma gives an estimate for the size of a non-zero
integer solution x̄ = (x1 , x2 , . . . , xN )T . It is a consequence of the so-called
pigeonhole principle or Dirichlet’s principle, which states:
If a set of n + 1 elements is subdivided into n subsets, then at
least one of these subsets contains at least two elements.
In the following the cases of rationals and imaginary quadratic fields are
studied, whereas Chapter 3 deals with the case of a general algebraic number
field.
8
2.1
Siegel’s lemma in the field of rational numbers
Theorem 5. Let
Lm (x̄) =
N
X
amn xn ,
m = 1, . . . , M,
n=1
be M non-trivial linear forms in N variables xn with coefficients amn ∈ Z.
Define
Am :=
N
X
n=1
|amn | ∈ Z+ ,
m = 1, . . . , M.
Suppose M < N ; then the system of equations
Lm (x̄) = 0,
m = 1, . . . , M,
has a non-zero integer solution z̄ = (z1 , . . . , zN )T ∈ ZN \ {0̄} with
j
k
1
1 ≤ max |zn | ≤ (A1 · · · AM ) N −M .
1≤n≤N
(2.1)
(2.2)
Proof. [9, p. 11] Since N > M , the homomorphism
L̄ = (L1 , . . . , LM )T : ZN → ZM .
between the additive groups (ZN , +) and (ZM , +) is not injective. Thus
ker L̄ 6= {0̄}, and there exists a non-zero solution z̄ = (z1 , . . . , zN ) ∈ ZN \ {0̄}
k
j
1
to the system of equations (2.1). Denote Z := (A1 · · · AM ) N −M . Then
1
(A1 · · · AM ) N −M − Z < 1, so A1 · · · AM < (Z + 1)N −M , where Z, A1 , . . . , AM
are positive integers. From this it follows that
(A1 Z + 1) · · · (AM Z + 1)
≤ A1 (Z + 1) · · · AM (Z + 1)
= A1 · · · AM (Z + 1)
< (Z + 1)N .
9
M
(2.3)
Now define a set
1 :=
x̄ ∈ ZN 0 ≤ xn ≤ Z .
The number of integer points in this box is #1 = (Z + 1)N , since each
component of the vector x̄ has Z + 1 possibilities. The linear mappings
Lm (x̄) =
N
X
amn xn ,
m = 1, . . . , M,
n=1
are bounded in the box 1 by
X
amn <0
amn xn ≤ Lm (x̄) ≤
X
amn xn ,
amn >0
P
as the components xn are non-negative there. Let −bm := amn <0 amn be the
P
sum of the negative coefficients and cm := amn >0 amn that of the positive
ones. Note that bm + cm = Am . Further,
− bm Z ≤ Lm (x̄) ≤ cm Z
(2.4)
in the box 1 .
Next define a second set
2 :=
¯l ∈ ZM − bm Z ≤ lm ≤ cm Z ,
where each component of the vector ¯l has #{lm } = (bm +cm )Z +1 = Am Z +1
possibilities, and therefore the number of integer points in this second box is
#2 = (A1 Z + 1) · · · (AM Z + 1). By the estimate (2.4) we have L̄(1 ) ⊆ 2 ,
where
#2 = (A1 Z + 1) · · · (AM Z + 1) < (Z + 1)N = #1 ,
using the estimate made in (2.3). Hence the mapping L̄ : 1 → 2 is not
injective on 1 . Therefore there exist two different vectors x̄1 , x̄2 ∈ 1 such
that L̄(x̄1 ) = L̄(x¯2 ), which further gives L̄(x̄1 − x̄2 ) = 0̄, where x̄1 − x̄2 ∈
10
±1 \ {0̄}. By denoting z̄ = (z1 , . . . , zN )T := x̄1 − x̄2 we get a non-zero
solution to the system of equations (2.1) satisfying the estimate
j
|zn | ≤ Z = (A1 · · · AM )
1
N −M
k
,
n = 1, . . . , N.
Remark 1. Because
Am ≤ N max |amn |,
1≤n≤N
then
max |amn |)M
A1 · · · AM ≤ (N
1≤m,n≤N
and the upper bound given in (2.2) can further be estimated to give a solution
z̄ with
1 ≤ max |zn | ≤
1≤n≤N
2.2
N
max |amn |
1≤m,n≤N
NM
−M
.
Siegel’s lemma in imaginary quadratic fields
Theorem 6. Let I denote the field Q of rational numbers or an imaginary
√
quadratic field Q( −D), where D ∈ Z+ , and ZI its ring of integers. Let
Lm (z̄) =
N
X
amn zn ,
m = 1, . . . , M,
(2.5)
n=1
be M non-trivial linear forms in N variables zn with coefficients amn ∈ ZI .
Define
Am :=
N
X
n=1
|amn | ∈ Z+ ,
m = 1, . . . , M.
(2.6)
Suppose M < N ; then there exist positive constants sI and tI such that the
system of equations
Lm (z̄) = 0,
m = 1, . . . , M,
11
(2.7)
has a non-zero solution z̄ = (z1 , . . . , zN )T ∈ ZN
I \ {0̄} with
M
√ √
1
N −M
1 ≤ max |zn | ≤ max 4 2 D, sI tI
(A1 · · · AM ) N −M ,
1≤n≤N
where sQ = tQ = 1,
sQ(√−D) =
and

√

 2√ 2 D 14 ,
π

 √2 D 14 ,
π
D ≡ 1 or 2
D≡3
(mod 4)
(mod 4)
5
tQ(√−D) = √ .
2 2
√
Proof. [4, p. 3] 1 Let c D ≤ B ∈ R+ , where c is some positive real constant.
√
Every integer z = x + y −D ∈ ZI for which |z| ≤ B lies inside the (filled)
ellipse
It has B and
area V(E) =
√B
D
2
πB
√ .
D
E := (x, y)T ∈ R2 x2 + Dy 2 ≤ B 2 .
as its semi-major and semi-minor axes, respectively, and
Assume first D ≡ 1 or 2 (mod 4). According to Theorem 3, now
n
o
√
ZI = a + b −D a, b ∈ Z .
Allocate each integer point (x0 , y0 )T ∈ Z2 to the square
1
T
2 ,
(x0 ,y0 ) := (x, y) ∈ R max {|x − x0 |, |y − y0 |} ≤
2
1
What is presented here is a slightly modified version of the original proof, developed
together with Tapani Matala-aho. It seems that in the case D ≡ 3 (mod 4) the value of
the constant c that was claimed in the article cannot be reached, at least not without
doing some further calculations.
12
which has area 1. Then the number of integer points inside the ellipse E can
be estimated using the shape
(
1
1
F1 := (s, t)T ∈ R2 s = x − , t = y − , x2 + Dy 2 ≤ B 2 ,
2
2
" r
#
"
#)
r
1
1
1
D
1
x∈
, B2 −
,y∈
,√
⊆ E.
B2 −
2
4
2 D
4
(See the picture on page 19. The dashed curve represents the boundary of
the set F1 , showing where the integer point (x0 , y0 ) can be located in order
to have the square (x0 ,y0 ) stay inside E.) For any (x0 , y0 )T ∈ F1 we have
(x0 ,y0 ) ⊆ E, so the number of integer points inside E is at least as big as the
area of F1 multiplied by four. 2 One can estimate the area 4V(F1 ) by noticing
that the ellipse E consists of four pieces identical to F1 plus some additional
parts, which in turn can be estimated by rectangles of area 2B and
2
3
2B 3
√
.
D
Due to symmetry, it is enough to consider only the first quarter of the ellipse E.
A lower bound for the area 4V(F1 ) can also be found by fitting a smaller ellipse E1
inside E so that the first quarter of E1 is inside F1 . This is a rather lengthy approach
where the method of Lagrange multipliers is needed, so to save some space, we will settle
for the block approximation.
13
(See the picture on page 20.) Subtracting these from V(E) results in
2B
πB 2
4V(F1 ) ≥ √ − 2B − √ + 1
D
D
!
√
2
πB
2
2 D
≥ √
−
1−
πB
πB
D
!
√
πB 2
2
2 D
≥ √
1− √ − √
D
πc D πc D
2
πB 2
2
−
≥ √
1−
πc πc
D
πB 2
4
= √
1−
πc
D
√ !
πB 2
2
≥ √
.
1−
c
D
(2.8)
When |zn | ≤ B, n = 1, . . . , N , for the linear forms (2.5) it holds
N
N
N
X
X
X
amn zn ≤
|amn zn | ≤
|Lm (z̄)| = |amn | B = Am B, m = 1, . . . , M.
n=1
n=1
n=1
So, if each component of the vector z̄ = (z1 , . . . , zN )T ∈ ZN
I lies inside
the ellipse E, then every component of its image in the mapping L̄ :=
M
(L1 , . . . , LM ) : ZN
I → ZI has to lie inside the ellipse
E 0 := (x, y)T ∈ R2 x2 + Dy 2 ≤ A2m B 2 .
With arguments similar to those justifying the estimate (2.8), one can define
a shape F2 (marked with the dashed curve in the figure on page 21) which
consists of the set
(
1
1
(s, t)T ∈ R2 s = x + , t = y + , x2 + Dy 2 ≤ A2m B 2 ,
2
2
)
Am B
x ∈ [0, Am B] , y ∈ 0, √
D
14
i
h
and the rectangles 0, 21 × 0, A√mDB + 12 and 0, Am B + 12 × 0, 21 . Then we
see that the ellipse E 0 contains at most
πA2m B 2
2Am B
√
+ 2Am B + √
+1
D
D
!
√
√
2 D
πA2m B 2
D
2
1+
= √
+
+
πAm B πAm B πA2m B 2
D
2
1
1
πA2m B 2
√
1+
+
1+ √
≤ √
πc
D
D
πc2 D
!
√
(2.9)
πA2m B 2
1
2 2
≤ √
+ 2
1+
πc
πc
D
!
√
2
1
πA2m B 2
+ 2
1+
≤ √
c
2c
D
2
πA2 B 2
1
= √m
1+ √
c 2
D
√
integer points. (Here the facts that Am , D ≥ 1 and B ≥ c D have been
4V(F2 ) =
taken into account.) Therefore the number of values of the linear forms (2.5)
with |zn | ≤ B, n = 1, . . . , N , is at most
π M B 2M
√ M
D
1
1+ √
c 2
2M Y
M
A2m .
m=1
Hence, the number of vectors z̄ ∈ ZN
I with |zn | ≤ B, n = 1, . . . , N , is greater
than the number of values of the linear forms when
√ !N
2M Y
M
π M B 2M
2
1
π N B 2N
√
≥
1
−
1
+
A2m .
√ N
√ M
c
c 2
m=1
D
D
(2.10)
It follows that
D
B≥√
1
4
π
N
√ !− 2N −2M
NM
−M
1
2
1+ √
1−
c
c 2
15
M
Y
m=1
Am
1
! N −M
.
(2.11)
√
Now choose c = 2 2, which gives
N
− 2N −2M
NM
1
D
5 −M
B≥√
4
π 2
M
Y
1
4
D
=√
1
4
π
1
√
2
N
− N −M
1
√
2
√ 1 NM
−M
2D 4
5
√
= √
π
2 2
NM
−M
M
Y
Am
m=1
Am
m=1
5
√
2 2
1
! N −M
1
! N −M
NM
−M
M
Y
Am
m=1
1
! N −M
(2.12)
.
√
√
At the beginning the assumption B ≥ c D = 2 2D was made, so let


1
! N −M
√ 1 NM
M

 √
Y
−M
2D 4
5
√
.
B 0 := max 2 2D, √
Am


π
2 2
m=1
By the pigeonhole principle, there exist two different vectors z̄ (1) , z̄ (2) ∈ ZN
I
(i) with zn ≤ B 0 , n = 1, . . . , N, i = 1, 2, and L̄ z̄ (1) = L̄ z̄ (2) . Thus we
have a solution z̄ = (z1 , . . . , zN )T := z̄ (1) − z̄ (2) ∈ ZN
I \ {0̄} to the system of
equations (2.7) such that
|zn | ≤ zn(1) + zn(2) ≤ 2B 0

√ 1 NM
 √
−M
5
2 2D 4
√
= max 4 2D, √

π
2 2
M
Y
m=1
Am

1
! N −M


for all n = 1, . . . , N . Here the coefficients sI and tI can be identified.
Secondly, let D ≡ 3 (mod 4), in which case Theorem 3 gives
√
a + b −D ZI =
a, b ∈ Z, a ≡ b (mod 2) .
2
Again the number of points corresponding to integers inside the ellipse E
T
∈ ( 21 Z)2 , x0 ≡ y0
is studied, but now by identifying each point x20 , y20
(mod 2), with the square
x0 y0 := (x, y)T ∈ R2
( , )
2
2
x − x0 + y − y0 ≤ 1 ,
2
2
2
16
whose area is 12 . (This is because the integer points are in this case somewhat
tighter packed on the plane than they were when D ≡ 1, 2 (mod 4).) Now
( x20 , y20 ) ⊆ (
) , so we can use the previous estimates (2.8) and (2.9) —
they just need to be multiplied by two since each point now corresponds to
x 0 y0
, 2
2
a square of area 21 . Hence the ellipse E contains at least
√ !
2
2πB 2
√
1−
c
D
and the ellipse E 0 at most
2πA2m B 2
√
D
1
1+ √
c 2
2
integer points. As in (2.10), demanding the number of N -tuples z̄ ∈ ZN
I to
be greater than the number of possible values of the linear forms (2.5) leads
to the inequality
2N π N B 2N
√ N
D
√ !N
2M Y
M
2
2M π M B 2M
1
≥ √ M
1+ √
A2m .
1−
c
c 2
m=1
D
We get (2.11) with 2π instead of π:
1
N
! N −M
√ !− 2N −2M
NM
1
M
Y
−M
1
2
D4
1−
.
1+ √
Am
B≥√
c
2π
c 2
m=1
√
Again choose c = 2 2, and comparison to (2.12) gives
1
! N −M
NM
1 M
Y
−M
4
5
D
√
.
Am
B≥√
π 2 2
m=1
Analogously to the case D ≡ 1, 2 (mod 4), there exists a solution z̄ ∈ ZN
I \{0̄}
to the system of equations (2.7) with

NM
1  √
−M
2D 4
5
√
kz̄k∞ ≤ max 4 2D, √

π 2 2
17
M
Y
m=1
Am

1
! N −M


.
√
Remark 2. The necessary assumption B ≥ 2 2D only matters when the
values of the row sums Am are small.
18
y
Ellipse E with the shape F1
√B
D
F1
B
19
x
y
Estimating the area 4V(F1 )
√B
D
V(F1 )
} 12
20
B
x
y
Ellipse E ′ with the shape F2
F2
}1
2
Am B
− A√mDB
21
x
Chapter 3
Siegel’s lemma in an arbitrary
algebraic number field
Let K be an algebraic number field of degree K and ZK its ring of integers
with the fixed basis {ω1 , . . . , ωK } ⊆ ZK .
Definition 11. Let α ∈ K and denote with α(1) := α, α(2) , . . . , α(K) its
conjugates relative to K. The house function
follows:
: K → R is defined as
α := max |α|, α(2) , . . . , α(K) .
Lemma 1. Let α, β ∈ K. Then
α±β ≤ α + β
and
αβ ≤ α β .
Proof. Since the field monomorphisms σi : K → C, i = 1, . . . , K, preserve
addition and multiplication, for the conjugates it holds
(α ± β)(i) = σi (α ± β) = σi (α) ± σi (β) = α(i) ± β (i)
22
and
(αβ)(i) = σi (αβ) = σi (α)σi (β) = α(i) β (i) ,
where i = 1, . . . , K. Directly
α ± β = max
1≤i≤K
= max
1≤i≤K
≤ max
1≤i≤K
≤ max
1≤i≤K
(α ± β)(i) (i)
α ± β (i) (i) (i) α + β (i) α + max β (i) 1≤i≤K
= α + β ,
and similarly
αβ = max
1≤i≤K
= max
1≤i≤K
≤ max
1≤i≤K
(αβ)(i) (i) (i) α β (i) α max β (i) 1≤i≤K
= α β .
Lemma 2. Let γ ∈ ZK and write γ = g1 ω1 +. . .+gK ωK , where g1 , . . . , gK ∈ Z
are unique. There exist two positive constants c1 and c2 depending solely on
the ring ZK , such that
γ ≤ c1 max{|g1 |, . . . , |gK |}
(3.1)
max{|g1 |, . . . , |gK |} ≤ c2 γ .
(3.2)
and
23
Proof. Mahler gives the outlines for this proof in [8, p. 91].
First, using Lemma 1 and the fact that g1 , . . . , gK ∈ Z,
γ = g1 ω1 + . . . + gK ωK
≤ g1 ω1 + . . . + gK ωK
= |g1 | ω1 + . . . + |gK | ωK
≤ ω1 + . . . + ωK max{|g1 |, . . . , |gK |}.
Choose c1 := ω1 + . . . + ωK > 0, which does not depend on γ.
From the properties of the field monomorphisms it follows that
(i)
(i)
γ (i) = g1 ω1 + . . . + gK ωK ,
i = 1, . . . , K.
(3.3)
Since {ω1 , . . . , ωK } is a basis for the ring ZK (over Z), it is also a basis for
the field K (over Q) due to the fact that for any α ∈ K there exists d ∈ Z+
such that dα ∈ ZK . Hence the discriminant
2
(1)
(1) ω1
· · · ωK .. ..
.
. (K)
(K) ω 1
· · · ωK of this basis is non-zero (see [14, p. 44]), so the determinant
(1)
(1) ω1
· · · ωK .. ..
.
. (K)
(K) ω 1
· · · ωK does not vanish. Therefore the equations (3.3) can be solved in the form
gj =
K
X
Ωij γ (i) ,
i=1
24
j = 1, . . . , K,
where the coefficients Ωij ∈ K depend only on the basis {ω1 , . . . , ωK } of ZK .
As a result,
)
( K
X
(i) Ωij γ max{|g1 |, . . . , |gK |} = max 1≤j≤K
i=1
( K
)
X
(i) ≤ max
|Ωij | γ 1≤j≤K
≤ max
1≤j≤K
= max
1≤j≤K
= γ
Choose c2 := max1≤j≤K
nP
K
i=1
(
i=1
K
(i) X
max γ
|Ωij |
1≤i≤K
(
γ
max
1≤j≤K
o
(
K
X
|Ωij |
i=1
K
X
i=1
)
i=1
)
)
|Ωij | .
|Ωij | > 0, which too is independent of γ.
Now we can generalize Siegel’s lemma.
Theorem 7. Let
ym =
N
X
amn xn ,
m = 1, . . . , M,
(3.4)
n=1
be M non-trivial linear forms in N variables xn with coefficients amn ∈ ZK .
Define
A := max amn > 0,
m,n
and suppose M < N . Then the system of equations
ym = 0,
m = 1, . . . , M,
has a non-zero integer solution z̄ = (z1 , . . . , zN )T ∈ ZN
K \ {0̄} with
M
max z1 , . . . , zN
≤ c(cN KA) N −M ,
where c is some positive constant.
25
(3.5)
Proof. [8, p. 92] Since amn ∈ ZK , the products amn ωj , j = 1, . . . , K, are
also in ZK and thus can be represented as linear combinations of the basis
elements ω1 , . . . , ωK :
amn ωj =
K
X
amnjk ωk ,
(3.6)
k=1
where m = 1, . . . , M, n = 1, . . . , N, j = 1, . . . , K, and the coefficients amnjk
are unique rational integers. Similarly, for unknown variables x1 , . . . , xN ∈ ZK
there exist unique rational integers xnj such that
xn =
K
X
xnj ωj ,
n = 1, . . . , N.
(3.7)
j=1
The set of N variables xn ∈ ZK has in this way been turned into a set of N K
variables xnj ∈ Z. Combining the formulae (3.4), (3.6) and (3.7) gives
ym =
N
X
amn xn
n=1
=
N
X
amn
n=1
=
K
X
xnj ωj
j=1
N X
K
X
amn ωj xnj
n=1 j=1
=
N X
K X
K
X
(3.8)
amnjk ωk xnj
n=1 j=1 k=1
=
N X
K
K X
X
amnjk xnj ωk
k=1 n=1 j=1
=
K
X
ymk ωk ,
k=1
where we have defined
ymk :=
N X
K
X
n=1 j=1
amnjk xnj ∈ Z,
m = 1, . . . , M, k = 1, . . . , K.
26
(3.9)
Let now c1 and c2 be the constants given by Lemma 2, and define
c3 := c2 max ωj + 1.
(3.10)
1≤j≤K
Then c3 , like c1 and c2 , only depends on the ring ZK . Next pick H ∈ Z+ in
such a way that
M
M
0 ≤ (N Kc3 A) N −M − 1 < 2H ≤ (N Kc3 A) N −M + 1.
(3.11)
Then H satisfies also the inequality
(2H + 1)N K = (2H + 1)(N −M )K (2H + 1)M K
> (N Kc3 A)M K (2H + 1)M K
(3.12)
≥ (2N Kc3 A + 1)M K ,
the last row following from the fact that H, N, K, c3 , A ≥ 1.
Consider the linear forms (3.9). If the possible values of the variables
xnj ∈ Z are restricted by the condition |xnj | ≤ H, the N K-tuple (xnj ) has
(2H + 1)N K distinct possibilities. By the representation (3.6) and the second
inequality (3.2) of Lemma 2 it follows in turn that
max |amnjk | ≤ c2 max amn ωj ≤ c2 max amn max ωj ≤ c3 A,
m,n,j,k
m,n
m,n,j
j
recalling the definitions (3.5) and (3.10) of the constants A and c3 . Hence
there is a limitation for the values of the linear forms (3.9) too:
N X
K
X
amnjk xnj max |ymk | = max m,k
m,k n=1 j=1
≤ max
m,k
N X
K
X
n=1 j=1
|amnjk ||xnj |
≤ N K max |amnjk | max |xnj |
m,n,j,k
≤ N Kc3 AH.
27
n,j
The M K-tuple (ymk ) has thus only (2N Kc3 AH + 1)M K different possibilites,
which by the estimate (3.12) is less than the number of possibilities the vector
(xnj ) has. Therefore two distinct vectors (x0nj ) and (x00nj ) must be mapped to
the same vector (ymk ) in the linear forms (3.9). By putting znj := x0nj − x00nj ∈
Z we have a non-zero N K-tuple (znj ) with
(3.11)
M
max |znj | ≤ max |x0nj | + max |x00nj | ≤ H + H ≤ (N Kc3 A) N −M + 1
n,j
n,j
n,j
satisfying ymk = 0 for all m = 1, . . . , M, k = 1, . . . , K, due to linearity.
It follows from (3.8) that the corresponding M algebraic integers ym also
vanish.
Finally, let c := max{2c1 , c3 }. Using the representation (3.7) and the first
inequality (3.1) of Lemma 2, we have a solution z̄ := (z1 , . . . , zN )T ∈ ZN
K \{0̄}
to the system of equations (3.4) with
max zn ≤ c1 max |znj |
n,j
M
≤ c1 (N Kc3 A) N −M + 1
1≤n≤N
M
≤ 2c1 (N Kc3 A) N −M
M
≤ c(cN KA) N −M .
28
Chapter 4
Minkowski’s convex body
theorems
In this chapter two important geometric theorems are presented, first of which
has consequences that are explored in chapter 5. Some amount of background
theory is needed before that, however; it is mostly based on Sections 2 and
3 of the lecture notes [9] and Chapter 2 of the lecture notes [5].
For this chapter, let n ∈ Z+ .
4.1
Convex bodies
Definition 12. A non-empty subset C ⊆ Rn is convex, if for any pair of
points ā, b̄ ∈ C it holds sā + (1 − s)b̄ 0 ≤ s ≤ 1 ⊆ C. In other words, a
line segment connecting any two points of a convex set C has to be included
in C.
A bounded convex subset of Rn is called a convex body.
Definition 13. A subset C ⊆ Rn is central symmetric (symmetric with
respect to origin), if C = −C.
29
The following two properties are simple to prove, but good to keep in
mind for later use.
Lemma 3. If λ ∈ R+ and C ⊆ Rn is a central symmetric convex body, then
also the dilation λC := {λc̄ | c̄ ∈ C} is a central symmetric convex body.
Proof. Immediately λC = λ(−C) = −λC, since the set C is central symmetric.
Next, let λā, λb̄ ∈ λC. Now
sλā + (1 − s)λb̄ 0 ≤ s ≤ 1 = λ sā + (1 − s)b̄ 0 ≤ s ≤ 1 ⊆ λC,
for C is convex. Lastly, C being bounded, the set λC is certainly bounded as
well, for the constant λ is finite.
Lemma 4. If C ⊆ Rn is a central symmetric convex body and L̄ : Rn → Rn
a linear mapping, then also the transformed set L̄(C) is a central symmetric
convex body.
Proof. Again we see at once that L̄(C) = L̄(−C) = −L̄(C), since the set C
is central symmetric and the mapping L̄ linear. Let then L̄ā, L̄b̄ ∈ L̄(C) for
some ā, b̄ ∈ C. Just like in the previous proof,
sL̄ā + (1 − s)L̄b̄ 0 ≤ s ≤ 1 = L̄ sā + (1 − s)b̄ 0 ≤ s ≤ 1 ⊆ L̄(C),
using the linearity of L̄ and the convexity of C. Moreover, every linear transformation on a finite-dimensional inner product space is bounded, so the
boundedness of C implies boundedness of its image L̄(C).
Intuitively it is clear that the smallest central symmetric convex body
containing given points v̄1 , . . . , v̄r ∈ Rn must be the polyhedron that has the
points ±v̄1 , . . . , ±v̄r ∈ Rn as its vertices. Lemma 5 confirms this formally.
30
Lemma 5. Let v̄1 , . . . , v̄r ∈ Rn . Then the set
)
( r
r
X
X
|xi | ≤ 1 ⊆ Rn
xi v̄i x1 , . . . , xr ∈ R,
Vr :=
i=1
i=1
is the smallest central symmetric convex body containing the vectors v̄1 , . . . , v̄r .
Proof. The fact that Vr is a central symmetric convex body is trivial to prove:
P
P
P
Let x̄ = ri=1 xi v̄i ∈ Vr . Then ri=1 | − xi | = ri=1 |xi | ≤ 1, so −x̄ ∈ Vr .
Also,
r
r
r
X
X
X
|xi |kv̄i k ≤ max kv̄i k
|xi | ≤ max kv̄i k,
xi v̄i ≤
kx̄k = 1≤i≤r
1≤i≤r
i=1
i=1
i=1
P
showing that Vr is bounded. If ȳ = sā + (1 − s)b̄ for some ā = ri=1 ai v̄i , b̄ =
Pr
i=1 bi v̄i ∈ V, 0 ≤ s ≤ 1, then
r
X
i=1
|yi | =
r
X
i=1
|sai + (1 − s)bi | ≤ s
r
X
i=1
|ai | + (1 − s)
meaning that ȳ ∈ Vr and Vr is convex.
r
X
i=1
|bi | ≤ s + (1 − s) = 1,
To prove the second claim, let C ⊆ Rn be any central symmetric convex
body that contains the points v̄1 , . . . , v̄r . If r = 1, clearly V1 ⊆ C since
the set V1 is precisely the line segment connecting the points v̄1 and −v̄1 .
Suppose now that the statement is true for r = k ∈ Z+ , and pick an arbitary
P
point x̄ = k+1
i=1 xi v̄i ∈ Vk+1 . It suffices to study the case where |xi | < 1 for
all i = 1, . . . , k + 1 and xi 6= 0 for some i = 1, . . . , k + 1, since otherwise
P
immediately x̄ ∈ C. By assumption Vk ⊆ C, so z̄ := ki=1 xi v̄i ∈ C. Denote
1
1
≤ Pk
.
1 − |xk+1 |
i=1 |xi |
P
P
Then also cz̄ ∈ C, for ki=1 |cxi | = c ki=1 |xi | ≤ 1 implies cz̄ ∈ Vk . Because
c :=
the set C is central symmetric, we have sign(xk+1 )v̄k+1 ∈ C, and the convexity
of C further implies
|xk+1 | sign(xk+1 )v̄k+1 + (1 − |xk+1 |)cz̄ = xk+1 v̄k+1 + z̄ =
31
k+1
X
i=1
xi v̄i = x̄ ∈ C,
indicating Vk+1 ⊆ C. As a result of the induction principle, the statement
holds true for all r ∈ Z+ .
4.2
Lattices
In the following, let (R, +, ×) be a ring formed from a non-empty set R with
ring product × and addition +. The ring R has zero and identity elements
0, 1 ∈ R, 0 6= 1.
Definition 14. Let R be a commutative ring. Then (M, +, ·) is an R-module,
if
1. (M, +) is an abelian group,
and the scalar product · : R × M → M satisfies the following axioms:
2. 1 · m = m, 1 ∈ R;
3. (rs) · m = r · (s · m);
4. (r + s) · m = r · m + s · m;
5. r · (m + n) = r · m + r · n
for all r, s ∈ R and m, n ∈ M .
The elements of R are called scalars.
Let now M be an R-module, S a subring of R and B ⊆ M .
Definition 15. A non-empty subset N ⊆ M is an S-submodule of M , if
(N, +) is a subgroup of (M, +), and s · n ∈ N for every s ∈ S, n ∈ N .
32
Definition 16. The smallest S-submodule containing the set B ⊆ M , denoted by
hBiS :=
\
N,
N is an S-submodule of M,
B⊆N
is called a linear hull over S generated by B. In particular, the set
hm1 , . . . , mk iS := Sm1 + . . . + Smk
is called a linear hull over S generated by the elements m1 , . . . , mk ∈ M .
Remark 3. When K is a field, a K-module is the same thing as a vector
space over K. Taking M = K n , we may write
M = K n = hē1 , . . . , ēn iK = K ē1 + . . . + K ēn ;
the vector space K n is a linear hull over the field K generated by the standard
basis vectors ē1 , . . . , ēn .
Definition 17. Let ¯l1 , . . . , ¯lr ∈ Rn be linearly independent over R. Then the
linear hull
Λ := ¯l1 , . . . , ¯lr Z = Z¯l1 + . . . + Z¯lr ⊆ Rn
over Z forms a lattice. The set ¯l1 , . . . , ¯lr forms a basis for the lattice Λ,
with rank Λ = r. If rank Λ = n, the Λ is called a full lattice.
p
The determinant of Λ is defined by det Λ := det(M T M ), where M :=
i
h
¯l1 · · · ¯lr is an n × r matrix formed from the basis vectors ¯l1 , . . . , ¯lr of the
lattice Λ.
The set
F := x1 ¯l1 + . . . + xr ¯lr xi ∈ R, 0 ≤ xi < 1, i = 1, . . . , r
is called the fundamental parallelepiped of the lattice Λ.
33
Remark 4. If Λ ⊆ Rn is a full lattice, the determinant becomes det Λ =
|det M |, since M is now a square matrix. Also V(F ) = det Λ, and we can
write
Rn =
[
ū∈Λ
(ū + F ),
where the translates ū + F := ū + x̄ x̄ ∈ F , ū ∈ Λ, are pairwise disjoint.
Remark 5. The integer lattice Λ = Zn = Zē1 + . . . + Zēn has determinant
i
h
det Λ = 1, as M = ē1 · · · ēn is now simply the identity matrix.
Lemma 6. If Λ ⊆ R2 is a full lattice, then the lattice Λm ⊆ R2m , m ∈ Z+ ,
has determinant (det Λ)m .
Proof. Let Λ = Z¯l1 + Z¯l2 =: Z(l11 , l12 )T + Z(l21 , l22 )T and denote M :=
h
i
¯l1 ¯l2 . Then
Λm =
(x̄1 , . . . , x̄m )T ∈ R2m x̄i ∈ Λ, i = 1, . . . , m
= (x11 l11 + x12 l21 , x11 l12 + x12 l22 , x21 l11 + x22 l21 , . . . ,
xm1 l12 + xm2 l22 )T ∈ R2m xij ∈ Z, i = 1, . . . , m, j = 1, 2
= x11 (l11 , l12 , 0, . . . , 0)T + x12 (l21 , l22 , 0, . . . , 0)T +
x21 (0, 0, l11 , l12 , 0, . . . , 0)T + . . . + xm2 (0, . . . , 0, l21 , l22 )T xij ∈ Z
= Z(l11 , l12 , 0, . . . , 0)T + Z(l21 , l22 , 0, . . . , 0)T +
Z(0, 0, l11 , l12 , 0, . . . , 0)T + . . . + Z(0, . . . , 0, l21 , l22 )T .
34
Using the notation of Definition 17, we can write det (Λm ) = |det Mm |, where
l11 l21 0 0 · · · 0 0 l12 l22 0 0 · · · 0 0 m
0 0 l11 l21 · · · 0 0 l11 l21 = (det M )m .
= det Mm = 0 0 l12 l22 · · · 0 0 l12 l22 ..
.. . .
..
..
..
.. . .
.
.
.
.
.
0 0 0 0 · · · l11 l21 0 0 0 0 · · · l12 l22 2m×2m
Hence det (Λm ) = | det Mm | = | det M |m = (det Λ)m .
Remark 6. Obviously Lemma 6 generalizes to the case Λ ⊆ Rn , but to avoid
more messing with indices, we stick to n = 2, which is enough for the purposes
of Chapter 5.
4.3
The convex body theorems
For the main theorem of this chapter two different proofs are presented, both
of which cleverly exploit the ever so familiar pigeonhole principle again.
Theorem 8 (The first Minkowski’s convex body theorem). Suppose Λ ∈ Rn
is a full lattice and C ⊆ Rn is a central symmetric convex body with


V(C) > 2n det Λ or

V(C) ≥ 2n det Λ,
if C is compact.
Then there exists a non-zero lattice point in the set C.
First proof. [13, pp. 121, 123] We start by examining the case Λ = Zn , which
can then be generalized using a linear transformation.
35
Fix t ∈ Z+ . The hyperplanes
xj =
2zj
,
t
j = 1, . . . , n, zj ∈ Z,
divide the space into hypercubes of volume
2 n
.
t
Let N (t) denote the number
of corners of these cubes inside the set C. Because C is bounded, this number
is finite. The smaller the hypercubes are, the bigger portion of the vertices
inside C are shared by several cubes contained in C. Therefore the number of
corners N (t) approximates the number of cubes in C, and we can write
n
2
V(C) = lim
N (t).
t→∞
t
Since V(C) > 2n by the assumption, for the limit we have limt→∞
N (t)
tn
> 1,
implying N (t) > tn for sufficiently large t. However, when n integers z1 , . . . , zn
are divided by t, the n-tuple of remainders has only tn different possibilities.
Consequently, the set C contains two distinct vertex points
!
!
(1)
(2)
(2)
(1)
2zn
2zn
2z1
2z1
,...,
,...,
and P̄2 :=
P̄1 :=
t
t
t
t
giving exactly the same n-tuple of remainders. This leads to the difference
(1)
zj
(2)
− zj
being divisible by t for all j = 1, . . . , n. Because C is central
symmetric, we have −P̄2 ∈ C. Hence the midpoint
1
M̄ :=
P̄1 − P̄2 =
2
(1)
(2)
(1)
(2)
z1 − z1
zn − zn
,...,
t
t
!
of the line segment connecting the points P̄1 and −P̄2 has integer coordinates
and, by the convexity of C, lies inside C. Lastly, P̄1 6= P̄2 implies M̄ 6= 0̄. Hence
we have a non-zero vector M̄ ∈ C ∩ Λ.
As for the second case, let C ⊆ Rn now be a closed central symmetric
convex body with V(C) = 2n . Then λC ) C and V(λC) > V(C) = 2n for all
λ > 1. In view of the first case, there is a non-zero lattice point in λC for all
36
λ > 1. Since any particular set λC is bounded, it cannot contain infinitely
many distinct lattice points. Therefore there has to be a lattice point in the
T
intersection λ>1 λC = C. (Here it is essential that C is closed, for actually
T
λ>1 λC = C ∪ ∂C and the point in question could be on the boundary ∂C of
the set C. However, C being closed means ∂C ⊆ C.)
Finally the generalization: Let Λ ∈ Rn be a full lattice with the basis
¯l1 , . . . , ¯ln ⊆ Rn and C ⊆ Rn a central symmetric convex body with V(C) >
2n det Λ. Define a bijective linear mapping L̄ : Rn → Rn by setting L̄x̄ =
x1 ¯l1 + . . . + xn ¯ln . Then
L̄(Zn ) = L̄(Zē1 + . . . + Zēn )
= ZL̄ē1 + . . . + ZL̄ēn
= Z¯l1 + . . . + Z¯ln
(4.1)
= Λ.
The volume of the transformed convex body L̄(C) obeys the rule
i
h
¯
¯
V L̄(C) = det l1 · · · ln V(C) = det(Λ)V(C),
i
h
the matrix of the mapping L̄ being ¯l1 · · · ¯ln =: L. Since L̄ is bijective,
it has an inverse L̄−1 : Rn → Rn , whose matrix L−1 has determinant
1
.
det L
According to Lemma 4 the set L̄−1 (C) ⊆ Rn is a central symmetric convex
body with volume
1
V(C) = V(C) > 2n .
V L̄−1 (C) = det L det Λ
By what was shown above, there exists a non-zero vector z̄ ∈ L̄−1 (C) ∩ Zn .
From the injectivity of L̄ and the equation (4.1) it follows that 0̄ 6= L̄z̄ ∈ C∩Λ.
Suppose in addition that C is closed and V(C) = 2n det Λ. The linear
mapping L̄ is continuous, so the preimage L̄−1 (C) is closed as well. The
37
dilation λL̄−1 (C) ) L̄−1 (C), λ > 1, has volume
V(C)
= 2n .
V λL̄−1 (C) > V L̄−1 (C) =
det Λ
T
Again the intersection λ>1 λL̄−1 (C) = L̄−1 (C) has to contain a non-zero
lattice point z̄ ∈ Zn , as the intersecting sets λL̄−1 (C) all contain one. Hence
0̄ 6= L̄z̄ ∈ C ∩ Λ.
Second proof. [5, p. 11] Assume V(C) > 2n det Λ, and let F stand for the
fundamental parallelepiped of the lattice Λ. Now the set 12 C has volume
V 21 C = 21n V(C) > det Λ. For ū ∈ Λ, define Sū := 12 C ∩ (ū + F ). Then the
S
sets Sū , ū ∈ Λ, are pairwise disjoint and ū∈Λ Sū = 12 C. Hence
X
1
V(Sū ) = V
C > det Λ.
2
ū∈Λ
Now all the sets Sū are shifted into the fundamental parallelepiped F by
defining
Sū∗
:= −ū + Sū =
1
−ū + C
2
∩ F,
ū ∈ Λ.
This doesn’t change the volumes of the sets, so
X
ū∈Λ
V(Sū∗ ) =
X
ū∈Λ
V(Sū ) > det Λ = V(F ) ≥ V
[
ū∈Λ
Sū∗
!
.
This implies that there exist two distinct points ū, v̄ ∈ Λ such that Sū∗ ∩ Sv̄∗ 6=
∅. Pick a point ā ∈ Sū∗ ∩ Sv̄∗ . Then ā = x̄ − ū = ȳ − v̄ for some x̄, ȳ ∈ 12 C,
whence z̄ := x̄− ȳ = ū− v̄ ∈ Λ\{0̄}. Now 2x̄, 2ȳ ∈ C, and because C is central
symmetric and convex, we have −2ȳ ∈ C and 21 ·2x̄+ 21 (−2ȳ) = x̄− ȳ = z̄ ∈ C.
Thus the set C contains a non-zero lattice point.
For the case that C is closed, see the first proof.
Remark 7. The second proof is essentially about cutting the set 21 C with the
lattice Λ and then rearranging the pieces into the fundamental parallelepiped
38
F . Since the added volume of the pieces (which is the volume of 21 C) is greater
than the volume of F , some intersecting must happen.
Remark 8. Actually #(C ∩ Λ) ≥ 3, for 0̄, M̄ , −M̄ ⊆ C ∩ Λ.
Definition 18. Let C ⊆ Rn be non-empty. The successive minima λ1 , . . . , λn ∈
R of the set C with respect to a lattice Λ ⊆ Rn are given by
λj = inf {λ > 0 | rankhλC ∩ ΛiZ ≥ j} ,
j = 1, . . . , n.
That is, the number λj is the lower bound of all λ > 0 such that the dilation
λC contains at least j linearly independent lattice points.
Lemma 7. Let C ⊆ Rn be a central symmetric convex body with 0̄ as an
interior point, and Λ ⊆ Rn a full lattice. Then the successive minima of C
with respect to Λ satisfy the inequality chain 0 < λ1 ≤ . . . ≤ λn < ∞.
Proof. From the obvious inclusion
{λ > 0 | rankhλC ∩ ΛiZ ≥ j + 1} ⊆ {λ > 0 | rankhλC ∩ ΛiZ ≥ j}
and the previous definition it follows that λj ≤ λj+1 , when j = 1, . . . , n − 1.
Letting λ → 0 we have λC → {0̄}, since C is bounded, so ultimately the only
lattice point contained in the intersection λC ∩ Λ is 0̄. Therefore λ1 6= 0.
Take now n linearly independent lattice points ¯l1 , . . . , ¯ln . By assumption
there exists a ball B(0̄, r) := {x̄ ∈ Rn | kx̄k < r} such that B(0̄, r) ⊆ C for
some r ∈ R. Therefore we can pick n vectors c̄1 , . . . , c̄n ∈ C such that ¯lk = λ̃k c̄k
n o
for some λ̃k ∈ R+ , k = 1, . . . , n. Choose λ̃ = max1≤k≤n λ̃k , whereupon it
follows from the convexity of the set λ̃C (see Lemma 3) that ¯l1 , . . . , ¯ln ∈ λ̃C
(the n line segments connecting 0̄ with the points λ̃c̄1 , . . . , λ̃c̄n contain each
of the chosen n lattice points ¯lk ). Thus {λ > 0 | rankhλC ∩ ΛiZ ≥ n} 6= ∅
and therefore the infimum λn exists.
39
Lemma 8. The successive minima of a closed, central symmetric convex
body C ⊆ Rn are actual minima.
Proof. Similar deduction as in the proof of Theorem 8 applies. Let λk ∈ R+ be
a successive minimum of the set C. Then λC contains k linearly independent
lattice points for all λ > λk . Because the sets λC are bounded, they contain
T
only finitely many lattice points. Therefore the intersection λ>λk λC = λk C
has to contain k linearly independent lattice points as well.
Theorem 9 (The second Minkowski’s convex body theorem). Let Λ ⊆ Rn
be a full lattice and C ⊆ Rn a central symmetric convex body with successive
minima λ1 , . . . , λn ∈ R+ . Then
2n
det Λ ≤ λ1 · · · λn V(C) ≤ 2n det Λ.
n!
Partial proof. [5, p. 19] We follow Chapter 2 of Evertse’s notes and prove
the lower bound assuming that C is closed, and the upper bound for the
euclidean unit ball Bn := {x̄ ∈ Rn | kx̄k ≤ 1}.
Let first ¯l1 , . . . , ¯ln ⊆ Rn be a basis for the lattice Λ. In light of Lemma
8, there exist n linearly independent vectors v̄1 , . . . , v̄n ∈ Λ such that v̄i ∈
λi C for each i = 1, . . . , m. Since each of these vectors has a representation
P
v̄i = nk=1 aik ¯lk , aik ∈ Z, i = 1, . . . , n, in the basis ¯l1 , . . . , ¯ln ⊆ Rn , we can
write
h
i h
i
v̄1 · · · v̄n = ¯l1 · · · ¯ln A,
where A ∈ Mn×n (Z) is a non-singular matrix. Moreover, by Lemma 5 the
set
S :=
(
)
n
X
xn x1
|xi | ≤ 1
v̄1 + . . . + v̄n x1 , . . . , xn ∈ R,
λ1
λn
i=1
is the smallest central symmetric convex body containing the points
C, i = 1, . . . , n. Hence S ⊆ C.
40
1
v̄
λi i
∈
The set S is the image of the n-dimensional octahedron
)
(
n
X
n |xi | ≤ 1
On := x̄ ∈ R i=1
under the linear transformation L̄ : Rn → Rn , L̄x̄ =
x1
v̄
λ1 1
+...+
xn
v̄ .
λn n
As it
is explained in [1, p. 429], the volume of an n-dimensional pyramid P ⊆ Rn is
V(P) = nh V(B), where h is the height of the pyramid and B ⊆ Rn−1 denotes
its (n − 1)-dimensional base. Each hyperoctahedron On is formed from two
pyramids of height 1 with On−1 as their common base. Therefore the formula
for the volume becomes recursive, yielding
1
2
1
2n
V(On ) = 2 · V(On−1 ) = · 2 ·
V(On−2 ) = . . . = ,
n
n
n−1
n!
since V(O1 ) = V([−1, 1]) = 2.
Consequently (L denoting, as usual, the matrix of the mapping L̄),
V(C) ≥ V(S)
= | det L| V(On )
i 2n
h
= det λ1 v̄1 · · · λ1 v̄n ·
n
1
n!
i 2n
h
1
det
=
v̄1 · · · v̄n ·
λ1 · · · λ n
n!
i h
2n
1
=
·
det ¯l1 · · · ¯ln A n! λ1 · · · λn
2n |det A| det Λ
·
=
n!
λ 1 · · · λn
det Λ
2n
.
·
≥
n! λ1 · · · λn
As for the upper bound, let C = Bn and define
µ :=
2n det Λ
V(Bn )λ1 · · · λn
Then it suffices to prove that µ ≥ 1.
41
n1
.
Pick again n linearly independent lattice vectors v̄1 , . . . , v̄n ∈ Λ such that
v̄i ∈ λi Bn , meaning kv̄i k = λi for i = 1, . . . , n. Using the Gram-Schmidt orthogonalization procedure, an orthonormal basis {ē1 , . . . , ēn } of the space Rn
can be constructed so that hē1 , . . . , ēi iR = hv̄1 , . . . , v̄i iR for i = 1, . . . , n. Let
L̄ : Rn → Rn be a linear transformation defined by L̄ēi = λi ēi , i = 1, . . . , n;
it has determinant det L = λ1 · · · λn . When the unit ball is transformed and
dilated, its volume becomes
V µL̄(Bn ) = µn λ1 · · · λn V(Bn ) = 2n det Λ.
Hence, according to Minkowski’s first convex body theorem, the set µL̄(Bn )
contains a non-zero lattice point 0̄ 6= z̄ ∈ Λ.
First, L̄−1 (z̄) ∈ µBn , so
−1 L̄ (z̄) ≤ µ.
(4.2)
Then, since z̄ ∈ Λ \ {0̄}, we have kz̄k ≥ λ1 . Let
j := max{i ∈ {1, . . . , n} | kz̄k ≥ λi }.
This means that z̄ ∈ hv̄1 , . . . , v̄j iR . Indeed, if j = n, then hv̄1 , . . . , v̄n iR =
Rn 3 z̄. If j < n, it holds kz̄k < λj+1 . Now the vectors v̄1 , . . . , v̄j ∈ Λ are
linearly independent and kv̄1 k, . . . , kv̄j k ≤ λj < λj+1 too. As there cannot be
j + 1 linearly independent lattice points having euclidean norm strictly less
than λj+1 , we must have z̄ ∈ hv̄1 , . . . , v̄j iR .
As mentioned before, hv̄1 , . . . , v̄j iR = hē1 , . . . , ēj iR , so we can write z̄ =
P
z1 ēi + . . . + zj ēj for some z1 , . . . , zj ∈ R. Accordingly, L̄−1 (z̄) = jk=1 λzkk ēk .
Using the definition of j and the fact that λ1 < . . . < λj results in
j
−1 2 X
L̄ (z̄) =
k=1
zk
λk
2
j
1 X 2 kz̄k2
z = 2 ≥ 1.
≥ 2
λj k=1 k
λj
42
(4.3)
Combining the inequalities (4.2) and (4.3) shows that µ ≥ 1, thus completing
the proof.
Remark 9. Proving the upper bound for the euclidean unit ball actually
proves it for any ellipsoid E = L̄(Bn ), where L̄ : Rn → Rn is a linear transformation: Denote Λ0 := L̄(Λ). Then the successive minima of the ellipsoid E
with respect to the lattice Λ0 are equal to the successive minima of the unit
ball Bn with respect to the lattice Λ. Furthermore, V(E) = | det L|V(Bn ) and
det Λ0 = | det L| det Λ.
V(E) =
1
Therefore
det Λ0
2n det Λ0
det Λ0 2n det Λ
=
.
V(Bn ) ≤
·
det Λ
det Λ λ1 · · · λn
λ 1 · · · λn
The full proof of Minkowski’s second theorem, requiring more advanced
theory, can be found in Chapter 8 of Cassels’ book [3].
1
If the vectors ¯l1 , . . . , ¯ln form a base for the lattice Λ, then the vectors L̄ ¯l1 , . . . , L̄ ¯ln
form a base for the image Λ0 . The latter equality follows directly from this.
43
Chapter 5
Some Diophantine inequalities
The first theorem of Minkowski from the preceding chapter is now put to
use.
5.1
A few Diophantine inequalities over R
Theorem 10. Let α1 , . . . , αm ∈ R and h1 , . . . , hm ∈ Z+ be given. Then
there exist m + 1 numbers p, q1 , . . . , qm ∈ Z, with some qi 6= 0, satisfying the
conditions
|qi | ≤ hi ,
i = 1, . . . , m
(5.1)
and
|p + q1 α1 + . . . + qm αm | <
1
.
h1 · · · hm
(5.2)
Proof. [9, p. 5] Writing L0 x̄ := x0 + α1 x1 + . . . + αm xm and Lk x̄ := xk , k =
1, . . . , m, defines m+1 linear mappings Li : Zm+1 → R, i = 0, 1, . . . , m+1. By
transforming the integer lattice Zm+1 = Zē1 + . . . + Zēm+1 with the mapping
44
L̄ := (L0 , L1 , . . . , Lm ) : Zm+1 → Rm+1 we get a full lattice
Λ := L̄(Zm+1 )
= ZL̄ē1 + . . . + ZL̄ēm+1
= Z(1, 0, . . . , 0)T + Z(α1 , 1, 0, . . . , 0)T + . . . + Z(αm , 0, . . . , 1)T
=: Z¯l0 + Z¯l1 + . . . + Z¯lm
⊆ Rm+1
with determinant
h
det Λ = det ¯l0 ¯l1
Now put
1 α 1 α 2
0 1 0
i · · · ¯lm = 0 0 1
.. ..
..
. .
.
0 0 0
|L0 x̄| <
and
···
···
···
...
···
1
1
:=
h
h1 · · · hm
1
|Lk x̄| ≤ hk + ,
2
αm 0 0 = 1.
.. . 1 k = 1, . . . , m.
These conditions determine a central symmetric convex body
1
1
T
m+1 m+1
C := (y0 , y1 , . . . , ym ) ∈ R
,
|y0 | < h , |yk | ≤ hk + 2 ⊆ R
where k = 1, . . . , m. The volume of this hyperrectangle is
1
1
2
· . . . · 2 hm +
V(C) = · 2 h1 +
h
2
2
Qm
1
m+1
2
k=1 hk + 2
=
h1 · · · hm
> 2m+1
= 2m+1 det Λ.
45
By the first Minkowski’s convex body theorem, there exists a non-zero vector
0̄ 6= ȳ := (p + q1 α1 + . . . + qm αm , q1 , . . . , qm )T ∈ C ∩ Λ,
where (p, q1 , . . . , qm )T ∈ Zm+1 \ {0̄}, |qi | ≤ hi for all i = 1, . . . , m, and
|p + q1 α1 + . . . + qm αm | <
1
.
h1 · · · hm
(5.3)
If qi = 0 for all i = 1, . . . , m, then the equation (5.3) above implies p = 0
too, which is a contradiction.
Remark 10. If −α ∈ R is an irrational number and h ∈ Z+ , according to
Theorem 10 there exist p, q ∈ Z, q 6= 0, such that
p
− α = 1 |p − qα| ≤ |p − qα| < 1 .
q
|q|
h
This is a proof for the fact that any real irrational number can be approximated by rationals with accuracy as good as we please. Since |q| ≤ h, actually we get the result first discovered by Dirichlet in 1842: For any irrational
number α ∈ R there exists infinitely many rational numbers
p
− α < 1 .
q
q2
p
q
∈ Q with
The fact that there are indeed infinitely many solutions to this equation will
be demonstrated later in Theorem 12.
Definition 19. An integer vector (r1 , . . . , rn )T ∈ Zn is said to be primitive,
if gcd(r1 , . . . , rn ) = 1.
With this definition a tuned version of the previous theorem can easily
be achieved:
46
Theorem 11. Let α1 , . . . , αm ∈ R and h1 , . . . , hm ∈ Z+ be given. Then there
exists a primitive vector (s, r1 , . . . , rm ) ∈ Zm+1 , with some ri 6= 0, satisfying
the conditions
|ri | ≤ hi ,
i = 1, . . . , m
and
|s + r1 α1 + . . . + rm αm | <
1
.
h1 · · · hm
Proof. [9, p. 6] Theorem 10 gives an integer vector (p, q1 , . . . , qm )T ∈ Zm+1 \
{0̄}, with some qi 6= 0, satisfying the conditions (5.1) and (5.2). Denoting
d := gcd(p, q1 , . . . , qm ) ∈ Z+ we have (p, q1 , . . . , qm )T = d(s, r1 , . . . , rm )T ,
where (s, r1 , . . . , rm )T ∈ Zm+1 \ {0̄} is now primitive. Furthermore,
d |s + r1 α1 + . . . + rm αm | = |p + q1 α1 + . . . + qm αm | <
1
,
h1 · · · hm
meaning
|s + r1 α1 + . . . + rm αm | <
1
1
≤
.
dh1 · · · hm
h1 · · · hm
Also |ri | ≤ |qi | ≤ hi for all i = 1, . . . , m, and ri =
1
q
d i
6= 0 for some i =
1, . . . , m.
Theorem 12. Let 1, α1 , . . . , αm ∈ R be linearly independent over Q. Then
there exist infinitely many primitive integer vectors v̄k := (pk , q1,k , . . . , qm,k )T ∈
Zm+1 \ {0̄} satisfying the condition
|pk + q1,k α1 + . . . + qm,k αm | <
1
1
=: ,
h1,k · · · hm,k
hk
(5.4)
where hi,k := max{1, |qi,k |}, i = 1, . . . , m.
Proof. [9, p. 7] Suppose on the contrary that there exist only finitely many
primitive vectors v̄k ∈ Zm+1 \ {0̄} satisfying the inequality (5.4). As these
47
vectors are non-zero and the elements 1, α1 , . . . , αm linearly independent, a
positive minimum exists: denote
1
:= min |pk + q1,k α1 + . . . + qm,k αm | > 0.
k
R
(5.5)
Choose then m such numbers ĥi ∈ Z+ , i = 1, . . . , m, that for their product
ĥ1 · · · ĥm =: ĥ it holds
1
ĥ
≤
1
.
R
Theorem 11 gives then a primitive vector
(p̂, q̂1 , . . . , q̂m ) ∈ Zm+1 with max {1, |q̂i |} ≤ ĥi , i = 1, . . . , m, satisfying the
condition
|p̂ + q̂1 α1 + . . . + q̂m αm | <
1
ĥ
≤
1
.
R
This contradicts the minimum assumption (5.5).
As a corollary of Theorem 12 we get
Theorem 13. Let 1, α1 , . . . , αm ∈ R be linearly independent over Q. If there
exist positive constants c, ω ∈ R+ such that the inequality
|β0 + β1 α1 + . . . + βm αm | ≥
c
(h1 · · · hm )ω
(5.6)
holds for all vectors (β0 , β1 , . . . , βm )T ∈ Zm+1 \{0̄} with hk := max{1, |βk |}, k =
1, . . . , m, then ω ≥ 1.
Proof. [9, p. 7] Again we prove by contradiction: assume ω < 1. By Theorem
12 and the assumption (5.6) there exists an infinite amount of primitive
vectors v̄k = (pk , q1,k , . . . , qm,k )T ∈ Zm+1 \ {0̄} satisfying
c
1
,
≤ |pk + q1,k α1 + . . . + qm,k αm | <
ω
(h1,k · · · hm,k )
h1,k · · · hm,k
where hi,k = max{1, |qi,k |}, i = 1, . . . , m. This results in
c
·
(h1,k · · · hm,k )ω
1
c
(h1,k · · · hm,k )1−ω
48
−1
> 0,
and the fact that c, h1,k , . . . , hm,k > 0 further implying
1
c
Consequently
1
c
(h1,k · · · hm,k )1−ω
> 1.
> (h1,k · · · hm,k )1−ω , meaning
log(h1,k · · · hm,k ) <
log 1c
.
1−ω
However, from the infinity of solutions such a vector v̄k can be chosen that
the numbers h1,k , . . . , hm,k ∈ Z+ are large enough to fulfill the condition
log(h1,k · · · hm,k ) ≥
Therefore we must have ω ≥ 1.
log 1c
.
1−ω
The definitions 3, 4, 9, 10 and 11 from Chapters 1 and 3 should be recalled
before the following theorem, which presents an inequality on the powers of
an algebraic integer.
Theorem 14. Let α ∈ B with degQ α = m + 1, and denote by α(0) :=
α, α(1) , . . . , α(m) its conjugates relative to the field Q(α). Then the inequality
1
m
|p + q1 α + . . . + qm α | >
, A := max 1, α ,
(3mH)m Am2
holds true for all rational integer vectors (p, q1 , . . . , qm )T ∈ Zm+1 with 1 ≤
H := max1≤i≤m |qi |.
Proof. [9, p. 8] Let (p, q1 , . . . , qm )T ∈ Zm+1 \{0̄}. First note that m, H, A ≥ 1.
So, if |p + q1 α + . . . + qm αm | ≥ 1, the claim holds. Hence it suffices to study
the case |p + q1 α + . . . + qm αm | < 1. The triangle inequality gives
|p| − |q1 α + . . . + qm αm |
≤ |p| − |q1 α + . . . + qm αm |
≤ |p + q1 α + . . . + qm αm |
< 1,
49
leading to
|p| < 1 + |q1 α + . . . + qm αm |
≤1+
m
X
i=1
≤1+H
≤1+H
|qi ||α|i
m
X
i=1
|α|i
m
X
max 1, |α|i
i=1
≤ 1 + mH(max {1, |α|})m .
Then
m p + q1 α(i) + . . . + qm α(i) m < |p| + q1 α(i) + . . . + qm α(i) m
≤ 1 + mH(max {1, |α|})m + mH max 1, α(i) (i) m
≤ 3mH max 1, α
0≤i≤m
m
= 3mH max 1, α
= 3mHAm .
Because α is an algebraic integer of degree degQ α = m + 1, the polynomial
expression Θ := p + q1 α + . . . + qm αm ∈ Z[α] \ {0̄} is a non-zero algebraic
integer and therefore N (Θ) ∈ Z \ {0}. Adopting, as with α, the notation
50
Θ(0) := Θ we have
1 ≤ |N (Θ)|
m
Y
= Θ(i) i=0
m Y
(i) m (i)
≤ |Θ|
p + q1 α + . . . + qm α
i=1
2
< |Θ| (3mH)m Am .
5.2
On simultaneous Diophantine inequalities
Before concluding the chapter with a Diophantine inequality over C, a couple
of results on simultaneous Diophantine approximations are introduced.
Theorem 15. Let the numbers α1 , . . . , αm ∈ R and f1 , . . . , fm ∈ R≥0 with
f1 + . . . + fm = 1 be given. Then there exist p ∈ Z+ and q1 , . . . , qm ∈ Z
satisfying the equations
|pαi + qi | <
1
,
p fi
i = 1, . . . , m.
Proof. [9, p. 10] Let q ∈ Z+ and define a set
1
1
T
m+1 C := (x0 , x1 , . . . , xm ) ∈ R
|x0 | ≤ q + 2 , |x0 αi + xi | < q fi ,
where i = 1, . . . , m. This is a central symmetric convex body with volume
V(C) = (2q + 1)
2
2m+1 q 2m
2
+
> 2m+1 .
·
·
·
=
q f1
q fm
q
q
Hence the first Minkowski’s convex body theorem with the integer lattice
Λ = Zm+1 gives a non-zero integer vector 0̄ 6= (p, q1 , . . . , qm )T ∈ C ∩ Λ.
51
Here p 6= 0, for otherwise the equation |pαi + qi | <
1
q fi
would imply q1 =
. . . = qm = 0 as well. Because the set C is central symmetric, we have
(−p, −q1 , . . . , −qm )T ∈ C ∩ Λ too, so we can choose p to be positive. Since
now p ≤ q, we have also
|pαi + qi | <
1
1
≤
,
f
qi
p fi
i = 1, . . . , m,
as was to be proved.
Remark 11. If (p, q1 , . . . , qm )T = d(s, r1 , . . . , rm )T ∈ Zm+1 is the integer vector given by Theorem 15 with d := gcd(p, q1 , . . . , qm ), then
1
1
≤
,
f
pi
sfi
|sαi + ri | ≤ |pαi + qi | <
i = 1, . . . , m,
so there exists a primitive solution as well.
Theorem 16. Let at least one of the numbers α1 , . . . , αm ∈ R be irrational.
Then there exist infinitely many primitive vectors v̄k := (pk , q1,k , . . . , qm,k )T ∈
Zm+1 \ {0̄} with pk ∈ Z+ satisfying the conditions
|pk αi + qi,k | <
1
1
m
,
i = 1, . . . , m.
pk
Proof. We proceed as in the proof of Theorem 12, and suppose that there
are only finitely many primitive solutions v̄k . Since these are non-zero and
some number αj is irrational, there is a positive minimum
1
:= min |pk αj + qj,k | > 0.
k
R
Choose then q ≥ Rm and fi =
1
,
m
i = 1, . . . , m, in the proof of Theorem 15. It
follows that there exists a primitive vector 0̄ 6= (p̂, q̂1 , . . . , q̂m )T ∈ Zm+1 , p̂ ∈
Z+ , such that
|p̂αi + q̂i | <
1
q
1
m
≤
1
1
p̂ m
52
,
i = 1, . . . , m.
In particular,
|p̂αj + q̂j | <
1
q
1
m
≤
1
,
R
which is a contradiction.
5.3
A Diophantine inequality over C
√
Theorem 17. Let I = Q( −D), where D ∈ Z+ , denote an imaginary
quadratic field, and ZI its ring of integers. Let the numbers Θ1 , . . . , Θm ∈ C
and H1 , . . . , Hm ∈ Z+ be given. Then there exists a non-zero integer vector
\ {0̄} with |βj | ≤ Hj , j = 1, . . . , m, satisfying the
(β0 , β1 , . . . , βm )T ∈ Zm+1
I
inequality
1
|β0 + β1 Θ1 + . . . + βm Θm | ≤
where
τ=


1,

1,
2
2τ D 4
√
π
D ≡ 1 or 2
D≡3
!m+1
1
,
H1 · · · Hm
(mod 4)
.
(mod 4)
Proof. [11, p. 8] The ring of integers ZI of the field I, given by Theorem 3,
√
can be written in the form ZI = Z + Z h + l −D , where




1, D ≡ 1 or 2 (mod 4)
0, D ≡ 1 or 2 (mod 4)
and l =
.
h=


 1 , D ≡ 3 (mod 4)
 1 , D ≡ 3 (mod 4)
2
2
√ First define a full lattice λ = Z(1, 0) + Z h, l D ⊆ R2 , which has determinant
√
√ −2h
1
h det λ = √ = l D = D2 .
0 l D (5.7)
Define also a complex disc
o
n
√
√
DR := x + y h + l −D ∈ C x, y ∈ R, x + y h + l −D ≤ R
53
of radius R > 0, and a corresponding real disc
CR := (v, w)T ∈ R2 v 2 + w2 ≤ R2 .
√
Here V(CR ) = πR2 . Now x + y h + l −D ∈ DR ∩ ZI means that x, y ∈ Z
√ 2
√ T
and (x + yh)2 + yl D
≤ R2 , implying x + yh, yl D
∈ CR ∩ λ.
√ T
The other way round, x + yh, yl D
∈ CR ∩ λ means exactly that x +
√
y h + l −D ∈ DR ∩ ZI . In this way the intersection CR ∩ λ is a real counterpart for the set DR ∩ ZI .
Next define a lattice Λ := λm+1 ⊆ R2m+2 . Recalling Lemma 6, it has
m+1
√
. By using the following notations
D2−2h
determinant det Λ =

√


a
+
b
h
+
l
−D
= −(z1 Θ1 + . . . + zm Θm ), a, b ∈ Z





√


zk = xk + yk h + l −D , xk , yk ∈ R, k = 0, 1, . . . , m,








vk = xk + yk h = Re(zk ),



√
D = Im(zk ),
w
=
y
l
k
k



1 m+1


1
2τ√D 4

,
R
:=

0
H1 ···Hm
π








1, D ≡ 1 or 2 (mod 4)



τ=





 1 , D ≡ 3 (mod 4)
2
two more sets are defined:
n
√
D := (z0 , z1 , . . . , zm )T ∈ Cm+1 z0 − a + b h + l −D ≤ R0 ,
o
|zk | ≤ Hk , k = 1, . . . , m
and
n
C := (v0 , w0 , v1 , w1 , . . . , vm , wm )T ∈ R2m+2 o
√ 2
(v0 − (a + bh))2 + w0 − bl D ≤ R02 , vk2 + wk2 ≤ Hk2 , k = 1, . . . , m .
54
The set C is a compact convex body, and if the coordinate axes are shifted
√
so that the origin lies at the point (a + bh, bl D, 0, . . . , 0)T ∈ Λ (this doesn’t
change the lattice), it is also central symmetric. The volume of the set C is
by definition
ZZ
···
= πR02
ZZ
V(C) =
ZZ




···

ZZ
√ 2
(v0 −(a+bh))2 +(w0 −bl D ) ≤R02
ZZ
= ...



ZZ
v12 +w12 ≤H12


dv0 dw0 
 dv1 dw1 · · · dvm dwm

dv1 dw1  dv2 dw2 · · · dvm dwm
2 2
= π m+1 H12 · · · Hm
R0
2
= π m+1 H12 · · · Hm
= 22m+2
= 22m+2
√ !m+1
1
22τ D
2
2
π
H1 · · · Hm
τ √ m+1
D
√ !m+1
D
2h
2
= 22m+2 det Λ.
The neat result explains the seemingly random definition of the radius R0 .
Pay also attention to how the volume, like the determinant in (5.7), can be
expressed in terms of h due to the way the numbers h, l and τ are defined.
Now, by the first Minkowski’s convex body theorem, there exists a non-zero
lattice point
√ T
x0 + y0 h, y0 l D, . . . , xm + ym h, ym l D ∈ C ∩ Λ \ {0̄}.
√
The correspondence between the sets CR ∩ λ and DR ∩ ZI implies then the
55
existence of a non-zero integer vector
(β0 , β1 , . . . , βm )T
T
√
√
:= x0 + y0 h + l −D , . . . , xm + ym h + l −D
\ {0̄},
∈ D ∩ Zm+1
I
with |βk | ≤ Hk , k = 1, . . . , m, satisfying the condition
1
|β0 + β1 Θ1 + . . . + βm Θm | ≤
56
2τ D 4
√
π
!m+1
1
.
H1 · · · Hm
Chapter 6
Applying Siegel’s lemma
Let again I stand for an imaginary quadratic field or the field Q of rational
numbers, and denote with ZI its ring of integers.
6.1
Approximations for eαt
In the article [4] the writers study linear forms of exponential values, and the
following theorem is one of the lemmas they need in order to prove a lower
bound for the expression |β0 + β1 eα1 + . . . + βm eαm |, where (β0 , β1 , . . . , βm )T ∈
Zm+1
and α1 , . . . , αm ∈ I. Here it serves as an enlightening example on how
I
Siegel’s lemma (Theorem 6) can be used.
Definition 20. The order of a power series P (x) =
ordx=c P (x) := inf {k ∈ N | ak 6= 0}.
Theorem 18. Let αj =
xj
yj
P∞
k=0
ak (x − c)k is
∈ I \ {0} be m different numbers with xj ∈ ZI \
{0}, yj ∈ Z+ and gcd(xj , yj ) = 1 for j = 1, . . . , m. Let also ν1 , . . . , νm ∈ Z+
and l1 , . . . , lm ∈ Z+ be such that 1 ≤ νj ≤ lj for j = 1, . . . , m, and denote
M := ν1 + . . . + νm ≤ L := l1 + . . . + lm . Write a := max1≤j≤m {|xj | + |yj |}
57
n
and b := max1≤j≤m 1 +
|xj |
|yj |
o
, as well as ᾱ := (α1 , . . . , αm )T ∈ Im and
ν̄ := (ν1 , . . . , νm )T ∈ Zm . Then there exists a non-zero polynomial
P0,0 (t) =
L
X
ch
h=0
L! h
t ∈ ZI [t]
h!
depending on L, with
1
M
2 L+1−M
√ √
L+1−M
M L M2
a b
,
|ch | ≤ max 4 2 D, sI tI
h = 0, 1, . . . , L. (6.1)
Furthermore, there exist non-zero polynomials P0,j (t) ∈ ZI [t, ᾱ], j = 1, . . . , m,
depending on L and ν̄, such that
P0,0 (t)eαj t − P0,j (t) = R0,j (t),
where


degt P0,j (t) ≤ L,
j = 1, . . . , m,
j = 0, 1, . . . , m,

L + νj + 1 ≤ ordR0,j (t) < ∞,
t=0
j = 1, . . . , m.
Proof. [4, p. 11] Let
P0,0 (t)eαj t =
∞
X
rN,j tN ,
j = 1, . . . , m,
N =0
where
rN,j :=
X
h+n=N
0≤h≤L
ch
L! αjn
.
h! n!
Cut this series after L + 1 terms and let
P0,j (t) :=
L
X
rN,j tN ,
N =0
58
j = 1, . . . , m.
Set rL+ij ,j = 0 for ij = 1, . . . , νj , j = 1, . . . , m. Then also
L+ij
(L + ij )! yj
0=
i rL+ij ,j
L!
xjj
L+ij
(L + ij )! yj
=
i
L!
xjj
X
=
h+n=L+ij
0≤h≤L
X
h+n=L+ij
0≤h≤L
ch
L! xnj
h!n! yjn
(L + ij )! n−ij L+ij −n
xj yj
ch
h!n!
(6.2)
L
X
(L + ij )!
xL−h
yjh ch
j
h!(L
+
i
j − h)!
h=0
L X
L + ij L−h h
=
xj yj ch
h
h=0
=
for ij = 1, . . . , νj , j = 1, . . . , m, meaning that we have M equations in L + 1
L−h h
j
unknowns ch , h = 0, 1, . . . , L, with coefficients L+i
xj yj ∈ ZI . These
h
coefficients satisfy
Aj,ij
L X
L + ij L−h h xj yj :=
h
h=0
L 1 X L + ij
|xj |L+ij −h |yj |h
=
|xj |ij h=0
h
L+ij 1 X L + ij
|xj |L+ij −h |yj |h
<
|xj |ij h=0
h
1
(|xj | + |yj |)L+ij
i
j
|xj |
i
|yj | j
L
= (|xj | + |yj |) 1 +
,
|xj |
=
when ij = 1, . . . , νj , j = 1, . . . , m. (Recall that in order to apply Theorem 6,
59
the product of the row sums (2.6) is needed.) Then
Y
Aj,ij <
j,ij
Y
j,ij
=
Y
j
Y
(|xj | + |yj |)
L
(|xj | + |yj |)
νj L
≤ aM L b
M2
2
j,ij
|yj |
1+
|xj |
Y
j
i j
|yj |
1+
|xj |
νj (ν2j +1)
.
The last step follows by employing the definition M = ν1 + . . . + νm , which
P
Pm 2 Pm
2
implies m
ν
(ν
+1)
=
j
j
j=1
j=1 νj +
j=1 νj ≤ M . Thus by Theorem 6 there
\ {0̄} to the group of M equations
exists a solution (c0 , c1 , . . . , cL )T ∈ ZL+1
I
derived in (6.2) with
1
M
2 L+1−M
√ √
L+1−M
M L M2
|ch | ≤ max 4 2 D, sI tI
a b
,
Writing
R0,j (t) :=
∞
X
h = 0, 1, . . . , L.
rN,j tN
N =L+νj +1
we get
P0,0 (t)eαj t − P0,j (t) = R0,j (t),
j = 1, . . . , m.
Here P0,j (t) are non-zero polynomials for all j = 0, 1, . . . , m, since the solution
(c0 , c1 , . . . , cL )T is a non-zero vector. In addition


degt P0,j (t) ≤ L, j = 0, 1, . . . , m,

L + νj + 1 ≤ ordR0,j (t) < ∞,
t=0
as the series R0,j (t) is non-zero as well.
60
j = 1, . . . , m,
6.2
Siegel’s lemma refined
Suppose now that I = Q and αj ∈ Z \ {0} (xj ∈ Z \ {0}, yj = 1) , j =
1, . . . , m, m ≥ 2, are m different rational integers. Then the equations
L X
L + ij
h=0
h
yjh ch = 0,
xL−h
j
ij = 1, . . . , νj , j = 1, . . . , m,
derived in (6.2) can be written in matrix form:

L
L−1

L+1
L+1
L+1
L+1
x
x1
···
x1
1
L−1
L
 0 1

 L+2 L
L+2
L+2
L+2 
L−1

 0 x1
x1
···
x1
1
L−1
L


..
.. 
..
..

...

.
. 
.
.

L−1

 L+ν1 L
L+ν1
L+ν1 
L+ν1

 0 x1
x1
···
x1
1
L
L−1



.
.
..
..
...
..
.. 


.
.


L+νm
L+νm
L+νm
L+νm
L
L−1
xm
xm
···
xm
0
L
1
L−1
M ×(L+1)

c0





 c1 


 .. 
 .  = 0̄.




cL−1 


cL
(6.3)
Denote with B the above matrix of coefficients.
The interest in this matrix representation is explained when looking at
the following improved version of Siegel’s lemma, proved by Bombieri and
Vaaler in 1983:
Theorem 19. Let A := [amn ] ∈ MM ×N (Z) with rank A = M < N . Then
the equation Ax̄ = 0̄ has N − M linearly independent integer solutions
x̄1 , . . . , x̄N −M ∈ ZN \ {0̄}
satisfying
NY
−M
k=1
kx̄k k∞ ≤
p
(6.4)
det(AAT )
,
D
where D is the greatest common divisor of all M × M minors of the matrix
A.
61
Proof. The proof, which is beyond the scope of this work, can be found in
the article [2].
Write

āT
 1


 .. 
A = [amn ] =  .  ,
 
āTM
where āTm := (am1 , . . . , amN ) denotes the mth row of A. The Gram determinant det(AAT )
1
satisfies
q
p
det(AAT ) = det [āl · ām ]1≤l,m≤M
v
uM
uY
≤t
ām · ām
m=1
=
≤
≤
≤
M
Y
m=1
M
Y
m=1
M
Y
m=1
N
kām k
kām k1
N max |amn |
1≤n≤N
max |amn |
1≤m,n≤N
M
.
For the solutions (6.4) we can write kx̄1 k∞ ≤ kx̄2 k∞ ≤ . . . ≤ kx̄N −M k∞ .
1
The Gram determinant is always non-negative, and since rank A = M , in this case it
is strictly positive. We have also the so-called Hadamard inequality
det [āl · ām ]1≤l,m≤M ≤
62
M
Y
m=1
ām · ām .
Then
kx̄1 k∞ ≤
NY
−M
k=1
kx̄k k∞
1
! N −M
1
! N −M
p
det(AAT )
≤
D
NM
−M
1
≤
N max |amn |
.
1
1≤m,n≤N
D N −M
In view of Remark 1, this means that in the rational case the estimate (6.1)
could possibly be sharpened if the number D for the matrix B in (6.3) happened to be some other than one.
Example. Take m = M = L = 2 in (6.3), whereupon
 
  2
3
3
3
2
x
x
x
3x
3
1
1
1
1
1
2 
,
B =  0
=
3
3
3
2
2
x2 1 x2 2
x2 3x2 3
0
and its 2 × 2 minors are
2
x1 3x1 = 3x21 x2 − 3x1 x22 = 3x1 x2 (x1 − x2 ),
2
x2 3x2 3x1 3
= 9x1 − 9x2 = 9(x1 − x2 ),
3x2 3
2 x1 3 2
2
2 = 3x1 − 3x2 = 3(x1 + x2 )(x1 − x2 ).
x2 3 The greatest common divisor of these appears to be D = 3|x1 − x2 | > 1.
Let now
0 ≤ k1 < k 2 < . . . < k M ≤ L
63
(6.5)
and consider an arbitrary M × M minor
L−k2
L+1 L−k1
L+1
k1 x 1
x1
k2
L+2 L−k1
L+2
2
k1 x 1
xL−k
1
k2
..
..
.
.
L+ν1 L−k1
L+ν1
2
k1 x 1
xL−k
1
k2
L−k2
L+1 L−k1
L+1
k x2
x2
k2
1
..
..
.
.
L+νm L−k1 L+νm L−k2
k xm
xm
k2
1
···
···
...
···
···
...
···
L−kM L+2
x
1
kM
..
.
L−kM L+ν1
.
x1
kM
L−kM L+1
x2
kM
..
.
L+νm
L−kM xm
k
L+1
kM
M
xL−k
1
(6.6)
M
Interchange then the second and the (ν1 + 1)th row, so that the first two rows
of the minor are


L+1
k1
L+1
k1
1
xL−k
1
1
xL−k
2
L+1
k2
L+1
k2
2
xL−k
1
···
2
···
xL−k
2
L+1
kM
L+1
kM
M
xL−k
1
M
xL−k
2

.
(6.7)
(This of course only affects the sign of the minor.) Expanding this permutated
determinant with the Laplace formula, each time with respect to the bottom
row, means that eventually we are calculating 2 × 2 minors of the first two
rows (6.7). Let 1 ≤ r < s ≤ M . Then 0 ≤ kr < ks ≤ L by (6.5) and
L+1 L−kr L+1 L−ks kr x 1
x1 ks
L+1 L−kr L+1 L−ks k x2
x2 ks
r L + 1 L + 1 L−kr L−ks
L + 1 L + 1 L−ks L−kr
=
−
x1 x2
x1 x2
kr
ks
kr
ks
L + 1 L + 1 L−ks L−ks ks −kr
=
x1 x2
− xk2s −kr .
x1
kr
ks
Since (x1 − x2 ) divides x1ks −kr − xk2s −kr , it is a common factor of all the
2 × 2 minors of the first two rows. Therefore (x1 − x2 ) is a factor of the minor
(6.6), implying D ≥ |x1 − x2 |.
Even more: Because any two rows of the determinant (6.6) can be changed
to the top before expanding, actually we have |xi − xj |D for all i, j =
64
1, . . . , m, i 6= j. Notably,
D ≥ max |xi − xj |.
1≤i,j≤m
This raises the question of whether one could get the whole product of
these differences out of the minor (6.6). Because the determinant essentially
consists of decreasing powers of some integers, we shall next explore something similar.
6.3
Vandermonde determinants
The Vandermonde determinant formula, about to be proved in Lemma 10,
is well worth studying as it so closely resembles our case. A property of
Vandermonde matrices is needed first, though.
Lemma 9. Let a0 , a1 , . . . , an ∈ C, n ∈ Z+ , be n + 1 distinct numbers. Then
the Vandermonde matrix

is non-singular.
1 a0


h i  1 a1
aji = 
 .. ..
. .

1 an
···

an0


· · · an1 

.. 
..
. .

· · · ann
h i
Proof. Denote A := aji . Following the idea of Example 3 in [6, p. 84], we
show that the only solution to the equation Ac̄ = 0̄ is c̄ = 0̄ ∈ Cn+1 . Let
c̄ = (c0 , c1 , . . . , cn )T ∈ Cn+1 and q(x) = c0 + c1 x + . . . + cn xn ∈ C[x]. The
equation Ac̄ = 0̄ implies precisely that q(a0 ) = q(a1 ) = . . . = q(an ) = 0; in
other words, the polynomial q(x) has n + 1 distinct zeros. But deg q ≤ n, so
it must be that q(x) = 0(x). Then c̄ = 0̄, which means that det A 6= 0.
65
Lemma 10 (Vandermonde determinant). Let a0 , a1 , . . . , an ∈ C, n ∈ Z+ .
Then
1 a0
h i 1 a1
det aji = . .
0≤i,j≤n
.. ..
1 an
···
an0 Y
· · · an1 =
(aj − ai ).
.
..
. .. 0≤i<j≤n
· · · ann Proof. [7, p. 5] [14, p. 44] If ai = aj for some i 6= j, it follows that
h i
det aji = 0
0≤i,j≤n
and the claim holds. Suppose then that the numbers a0 , a1 , . . . , an are dish i
tinct. Denote det0≤i,j≤n aji =: ∆(a0 , a1 , . . . , an−1 , an ). Then the expression
∆(a0 , a1 , . . . , an−1 , x) is a polynomial in x of degree n with the leading coef-
ficient
1 a0
1 a1
..
..
.
.
1 an−1
···
a0n−1 · · · a1n−1 .. = ∆(a0 , a1 , . . . , an−1 ) 6= 0.
..
.
. n−1 · · · an−1
This polynomial has the distinct zeros a0 , a1 , . . . , an−1 ∈ C, since inserting
x = ai for any i = 0, 1, . . . , n − 1 results in two same rows in the determinant
∆(a0 , . . . , an−1 , x). Hence the polynomial ∆(a0 , a1 , . . . , an−1 , x) factorizes as
∆(a0 , a1 , . . . , an−1 , x) = ∆(a0 , a1 , . . . , an−1 ) · (x − an−1 ) · · · (x − a1 )(x − a0 ).
So it follows inductively (keeping Lemma 9 in mind)
∆(a0 , a1 , . . . , an−1 , an )
= ∆(a0 , a1 , . . . , an−1 )(an − an−1 ) · · · (an − a1 )(an − a0 )
= ...
Y
(aj − ai ).
=
0≤i<j≤n
66
Now rearrange the minor (6.6) in such a way that the first m rows are

L−k1 L+1 L−k2
L−kM 
L+1
L+1
x
·
·
·
x
x1
1
k2
kM
 k1 1

 L+1 L−k1 L+1 L−k2
L−kM 
L+1
 k1 x 2

· · · kM x 2
x2
k2


.
(6.8)


..
..
.
..
..
.


.
.

L−k
L−k
L−k 
L+1
L+1
L+1
1
2
xm
· · · kM x m M
xm
k1
k2
m×M
Let 1 ≤ q1 < q2 < . . . < qm ≤ M , whereupon 0 ≤ kq1 < kq2 < . . . < kqm ≤ L
by (6.5). Examine an arbitrary m × m minor of the first m rows in (6.8) (by
definition, M = ν1 + . . . + νm ≥ m):
L−kqm L+1 L−kq1 L+1 L−kq2
L+1
· · · kq x 1
x1
kq x 1
kq 2
m
1
L+1 L−kq1 L+1 L−kq2
L−kqm L+1
kq x 2
· · · kq x 2
x2
kq 2
m
1
..
..
..
..
.
.
.
.
L−kqm L+1 L−kq1 L+1 L−kq2
L+1
k xm
· · · kq x m
xm
kq 2
q
m
m×m
1
L−kq1
L−kq2
L−kqm x1
· · · x1
x1
L−kq1
L−k
L−k m x2
x 2 q2 · · · x 2 qm Y
L
+
1
= .
..
.. ..
.
kqi
.
.
.
. i=1
L−kq1
L−k
L−k xm
x m q2 · · · x m qm {z
}
|
=∆
m Y
i=1
(6.9)
=:∆
L+1
.
kqi
Denote x := (x1 , x2 , . . . , xm ). Then the determinant ∆ = ∆(x) ∈ Z[x] is an
alternating polynomial in m variables x1 , . . . , xm , for it changes sign when
interchanging any two of them.
Lemma 11. Every alternating polynomial A(x) ∈ Q[x], x = (x1 , x2 , . . . , xm ),
is a product of some symmetric polynomial
2
2
and the Vandermonde polyno-
A symmetric polynomial in the variables x1 , x2 , . . . , xm stays invariant under permu-
tation of these variables.
67
mial
V(x) :=
Y
1≤i<j≤m
(xj − xi ).
Proof. [12, p. 28] By definition,
A(x1 , . . . , xi , . . . , xj , . . . , xm ) = −A(x1 , . . . , xj , . . . , xi , . . . , xm ),
(6.10)
when i, j ∈ {1, . . . , m}, i 6= j,. Let c · axsi xtj be any term of the polynomial
A(x). Here c ∈ Q, s, t ∈ N and a denotes the product of the powers of the
other variables xk , k 6= i, j. Without loss of generality we may assume that
s ≥ t. From (6.10) it follows that
2A(x1 , . . . , xi , . . . , xj , . . . , xm )
= A(x1 , . . . , xi , . . . , xj , . . . , xm ) − A(x1 , . . . , xj , . . . , xi , . . . , xm )
= . . . + c · axsi xtj − c · axti xsj + . . .
= . . . + c · axti xtj xis−t − xjs−t + . . .
Now (xi − xj ) xis−t − xjs−t . Since the chosen term c · axsi xtj was an arbitrary
one, we have (xi − xj )|2A(x) and so (xi − xj )|A(x) in the ring Q[x]. Further-
more, V (x)|A(x) as the factors of the Vandermonde polynomial are coprime
in the ring Q[x], which is a UFD. The quotient
A(x)
V (x)
∈ Q[x] is symmetric since
interchanging any two variables changes the signs of both the numerator and
the denominator.
Thus ∆(x) = V(x)S(x) for some symmetric polynomial S(x) ∈ Z[x]. One
can argue inductively that the polynomial S(x) really has integer coefficients:
First write ∆(x) = (xm − xm−1 )S1 (x). Suppose not all the coefficients of the
polynomial
S1 (x) =
X
i≥0
n
(i)
(i)
ci · x1 1 · · · xnmm ∈ Q[x],
68
(i)
(i)
n1 , . . . , n m
∈ N,
are integers, and let c · axkm be a (not necessarily unique) term of S1 (x) for
which c ∈ Q \ Z and the power of xm is the highest possible. Here k ∈ N and
a denotes the product of the powers of the other variables x1 , . . . , xm−1 . Now
the product xm S1 (x) contains the term c · axk+1
m , but the product xm−1 S1 (x)
can only contain an integer multiple of axk+1
m , for otherwise there would be a
contradiction with the choice of the term c · axkm . As a result, the polynomial
∆(x) = xm S1 (x) − xm−1 S1 (x) ∈ Z[x]
contains a term c0 · axkm , where c0 ∈ Q \ Z. This is a contradiction, implying
S1 (x) ∈ Z[x].
Next, S1 (x) = (xm − xm−2 )S2 (x), where again S2 (x) ∈ Z[x] by repeating
the reasoning above. Continuing this way we have
∆(x) = (xm − xm−1 )S1 (x)
= (xm − xm−1 )(xm − xm−2 )S2 (x)
= ...
= (xm − xm−1 )(xm − xm−2 ) . . . (x3 − x2 )(x3 − x1 )S (m−1)m−2 (x)
2
= V(x)S(x),
where S (m−1)m−2 (x) = (x2 − x1 )S(x) ∈ Z[x], and so S(x) ∈ Z[x].
3
2
Returning to the equation (6.9) we see that the product
Y
1≤i<j≤m
(xj − xi )
is a factor of all the m × m minors of the first m rows in (6.8), which means
that it divides the minor (6.6). (As seen before, this is a consequence of the
Laplace expansion formula.) Since the chosen minor was an arbitrary one,
3
Actually S(x) =
∆(x)
V(x)
is a so-called Schur polynomial. More on these in Chapter 2 of
Procesi’s book [12].
69
the greatest common divisor D of all the M × M minors of the matrix B in
(6.3) indeed satisfies
D≥
Y
1≤i<j≤m
70
|xj − xi |.
Bibliography
[1] Berger, Marcel: Geometry Revealed, Springer, Berlin, 2010.
[2] Bombieri, E.; Vaaler, J.: On Siegel’s Lemma, Invent. math., vol 73, pp.
11 - 32, 1983.
[3] Cassels, J. W. S.: An Introduction to the Geometry of Numbers, SpringerVerlag, Berlin, 1997.
[4] Ernvall-Hytönen, Anne-Maria; Leppälä, Kalle; Matala-aho, Tapani: An
explicit Baker type lower bound of exponential values, Proc. Roy. Soc.
Edinburgh Sect. A. (In press.) http://cc.oulu.fi/~tma/JULKAISUJA.
html
[5] Evertse, Jan-Hendrik: Diophantine approximation, lecture notes, Leiden University, 2015. http://www.math.leidenuniv.nl/~evertse/
dio.shtml
[6] Johnson, Lee W.; Riess, R. Dean; Arnold, Jimmy T.: Introduction to
Linear Algebra, Addison-Wesley, Reading, 1993.
[7] Krattenthaler,
C.:
Advanced
determinant
calculus,
Séminaire
Lotharingien Combin. 42 (1999) (”The Andrews Festschrift”), Article B42q, 67 pp. math.CO/9902004. http://www.mat.univie.ac.
at/~kratt/papers.html
71
[8] Mahler, Kurt: Lectures on Transcendental Numbers, Springer-Verlag,
Berlin, 1976.
[9] Matala-aho, Tapani: A geometric face of Diophantine analysis, lecture
notes given at Summer School for Master and PhD students in Diophantine Analysis, Würzburg 2014. http://cc.oulu.fi/~tma/JULKAISUJA.
html
[10] Matala-aho, Tapani: Algebralliset luvut, lecture notes, University of
Oulu, 2014. http://cc.oulu.fi/~tma/ALGEBRALLISET.html
[11] Matala-aho, Tapani: On Baker type lower bounds for linear forms. (Submitted.) http://cc.oulu.fi/~tma/JULKAISUJA.html
[12] Procesi, Claudio: Lie Groups: An Approach through Invariants and Representations, Springer, New York, 2007.
[13] Steuding, Jörn: Diophantine Analysis, Chapman & Hall / CRC, Boca
Raton, 2005.
[14] Stewart, I. N.; Tall, D. O.: Algebraic Number Theory, Chapman & Hall
/ CRC, London, 1978.
72