Lecture notes Fundamental inequalities: techniques and applications

Lecture notes
Fundamental inequalities: techniques and
applications
Manh Hong Duong
Mathematics Institute, University of Warwick
Email: [email protected]
January 4, 2017
1
2
Abstract
Inequalities are ubiquitous in Mathematics (and in real life). For example, in
optimization theory (particularly in linear programming) inequalities are used to described constraints. In analysis inequalities are frequently used to derive a priori
estimates, to control the errors and to obtain the order of convergence, just to name
a few. Of particular importance inequalities are Cauchy-Schwarz inequality, Jensen’s
inequality for convex functions and Fenchel’s inequalities for duality. These inequalities are simple and flexible to be applicable in various settings such as in linear
algebra, convex analysis and probability theory.
The aim of this mini-course is to introduce to undergraduate students these inequalities together with useful techniques and some applications. In the first section,
through a variety of selected problems, students will be familiar with many techniques
frequently used. The second section discusses their applications in matrix inequalities/analysis, estimating integrals and relative entropy. No advanced mathematics is
required.
This course is taught for Warwick’s team for International Mathematics Competition for University Students, 24th IMC 2017. The competition is planned for
students completing their first, second, third or fourth year of university education.
3
Contents
1 Fundamental inequalities: Cauchy-Schwarz inequality, Jensen’s inequality for convex functions and Fenchel’s dual inequality
4
1.1 Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4 Convex conjugate and Fenchel’s inequality . . . . . . . . . . . . . . . . . .
7
1.5 Some techniques to prove inequalities . . . . . . . . . . . . . . . . . . . . .
8
2 Applications to matrix inequalities, estimating integrals and relative entropy
2.1 Minkowski’s inequality for determinants . . . . . . . . . . . . . . . . . . .
2.2 Hadamard’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Hermite-Hadamard inequality . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
12
13
15
17
4
1
Fundamental inequalities: Cauchy-Schwarz inequality, Jensen’s inequality for convex functions and
Fenchel’s dual inequality
In this section, we review three basic inequalities that are Cauchy-Schwarz inequality,
Jensen’s inequality for convex functions and Fenchel’s inequality for duality. For simplicity
of presentation, we only consider simplest underlying spaces such as Rn or a finite set. These
inequalities, however, can be stated in much more complex situations. Many techniques
for proving inequalities are presented via selected examples and exercises in Section 1.5.
1.1
Cauchy-Schwarz inequality
Let u, v be two vectors of an inner product space (over R, for simplicity ). The CauchySchwarz inequality states that
|hu, vi| ≤ kukkvk.
Proof. If v = 0, the inequality is obvious true. Let v 6= 0. For any λ ∈ R, we have
0 ≤ ku + λvk2 = kuk2 + 2λhu, vi + λ2 kvk2 .
Consider this as a quadratic function of λ. Therefore we have
hu, vi2 ≤ kuk2 kvk2 ,
which implies the Cauchy-Schwarz inequality.
1.2
Convex functions
Definition 1.1 (Convex function). Let X ⊂ Rn be a convex set.
• A function f : X → R is call convex if for all x1 , x2 ∈ X and t ∈ [0, 1]
f (tx1 + (1 − t)x2 ) ≤ tf (x1 ) + (1 − t)f (x2 ).
• A function f : X → R is call strictly convex if for all x1 , x2 ∈ X and t ∈ (0, 1)
f (tx1 + (1 − t)x2 ) < tf (x1 ) + (1 − t)f (x2 ).
Example 1.1. Examples of convex functions:
• affine functions: f (x) = a · x + b, for a, b ∈ Rn ,
• exponential functions: f (x) = eax for any a ∈ R,
5
• Euclidean-norm: f (x) = kxk =
P
n
x2i
12
.
i=1
Verifying convexity of a differentiable function Note that in Definition 1.1, the
function f does not require to be differentiable. The following criteria can be used to verify
the convexity of f when it is differentiable.
(1) If f : X → R is differentiable, then it is convex if and only if
f (x) ≥ f (y) + ∇f (y) · (x − y) for all x, y ∈ X.
(2) If f : X → R is twice differentiable, then it is convex if and only if its Hessian ∇2 f (x)
is semi-positive definite for all x ∈ X.
Some important properties of convex functions
Lemma 1.2. Below are some important properties of convex functions.
(1) If f and g are convex functions, then so are m(x) = max{f (x), g(x)} and s(x) =
f (x) + g(x).
(2) If f and g are convex functions and g is non-decreasing, then h(x) = g(f (x)) is convex.
Proof. These properties can be directly proved by verifying the definition.
1.3
Jensen’s inequality
Theorem 1.3 (Jensen’s inequality). Let f be a convex function, 0 ≤ αi ≤ 1; i = 1, . . . , n
n
P
such that
αi = 1. Then for all x1 , . . . , xn , we have
i=1
f
n
X
αi xi ≤
n
X
αi f (xi ).
(1)
i=1
i=1
Proof. We prove by induction. The cases n = 1, 2 are obvious. Suppose that the statement
is
for n = 1, . . . , K − 1. Suppose that α1 , . . . , αK are non-negative numbers such that
Ptrue
K
i=1 αi = 1. We need to prove that
f
K
X
i=1
αi xi ≤
K
X
i=1
αi f (xi ).
6
P
Since K
i=1 αi = 1, at least one of the coefficients αi must be strictly positive. Assume
that α1 > 0. Then by the conducting assumptions, we obtain
f
K
X
K
X
αi xi = f α1 x1 + (1 − α1 )
i=1
i=2
≤ α1 f (x1 ) + (1 − α1 )f
≤ α1 f (x1 ) + (1 − α1 )
K
X
i=2
K
X
i=2
K
X
=
αi
xi
1 − α1
αi
xi
1 − α1
αi
f (xi )
1 − α1
αi f (xi ),
i=1
where we have used the fact that
K
P
i=2
αi
1−α1
= 1.
Remark 1.4 (Jensen’s inequality- probabilistic form). Jensen’s inequality can also
be stated using probabilistic form. Let (Ω, A, µ) be a probability space. If g is a real-valued
function that is µ− integrable and if f is a convex function on the real line, then
Z
Z
f ◦ g dµ.
g dµ ≤
f
Ω
Ω
Example 1.2 (Examples of Jensen’s inequality).
1) For all real numbers x1 , . . . , xn , it holds
n
X
xi
2
≤n
i=1
n
X
x2i .
i=1
Proof. Since f (x) = x2 is convex, we have
n
1 X
n
xi
2
=f
i=1
n
1 X
n
i=1
xi
n
n
X
1
1X 2
≤
f (xi ) =
x,
n
n i=1 i
i=1
which is the desired statement.
2) Arithmetic-Geometric (AM-GM) Inequality. Let (xi )1≤i≤n and (λi )1≤i≤n be real
number satisfying
n
X
xi ≥ 0, λi ≥ 0,
λi = 1.
i=1
7
Then, with the convention 00 = 1,
n
X
λ i xi ≥
i=1
In particular, taking λ1 = . . . = λn =
n
X
n
Y
xλi i .
(2)
i=1
1
n
yields
√
xi ≥ n 1/n x1 . . . xn .
i=1
Proof. By taking the logarithmic both sides, (2) is equivalent to
n
X
n
X
λi ln(xi ) ≤ ln
i=1
λi xi .
i=1
This is exactly Jensen’s inequality applying to the convex function f (x) = − ln(x).
1.4
Convex conjugate and Fenchel’s inequality
The convex conjugate of a function f : Rd → R is, f ∗ : Rd → R, defined by
f ∗ (y) = sup {x · y − f (y)}.
x∈Rd
Fenchel’s inequality: for x, y ∈ Rd , we have
f (x) + f ∗ (y) ≥ x · y.
Example 1.3. Examples of Fenchel’s inequality
1) f (x) =
|x|2
,
2
then f ∗ (y) = sup {x · y − |x|2 } =
x∈Rd
|y|2
.
2
Fenchel’s inequality reads
1
(|x|2 + |y|2 ) ≥ x · y.
2
2) f (x) = p1 |x|p where p > 1. Then f ∗ (y) = sup {x · y − p1 |x|p } = 1q |y|q where
x∈Rd
d
Fenchel’s inequality becomes: for x, y ∈ R and p, q > 1 such that
1 p 1 q
|x| + |y| ≥ x · y.
p
q
1
p
+
1
q
1
p
+
1
q
= 1.
= 1, we have
8
1.5
Some techniques to prove inequalities
In practises, three inequalities introduced in previous sections often do not appear in
standard forms. It is crucial to recognize them. In this section, through exercises we will
learn some techniques to prove inequalities.
Exercise 1. Let ai , bi ∈ R, bi > 0 for i = 1, . . . , n. Prove that
P
2
n
n
a
2
X
i
i=1
ai
.
≥ Pn
b
b
i
i
i=1
i=1
Proof. By the Cauchy-Scharz inequality we have
n
n
n
n
X
2 X
ai p 2 X a2i X √
bi ≤
bi .
ai =
b
bi
i=1
i=1
i=1 i
i=1
Exercise 2 (Problem 6, IMC 2015). Prove that
∞
X
√
n=1
1
< 2.
n(n + 1)
Proof. By the AM-GM Inequality we have
p
2(n + 1) − 1 = n + (n + 1) > 2 n(n + 1),
which implies that
p
2(n + 1) − 2 n(n + 1) > 1.
Dividing both sides by
√
n(n + 1) yields
√
1
1
1 √
√
.
<2
−
n(n + 1)
n
n+1
Hence by summing up over n we obtain
∞
X
n=1
√
∞ X
1
1
1 √ −√
<2
= 2.
n(n + 1)
n
n
+
1
n=1
Exercise 3. Let A, B, C be three angles of a triangle. Prove that
√
3
sin A + sin B + sin C ≤ 3
.
2
9
Proof. Consider the function f (x) = sin x. Since f 00 (x) = − sin2 (x) ≤ 0, f is concave.
Therefore,
√
A + B + C π
sin A + sin B + sin C ≤
3
≤ sin
= sin =
.
3
3
3
2
Exercise 4 (Problem 3, IMC 2016). Let n be a positive integer and a1 , . . . , an ; b1 , . . . , bn
be real number such that ai + bi > 0 for i = 1, . . . , n. Prove that
P
2
n
n
n
P
P
a
·
b
−
b
n
i
i
i
X
ai bi − b2i
i=1
i=1
≤ i=1
.
(3)
n
P
a
+
b
i
i
i=1
(ai + bi )
i=1
Proof. We notice the similar form of both sides of (3). For A, B ∈ R we have
2B 2
AB − B 2
=A−
A+B
A+B
(4)
Applying (4) for A = ai , B = bi , we get
LHS =
n
X
ai b i − b 2
i
i=1
ai + b i
=
n X
i=1
n
n
X 2b2
2b2i X
i
ai −
=
ai −
.
ai + b i
a + bi
i=1
i=1 i
P
ai , B = ni=1 bi , we obtain
2
P
n
n
b
2
X
i
i=1
RHS =
ai − Pn
.
i=1 ai + bi
i=1
Similarly now applying (4) for A =
Pb
i=1
Therefore (3) is equivalent to
P
n
i=1 bi
Pn
i=1
2
ai + b i
≤
n
X
i=1
b2i
ai + b i
By Cauchy-Scharz inequality we have
n
n
n
n
X
2 X
2 X
p
b
b2i X
√ i
bi =
ai + b i ≤
·
ai + bi ,
a + bi
ai + bi
i=1
i=1
i=1 i
i=1
which implies (5) as desired.
Exercise 5 (Problem 1, IMC 2010). Let 0 < a < b. Prove that
Z b
2
2
2
(x2 + 1)e−x dx ≥ e−a − e−b
a
(5)
10
Proof. By the AM-GM Inequality x2 + 1 ≥ 2x > 0 for any 0 < a ≤ x ≤ b, we have
Z b
Z b
b Z b
2
2
−x2
−x2
−x2 (x2 + 1)e−x dx.
(x + 1)e
dx ≥
2xe
dx = e =
a
a
a
a
Exercise 6 (Problem 6, IMC 2001). Let n be an integer and let fn (x) = sin x · sin(2x) ·
. . . sin(2n x). Prove that
2
|fn (x)| ≤ √ |fn (π/3)|.
3
Proof. Let g(x) = | sin x| · | sin(2x)|1/2 . We have
√ q
2
√
2 4
1/2
|
sin
x|
·
|
sin
x|
·
|
sin
x|
·
|
3
cos
x|
|g(x)| = | sin x| · | sin(2x)| = √
4
3
√
√
2 3 sin2 x + 3 cos2 x 3 2
=
= g(π/3).
≤ √
4
4
2
3
Note we have use the AM-GM inequality that
√
q
√
4
3 sin x + 3 cos x = sin x + sin x + sin x + ( 3 cos x) ≥ 4 | sin x|| sin x|| sin x|| 3 cos x|.
2
2
2
2
2
2
i
h
n
Therefore, let K = 1 − (−1/2) , we have
2
3
|fn (x)| = | sin x| · | sin(2x)| . . . | sin(2n x)|
1/2 3/4
1/2
1/2
1/2
= | sin x|| sin(2x)|
· | sin(2x)|| sin(4x)|
| sin(4x)|| sin(8x)|
×
K 1−K/2
× . . . × | sin(2n−1 x|| sin(2n x)|1/2
| sin(2n x)|
1−K/2
1/2
n−1 K
n
= g(x) · g(2x) . . . g(2 x) | sin(2 x)|
≤ g(π/3)g(x) · g(2π/3)1/2 . . . g(2n−1 π/3)K
= |fn (π/3)|/| sin(2n π/3)1−K/2 |
2 1−K/2
2
= |fn (π/3)| √
≤ |fn (π/3)| √ .
3
3
This is the desired inequality.
Exercise 7 (IMO 1995). Let a, b, c be positive real numbers such that abc = 1. Prove that
1
1
1
3
+ 3
+ 3
≥ .
+ c) b (c + a) c (a + b)
2
a3 (b
11
Proof. Let x = a1 , y = 1b , z = 1c . Then x, y, z are positive real numbers and xyz = 1. We
have
1
1
x2
=
=
.
1
1
a3 (b + c)
y+z
+1
x3
y
z
Similarly
1
y2
=
,
b3 (c + a)
z+x
1
z2
=
.
c3 (a + b)
x+y
By Cauchy-Scharz inequality (see also Exercise 1) and the Arithmetic-Geometric Inequality
we have
√
3 3 xyz
x2
y2
z2
(x + y + z)2
x+y+z
3
+
+
≥
=
≥
= .
y+z z+x x+y
2(x + y + z)
2
2
2
Exercise 8 (Problem 1, The 26th Annual Vojtech Jarnik International Competition 2016).
Let a, b, c be positive real number such that a + b + c = 1. Prove that
1
1 1
1 1
1
+
+
+
≥ 1728
a bc b ca c ab
Proof. By the AM-GM inequality we have
1
1
1
1
1
1
1
,
+
= +
+
+
≥ 4√
4
a bc
a 3bc 3bc 3bc
ab3 c3
and
a + b + c 3
1
=
≥ abc,
27
3
i.e.,
1
≥ 27.
abc
Therefore,
1
a
+
1 1
1 1
1 1 1 1 √
√
+
+
≥ 4√
4
4
4
4
4
bc b ca c ab
ab3 c3
ba3 c3
ca3 b3
64
=
≥ 64 × 27 = 1728.
abc
Exercise 9. Let x, y ∈ R, y > 0. Prove that
ex + y(ln y − 1) ≥ x · y.
Proof. Let f (x) = ex . Then for y > 0, we have f ∗ (y) = sup{x · y − ex } = y(ln y − 1). By
x∈R
Fenchel’s inequality we have f (x) + f ∗ (y) = ex + y(ln y − 1) ≥ x · y as desired.
12
Exercise 10 (IMO 2001). Let a, b, c be positive real numbers. Prove that
a
b
c
√
+√
+√
≥ 1.
a2 + 8bc
b2 + 8ca
c2 + 8ab
Proof. Since the expression on the LHS does not change when we replace (a, b, c) by
(ka, kb, kc) for arbitrary k ∈ R, we can assume that a + b + c = 1. Since x 7→ √1x is
convex for x > 0, applying Jensen’s inequality we obtain
√
a
b
c
1
+√
+√
≥p
a2 + 8bc
b2 + 8ca
c2 + 8ab
a(a2 + 8bc) + b(b2 + 8ca) + c(c2 + 8ab)
1
.
(6)
=√
3
3
a + b + c3 + 24abc
Next we show that
a3 +b3 +c3 +24abc ≤ 1 = (a+b+c)3 = a3 +b3 +c3 +3(a2 b+a2 c+b2 c+b2 a+c2 a+c2 b)+6abc,
(7)
which is equivalent to show that
(a2 b + a2 c + b2 c + b2 a + c2 a + c2 b) ≥ 6abc.
This is indeed true because of the AM-GM inequality
√
6
(a2 b + a2 c + b2 c + b2 a + c2 a + c2 b) ≥ 6 a2 b · a2 c · b2 a · b2 c · c2 a · c2 b = 6abc.
The desired inequality follows from (6) and (7).
2
Applications to matrix inequalities, estimating integrals and relative entropy
This section discusses several applications of inequalities introduced in previous sections. These applications include matrix inequalities in Sections 2.1 and 2.2, integral
estimations in Section 2.3 and relative entropy in Section 2.4.
2.1
Minkowski’s inequality for determinants
Lemma 2.1. [Vil03, Lemma 5.23] Let A and B be two non-negative symmetric n × n
matrices, and λ ∈ [0, 1]. Assume that A is invertible. Then
det(λA + (1 − λ)B)1/n ≥ λ(det A)1/n + (1 − λ)(det B)1/n ,
(8)
det(λA + (1 − λ)B) ≥ (det A)λ (det B)1−λ .
(9)
and
13
Proof. We first prove (8). Since det(λA) = λn det A, to prove (8) it is sufficient to prove
that
det(A + B)1/n ≥ (det A)1/n + (det B)1/n .
(10)
This inequality is known as Minkowski’s inequality. Since A+B = A1/2 (I+A−1/2 BA−1/2 )A1/2 =
A1/2 (I + C)A1/2 , where C = A−1/2 BA−1/2 is a non-negative symmetric matrix, and
det(M N ) = (det M )(det N ) we have
h
i
det(A+B)1/n = (det A)1/n det(I+C)1/n , (det A)1/n +(det B)1/n = (det A)1/n 1+(det C)1/n .
We next show that
det(I + C)1/n ≥ 1 + (det C)1/n
for all C non-negative and symmetric,
which will imply (10). Since C is non-negative symmetric, it has nonnegative eigenvalues
λ1 , . . . , λn and
det(I + C)
1/n
n
Y
=
(1 + λi )1/n
and (det C)
1/n
=
n
Y
i=1
1/n
λi .
i=1
We need to prove
n
n
Y
Y
1/n
1/n
(1 + λi ) ≥ 1 +
λi .
i=1
i=1
This inequality indeed holds true since using the Arithmetic-Geometric inequality 1.2 we
have
n
Q
1/n
1+
λi
Y 1 1/n Y λi 1/n
1X 1
1 X λi
i=1
+
≤
+
= 1.
=
n
Q
1 + λi
1 + λi
n
1 + λi n
1 + λi
1/n
(1 + λi )
i=1
We now prove (9). From (8) and the Arithmetic-Geometric inequality we have
det(λA + (1 − λ)B)1/n ≥ λ(det A)1/n + (1 − λ)(det B)1/n ≥ (det A)λ/n (det B)(1−λ)/n ,
which implies that
det(λA + (1 − λ)B) ≥ (det A)λ (det B)(1−λ) ,
that is (9) as expected.
2.2
Hadamard’s Inequality
Theorem 2.2 (Hadamard’s inequality).
| det A| ≤
n
Y
kai k,
i=1
where
{ai }ni=1
are (real vectors) columns of A and k · k is the Euclidean norm.
(11)
14
Proof. The inequality is obviously true If A is singular. Therefore, assume that the column
of A are linearly independent. By dividing each column by its length, the inequality is
equivalent to the special case where each column has length 1. Suppose that {bi }ni=1 are
unit column vectors and B has the {bi } as column. We need to show that
| det B| ≤ 1.
Indeed, let C = B T B. Then C is non-negative symmetric whose diagonal entries are all 1.
Thus trC = n. Let λ1 , . . . , λn ≥ 0 be the eigenvalues of C. By the arithmetic-geometric
inequality we have
(det B)2 = det C =
n
Y
λi ≤
n
1 X
i=1
n
λi
i=1
n
=
1
n
trC
n
= 1,
i.e., | det B| ≤ 1 as required.
Corollary 2.3. Let A be a n × n positive definite matrix. Then
| det A| ≤
n
Y
Aii .
(12)
i=1
Proof. Sine A is positive definite, there exists B such that A = B T B. Let bi are the
columns of B. We have
Y
Y
det A = (det B)2 ≤
kbi k2 =
Aii .
(13)
Proposition 2.4. [Dan01, Theorem 2.8] Let A and B be n × n positive definite matrices.
Then
m
(14)
n(det A · det B) n ≤ tr(Am B m )
for any positive integer m.
Proof. Let λ1 , . . . , λn > 0 be eigenvalues of A and suppose that A = P T ΛP where P is an
orthonormal matrix and Λ = diag(λ1 , . . . , λn ). Let b11 (m), . . . , bnn (m) denote the diagonal
elements of (P T BP )m . Since the trace operator is invariant under permutation, we have
1
1
tr(Am B m ) = tr(P T Λm P B m )
n
n
1
= tr(Λm P T B m P )
n
1
= tr(Λm (P T BP )m )
n
n
1X m
=
λ bii (m).
n i=1 i
15
Using the last identity and the arithmetic-geometric inequality, we have
Y
Y
m
1
1
tr(Am B m ) ≥
(λi ) n
(bii (m)) n
n
(15)
On the other hand from (12) we have
m
1
(det A · det B) n = (det Λm det P T B m P ) n ≤
YY
m
λin
Y
1
(bii (m)) n
Together with (15) we obtain the claimed inequality.
2.3
Hermite-Hadamard inequality
Theorem 2.5. Let f : [a, b] → R be a convex function. Then
f
a + b
2
1
≤
b−a
b
Z
f (x) dx ≤
a
f (a) + f (b)
.
2
(16)
Proof. Using a change of variable x = (1 − λ)a + λb we get
1
b−a
Z
1
b−a
Z
b
Z
1
f ((1 − λ)a + λb) dλ.
f (x) dx =
a
0
Similarly we also get
So we have
Z b
1
1
f (x) dx =
b−a a
2
b
Z
f (λa + (1 − λ)b) dλ.
f (x) dx =
a
Z
1
0
1
Z
f ((1 − λ)a + λb) dλ +
0
!
1
f (λa + (1 − λ)b) dλ .
0
Since f is convex on [a, b] we have
f
a + b
2
=f
λa + (1 − λ)b + (1 − λ)a + λb 2
f (λa + (1 − λ)b) + f ((1 − λ)a + λb)
≤
2
"
#
1
f (a) + f (b)
≤
λf (a) + (1 − λ)f (b) + (1 − λ)f (a) + λf (b) =
.
2
2
Integrating with respect to λ from 0 to 1 and using (17) we obtain (16).
(17)
16
Remark 2.6. We can apply Hermite-Hadamard inequality for two sub-intervals [a, a+b
]
2
,
b]
to
obtain
and [ a+b
2
a+b
Z
3a + b f (a) + f a+b
2
2
2
f
f (x) dx ≤
≤
4
b−a a
2
Z b
a + 3b + f (b)
f a+b
2
2
≤
.
f
f (x) dx ≤
4
b − a a+b
2
2
Summing up both sides we obtain
"
# Z
"
#
b
1
3a + b
a + 3b
1
a+b
f
+f
≤
f (x) dx ≤
f (a) + 2f
+ f (b) .
2
4
4
4
2
a
We can get further improvements by repeatedly decomposing to smaller sub-intervals.
Exercise 11. Let 0 < x < y be positive real numbers. Prove the following geometric,
logarithmic, and arithmetic mean inequalities
√
xy ≤
x+y
y−x
≤
.
ln(y) − ln(x)
2
(18)
Proof. Let f : R → R, f (x) = ex . Then f is convex. According to Hermite-Hadamard
inequality for arbitrary real numbers a < b we have
Z b
a+b
eb − ea
ea + eb
1
ex dx =
≤
.
e 2 ≤
b−a a
b−a
2
Taking a = ln x, b = ln y yields
e
ln x+ln y
2
=
√
xy ≤
y−x
x+y
≤
.
ln y − ln x
2
Exercise 12. Let n be a positive integer. Prove that
n−1
n−1 X
X
2k
1
k ≤ ln(n!) ≤
k+
.
k+2
2
k+1
k=1
k=1
(19)
1
2
Proof. Let f : [0, ∞) → R, f (x) = 1+x
. Then f is convex since f 00 (x) = (1+x)
3 > 0.
Applying the Hermite-Hadamard inequality for f with a = 0 and b = k, k is a positive
integer, we get
Z
1
1
1
1 1 k 1
≤
dx = ln(k + 1) ≤
1+
,
k 0 1+x
k
2
1+k
1 + k2
17
i.e.,
2k
1
k ≤ ln(k + 1) ≤
k+
.
2+k
2
1+k
Summing up over k from 1 to n − 1 we obtain
n−1
n−1
n−1 X
X
X
2k
k 1
≤
k+
ln(k + 1) = ln(n!) ≤
2+k
2
1+k
k=1
k=1
k=1
as claimed.
2.4
Relative entropy
Definition 2.7 (Relative entropy). Let P and Q be two probability measures on a space
X. Suppose that P is absolutely continuous with respect to Q. The relative entropy of P
with respect to Q is given by
Z
dP dP
dQ.
H(P ||Q) =
log
dQ dQ
X
In particular,
1. (Discrete probability measures) If X P
is a finite set
P with |X| = n. Let P = (p1 , . . . , pn )
and Q = (q1 , . . . , qn ) with pi , qi ≥ 0, ni=1 pi = ni=1 qi = 1 then
H(P ||Q) =
n
X
pi log
i=1
p i
.
qi
2. (Continuous probability measures) Let P (dx) = p(x) dx and Q(dx) = q(x) dx are
two probability measure on X = Rd . Then
Z
p(x) p(x) log
H(P ||Q) =
dx.
q(x)
Rd
Lemma 2.8 (Log-Sum inequality). Let a1 , . . . , an and b1 , . . . , bn be non-negative real numbers. Prove that
Pn
n
n
a X
X
ai
i
ai log
≥
.
(20)
ai log Pi=1
n
b
b
i
i
i=1
i=1
i=1
Pn
Pn
In particular, if i=1 ai = i=1 bi = 1 then (comparing with Exercise below)
n
X
i=1
ai log
a i
bi
≥ 0.
18
Proof. Consider x 7→ f (x) = x log x for x > 0. Then f is convex. Let αi =
By Jensen’s inequality we have
Pnbi
i=1 bi
, xi =
ai
.
bi
n
n
X
X
f(
αi xi ) ≤
αi f (xi ),
i=1
i=1
i.e.,
n
X
n
n
X
X
αi xi · log
α i xi ≤
αi xi log xi .
i=1
i=1
i=1
Substituting values of αi and xi , we obtain
Pn
ai
Pn
Pn
i=1 ai log bi
a
a
i
i
Pi=1
Pn
log Pi=1
≤
,
n
n
b
i
i=1
i=1 bi
i=1 bi
which is equivalent to the log-sum inequality as required.
Lemma 2.9 (Positivity of the relative entropy). Prove that H(P ||Q) ≥ 0.
Proof. 1) Since the function x 7→ ϕ(x) = x log x is convex for x > 0, by Jensen’s inequality
1.4, we have
Z Z dP
dP
dQ ≥ ϕ
dQ = ϕ(1) = 0.
H(P ||Q) =
ϕ
dQ
X dQ
X
Lemma 2.10 (Convexity of the relative entropy). Let P1 , P2 , Q1 , Q2 be probability measures
on a finite set X. Prove that for any 0 ≤ λ ≤ 1, we have
H(λP1 + (1 − λ)P2 ||λQ1 + (1 − λ)P2 ) ≤ λH(P1 ||Q1 ) + (1 − λ)H(P2 ||Q2 ).
(21)
Proof. For each x ∈ X, applying the Log-Sum inequality for a1 = λp1 (x), a2 = (1 −
λ)p2 (x), b1 = λq1 (x), b2 = (1 − λ)p2 (x) we have
(λp1 + (1 − λ)p2 ) log
λp (1 − λ)p λp + (1 − λ)p 1
2
1
2
≤ λp1 log
+ (1 − λ)p2 log
λq1 + (1 − λ)q2
λq1
(1 − λ)q2
p1
p2
= λp1 log + (1 − λ)p2 log .
q1
q2
Summing over x ∈ X we obtain (21).
Lemma 2.11 (Duality formula for the relative entropy). Let P = (p1 , . . . , pn ), Q =
(q1 , . . . , pn ) be probability measures on a finite set X = {1, . . . , n}. Prove that
X
n
n
X
xi
e qi .
H(P ||Q) =
sup
xi pi − log
x=(x1 ,...,xn )
i=1
i=1
19
Proof. Let f (p) := H(P ||Q) =
f (p).
Pn
pi log pqii . We will find the convex conjugate f ∗ (x) of
i=1
n
n
X
X
pi
f (x) = sup{
xi p i −
pi log }
qi
p
i=1
i=1
∗
P
where the sup is taken over all S := {p = (p1 , . . . , pn ) : pi ≥ 0, ni=1 pi = 1}. Since S is
compact and the function inside the supremum is continuous, the supremum is achieved
at some p which satisfies
pi
xi − 1 − log + λ = 0,
qi
xi −1+λ
for
. Since
Pnsome (Lagrange multipliers) λ ∈ R. Solving this equality, we get pi = qi e
p
=
1,
we
must
have
i=1 i
1=
n
X
qi exi −1+λ = eλ−1
which gives λ = 1 − log
∗
f (x) =
n
xi
i=1 qi e
X
qi e
q i e xi ,
i=1
i=1
P
n
X
. Therefore,
xi −1+λ
xi −
i
n
X
qi exi −1+λ (xi − 1 + λ)
i=1
= −(λ − 1)
n
X
qi exi −1+λ
i=1
= −(λ − 1) = log
n
X
q i e xi .
i=1
∗
Hence H(P ||Q) = f (p) = supx {x·p−f (x)} = supx=(x1 ,...,xn )
Pn
i=1
xi pi −log
P
n
xi
i=1 e qi
Remark 2.12. The continuous counter-part of the duality formula for the relative entropy
is
Z
Z
H(P ||Q) = sup
f ∈Cb (X)
ef dQ .
f dP − log
X
X
Lemma 2.13 (Pinsker’s
inequality). Let P, Q be probability measures on a finite set X.
P
Let kP − Qk1 = x∈X |p(x) − q(x)| be the total variation distance between P and Q. Prove
that
1
H(P ||Q) ≥ kP − Qk21 .
2
.
20
Proof. Let A := {x ∈ X : p(x) ≥ q(x)}. It follows that
|P − Qk1 =
X
(p(x) − q(x)) +
x∈A
X
(q(x) − p(x)) = 2
x∈X−A
X
(p(x) − q(x)) = 2(P (A) − Q(A)).
x∈A
By the Log-Sum inequality we have
H(P ||Q) =
X
p(x) log
x∈X
X
p(x) X
p(x)
p(x)
=
p(x) log
+
p(x) log
q(x) x∈A
q(x) x∈X−A
q(x)
P
P
X
X
p(x)
x∈A p(x)
≥
p(x) log P
+
p(x) log Px∈X−A
x∈A q(x)
x∈X−A q(x)
x∈A
x∈X−A
P (X − A)
P (A)
+ P (X − A) log
Q(A)
Q(X − A)
1
≥ 2(P (A) − Q(A))2 = kP − Qk21 ,
2
= P (A) log
where we have used that
a log
a
1−a
+ (1 − a) log
≥ 2(a − b)2
b
1−b
for 0 ≤ a, b ≤ 1.
References
[Dan01] F. M. Dannan. Matrix and operator inequalities. Journal of Inequalities in Pure
and Applied Mathematics, Volume 2, Issue 3, Article 34, 2001.
[Vil03] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2003.