Journal of Theoretical Probability, Vol. 16, No. 1, January 2003 (© 2003)
An Inequality for Tail Probabilities of Martingales
with Differences Bounded from One Side
V. Bentkus 1
Received May 15, 2001; revised December 6, 2001
Let Mn =X1 + · · · +Xn be a martingale with differences Xk =Mk − Mk − 1
bounded from above such that P{Xk [ ek }=1 with some non-random positive ek .
Let the conditional variance t 2k =E(X 2k | X1 ,..., Xk − 1 ) satisfy t 2k [ s 2k with probability one, where s 2k are some non-random numbers. Write s 2k =max{e 2k , s 2k }
and s 2=s 21 + · · · +s 2n . We prove the inequality
P{Mn \ x} [ min{exp{ − x 2/(2s 2)}, c0 (1 − F(x/s))}
with a constant c0 =1/(1 − F(`3)) [ 25.
KEY WORDS: Probabilities of large deviations; martingale; bounds for tail
probabilities; inequalities; bounded differences and random variables; measure
concentration phenomena; product spaces; Lipschitz functions; Hoeffding’s
inequalities; Azuma’s inequality.
1. INTRODUCTION AND RESULTS
Our attention to this topic was attracted by a seminal paper of Hoeffding, (4)
where the classical inequalities for probabilities of large deviations for sums
of bounded independent random variables (as well as for martingales) have
been obtained. These inequalities have not been improved almost 40 years
long with exceptions of papers of Pinelis, (9, 10) Talagrand, (13) and Bentkus. (1, 2)
In this paper we extend and improve Hoeffding’s Theorem 2. The result is
new already for sums of independent random variables. Most probably (up
to an absolute constant) our result is the best possible that can be achieved
using normal like tails as an upper bound.
1
Vilnius Institute of Mathematics and Informatics, Akademijos 4, 232600 Vilnius, Lithuania.
E-mail: [email protected]
161
0894-9840/03/0100-0161/0 © 2003 Plenum Publishing Corporation
162
Bentkus
Let F0 =” … F1 … · · · … Fn … F be a family of s-algebras of a measurable space (W, F). Let Mn =X1 + · · · +Xn be a martingale (we define
M0 =0) with differences Xj =Mj − Mj − 1 bounded from above by some
non-random ej \ 0 such that
P{Xj [ ej }=1,
for j=1,..., n.
(1.1)
Assume that the conditional variance t 2j =E(X 2j | Fj − 1 ) is bounded from
above, that is, that P{t 2j [ s 2j }=1, for some non-random s 2j . Write
s 2j =max{s 2j , e 2j },
s 2=s 21 + · · · +s 2n .
Let I(x)=1 − F(x)=> .
x j(t) dt be the survival function of the standard
normal distribution with the density j(t)=(2p) −1/2 exp{ − t 2/2}. Introduce
D(x)=1, for x [ 0, and
D(x)=min{exp{ − x 2/2}, cI(x)},
for x \ 0,
where c is an absolute constant. Our result is the following upper bound
for the tail probabilities of Mn .
Theorem 1.1. Let 2 [ c [ c0 with c0 =1/I(`3). Then, for x ¥ R, we
have
P{Mn \ x} [ D(x/s).
(1.2)
Theorem 1.1 complements the following inequalities (1.5). For the differences Xj consider the condition: Xj have non-random sizes 2ej such that
P{aj − ej [ Xj [ aj +ej }=1,
for j=1,..., n,
(1.3)
where aj are some random variables measurable with respect to the
s-algebra Fj − 1 . Since Mj =X1 + · · · +Xj , the condition (1.3) is equivalent
to
P{bj − ej [ Mj [ bj +ej }=1,
for j=1,..., n,
(1.4)
where bj are random variables measurable with respect to the s-algebra
Fj − 1 .
Assume that one of conditions (1.3) or (1.4) is fulfilled. Then (see
Bentkus, (2) for x ¥ R, we have
P{Mn \ x} [ D(x/e),
P{Mn > x} \ 1 − D(−x/e),
where e 2=e 21 + · · · +e 2n , and where D is defined with c=435.
(1.5)
Tail Probabilities
163
The upper bound c [ c0 [ 25 for c in (1.2) is not optimal and can be
improved. Extending the methods of Bentkus, (1) one can show that
sup sup P{Mn \ x} \ 2I(x/e),
n
Mn
where sup is taken over all Bernoulli type martingales Mn such that
e1 = · · · =en and P{|Xj | [ ej }=1, and all n (we call a martingale of
Bernoulli type if the differences conditionally have two point distributions).
Hence, the constant c in (1.2) has to satisfy c \ 2.
In the very special case of independent identically distributed Rademacher
random variables X1 ,..., Xn of size 2/`n such that P{X1 =−1/`n}=
P{X1 =1/`n}=1/2, we have obviously s 2=1 and
sup P{Mn \ x} \ I(x),
n
which, for larger x, differs by the factor 2 [ c [ c0 compared to (1.2). Thus,
Theorem 1.1 shows that the martingale type dependence does not influence
much the bounds for tail probabilities compared to the independent and
i.i.d. cases.
Theorem 1.1 and (1.5) improve and extend an inequality of Hoeffding, (4)
which is proved for martingales satisfying the condition (1.3) with nonrandom aj . In our notation his inequality (Theorem 2 in Hoeffding (4)) reads
as
P{Mn \ x} [ exp{ − x 2/(2e 2)},
x \ 0.
(1.6)
In the case of the condition (1.3) with random aj , the inequality (1.6) is
contained in McDiarmid (6) as Theorem 6.7. Therefore, the new component
in the bounds (1.2) and (1.5) is the inequality P{Mn \ x} [ cI(x/s). For
larger x \ c1 s, the bounds (1.2) and (1.5) are better than (1.6) since
I(x) [ c2 x −1 exp{ − x 2/2}, for x > 0 (here cj are absolute constants). Hence,
the improvement is just the factor s/x. Theorem 1.1 and (1.5) extend a
result of Pinelis, (10) who proved P{Mn \ x} [ 4.48I(x/e) for martingales
assuming that aj =0, for all j. The constant in Pinelis inequality is better
than our constants. However, the values of c in (1.2) and (1.5) can be considerably improved. Hence, compared to the bound of Pinelis, we get rid of
the symmetrical boundedness assumption aj =0, which is very important in
applications to the measure concentration, and instead of two-sided boundedness of the difference we require only the boundedness from above.
The inequalities of type (1.6) (and hence, (1.2) and (1.5) as well) have
extensive applications in combinatorics, operational research, computer
164
Bentkus
science, random graphs (see McDiarmid (6)), in the theory of Banach spaces
(see Milman and Schechtman (7)). The result applies to the measure concentration for separately Lipschitz functions on product spaces (see Bentkus (2)
for applications of (1.5), as well as for some non-linear statistics. In these
models the bounds implied are the sharpest among the known ones. The
space for improvements is restricted (cf. the discussion above). For statistical applications optimal bounds for finite (that is, fixed) n are of interest
(see Bentkus and Van Zuijlen (3)). In this sense our result is not optimal and
can be improved extending the methods of Bentkus. (1)
The history of inequalities for tail probabilities is very rich (see,
for example, books Petrov, (8) Shorack and Wellner (11)). The names of
Chernoff, Bennett, Prokhorov, Hoeffding, and other come to mind.
Our methods are different from those of Hoeffding, (4) Pinelis, (9) and
Talagrand. (13) The proof of Theorem 1.1 is based on induction in n and
multiply applications of the Chebyshev inequality. Chebyshev’s inequality
we understand as > f [ > g if f [ g, and in this paper we always apply it
with a quadratic function g. Our methods are well designed for applications where we have martingale type dependence.
For independent random variables X1 ,..., Xn satisfying (1.3) with
aj =0 and ej =1, for all j, a bound P{Mn \ x} [ Bn (x) with some function
Bn (x) essentially smaller than cI(x/e), e=`n, was obtained earlier in
Bentkus. (1) One can show that the bound Bn (x)! a! a is sharp on martingales, for integer x and all n. Heuristically, the basic ideas and methods are
already contained in this paper.
2. THE PROOF
By the definition of c0 , we have c0 I(`3)=1. Hence, c0 I(x) \ 1, for
x [ `3, and the function c0 I is strictly decreasing.
Below we prove the following bounds.
Theorem 2.1. For x ¥ R, we have
P{Mn \ x} [ c0 I(x/s).
Theorem 2.2. For x \ 0, we have
P{Mn \ x} [ exp{ − x 2/(2s 2)}.
Proof of Theorem 1.1. It suffices to combine the bounds of Theorems
2.1 and 2.2.
i
Tail Probabilities
165
Proof of Theorem 2.1. We apply induction in n. Without loss of generality we assume that ej > 0, for all j=1,..., n. Rescaling if necessary, we
can assume as well that e 21 =1. Throughout we write
s 2=r 2+s 2,
where r 2=s 21
and
s 2=s 22 + · · · +s 2n .
Due to the assumption e1 =1, we have P{X1 [ 1}=1 and s 2 \ r 2=s 21 \ 1.
We have to prove that
P{Mn \ x} [ c0 I(x/s),
for x \ s `3.
(2.1)
Indeed, for x [ s `3 the trivial bound P{Mn \ x} [ 1 yields (2.1) since for
x [ s `3 it holds 1 [ c0 I(x/s) due to the definition of c0 and s \ 1.
The Case n=1. In essence, in this case there is nothing to prove.
Notice that now M1 =X1 and P{X1 \ x} [ I{x [ 1} since X1 is bounded
from above by 1, and where I{A} is the indicator function of event A. The
definition of c0 and s \ 1 show that I{x [ 1} [ c0 I(x/s), which concludes
the proof for n=1.
The Case n > 1. By the induction assumption we have
P{Zn − 1 \ x} [ c0 I(x/b),
for all x ¥ R,
(2.2)
for any martingale sequence Z0 =0, Z1 ,..., Zn − 1 such that
P{Zk − Zk − 1 [ ak }=1,
P{E(Z 2k | Fk − 1 ) [ c k }=1
with some non-random ak and c k , where
b 2=max{a 21 , c 21 }+ · · · +max{a 2n − 1 , c 2n − 1 }.
Notice that s > 0. Using (2.2) and conditioning on X1 , we have
P{Mn \ x}=EP{X2 + · · · +Xn \ x − X1 | X1 } [ Ec0 I
1 x −sX 2 ,
since, for given X1 , the sequence
Z0 =0, Z1 =X2 , ... , Zn − 1 =X2 + · · · +Xn
1
166
Bentkus
is a martingale sequence with differences such that
P{Zk − Zk − 1 [ ek+1 }=1,
P{E(Z 2k | Fk − 1 ) [ sk+1 )}=1.
To simplify notation, write
+(t)=c0 I
1 x −s t 2 ,
Ec0 I
1 x −sX 2=E+(X ).
1
1
Notice that + depends on x and other parameters, which is not reflected in
the notation. We have to prove that E+(X1 ) [ c0 I(x/s).
Let us note that
(i) the function t W +œ(t) is positive and strictly increasing for t [ 1.
Indeed, introducing the variable z=(x − t)/s such that z \ (x − 1)/s,
the assertion (i) is equivalent to
(ii) the function z W Iœ(z) is positive and strictly decreasing for
z \ (x − 1)/s.
We have Iœ(z)=zj(z). The function z W zj(z) is positive and strictly
decreasing for z \ 1. Hence, to prove (i) it suffices to verify that (x − 1)/s
\ 1, or, equivalently, that x 2 \ s 2+2s+1. Using r \ 1 and 2rs [ r 2+s 2, we
have
1+2s+s 2 [ r 2+2rs+s 2 [ 2s 2 [ x 2
since by our assumption x 2 \ 3s 2. This proves (i).
Due to (i), we can apply the following Lemma 2.3 (we give its proof
later).
Lemma 2.3. Let +: (−., 1] Q [0, .) be a function such that the
second derivative t W +œ(t) is a positive strictly increasing function of t [ 1.
Let z < 1. Then the quadratic polynomial
P(t)=at 2+bt+c,
where
a=(z − 1) −2 ((z − 1) +Œ(z) − +(z)++(1)),
b=(z − 1) −2 ((1 − z 2) +Œ(z)+2z+(z) − 2z+(1)),
c=(z − 1) −2 ((z 2 − z) +Œ(z)+(1 − 2z) +(z)+z 2+(1)),
satisfies P(t) \ +(t), for all t [ 1.
Tail Probabilities
167
Writing z=−r 2, the quadratic polynomial P(t)=at 2+bt+c from
Lemma 2.3 satisfies
+(t) [ P(t),
for all t [ 1.
(2.3)
Using (2.3) and EX1 =0, a small elementary calculation shows that
1 x+rs 2+c pI 1 x −s 1 2
2
E+(X1 ) [ EP(X1 )=c0 (1 − p) I
0
(2.4)
with
1
1 − p=
1+r 2
and
r2
p=
.
1+r 2
The inequality (2.4) reduces the proof of (2.1) to checking that
def
1 x+rs 2+pI 1 x −s 1 2 − I 1 sx 2 [ 0.
2
D= (1 − p) I
(2.5)
Introduce a variable, say y, such that 0 [ y [ r/s. Write h=`1 − y 2 and
consider the function
def
w(y)= (1 − p) I
1 x+rsy
2+pI 1 x −shsy/r 2 − I 1 sx 2 .
sh
It is clear that w(0)=0 and w(r/s)=D. To simplify the notation, write
y=x/s. The condition x 2 \ 3s 2 is equivalent to y 2 \ 3. Then
w(y)=(1 − p) I
1 y+ry
2+pI 1 y −hy/r 2 − I(y).
h
(2.6)
Let us prove (2.5). Using (2.6), we have
wŒ(y)=−(1 − p) j
yy+r
y − y/r 2 yy − 1/r
1 y+ry
2
− pj 1
.
h
h
h
h
3
3
(2.7)
A bit later we prove that wŒ(y) [ 0, for 0 [ y [ r/s. Therefore w(y) is a
decreasing function of y \ 0, and to prove (2.5) (that is, to show that
w(r/s)=D [ 0) it suffices to check that w(0)=0, which holds by the definition of w. The proof of (2.5) is completed.
168
Bentkus
Let us prove that wŒ(y) [ 0. Using j(t)=(2p) −1/2 exp{ − t 2/2} and
(2.7), the inequality wŒ(y) [ 0 is equivalent to
(yy+r) exp{E}+r 2(yy − 1/r) \ 0,
(2.8)
where
E=−
1 2
1
2
y2 1
yy 1
+r + 2 2 − r 2 .
2
h r
2h r
If y=0 then (2.8) is just the equality 0=0. If yy − 1/r \ 0 then (2.8)
is obviously fulfilled. Hence, it suffices to prove (2.8) for y such that
0 < y < 1/ry. Write
C=yy+r,
D=r − yyr 2.
Then (2.6) is equivalent to the inequality
inequality
C
D
(2.9)
exp{E} \ 1 and therefore to the
def
v= ln C − ln D+E \ 0.
(2.10)
A bit later we prove that “y v \ 0, that is, that y W v(y) is an increasing
function. This proves (2.10) since v(0)=0.
Let us prove that “y v(y) \ 0. We have
“y C=y,
“y D=−yr 2,
(2.11)
and
“y E=−
1 2 1
2
y(1+y 2) 1
y 1
+r + 4 2 − r 2 .
4
h
r
h r
It is clear that
“C “D
“y v= y − y +“y E
C
D
and
sign(“y v)=sign(D“y C − C“y D+CD“y E),
(2.12)
Tail Probabilities
169
where the sign function is defined as sign(z)=−1, for z < 0, and
sign(z)=1, for z > 0, and sign(z)=0, if z=0. Using (2.9) and (2.11), the
relation (2.12) is equivalent to
sign(“y v)=sign(y+yr 2+B“y E),
(2.13)
where
B=(yy+r)(1 − yyr)=r+yy(1 − r 2) − y 2ry 2.
We have
1+r 2
“y E= 2 4 A,
rh
def
A= − yr+y(1 − r 2) − y 2ry.
Hence, the relation (2.13) is equivalent to
sign(“y v)=sign(yr 2h 4(1+r 2)+B(1+r 2) A)
=sign(yr 2h 4+BA).
Write
B=r+yg,
A=−yr+yt
with
g=y(1 − r 2) − yry 2,
t=1 − r 2 − yry.
Then
BA=−yr 2+y(rt − yrg+ytg)
and, using h 2=1 − y 2, we obtain
yr 2h 4+BA=yQ,
def
Q= − 2yr 2y+yr 2y 3+rt − yrg+ytg.
Hence, sign(“y v)=sign(Q). A small elementary calculation shows that
Q=ra0 +yya1 +2y 2ry 2a2 +y 3yr 2(1+y 2),
(2.14)
where
a0 =(r 2 − 1)(y 2 − 1),
a1 =r 2y 2+1+r 4 − 5r 2,
a2 =r 2 − 1.
170
Bentkus
We assume that r \ 1 and y 2 \ 3. Therefore we have
a1 \ 1+r 4 − 2r 2=(r 2 − 1) 2 \ 0.
Other terms in (2.14) are clearly non-negative. This means that Q \ 0 and
i
therefore “y v \ 0, proving the theorem.
Proof of Theorem 2.2. The proof of this theorem is quite a standard
one. Let h \ 0. Replacing the indicator function t W I{t \ x} by the function t W exp{h(t − x)}, we have
P{Mn \ x} [ exp{ − hx} E exp{hMn }.
(2.15)
Below we prove that
E exp{hMn } [ exp{h 2s 2/2}.
(2.16)
Choosing h=sx2 and combining (2.15) and (2.16), we get P{Mn \ x} [
2
exp{ − 2sx 2}, which proves the theorem.
It remains to prove (2.16). A bit later we prove that
n
E exp{hMn } [ D Fk ,
k=1
Fk =(1 − pk ) exp{ − hek gk }+pk exp{hek }
(2.17)
with gk =max{1, s 2k /e 2k } and pk =gk /(1+gk ). The inequality (2.17) yields
(2.16). Indeed, we can apply to each factor Fk in (2.17) the estimate of
the following Lemma 2.4 with g=gk and g=hek (we give the proof of
Lemma 2.4 later).
Lemma 2.4. For g \ 0 and g \ 1, we have
3 4
1
g
g 2g
exp{ − gg}+
exp{g} [ exp
.
1+g
1+g
2
Let us prove (2.17). It suffices to show that
E exp{hMk } [ Fk E exp{hMk − 1 },
for k=1,..., n.
(2.18)
Using induction, (2.18) yields (2.17). Conditioning on Mk − 1 , we have
E exp{hMk } [ E exp{hMk − 1 } E(exp{hXk } | Mk − 1 ).
(2.19)
Tail Probabilities
171
Write Z=Xk /ek . Then P{Z [ 1}=1 and
E(exp{hXk } | Mk − 1 )=E (+(Z) | Mk − 1 )
with +(t)=exp{hek t}.
(2.20)
The function t W +(t) has a strictly increasing positive second derivative.
Therefore we can apply Lemma 2.3 with z=−gk . Let P(t)=at 2+bt+c be
the polynomial given in Lemma 2.3. Then +(t) [ P(t), and we have
E(+(Z) | Mk − 1 ) [ E(P(Z) | Mk − 1 )=aE(Z 2 | Mk − 1 )+c
=ae k−2 E(X 2k | Mk − 1 )+c [ ae k−2 s 2k +c.
(2.21)
A small calculation shows that ae k−2 s 2k +c=Fk . Hence, the relations (2.19)–
(2.21) together yield (2.18), which concludes the proof of Theorem 2.2. i
Proof of Lemma 2.3. It is easy to check that
P(z)=+(z),
PŒ(z)=+Œ(z),
P(1)=+(1).
(2.22)
Furthermore, we have
Pœ(z) − +œ(z)=(z − 1) −2 (2(z − 1) +Œ(z) − 2+(z)+2+(1) − (z − 1) 2 +œ(z))
=2E(1 − y)(+œ(z+(1 − z) y) − +œ(z)),
(2.23)
by an application of the Taylor expansion
+(1)=+(z)+(1 − z) +Œ(z)+(1 − z) 2 E(1 − y) +œ(z+(1 − z) y),
where y is a random variable uniformly distributed in the interval [0, 1].
The function z W +œ(z) is strictly increasing function of z [ 1. Since
z+(1 − z) y > z, for y ] 0 and z < 1, the expression under the expectation
sign in (2.23) is positive, and therefore Pœ(z) − +œ(z) > 0. This means that
the function t W P(t) − +(t) is positive for t sufficiently close to z such that
t ] z.
Now we can prove that P(t) − +(t) \ 0, for t [ 1. Assume that the
inequality does not hold. Then there exits a t0 < 1 such that P(t0 ) − +(t0 )
< 0. Due to (2.22) and positiveness of P(t) − +(t) for t close to z, the function t W P(t) − +(t) has at least 4 zeroes since by (2.22) it has 3 zeroes and
at least one additional zero is guarantied by P(t0 ) − +(t0 ) < 0. Therefore the
function Pœ(t) − +œ(t)=2a − +œ(t) has at least 2 zeroes, which contradicts
to the assumption that t W +œ(t) is a strictly increasing function.
i
172
Bentkus
Proof of Lemma 2.4. The inequality we have to prove is equivalent to
def
v(g)= exp{ − g(g+g 2/2)}+g exp{g − gg 2/2} − 1 − g [ 0.
(2.24)
Below we prove that vŒ(g) [ 0, for g \ 0. Since v(0)=0, this proves (2.24)
and the lemma.
Let us prove that vŒ(g) [ 0. Using (2.24), the inequality vŒ(g) [ 0 is
equivalent to
− (1+g) exp{ − gg}+(1 − gg) exp{g} [ 0.
(2.25)
The inequality (2.25) clearly is fulfilled if 1 − gg [ 0. Hence, we have to
verify (2.25) only for 0 [ g < 1/g. For such g, the inequality (2.25) is
equivalent to
1+g
exp{ − gg − g} \ 1.
1 − gg
or to the inequality
def
u(g)= ln(1+g) − ln(1 − gg) − gg − g \ 0.
(2.26)
To prove (2.26), it suffices to verify that uŒ(g) \ 0. Elementary calculations
show that uŒ(g) \ 0 is equivalent to g+gg \ 1. The inequality g+gg \ 1
holds for all g \ 0 since we assume that g \ 1. This proves (2.26) and the
lemma.
i
ACKNOWLEDGMENT
Research supported by Max Planck Institute for Mathematics, Bonn.
REFERENCES
1. Bentkus, V. (2001). An inequality for large deviation probabilities of sums of bounded
i.i.d.r.v. Lithuanian Math. J. 41, 144–153.
2. Bentkus, V. (2001). On measure concentration for separately Lipschitz functions in
product spaces, To appear in Israel J. Math.
3. Bentkus, V., and van Zuijlen, M. (2001). Upper confidence bounds for mean, submitted to
Lithuanian Math. J.
4. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
J. Amer. Statist. Assoc. 58, 13–30.
5. Ledoux, M. (1999). Concentration of measure and logarithmic Sobolev inequalities,
Séminaire de Probabilités, XXXIII, pp. 120–216, Lecture Notes in Math., Springer, Berlin,
Vol. 1709.
Tail Probabilities
173
6. McDiarmid, C. (1989). On the method of bounded differences, Surveys in combinatorics,
1989 (Norwich, 1989), Cambridge University Press, Cambridge, pp. 148–188, London
Math. Soc. Lecture Note Ser., Vol. 141.
7. Milman, V. D., and Schechtman, G. (1986). Asymptotic theory of finite-dimensional
normed spaces, Lecture Notes in Mathematics, Springer, Vol. 1200.
8. Petrov, V. V. (1975). Sums of independent random variables, info Ergebnisse der
Mathematik und ihrer Grenzgebiete, Band 82, Springer-Verlag, New York/Heidelberg,
346.
9. Pinelis, I. (1994). Extremal probabilistic problems and Hotelling’s T 2 test under a symmetry assumption. Ann. Stat. 22(4), 357–368.
10. Pinelis, I. (1998). Optimal tail comparison based on comparison of moments. In High
Dimensional Probability (Oberwolfach, 1996), Progr. Probab., Vol. 43, Birkhäuser, Basel,
pp. 297–314.
11. Shorack, G. R., and Wellner, J. A., (1986). Empirical processes with applications to statistics, info Wiley Series in Probability and Mathematical Statistics: Probability and
Mathematical Statistics, Wiley, New York, 938.
12. Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product
spaces. Inst. Hautes Études Sci. Publ. Math. 81, 73–205.
13. Talagrand, M. (1995). The missing factor in Hoeffding’s inequalities. Ann. Inst. H.
Poincaré Probab. Statist. 31 (4), 689–702.
© Copyright 2026 Paperzz