Hoeffding, W.; (1965)On probabilities of large deviations."

I
I.
I
I
I
I
I
I
ON PROBABILITIES OF LARGE DEVIATIONS
by
Wassily Hoeffding
University of North Carolina
Institute of Statistics Mimeo Series No. 443
September 1965
I.
I
I
I
I
I
I
I
Ie
I
I
This research was supported in part by the Mathematics
Division of the Air Force of Scientific Research. Part
of the work was done while the author was a visiting
professor at the Research and Training School of the
Indian Statistical Institute in Calcutta under the United
National Technical Assistance Program.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
ON PROBABILITIES OF IARGE DEVIATIONS 1
by
Wassily Hoeffding
University of North Carolina
Summary.
The paper is concerned with the estimation of the probability
that the empirical distribution of
n
independent, identically distributed
random vectors is contained in a given set of distributions.
are a survey of some of the literature on the sUbject.
Section s 1-3
In section
4
the
special case of multinomial distributions is considered and certain results on
the order of magnitude of the probabilities in question are obtained.
1.
The general problem.
Let
x ,X , ••• be a sequence of independent
l 2
m-dimensional random vectors "lith common distribution ±'unction (d. f.) F. If
we want to obtain general results on the behavior of the probability that
x(n) =
(~,
... , X ) is contained in a set A* when
n
impose some restrictions on the class of sets.
n
is large, we must
One interesting class consists
of the sets A* which are symmetric in the sense that if X(n)
is in A* then
every permutation (X. , ... , X. ) of the n component vectors of x(n) is in
,
Jl
In
A • The restriction to symmetric sets can be motivated by the fact that tmder
*
our assumption all permutations of x(n) have the same distribution.
Let
= Fn (. Ix(n» denote the empirical d.f. of x(n) '. The empirical
n
distribution is invariant under permutations of x(n), and for any symmetric set
F
*
A there is at least one set A in the space
that the events
x(n) € A* and F ( •
n
event '.fill be denoted by F
n
€
g,
of m-dimensional d. f. r s such
I x(n»€ A are
A for short.
equivalent.
The latter
Thus when we restrict ourselves
to symmetric sets, we may as well consider the probabilities P CF n € A} ,
where A
= An
may depend on n.
(It is understood that A eGis such that
=
I
the set ( x(n) IF (.
n
I x(n»€ A}
is measurable.)
Since F
n
converges to F
in a well-known sense (Glivenko-Cantelli theorem), we may say that p
J\}
CFn €
is the probability of a large deviation of F
from F if F is not in A
n
n
and not "close" to A , implying that P {:F € A ) approaches 0 as n -+ 00. For
n
n
n
certain classes of sets A estimates of P CF € A ) (some of which are mentioned
n
n
n
be1o"T) have been obtained which hold uniformly for "large" and for "small"
deviations.
-
For any two dfs F and G in G let I-l be some sigma-finit.e measure which
= d(F
dominates the two distributions (for instance, dl-l
+ G», and let
g be the corresponding densities, dF = f dl-l , dG = g dl-l.
I(G, r)
= f
(log (gff» g
I(G,F) = 0
if and only if G = F;
I(A,F)
I(A,F)
=+
co
if A is empty.
I(G,F) <
= infG€ A
00
and
Define
dl-l,
with the usual convention that ~he integrand is 0 whenever g
of I(G,F) does not depend on the choice of I-le)
f
We have
= O.
O::S I(G,F)
(The value
::s
only if F dominates G.
00
;
Let
I(G,F),
Sanov [121 has sho'Wl'l that under certain restric'"
tions on A
n
(1)
P{:F € A )
n
n
If F
of F
n
=
exp (- n I(A ,F) + o(n)}
n
•
is discrete and takes only finitely many values, the distribution
may be expressed in terms of a multinomial distribution.
In this case
the estimate (1), with o(n) replaced by 0 (log n), holds under rather mild
restrictions on An (see [5] and section 4).
In
[12] ("mere only sets A in-
dependent of n and one-dimensional distributions F are considered) Sanov
obtains (1) for a certain class of sets A such that
2
P CF € A} can be
n
.'I
I
I
I
I
I
.-I
I
I
I
I
I
I
J
I
I
I
I.
I
I
I
I
I
I
approximated by multinomial probabilities.
Some necessary conditions for (1) to be true are easily noticed.
Let
£k (F)
fEdF = 0
denote the set of all G€ G such that n G is integer-valued and
implies
probability one and
Ie
I
I
=0
P (F
for every open set
Hence
enwty.
exp (- n I(A,F)}
G (F) c g(F); if F
=n
= An
Then Fn €
2n (F)
"lith
Let =G(F) denote set of
only if An ~(F) is not
eo
can be a non-trivial estimate of P {F
.
are non-empty.
If F
n
€
A}
is discrete then
takes only finitely many vaJ..ues, (1) is always true for
-
A(n)
I(A,F) <
Then
An ~n(F) and An ~(F)
only if both
E € RID.
€ A) = P {F €A n G (F)}.
n.
n
-n
all G which are dominated by F.
n G (F) (see (32), section
n.
is not too large.
I.
I
I
I
I
I
I
I
fEdG
If F
4),
and (1) holds if I(A(n/F) - I(A ,F)
n
is absolutely continuous with respect to Lebesgue
measure, then G (F) and g(F) are disjoint; for (1) to be true and nontrivial,
=n
-
A must, as a minimum requirement, contain both values of F
n
n
(which are
discrete) and dfs which are dominated by F (hence also by the Lebesgue
measure).
In the following tyro sections the approximation (1) will be related
to
l~ovm
results for certain classes of sets, which give more precise
estilmtes of the probability.
2.
Half-spaces.
Let cp
be a real-valued measurable function on
m
R and let
(2)
H = H(cp)
be the set of all
G€ ~
=
(G
If
such that
cp d.G
f
2:
cp dG
0 )
is defined and nonnegative.
The
set H may be called a half-space in ~. The asymptotic behavior of P (Ii'n € H)
= P (;f
j=l
cp (X .)
2:
o}
has been studied extensively.
To relate these results
J
to the estimate (1) we first prove the following lemma.
3
He shall write G[B]
I
I
for
.1
dG and G[ep€ E] for G[(x \ep{x) € EJ].
B
Lemma. 1.
Let H
= (G II ep
dO ~ OJ, M{t)
10
if
I
(A) We have
> 0 M{t) •
I{H,F) = log inft
(B) 0 < I{H,F) <
= Iexp{~)dF.
ep dF < 0, F[ep ~ 0] > 0, M{t) <
10
for some t > 0,
(B ) If, in addition, F[ep > 0] > 0, then inf > 0 M{t) = M{t ), where t > 0
l
t ep
ep
is the unique root of M1{t) = 0, M'{t) = dM{t)/dt.
(c) I{H,F) = 0 if Iep dF ~ 0 or M{t)
=
(D) I(H,F)
Proof.
00
if F[ep ~ 0]
= 10
for all t > O.
= O.
.-I
If G € H and I{G,F) < eo then for t > 0
-I(G,F) ~ tIepdG - I{G,F)
=I
log(exp(~)fg-l)g d~
(4)
~ log Iexp{~)f d~
by Jensen's inequality.
sign holds in both inequalities in (4) if
If M{t) <
00
and M' (t)
= O.
t
ep
= log M(t)
Hence I(H,F) ~ - log inf > 0 M{t).
t
I
ep dG
=0
The equality
and exp(t.ep) fg -1
and M' (t) exists, these conditions are equivalent to
F[ep = 0]
> 0 then inft > 0 M{t)
and I{G,F) = .- log F[ep = 0].
for this case.
F rep ~ 0]
=0
= O.
I
= 1 then G € H
ep dF ~ 0 and
It remains to show that if M{t)
Since this is true for
ep dF < 0 for all c €(o,oo).
00.
= 0 and
This implies (B) and (B ), and (3) is proved
l
are easily verified.
for t ~ 0, c <
= exp{tep)dl/M{t)
> 0]
= 0]
= M{oo) = F[ep = 0], and if G[ep
The statements (3) , (C) and (D) in the cases
t > 0 then I(H,F)
Iep <c
em
= const.
Under the hypothesis of (B), if F rep > 0] > 0, there is a unique
dO,oo) such that W(t ) = 0 and M(t ) = min M{t) < 1; if F[ep
ep
ep
Let M{t,c)
I
For each c €(O,oo) there is a t(c)
4
=
10
for all
ep dF ~ 0, we may assume that
= J./
exp{~)dF,
ep.. . .c
€
I
I
I
I
I
I
'Which is finite
(O,eo) such that
I
I
I
I
I
I
.}
I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
o Jvl(t,e)/2l (t) = 0
-) 1 as e
2:
= 0,
= tee).
It is easy to show that tee) -)0 and H(t(e),e)
= exp(t(c)
Let G be the elf defined by dGe
e
-)«l.
for cp <:: e, Ge[cp
Thus I(H,F)
for t
c]
= o.
Then Gc
€
H and I(Ge,F)
=-
~')
dF/M(t(e),e)
10gH(t(e),e)-)0 as e
-)00.
as was to be shown.
We have the elementary and 't'rell-lmown inequality
n
P(Fn € H} = p(~. cp(X .)
J
J=l
2:
O}
:s inft
> 0 M(t)n = exp( -n I(H,F)}.
Equality is attained only in the trivial cases F [cp
Let He = H(m-e)
"t'
= Pt~=l
cr(X)
2:
= (Glf co
ne}.
dG
:s 0]
= 1 and F [cp
2:
0]
= 1.
> e}, where e is a real number. Then PtF n € He }
, -
If
f
cp dF
:s e,
F[cp > e] > 0, and M(t) <
00
for same t > 0
then, by Lemma 1,
I(He,F) = e tee) - L(t(e»
where L(t) = log ~t) and tee) = t
cp-e
= I *(c), say,
is defined by L'(t(e»)= e.
A theorem of Cramer [3] as sharpened by Petrov [8] can be stated as
follmTs.
Suppose that
Jcp dF = 0, F[cp f:. 0] > 0, M(t) <
(6)
for some t
(7)
o
>0.
Then for e
= en
«l
if It I < t
o
l 2
>o:n- / (0:>0), c = 0(1) as n
-)«lvTe
have
*
n
Pl~
cp(X.) > ne} = b (c) expl-n I (c)} (1 + O(e»,
j=l
J n
= (2rc) - 1/2 f x
where, ui tIl 'P(x)
2
exp( -y /2) dy,
-co
(8)
2
l 2
b (c) = (1 - 'P(x» exp(_x /2), x = n / c/O',
n
i
=f
cp2 dF
1 2
(Usually the theorem is stated in terms of an expansion of I*(on- / x) in
5
•
I
l 2
powers of n- / x.)
o(c)
Petrov [8] also shows that for any
replaced by r
CT €
holds uniformly for 0
<c <
CT€,
€
> 0 equation (7) with
where r does not exceed
(Compare also the earlier paper of Feller [4].)
an absolute constant.
Ba.ha.dur and Rae [1] have obtained an asymptotic expression for the
probability in (7) when c is fixed.
satisfied,} c
It implies that if conditions (6) are
> 0 is fixed and F [cp > c] > 0 then2
n
p{t
j=l
1/2
*'
cp(X )?= nc}:=( nexp{-n I (c)} •
j
From (8) it is seen that b (c) X
n
Hence the quoted results
imp~
~ -1><
c- l n- l / 2 if c XX n-1/2 (a
the following uniform estimate of the order
of magnitude of the probability under consideration.
numbers such that F[cp
>
~]
> 0).
Let a and
~
be positive
> O. If conditions (6) are satisfied then
n
p(t
cp(X ) ?=nc)X c- l n- l / 2 exp(-n I* (c»)
j
j=l
(10)
uniformly for a n
-1/2
< c < . ~.
Lemma. 1 shows that expt-nI(H,F)} does not approximate P{F
Met) =
co
for all t
> O.
In this case estimates of
n
€
H} if
P(~=lCP(Xj) > nc} have
been obtained by Linnik [7] and Petrov [9] for "moderate" deviations such
that n(c -
J cp
dF) tends to 0 at a specified rate.
For certain sets A the results on half-spaces enable us to obtain
upper and/or lower bounds for PlFn
€
A} of the form (1).
.
of G, it follows from the definition of I(A,F) that A c B
I f A is any subset
= (G I I(O,F) -> I(A,F)}.
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
J
6
I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
Suppose that 0 < I(A,F) <
.
= I(A,F).
and that there is a G
eo
0
€
A such that I(G ,F)
0
=f
If I(G,F) and I(G,Go ) are finite, we have I(G,F)
= dF/dfl ,g0 = dG/dlJo Hence the half-space
I(G,Go ), where f
2: I(A,F)} is a subset of
H
log(go/f) dG +
= [G I flog(g d /f)dG
= I(A,F). In generaJ_ H neither con-
B, and I(H,F)
tains nor is contained in A.
Suppose that A contains a l1alf-space H such that I(H,F)
= I(A,F). For
~
example, if A is the union of a family of half-spaces H(cp), c;::
€
the complement of A is convex), it is easily seen that I(A,F)
= inf[I(H(cp),F),
cp
€
~}.
If the infimum is attained in
Then ,-re have the lower bound P[F
n
~,
the stated assumption is satisfied.
A) > PtF
€
-
(so that
n
€
H}, Which, under appropriate
conditions, can be estimated explicitly, as in
(7) or (9), where I *(c) = I(A,F).
If A is contained in a half-space H and I(H,F) = I(A,F), we have analogous
upper bounds, including P[Fn € A)
:s exp{ -n
I(A,F)}.
Now suppose that the set A is contained in the union of a finite number
k = ken) of half-spaces H.,
i = l, ••• ,k.
1
k
(11)
P[F
A} < 1:: P[F
n
- i=l n
€
Then (using (5»
k
€
H.) < 1:: exp[ -n I(H.,F»)
1
- i=l
1
< k exp[- n min I(H.,F)} •
-
1
i
If min I(H.,F) is close to I(A,F) and k = ken) is not too large, even the
1
crudest of the three bounds in (11) may be considerably better than the upper
bound implied by (1).
The following example serves as an illustration.
Let
(12)
A = [G I sup
IG(x) - F(x)
x c Rm
7
I>
c}, 0 < c < 1 •
I
The set A is the union of the half-spaces H+
x
H;
= (GI F(X) - G(x)
~ c}, x € Rm•
= (GI
G(x) - F(x) . > c} ,
-
Sethuraman [131 has shown that the esti-
mate (l) holds in the present case with c fixed, and for more general unions
of half-spaces.
It follows from Lemma. l by a simple calculation that l:(H~,F) = J(F(x),c)
and I(~F)
= J(l
(l3)
J(p,c)
- F(x),c), where
= (p+c)
if 0 < P < l-c, J(l-c,c)
log«p+c)/p)+ (l-p-c) log «l-p-c)/(l-p»
= -log(l-c),
J(p,c)
= ~ if p = 0
or p > l-c.
I shall assume for simplicity that the one-dimensional marginal dfs
of F are continuous.
(l4)
I(A,F)
Then F(x) takes all values in (0, l) and we have
= minp
J(p,c)
= J(p(c),c) = K(c),
where p(c) is the unique root in (O,l-c) of
shOl-T that (l-c)/2 < p(c) < min (l/2,l-c).
For any x with F(x)
P{F
n
= p(c)
€ A} > P{F
n
€
say,
OJ(p,c)/ep
= O.
It is easy to
For K'(c) = dK(c)/dc we find
we have
H+} > (n) p(c)r (l_p(c»n-r ,
x - r
where r = r(n,c) is the integer defined by r
~
n(p(c) + c)
> r-l.
An applica-
tion of Stirling's formula shows that this lower bound is greater than
cl n-l / 2 ex:p{-n J(p(c), cn)}' where Cl is a positive constant independent of
c and cn = (r/n) - p(c) = c + eln,
0 -< e < l. Hence it can be shown that
1
> 0 there is a pas i tive constant C2 which depends only on € such
that for 0 < c < l-€
for every
€
8
.'I
I
I
I
I
I
.'I
I
I
I
I
I
I
~
I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
(16)
PtsuPX
€
R IFn{x) - F{x)
m
I~
Now let k be a positive integer.
F{x)
~ c2n- l / 2
c}
expt-n K{c)} •
Since the marginal dfs F{i){x{i»
of
= F{x{l), ••• ,x{m» are continuous, there are numbers a~i), _~ = a~i)
..
J
< a~i) < ••• < ~:i < ~i) = +~, i = l, ••• ,m, such that F{i){a~i»
all i, j.
(i)
(i)
(i)
.
If a j -l~ x
< aj
for ~
i
= j/k
for
= l, ••• ,m then
i
G{x) - F{x) ~ G{a) - F{a) + m/k ,
where a
= (a~l) , .... , a~m», 1 ~ ji ~ k, and
we have a similar upper bound
m
1 .
m
Hence the set A is contained in the union of the 2 k
for F{x) - G{x).
half-spaces tGIG(a) - F{a) ~ c - m/kJ, tGI F(a) - G{a) ~ c-m/kJ) corresponding
m
to the k values a. If c - m/k > 0, we have for each of these half-spaces H
the inequality I{H,F) ~ K{c-m/k).
We have K{c-m/k)
Hence, by (11),
= K{c) - (m/k) Kf{C-em/k),
0
< e < 1. With (15) this implies
2
K{c-m/k) > K{c) - (m/k) 4c/{1-c ) and
If we choose k so that k-l ~ 4nc{1_c2 )-1 ~ k and take account of the assumption
c > mk -1, we obtain
Ptsup
x
€
R
m
IF (x)-F{x)
n
I -> c)
m
2
< 2e (4cn/{1_c )+1}m exp(-nK{c)}
if 4c 2n > m{1_c2 ).
m
For c fixed the bound is of order n expt-nK{c) J.
reduced by using the closer bounds in (11).
9
The power n
m
can be
An upper bound of a different
I
form for the probability in (17) has been obtained by Kiefer and Wolfowitz
[6].
3.
Sums of independent random vectors.
m
Let 'P = (epl' ••• ,'Pk) be a
k
measurable function from R to R and consider the set
(18)
A
= (G I f
'P em
€
D) ,
'Where D is a k-dimensional Borel set. Then P(Fn € A} is the probability that
l
the sum n- ~=lCP(Xj) of n in~ependent, identically distributed random vectors is contained in the set D.
I(A,F) = inf s
For
t
€
D I(A(s),F), A(s) = (G I
€
f
k
where (t,cp) =
10.
~=l
f
exp(t,ep)dF,
ti'Pi.
Let
e
L(t). log M(t),
denote the set of points t
Suppose that the set
eo
of inner points of
derivatives L:i.(t) = dL(t)/dt i e~st in
L'(t) =(Li(t)'''.'Lk(t)~ t
€
eo.
Let
no
e
€
11t for
'Which
is not empty.
The
denote the set of points
eO. The following lemma, in conjunction with
(19) is a partial extension of lemma. 1.
Lennna. 2.
(20)
If s
€
no
then
I(A(s),F) = (t(s),s) - L(t(s»
= -min
t
where t(s) satisfies the equation L'(t(s»
= s.
where G is the elf in A(s) defined by
s
(21)
dG
s
= exp«t(s),cp)} dF/M(t(s» •
10
€
k[L(t) - (t,s)],
R
Also, I(A(s),F)
I
I
I
I
I
.-I
cp dG = s} •
R let
M(t) =
M(t) <
We have
.-I
= I(Gs ,F),
I
I
I
I
I
I
~
I
I
I
I.
I
I
I
I
I
I
If G E A(s), we find as in (4) that -I(G,F)
Proof.
all t
Ie
I
I
L(t) - (t,s) for
k
R , wi th equality holding only if
€
(22)
dG = exp{(t,cp)} dF/M(t) •
The df G defined by (22) is in A(s) if and only if
I
for t
nO'
80 , is equivalent to Lf(t) = s.
€
point t(s)
€
Since s
E
cp{exp (t,cp)ldF/M(t) = s which,
there is at least one
eO which satisfies this equation. The lemma. follows.
(If the
distribution of the random vector cp(X ) is concentrated on a hyperplane in
l
k
R , the solution t (s) of Lf (t) :I s is not unique; but the distribution G
s
can be shown to be the only G E A(s) for Which I(G,F)
It is seen from (20) and (21) that if s
I.
I
I
I
I
I
I
I
S
€
=I(A(s),F)
.)
no then
dF(x) = exp(-I(A(s),F)-(t(s),cp(x)-s») dG (x) ,
s
n
n
II dF(Xj ) = expt-nI(A(s),F)-n(t(s), ICPdF -s» II dG (xj ) •
j=l
n
j=l s
(Here the same notation F
.
n
is used for the value F (. Ix(n»
n
function F (. Ix(n».) Hence the distribution of the sum
n
I
as for the random
cp dF
n
can be expressed
in the form
n
n
ITdF(x j ) = expt-n I(A(s),F)}
I
II dG (x j )
~IcpdF =s) j=l
tIcpdF =s} j=l s
(23)
r '
I
n
for values s
I
bution of
case s =
I
€
nO.
n
The inte.gral on the right is the value at s of the distri-
cp dFn when Xl' ••• ' Xn have the common distribution Gs , in which
cp dG
s
is the expected value of
I
cp dF.
n
The higher IOOrnents of
this distribution are finite, and the well-known results on the approximtion
of the density of a sum of independent random vectors in the center of the
11
I
distribution can be used to approximate the density (on the left in (23»
points remote from the center.
p[ fep <iFn
€
at
This, in turn, can be used to approximate
D} at least for D c nO.
This approach has been used by Borovkov
and Ib.gozin [2] to derive an asJ'lll.Ptotic expansion of the probability
p{ h <iF
n
€
D } for an extensive class of sets D under the assumption that
n
n
f
the distribution of
cp <iF
. n
is absolutely continuous with respect to Lebesgue
k
measure in R for some n.
Borovkov and Rogozin make the following assumptions concerning D (in
n
my
notation).
Let
-* n denote the essential infinnun relative to k-dimensionaJ.
Lebesg:ue measure of I(A(s),F) for s
A = {G If cp dG
n
€
Dn • (Thus 'lin = -I(An,F) where
D } and D differs from D by a set of Lebesgue measure 0.)
n
n
n
.
*
€
*
Let Gf be a compact subset of ®O and ~
= tL'(t) It
€
®f}.
Ass~~tion (A):
For some 5
Ass'l.lIl:q;ltion (B):
There is a union U of finitely many half-spaces in ~ such
> 0, Dn n tsII(A(s),F) < 'IIn +J}
€~.
that
D
n
n [s II(A(s),F) > _\I n +5} cUe (s II(A(s),F) > _,'t'n+5/2} •
Under these assumptions the leading term of the asymptotic expansion obtained
in [2] is
(24)
P( f cp dF
where cp (u)
n
=f
= tsl
Dn
I(A(s),F)
n
€
n
D } - (2"r) -k/2 nk/ 2 exp(n'l1 ) / e -nu cp (u) du)
nOn
n
IE
(s)
,-1/2
ds,
r(-c)
r(+*n-u )
= c},
IE(s)1 is the determinant of the covariance matrix of
cp(X ) when Xl has the distribution G ' and the last integral is extended over
s
l
the indicated surface.
12
.'I
I
I
I
I
I
.'I
I
I
I
I
I
I
,}
I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
I.
I
I
It should be feasible to obtain an analogous expansion for the case of
lattice-valued random vectors.
An extension of the Euler-Maclaurin stun
fornmla to the case of a fm1ction of several variables due to R. Ranga Rao
(in a Ph.D. dissertation lThich is unpublished at this vr.riting; compare [lD]),
The order of magnitude of the probability p( J cpdF
i'T01J.ld be useful here.
n
€
D }
n
for a fairly extensive class of sets D can be determined in a rather simple
n
vray, as is shown in section 1,_ for the multinomial case.
an estimate of P(J cp dF
€
n
Richter [J 1] derived
D} for a special class of sets D in the lattice
vector case as well as in the absolutely continuous case; it is akin to the
Cramer-Petrov estimate for the one-dimensional case but seems to have no
simple relation to the Sanov-type estimate (1).
The preceding discussion has an interesting statistical interpretation.
Le:mma 2 Sh01'TS that if s € nO then the infimum I(A(s),F) is attained in the
"exponential" subclass of
(22).
Suppose that F
=F
~
e
vThich consists of the distributions G defined by
is a member of the class [F , 0
e
€
e},
dF
e = f e dv,
"There
fe(X) = exp(o,cp(x» - L(e)} ,
(25)
v is a sigma-finite measure on the m-dimensional Borel sets,
tl
k
R to R , and
e
is the set of points
e€
(If the null vector 0 is in
is finite.
loss of generality, then d\l
= dF O
k
a function from
R for which exp L( e) =
e,
Let f
.)
~
J
exp( e,ep )d\l
vThich could be assumed with no
e,n = f e,n (x(n»
be the density of
x(n), so that
(26)
Here
f
J cp
dF
n
e,n = exp n[( e, J cp dF n ) - L( e)} •
is a sufficient statistic and it is natural to restrict attention
13
I
to sets A of the form (18).
(27)
I(Fe"F )
e
for e' € 8 0, e
s
€
= (et_e,Lt(e t »
From (20) "lith F
8.
f
We have
cp
(IF
e = L' (e) for e
- L(e')
= Fe
+ L(e)
€
0
=I *(e',e),
we have I(A(s),F e)
.-I
8 , and
say,
= I*( e', e),
where
= L'(e t ) .
such that en(x(n»
of Lt (e) =
f
cp
ma.xi.mizes fe,n(X(n».
(IF ,
n
and we have max... f
e
A
e is
A maxirm.un likelihood estimator of
a function e
If
f
e, n
cp dF €
n
= f~
t:ln,n
n
from Rmn into
e
nO then en is a root
= exp
n((e ,L'(3 »-L(3
n
n
n
».
Hence
(28)
f
=fA
e, n ,en
n
* '"
eXPt-nI(e,e)}.
n
(28~ which is related to (23), shows that f e,n depends on
Equation
*A
tl1rougl1 I (e , a).
n
Note that the likelihood ratio test for testing the simple
hypothesis a = at against the alternatives
* ..
I (e ,e') exceeds a constant.
n
f
C? <IFn
e only
e ~ e'
rejects the hypothesis if
For the special case where the distribution of
is multinomial the author has shown in [5] that the likelihood ratio
test l1aS certain &sY'IOtically optimal properties.
4.
Multinomial probabilties.
The case where Xl takes only finitely many
values can be reduced to the case where
and takes the
k
Xl
is a vect,or of k
COmpOl ents
values (1, 0, ••• , 0), (0, 1, 0, ••• , 0), ••• , (0, ••• ,0,1)
with respective probabilities
Pl' P2' ••• , Pk
n Z(n) = Xl + ... + X takes the values
n
n z (n)
whose sum is 1.
= (n l ,
••• ,
~),
The sum
ni
~ 0,
~~=In.~ = n , and we have
The df F of X and the empirical df Fare re spectively determined by the
1
n
14
I
I
I
I
I
.-I
I
I
I
I
I
I
eI
I
I
I
I.
I
I
I
I
I
I
vectors
I' = (pl, ••• ,pk)
~nd
Zen) whose values lie in the simplex
n = {(~, ••• ,~) I~ ~ 0, ••• , ~ ~ O'~=l
to write
I(Z(n) ,I')
(30)
for
for
and
in
I'
I(F ,F), where
n
Xi log (x./pi)
i=l
~
n.
We have
vll1ich corresponds to equations (23) and (28).
For any set
Ac:.
n let I(A,p)
(32)
p(Z(n)€ A} =
uniformly for A c
n
= inf{I(x,p)I::€A}
and let A(n) denote
In [5] it is shown that
the set of points zen) € A.
I.
I
I
I
I
I
I
I
I.
I
I
It vdll be convenient
z3c
I(x,p) =
x
= 1).
xi
exp(- n I(A(n),p) + 0 (log n)}
n.
and I' €
I(A (n) ,I')
Clearly
> I(A,p).
Hence if
is a sequence of sets such that
I(A(n),p) < I(A,p) +
n
n
(33)
O(n-~og
n)
then
p(z(n) € A } x
n
Ifl1ere
r
n
is bounded.
the appendix of
n
rn
exp (- n I(A ,p)}
n
,
Sufficient conditions for (33) to hold are given in
[5].
Here vre shall consider the determination of the order of magnitude of
p{Z(n)€A}, which amounts to the determination of
n
p
..Till be held fixed rlth p.
lmiformly for
>0
~..
p.
~
>
€, i
for all 1.
= 1, ••• ,k,
r
n
in (34).
The point
(The results to be derived hold
where € is any fixed positive number.)
15
I
Lemma 3.
on
r,l
For every 7'eal m there is a number
d
(Which depends only
Ac n
and k) such that uniformly for
p (Z(n) € A) = P (Z(n) € A, I(Z(n) ,p) < I(A,p) + dn -~og n}
(35)
+ O(n-mexp (-n I(A,p)}) •
2:
The lemma. follows from (32) with A replaced by (xII(x,p)
and
d
I(A,p) + d n-llog n}
suitably chosen.
It should be noted that
z(n)€ A implies
I(Z(n),p) > I(A,p).
Thus if
the reminder term in (35) is negligible, the min contribution to
P (Z (n) € A)
is from the intersection of A with a narrow strip surrounding
the (convex) set (x I I(x,p) < I(A,p») •
Let
Let for
n€
denote the set of all
x
€
n
such that
xi
>
€ ,
i = 1, ••• ,k.
x € n
Lemma. 4.
For
€
> 0 fixed we have uniformly for A c n €
(36)
We now approximte the sum in C~6) by an integraL
To determine the
order of magnitude only, a crude approximation will suffice.
Let
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
.-I
16
I
I
I.
I
I
I
I
I
I
Lemma. 5.
For
>0
€
€
...
. ~oof.
. We have
1
«n»
..
)
(~, .•••
'~~-l € Rn z
x..K
= l-x..~ ••• x..K~1.
X and
z
€
=n
t
lc-1
Also,
f··· f~ (z(n»
I Xi
hen
(n)
- Zi
I<k
= I(z,p)
I(x,p)
~. "~-1. If
n
-1
NOvl let
f(x)
.
= 1, •••
,k,
+ O(max. Ix.-z.1)
J.
=
(x
I f(x)
E' > 0
~ c}
J.
J.
'Where
uniformly for
,
there is a number
(38)
if
Iz
.
J.
be a :f'unction defined on n ,
A(c)
where
for
n€ • These fa.cts imply the lemma.
and suppose that for every
I.
I
I
I
I
I
I
I
I.
I
I
Ac n
we have uniformly for
- xI
= maxi Izi
partial derivatives of
x. > 0 for all i).
- Xi I.
f
z
a
€
1
(E')
n€
I'
such that
X
e:
nE:
I
,
This condition is satisfied if the first
exist and are continuous in
no (the set Where
Let
J.
=
D(c,o)
(xlf(x) ~ c,
I(x,p) ~ I(A(c),p) + o} ,
(40)
(41)
and, if the derivative
=d
VI (u)
c
V (u)/du
c
<u <
exists for 0
0 ,
o
= f0
K (c, 0)
(42)
n
Theorem 1.
every
€I
Let
> O. Let
A(c)
e -nu V' ( u) du •
c
be defined by
(37),
'Where
f
satisfies
(38)
for
(c ) be a real number sequence and suppose that for every
n
17
I
aI
> 0 there are positive numbers
8 and
n
o
such that D( c -a In -1, 8):: n
n
€
> n.
Then for every real nwnber m there are positive numbers d
o
for n
and
a
where
Ie \ < 1,
\c-c I < an -1
n -
tllat
€,
such that
8 = d n-llog n, and it is assumed that for each
n
the derivative
VI (u) exists for 0
c
c
such
< u < 8.
From lemmas 3, 4 and 5 we obtain
!2:Q.2!.
where
J 1 ,n=
f
•••
D*
n
and .8
n
= d n
-1
log n.
f
(11 k
Xi)
i=l
~
=U
z
(n)
€
-~
2exp(-nI(x,p)}~".~_1 '
D(c ,8)
n
n
R (z(n»
,
n
It follows from condition
(38),
which is also satisfied
by I(· ,p), that there is a number a > 0 such that
-1) cD*cD*c-an,8
(-1
D*c+an
+an -1) •
( n -1,8-an
n
n
n
n
is bounded in n
Since
€
( c -e an -1,8 +e an -1) ,
, we obtain J 1 , n
~ J
,2n n
n
where
and
lei s 1.
If the derivative VI (u)
c
write
18
exists for
0
< u < 8 , we can
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
.-I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
I.
I
I
J
2 ,n
(c,5) = exp(-n I(A(c),p»
r 5 e-n~'(u)du
•
c
0
The theorem follows.
1
If vie had not suppressed the factor (rr xi) -'2 , we i,rould have obtained
(43) vrith
V (u)
c
replaced by
1
VI (u)
,c
.,
= f •.•fD*(c,u) (~i=lx.)-'2
~ ••• n~ 1
1
---~
-K-
.
•
In this form the first term on the right of (43) is analogous to the right
side of
(24).
The integer
k
(24)
in
is here replaced by
k-l
since the
distribution is (k-l)-dimensional.
To apply theorem 1 we need to determine the order of magnitude of
5 ~ n-llog n, so that 5 ~ 0
and n 5 ~IO. If,
n
n
n
r
V' (u) ~ b(c )u
uniformly with respect to n as u ~ 0+ ,
c
n
K (c ,5 ), where
n
n
n
for instance
n
then
(c,5 ) X b (cn) f
n n n
5
n e -nu u r du
K
~
o
b (c ) n -r-l
n
•
Concerning the determination of the order of magnitUde of V' (u) we
c
observe the following.
f(p) < c.
Note that I(A(c),p) = 0
c D,(c,5).
f(p)
2:
c.
Assume
The continuity condition (38) implies that I(A(c),p) > O.
Y denote the set of points
Y
if
y€ A(c) such that I(y,p) = I(A(c),p).
The assumption D(c,5)
of I(·,P) imply that fey) = c
C
Let
Then
!l€ ' condition (38) and the convexity
if Y € Y •
Suppose first that the set A( c) is contained in a half -space H such that
I(H,P) = I(A(c),p).
(This is true if the fUnction -f(x) is convex, so that
the set A(c) is convex.)
Then the set Y consists of a single point y and we
have
19
I
(44)
.'I
I(x,p) - I(A(c),p) = I(x,p) - I(y,p)
= 1:(log(Y/Pi»(xi - Yi) + I(x,y) •
Hence
H
=
(x I 1: (log Y/Pi,)(x i -Yi)
I(x,p) - I(A(c),p)
2: I(x,y). Therefore if x
Now I(x,y) =~Q2(X,y) + o( Ix-yI3),
if x
€
D(c,
2: 0) , and x € A( c) implies
°)
then
n
Ix-y I = O(O!),
n
1
where
€
D(C'On) then I(x,y) < on •
Q2(x,y) = 1:(xi-Yi)2/Yi.
Hence
3/2 ) = 0 (n- 1), and
I(x,y) - ~Q2(x,y) = 0(5
n
the inequality
°
(45)
I(x,p) - I(A(c),p) < n
may be written
An inspection of the proof of theorem 1 shows that in the present case the
theorem remains true if in the domain of integration D(c, u) of the integral
V (u) the left hand side of
c
(45)
is replaced by the left hand side of
Now suppose further tha.t the partial derivatives
fij(x)
= d2f(X)/Oxi dxj
continuous in n •
o
(46).
fi (x) = Of(X)/Oxi '
and the third-order derivatives exist and are
Then
f(x)-c = f(x)-f(y) = 1: fi(y)(xi-Yi) + !F(x-y)+ 0(lx-YI3),
uniformly for
if x
€
y
€
n€
,where F(x-y) = 1: 1: fi"j (y) (x. -y.) (x '-Yj).
1
D(c,on)' the inequality f(x)
Furthermore, the half-space
This implies that
1
J
2: c may be written as
(xl 1: fi(y)(xi-Yi)
2: 0) is identical with H.
y = y(c) satisfies the equations
log(y./p.) = t(c) fi'(y) + s(c),
1
Hence
1
20
i = l, ••• ,k ,
I
I
I
I
I
.'I
I
I
I
I
I
I
.'I
I
I
I.
I
I
I
I
I
I
where
~
t{c)
>0
Yi = 1.
and
s(c) are constants, as well as the equations f(y) = c and
It follows that under the present assumptions theorem 1 remains
true with K {c ,8}
n n n
replaced by K {c ,8 ,r }, Where
n n n n
Kn {c,8,r} = J0 8 e-nu V'c,r {u}du
and VI {u}
c,r
is the derivative with respect to u
of the volume V {u}
c,r
of n*(c,r,u}, the set of points (XJ.'''.'~-l) which satis:f'y the inequalities
(48)
1
I.
I
I
I
I
I
I
I
I.
I
I
If 've make the substitution zi = Y~2 {Xi-Yi}' i = 1, ... , k, we obtain
k
2
= ~i=l Z1.
Q2{X,y}
' ~ Yi
J._
2
Zi
=0
l..
~ f!{y}(x.-y.) =
~
where
a{c)
We have
=~
~ b~
~
~ (f!{y}-a{c})y. 2 Zi
~
~
~
YifI{y}, ~2 {c}
= 1,
l..
~ biyi
and
.
= ~,..
= a{c}Z biZi
= ~{fI(y}-a{c}}2yi
and b
i
'
= ~l{c}{fI(y}-a(c}}.
Hence we can perform an orthogonal trans-
formation (zl' ••• '~) -+ (v1 , ••• ,vk ), where vl = ~ bizi = ~-l(c}Z fI(Yi} (Xi-Yi)
l..
and v = ~ Yi 2 Zi = o. The inequalities (48), (49) are transformed into
k
(50)
t{c}a{c}v + ~ ~-l
l
~=l
{51}
"There
,
G(vl' ••• , vk-l} is a quadratic form in vl' • • • , vk-1.
V {u}
c,r
,mere
'1 < u
C(c)
= C{c}
W {u}, V' (u)
c,r
c,r
Thus
= C{c)W'c,r (u),
is the modulus of the determinant of the linear transformation
21
I
(~' ••• '~-l) -+(vl,···,Vk _l ) and Wc r(u) is the volume of the set defined
,
by
(50) and (51). In the
estimation of WI
(u)
,c,r
we may assume that u = o(a )
n
and r = 0(5 3/2 ) = o(n- l ).
n
The replacement of V (u) by V (u) may be possible under conditions
c
c,r
different from those assumed in the two preceding paragraphs. Suppose that
c
> f(p) is fixed and that the set
Y consists of a finite number s of points.
Choose T) > 0 so sma.ll that the s sets Sy = {x I Ix-y I < T)}, Y
€
Y, are disjoint.
Then for 6 small enough D(c,a) is contained in the union of the sets S.
y
for each y
€
If,
Y, the surfaces f(x) = c and I(x,y) = I(A(c),P) are not too close
in the neighborhood of y then xES
y
n D(c,6)
will
imply Ix-y\ =
,
0(&1/2) and
vie arrive at analogous conclusions as in the preceding case.
~=l
If f(x) =
aixi is a linear function, n f(Z(n»
is the sum of n
independent random variables" ,each of which takes the values a l , ••• , ~ with
respective probabilities Pl, ••• ,Pk. The following theorem can be deduced from
theorem
1 but is a special case of (10).
Theorem 2.
all equal) and
uniformly for
Let A(c) = ,{xl
~=l a i Pi =
0:
O.
~=l
aixi
~
c}, where al' ••• '~ are fixed (not
Then
n -1/2 < c < max a. - f3, where ex and f3 are arbitrary positive
J.
constants.
The next theorem gives an analogous uniform estimate for the distribution
of I(Z(n) ,p).
Theorem 3.
constants.
Let p.
= min.J. Pi and let ex and f3 be arbitrary positive
mJ.n
Then
22
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
.-I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
I.
I
I
uniformly ,for ex n -1 < c < -log(l-Pmin ) -I'.
Proof.
In this case
D(c, 6)
= ( x I c .s I (x, p) < c
+ aJ
•
It can be shown that I(x,p) < -log(l-P
) implies x. > 0 for all i.
min
~
assumption c < -log(l-p:mJ.Il
. ) - 1', "> 0, implies D(c,a) c n€ for some
if 6 is small enough.
V(u)
The
€
>0
Let
= f ... f
I(x,p) < u
~. "~-l •
We first prove the following lemma.
Lemma 6.
The derivative V' (u)
= dV(u)/du exists, is continuous and
positive for 0 < u < -log(l-Pmin ) and
(Heuristically, as u
~O,
V(u) is approximated by the volume of the
ellipsoid Q2(x,p) < 2u, which is proportional to u (k-l)/2.)
We shall write
~(x,p)
for I(x,p) to indicate the number of components
, for V(u).
of the arguments x and P, and Vk (u)j We may and shall assume that P = P .•
k
,,,
,P
nun
For ~
q
= (Y1' ••• 'Yk-l)' Yi = xi~l-~); Z = (Zl,Zg) = (l-~,~);
= (ql, ••• ,qk-l)' ~ = Pi/(l-Pk ); r = (rl ,r2 ) = (l-Pk,Pk). Then we have
F1
let y
the identity
(54)
~(x,p) = zl ~c-l(y,q) + ~(z,r) •
Hence we obtain the recurrence relation
(55)
23
I
Since Pk = Pmin
:s l-Pk,
we have r
l ~ r2•
We have V ,r(u) = b(u) - a(u),
2
where, for 0 < u < -log{l"'Pmin ) = -log r , a(u) and b{u) are the two roots of
2
the equation ~«Zl,l-zl),r) = u, 0 < a(u) < r
to show that the lemma. is true for k = 2.
b{u) k-3
= a(u)
f
zl
I
(56)
Vk (u)
,p
l
< b{u) < 1. Hence it is easy
From (55) we obtain for k = 3
-1
I
Vk _l (Zl (u-I2 {z,r») ) dz •
l
,q
It now can be shown that the lemma. holds for k = 3 and, by induction, that
equation (56) and the lemma. are true for any k.
Under the conditions of theorem 1 we have Ve(u)
= V(c+u)
- V(u).
The
lemma. implies that V' (u) = V' (c+u) ;::: (c+u) (k-3)/2 uniformly for 0 < c+u <
c
-lOg(l-P
min
) -
13.
It follows that uniformly for
K (c,6 »)( f
n
n
0:
n -1
< c < -log(l-Pmin ) - 13
6
n e-nu (c+u)(k-3)/2 du~ c{k-3)/2 n-1 •
o
This establishes the theorem under the restriction nc > a +
the number which appears in (43).
0:
0:,
Where a is
That the result holds for nc
> 0: with any
> 0 follows from the well-known fact that2nI{Z(n) ,p) has a chi-square
limit distribution.
Since A C (x I I(x,p) ~ I{A,p») for any subset A of n , theorem 3 immediately
implies
Theorem 4.
C
= C{O:,f3,p)
If
0:
and 13 are any positive numbers, there is a constant
such that for any set A which satisfies
we have
p{z(n)
€
A)
:s C{
n I(A,p») (k-3)/2 exp(-n I(A,p») '.
24
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
.-I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
I.
I
I
Remark.
We have max
f'I
X€~.
I(x, p) = -log Pmi.
n
It seems plausible that the
estimate (53) of theorem 3 holds uniformly for a n-
l
< c<-log Pmin -~.
If so,
theorem 4 holds with an analogous modification.
For the functions f(x) = La.x. and f(x)
J. J.
order of magnitude of p(f(Z(n»
in a wide range of c.
example.
c}
of theorems 2 and 3 the
is expressed in the form c r n S exp(-nI(A(c),p)}
That this is not true in general is shown by the following
Let
Since I(x,p) =
as c
~
= I(x,p)
~ O.
~ Q2(x,p)
+ O( Ix-pI3), Q2(x,p) < c implies I(x,p) =
Hence it follo1'TS from theorem 3 that if c
then p[Q2(z(n) ,p)
~
We have I(A(c),p)
= ~c +O(c3/ 2 ).
= O(n-2/3)
~
Q2(x,p) + O(c3/ 2 )
and nc
c} is of the same order of magnitude as p(I(Z(n) ,p)
uniformly for a n -1 < c <
~
n-2/ 3'.
>a >0
~
c/2} '.
By theorem 3 this implies
On the other hand, if c is bounded away
-1 -1, it can be deduced from theorem 1 ( see
frora 0 and from max Q2( x,P ) = Pmin
x
the remarks after the proof of theorem 1 and section 8 of [5])that
In this case the probability of the set A(c) is of the sa.rre order of magnitude
as the probability of any of the half-spaces contained in B
= (xl
and bounded by the supporting hyperplanes of the convex set B' =
I(k,p) ~ I(A(c),p)}
lX I
I(A(c),p)} at the common boundary points y of the sets A(c) and B.
holds for a wide class of functions f when c is fixed.
I(x,p) <
This result
(H.owever, theorem 4 of
[12] is inaccurate in the stated generality, as is seen from theorem 3 above.)
An asymptotic expression for
~)tained
p{~(z(n),p) ~
by Richter [11].
25
c} with c
= 0(1)
as n
~IO
has been
I
FOOl'NOTES
1
This research was supported in part by the M!J.thenatics Division of
the Air Force Office of Scientific Research. Part of the work was done while
the author was a visiting professor at the Research and Training School of
the Indian Statistical Institute in Calcutta under the United National
Technical Assistance Progr~
2
The notation a X b
magnitude,
that is, a n/bn
,
non.
means that a and b are of the same order of
n
n
n
is bounded away from zero and infinity from some
.-I
I
I
I
I
I
.-I
I
I
I
I
I
I
. I
e
26
I
I
I
I.
I
I
I
I
I
I
I.
I
I
I
I
I
I
I
I.
I
I
References
[1]
Bahadur, R. R. and Ranga Rao, R. (1960). On deviations of the sample
mean. Ann. M3.th. Statist. ~ 1015-1027.
[2]
Borovkov, A. A. and Rogozin, B. A. (1965). On the central limit theorem
in the multidimensional case. (Russian) Teor. Verojatnost. i Primenen.
10 61-69.
[.3]
Cramer, H. (1938). Sur un nouveau theoreme-'limit'e de la theorie des
probabilites. Actualites Sci. Indust., No. 736.
[4]
Feller, W. (1943). Generalization of a probability limit theorem of
Cramer. Trans. Amer. Math. Soc. ~ 361-372.
[5]
Hoeffding, Wassily (1'965). 'Asymptotically optimal tests for multinomial
distributions. Ann. ~~th. Statist. ~ 369-401.
[6]
Kiefer, J. and Wolfmr.i.tz, J. (1958). On the deviations of the empiric
distribution function of vector chance variables. Trans. Amer. Math.
Soc. 87 173-186.
[7]
Linnik, Ju. v. (196J./62). Limit theorems for sums of independent
variables taking into account large deviations, I, II, III. (Russian)
Teor. Verojatnost. i PriIoonen • .§. 145-163, 377-391; 7 122-134 (English
translation in Theor. Probab. and Appl• .§. 131-148, 3"5-360; 1115-129).
[8]
Petrov, V. V.(1954). Generalization of Cr~r's' limit theorem.
Uspehi Matem. NaUk (N.S.) 2, No. 4(62), 195-202.
[9]
Petrov, V. V. (1963/64). Limit theorems for large deviations when Cra.rrir's
condition is violated, I, 'II. (Russian) Vestnik Leningrad Univ. Sere
M3.t. Meh. Astronom. ~ No.4, 49- 68; 12" No.1, 58-75.
(Russian)
[J 0]
Ranga Rao,R. (1961). On the central limit theorem in '~.
Math. Soc.' 67 359-361.
[11]
Richter, W. (1964). l~lll~dimensionale Grenzwe~satze fUr grosse Abweichungen und ihre Anwendung auf die Verteilung von i!. Teor. Verojatnost.
i Primenen. 2. 31.J'-2.
[12]
Sanov, I. N. (1957).' On the probability of large deviations of random
variables (Russian.) Mat. Sbornik N.S. 42 (84), 11-44. English
translation: Select. Transl. Math. Statist. and Probability 1 (1961)
213-244.
.
-
[13]
Sethuraman, J. (1964). On the probability of large deviations of
families of sample means. Ann. Math. Statist. 35 1304-1316.
Bull. Amer.
r
27