Document

Convergence of Probability Measures
WILEY SERIES IN PROBABILITY AND STATISTICS
PROBABILITY AND STATISTICS SECTION
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors: Vic Barnett, Noel A. C. Cressie, Nicholas I. Fisher,
Iain M, Johnstone, J. B. Kadane, David G. Kendall, David W. Scott,
Bernard W. Silverman, Adrian F. M. Smith, Jozef L. Teugels;
Ralph A. Bradley, Emeritus, J. Stuart Hunter, Emeritus
A complete list of the titles in this series appears at the end of this volume.
Convergence of
Probability Measures
Second Edition
PATRICK BILLINGSLEY
TEe Universityof Chicago
Chicago, Illinois
A Wiley-Interscience Publication
JOHN WILEY & SONS,INC.
NewYork
Chichester
Weinheim
Brisbane
Singapore
Toronto
This text is printed on acid-free paper. 8
Copyright 0 1999 by John Wiley & Sons, Inc.
All rights reserved. Published simultaneously in Canada.
No patt of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York,
NY 10158-0012, (212) 850-601 1, fax (212) 850-6008, E-Mail: PERMREQ @, WILEY.COM.
For ordering and customer service, call 1-800-CALL-WILEY.
Library of Congress Cataloging in Publication Data:
Billingsley, Patrick.
Convergence of probability measures / Patrick Billingsley. - 2nd
ed.
p. cm. -(Wiley series in probability and statistics.
Probability and statistics)
“Wiley-Interscience publication.”
Includes bibliographical references and indexes.
ISBN 0-471-19745-9 (alk. paper)
1. Probability measures. 2. Metric spaces. 3. Convergence.
I. Title. 11. Series: Wiley series in probability and statistics.
Probability and statistics.
QA273.6.BS5 1999
5 1 9 . 2 4 ~ 12
99-30372
CIP
Printed in the United States of America
109876
PREFACE
From the preface t o the first edition. Asymptotic distribution theorems in probability and statistics have from the beginning depended
on the classical theory of weak convergence of distribution functions
in Euclidean spat-onvergence, that is, at continuity points of the
limit function. The past several decades have seen the creation and
extensive application of a more inclusive theory of weak convergence
of probability measures on metric spaces. There are many asymptotic
results that can be formulated within the classical theory but require
for their proofs this more general theory, which thus does not merely
study itself. This book is about weak-convergence methods in metric
spaces, with applications sufficient to show their power and utility.
The second edition. A person who read the first edition of this
book when it appeared thirty years ago could move directly on to
the periodical literature and to research in the subject. Although the
book no longer takes the reader to the current boundary of what is
known in this area of probability theory, I think it is still useful as a
textbook one can study before tackling the industrial-strength treatises
now available. For the second edition I have reworked most of the
sections, clarifying some and shortening others (most notably the ones
on dependent random variables) by using discoveries of the last thirty
years, and I have added some new topics. I have written with students
in mind; for example, instead of going directly to the space D [0, oo),
I have moved, in what I hope are easy stages, from C[O,I] to D[O,11
to D[O,oo). In an earlier book of mine, I said that I had tried to
follow the excellent example of Hardy and Wright, who wrote their
Introduction to the Theory of Numbers with the avowed aim, as they
say in the preface, of producing an interesting book, and I have again
taken them as my model.
Chicago,
January 1999
Patrick Billingsley
V
For mathematical information and advice, I thank Richard Arratia,
Peter Donnelly, Walter Philipp, Simon Tavar6, and Michael Wichura.
For essential help on Tex and the figures, I thank Marty Billingsley
and Mitzi Nakatsuka.
PB
CONTENTS
Introduction
1
Chapter 1. Weak convergence in Metric Spaces
7
Section 1. Measures on Metric Spaces, 7
Measures and Integrals. Tightness. Some Examples. Problems.
Section 2. Properties of Weak Convergence, 14
The Portmanteau Theorem. Other Criteria.
Theorem. Product Spaces. Problems.
,The Mapping
Section 3. Convergence in Distribution, 24
Random Elements. Convergence in Distribution. Convergence
in Probability. Local us. Integral Laws. Integration to the Limit.
Relative measure.* Three Lemmas.* Problems.
Section 4. Long Cycles and Large Divisors,* 38
Long Cycles. The Space A. The Poisson-Dirichlet Distribution. Size-Biased Sampling. Large Prime Divisors. Technical
Arguments. Problems.
Section 5 . Prohorov’s Theorem, 57
Relative Compactness. Tightness. The Proof. Problems.
Section 6. A Miscellany,* 65
The Ball a-Field. Slcorohod’s Representation Theorem. The
Prohorov Metric. A Coupling Theorem. Problems.
Chapter 2. The Space C
80
Section 7 . Weak Convergence and Tightness in C, 80
* Starred topics can be omitted on a first reading
vii
viii
CONTENTS
Tightness and Compactness in C . Random Functions. Coordinate Variables. Problems.
Section 8. Wiener Measure and Donsker’s Theorem, 86
Wiener Measure. Construction of Wiener Measure. Donsker ’s
Theorem. A n Application. The Brownian Bridge. Problems.
Section 9. Functions of Brownian Motion Paths, 94
Maximum and Minimum. The Arc Sane Law. The Brownian
Bridge. Problems.
Section 10. Maximal Inequalities,lO5
Maxima of Partial Sums. A More General Inequality. A Further Inequality. Problems.
Section 11. Trigonometric Series: 113
Lacunary Series. Incommensurabl e Arguments. Problem.
Chapter 3. The Space D
121
Section 12. The Geometry of D, 121
The Definition. The Skorohod Topology. Separability and Completeness of D . Compactness in D . A Second Characterization
of Compactness. Finite-Dimensional Sets. Random Functions
in D . The Poisson Lamat.* Problems.
Section 13. Weak Convergence and Tightness in D, 138
Finite-Dimensional Distributions. Tightness. A Criterion for
Convergence. A Criterion for Existence.“ Problem.
Section 14. Applications, 146
Donsker’s Theorem Again. A n Extension. Dominated Measures. Empirical Distribution Functions. Random Change of
Time. Renewal Theory. Problems.
Section 15. Uniform Topologies,” 156
The Uniform Metric on D [0,1]. A Theorem of Dudley’s. Empirical Processes Indexed by Convex Sets.
Section 16. The Space D [0, oo), 166
Definitions. Properties of the Metric. SeparabilitQ and Completeness. Compactness. Finite-Dimensional Sets. Weak Convergence. Tightness. Aldous ’s Tightness Criterion.
ix
CONTENTS
Chapter 4. Dependent Variables
180
Section 17. More on Prime Divisors: 180
Introduction. A General Limit Theorem. The Brownian Motion Limit. The Poisson-Process Limit.
Section 18. Martingales, 193
Triangular Arrays. Ergodic Martingale Differences.
Section 19. Ergodic Processes, 196
The Basic Theorem. Uniform Mixing. Functions of Mixing
Processes. Diophantine Approximation.
Chapter 5. Other Modes of Convergence
207
Section 20. Convergence in Probability, 207
A Convergence-in-Probability Version of Donsker ’s Theorem.
Section 21. Approximation by Independent Normal Sequences, 21 1
Section 22. Strassen’s Theorem, 220
The Theorem. Preliminaries o n Brownian Motion. Proof of
Strassen’s Theorem. Applications.
Appendix M
236
Some Notes on the Problems
264
Bibliographical Notes
267
Bibliography
270
Index
275
Metric Spaces. Analysis. Convexity. Probability.
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
INTRODUCTION
The De Moivre-Laplace limit theorem says that, if
Fn(z)= P['"
-np
m -
is the distribution function of the normalized number of successes in n
Bernoulli trials, and if
is the standard normal distibution function, then
(3)
Fn(z)
+
F(z)
for all z ( n -+ 00, the probability p of success fixed).
We say of arbitrary distribution functions F, and F on the line that
F, converges weakly to F , which we indicate by writing F, + F , if (3)
holds at every continuity point z of F . Thus the De Moivre-Laplace
theorem says that (1) converges weakly to (2); since (2) is everywhere
continuous, the proviso about continuity points is vacuous in this case.
If Fn and F are defined by
(4)
Fn(z) = I[n-l,m)
( I for the indicator function) and
(5)
F ( z ) = 1[o,CO)(z)7
then again Fn + F , and this time the proviso does come into play: (3)
fails at z = 0.
1
INTRODUCTION
2
For a better understanding of this notion of weak convergence,
which underlies a large class of limit theorems in probability, consider
the probability measures P . and P generated by arbitrary distribution
functions F, and F . These probability measures, defined on the class of
Borel subsets of the line, are uniquely determined by the requirements
P ~ ( - o o , z ]= F~(s),P(-co,z] = F ( z ) .
Since F is continuous at 2 if and only if the set {z} consisting of z
alone has P-measure 0, Fn =+ F means that the implication
holds for each 2.
Let dA denote the boundary of a subset A of the line; dA consists
of those points that are limits of sequences of points in A and are also
limits of sequences of points outside A . Since the boundary of ( o o , ~ ]
consists of the single point 2, (6) is the same thing as
(7)
&(A)
---$
P(A) if P(dA) = 0,
where we have written A for (--oo,z]. The fact of the matter is that
Fn =+ F holds if and only if the implication (7) is true for every Borel
set A-a result proved in Chapter 1.
Let us distinguish by the term P-continuity set those Borel sets A
for which P ( 8 A ) = 0, and let us say that P. converges weakly to P , and
write Pn =+ P , if Pn(A) --t P ( A ) for each P-continuity set A-that is,
if (7) holds. As just asserted, P. =+ P if and only if the corresponding
distribution functions satisy Fn =+ F .
This reformulation clarifies the reason why we allow (3) to fail if
F has a jump at 2. Without this exemption, (4)would not converge
weakly to ( 5 ) , but this example may appear artificial. If we turn our
attention to probability measures P. and P , however, we see that
Pn(A) -+ P(A) may fail if P(dA) > 0 even in the De Moivre-Laplace
theorem. The measures Pn and P generated by (1) and (2) satisfy
and
(9)
3
INTRODUCTION
for Borel sets A. Now if A consists of the countably many points
k - np
m'
n = 1 , 2 ,..., k = O , l ,
...,n,
then P,(A) = 1 for all n and P ( A ) = 0, so that P(A,) -+ P ( A ) is
impossible. Since 8A is the entire real line, this does not violate (7).
Although the concept of weak convergence of distribution functions
is tied to the real line (or to Euclidean space, at any rate), the concept
of weak convergence of probability measures can be formulated for the
general metric space, which is the real reason for preferring the latter
concept. Let S be an arbitrary metric space, let S be the class of
Borel sets (S is the a-field generated by the open sets), and consider
probability measures Pn and P defined on S . Exactly as before, we
.P by requiring the implication (7) to
define weak convergence P, j
hold for all Borel sets A. In Chapter 1 we investigate the general theory
of this concept and see what it reduces to in various special cases. We
prove there, for example, that P, converges weakly to P if and only if
holds for all bounded, continuous real-valued functions on S . (In order to conform with general mathematical usage, we take (10) as the
defintion of weak convergence, so that (7) becomes a necessary and
sufficient condition instead of a definition.)
Chapter 2 concerns weak convergence on the space C = C [0,1]
with the uniform topology; C is the space of all continuous real functions on the closed unit interval [ 0,1], metrized by taking the distance
between two functions z = z ( t )and y = y ( t ) to be
An example of the sort of application made in Chapter 2 will show why
it is both interesting and useful to develop a general theory of weak
convergence-one that goes beyond the Euclidean case. Let &, ( 2 , . . .
be a sequence of independent, identically distributed random variables
defined on some probability space (R, F ,P). If the [n have mean 0 and
variance c2,then, by the Lindeberg-L6vy central limit theorem, the
distribution of the normalized sum
1
1
INTRODUCTION
4
converges weakly, as n tends to infinity, to the normal distribution
defined by (9).
We can formulate a refinement of the
central limit theorem by proving weak
convergence of the distributions of certain
0
1 random functions constructed from the
partial sums Sn. For each integer n and each
sample point w , construct on the unit interval the polygonal function
that is linear on each of the subintervals [(i - l)/n, i/n], i = 1 , 2 , . . . ,n,
and has the value S i ( w ) / a f i at the point i / n (&(a)= 0). In other
words, construct the function X n ( w )whose value at a point t of [ 0,1]
is
&
(13)
. ,
1
X,n(U)= -Si-&)
u f i
+
t - (i- 1)/n 1
-<i(u),
afi
1/n
if t E
[b,
,I.
'-1
i
For each w , X " ( w ) is an element of the space C, a random function.
Let Pn be the distribution of X n ( w )on C, defined for Bore1 subsets A
of C-Bore1 sets relative to the metric (11)-by
Pn(A)= P[w:Xn(w)E A]
(the definition is possible because the mapping w + X n ( w ) turns out
to be measurable in the right way), In Chapter 2 we prove Donsker's
theorem, which says that
where W is Wiener measure. We also prove the existence in C of
Wiener measure, which describes the probability distribution of the
path traced out by a particle in Brownian motion.
If A = [z: z(1) 5 a ] ,then, since the value of the function X n ( w )at
t = 1 is X y ( w )= S n ( w ) / u f i ,
It turns out that W ( B A )= 0, so that (14) implies
1
P [w: -Sn(w)
u f i
I
5Q
---f
W [ zz(1)
: 5 a].
INTRODUCTION
5
It also turns out that
so that (14) does contain the Lindeberg-L6vy theorem.
If [, takes the values +1 and -1 with probability each, S, can
be interpreted as the position at time n in a symmetric random walk.
The central limit theorem says that this position, normalized by fi
(a = l ) , is, for n large, approximately distributed as the position at
time t = 1 of a particle in Brownian motion. The relation (14) says
that the entire path of the random walk during the first n steps is, for
n large, distributed approximately as the path up to time t = 1 of a
particle in Brownian motion.
To see in a concrete way that (14) contains information going beyond the central limit theorem, consider the set
Again it turns out that W ( 8 A )= 0, so that (14) implies
(15)
lim P[u:
n-m
1
max Sk(w) 5 a] = lim Pn(A)
afillk<n
n+cc
=W[,: sup
O<t<l
Z(t)
5a
1
,
We can evaluate the rightmost member of (15) by finding the limit on
the left for the special case of symmetric random walk, which is easy
to analyze. And then we have a limit theorem for the distribution of
maxk<, S k under the hypothesis of the Lindeberg-L6vy theorem.
Fir another example involving X n ( w ) ,take A to be the set of z
in C for which the set [ t : z ( t )> 01 has Lebesgue measure at most a
(where 0 5 a 5 1). As before, Pn(A) -, P ( A ) . Since the Lebesgue
measure of [t:Xr(u)> 01 is essentially the fraction of the partial sums
S1,S2,. . . ,S, that exceed 0, this argument leads to an arc sine law under the hypotheses of the Lindeberg-L6vy theorem. Chapter 2 contains
the details of these derivations.
We can in this way use the theory of weak convergence in C to
obtain a whole class of limit theorems for functions of the partial sums
S,,Sz,. . . , S,. The fact that Wiener measure W is the weak limit of
6
INTRODUCTION
the distribution over C of the random function X n ( w )can also be used
to prove theorems about W ,and W is interesting in its own right.
Chapter 3 specializes the theory of weak convergence to another
space of functions on [ 0, 11-the space D [0,1] of functions having only
discontinuities of the first kind (jump discontinuities). This is the natural space in which to analyze the Poisson process and other processes
with paths that are necessarily discontinuous. We also study spaces of
discontinuous functions on [ 0, oo) and on certain other sets that play
the role of a generalized “time”-for example, the set of convex subsets of the unit square. Chapter 4 concerns weak convergence of the
distributions of random functions derived from various sequences of dependent random variables. Chapter 5 has to do with other asymptotic
properties of random functions; there we prove Strassens’s theorem, a
far-reaching generalization of the law of the iterated logarithm.
Many of the conclusions in Chapters 2 through 5 , although not
requiring function-space concepts for their statement , could hardly
have been derived without function-space methods. Standard measuretheoretic probability and metric-space topology are used from the beginning of the book. Although the point of view throughout is that
of functional analysis (a function is a point in a space), nothing of
functional analysis is assumed (beyond an initial willingness to view a
function as a point in a space). All function-analytic results needed
are proved in the text or else in Appendix M at the end of the book.
This appendix also gathers together for easy reference some results in
metric-space topology, analysis, and probability.
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
CHAPTER 1
WEAK CONVERGENCE IN
METRIC SPACES
SECTION 1. MEASURES ON METRIC SPACES
We begin by studying probability measures on the general metric space.
Denote the space by S, and let S be the Borel a-field, the one generated
by the open sets;t its elements are the Borel sets. A probability measure on S is a nonnegative, countably additive set function P satisfying
PS = 1.
If probability measures Pn and P satisfy* Pnf -+ Pf for every
bounded, continuous real function f on S, we say that Pn converges
weakly to P and write Pn + P. In Chapter 1 we study the basic, general theory of weak convergence, together with the associated concept
of convergence in distribution. We first derive some properties of individual measures on (S,S). Although S is sometimes assumed separable
or complete, most of the theorems in this chapter hold for all metric
spaces.
Measures and Integrals
Theorem 1.1. Every probability measure P o n ( S ,S ) is regular,
that is, f o r every S-set A and every E there exist a closed set F and an
open set G such that F c A c G and P(G - F ) < E.
PROOF.
Denote the metric on S by p ( z , y ) and the distance from
z to A by p ( z , A ) [Ml].§ If A is closed, we can take F = A and
G = A6 = [ z : p ( z , A )< 61 for some 6, since the latter sets decrease to
A as 6 1 0. Hence we need only show that the class G of S-sets with
t Or, for example, by the closed sets. All unindicted sets and functions are
assumed measurable with respect to S . Another convention: all E’S and 6’s are
positive.
fdP.
Write Pf for
0 A reference [Mn] is to paragraph n of the Appendix starting on p. 236.
ss
7
8
W E A K CONVERGENCE IN
METRICSPACES
the asserted property is a a-field. Given sets A, in 6 , choose closed
sets F, and open sets G, such that F, C A, C G, and P(Gn- F,) <
~ / 2 , + l .If G = U , G,, and if F = Un<noF,, with no so chosen that
P(U, Fn- F ) < e/2, then F c U, A, ? G and P(G - F ) < E. Thus 8
is closed under the formation of countable unions; since it is obviously
0
closed under complementation, Q is a a-field.
Theorem 1.1 implies that P is completely determined by the values of PF for closed sets F . The next theorem shows that P is also
determined by the values of P f for bounded, continuous f. The proof
depends on approximating the indicator IF by such an f, and the
function f (3) = (1 - p(z, F ) / E ) +works. It is bounded, and it is continuous, even uniformly continuous, because If ).( - f (y)l I p(., y)/~.
And 2 E F implies f(.) = 1, while z 6 F Eimplies p(.,F) 2 E and
hence f (z) = 0. Therefore,
(1.1)
Pf
I&)
I f(.)
= (1 - p ( . , F ) / E ) +
I IF+).
Theorem 1.2. Probability measures P and Q o n S coincide if
= Q f for all bounded, uniformly continuous real functions f.
PROOF.
For the bounded, uniformly continuous f of (l.l),PF 5
P f = Q f I QF'. Letting E 1 0 gives PF 5 Q F , provided F is closed.
0
By symmetry and Theorem 1.1,P = Q.
Because of theorems like this, it is possible to work with measures
P A or with integrals P f , whichever is simpler or more natural. We
defined weak convergence in terms of the convergence of integrals of
functions, and in the next section we characterize it in terms of the
convergence of measures of sets.
Tightness
The following notion of tightness plays a fundamental role both in
the theory of weak convergence and in its applications. A probability
measure P on (S,S) is tight if for each E there exists a compact set K
such that P K > 1 - E. By Theorem 1.1, P is tight if and only if P A
is, for each A in S , the supremum of PK over the compact subsets K
of A.
Theorem 1.3. If S is separable and complete, then each probability
measure o n ( S ,S ) is tight.
PROOF.Since S is separable, there is, for each k , a sequence
Akl, Ak2,... of open l/k-balls covering S. Choose nk large enough that
SECTION
1. MEASURESON METRICSPACES
P(Uil,,
Aki)
bounded set
9
> 1 - ~/2'. By the completeness hypothesis, the totally
Ui<nk
Aki has compact closure K . But clearly PK >
-
1-E.
0
Some Examples
Here are four metric spaces, together with some facts about them we
need further on. Define a subclass A of S to be a separating class if
two probability measures that agree on A necessarily agree also on the
whole of S: The values of P A for A in A are enough to separate P from
all the other probability measures on S. For example, by Theorem 1.1
the closed sets form a separating class. Recall that a class is a 7r-system
if it is closed under the formation of finite intersections and that A is
a separating class if it is a 7r-system generating the a-field S [PM.42].t
EzumpZe 1 . I . Let Rk denote k-dimensional Euclidean space with
the ordinary metric Iz - yI = J[x?=,(zi - ~ i ) ~and
] , denote by Rkthe
corresponding class of Borel sets-the k-dimensional Borel sets. The
distribution function corresponding to a probability measure P on Rk
is
Since the sets on the right here form a 7r-system that generates Rk,
they constitute a separating class. Therefore, F completely determines
P.
By Theorem 1.3, each probability measure on (Ilk,,') is tight.
But tightness is in this case obvious because the space is 0-compactis a countable union of compact sets (B(0,n)- Rk,for example). 0
Exumple 1.2. Let Rw be the space of sequences z = ( q , z 2 , . . .)
of real numbers-the product of countably many copies of R1. If
b ( a , p ) = 1 A la - PI, then b is a metric on R1 equivalent to the usual
one, and under it, R1 is complete as well as separable [M4]. Metrize
Rw by p ( z , y ) = Cib(zi,yi)/2i. Obviously,* p ( z n , z ) +, 0 implies
b(sp,si) +, 0 for each i; but by the M-test [PM.543], the reverse
implication also holds. Under p, therefore, Rm has the topology of
pointwise convergence: zn +n ~t:if and only if xr +, xi for each i.
t These are page references to Probability and Measure, third edition, for any who
may find them useful.
As often in the book, the superscript is not a power: xn = (zy, z;, . . .).
*
10
WEAKCONVERGENCE
IN METRIC
SPACES
Let r k : R" + Rk be the natural projection: q ( x ) = (21,.
. . , xk).
Since convergence in R" implies coordinatewise convergence, Tk is
continuous, and therefore the sets
+
are open. Moreover, y E Nk,,(x) implies p ( x , y ) < E 2-'. Given a
positive r , choose E and k so that E 2-k < r ; then N k , € ( x )C B ( x ,r ) .
This means that sets (1.3) form a base for the topology of Rw. It
follows that the space is separable: one countable, dense subset consists
of those points having only finitely many nonzero coordinates, each of
them rational. If {xn} is fundamental, then each {x?} is fundamental
and hence converges to some xi, and of course xn converges to the
point with coordinates xi. Therefore, Rw is also complete. (These
facts are proved in a more general setting in [M6]).
Since R" is separable and complete, it follows by Theorem 1.3 that
each probability measure on 72" is tight.
+
In the case of Rk,tightness follows directly from the fact that the space is acompact, an argument which cannot work here, since Rm is not a-compact. Indeed,
if y; = zi for a 5 k and y; = n for i > k , then the sequence {y"} is contained in
(1.3) but has no convergent subsequence. This means that the closure of (1.3) is not
compact, and therefore no closed ball B ( z ,e)- is compact. This of course implies
that Rw is not locally compact. Furthermore, if K is compact, then an arbitrary
open ball B must share some point z with K" (B c K being impossible), and
so (since B n K" is open) there is an c such that B(z,e)c B n K". Therefore,
every compact set is nowhere dense. Finally, by Baire's category theorem [M7], this
implies that Rm is not a-compact.
Let 727 be the class of finite-dimensional sets, that is to say, the
sets of the form r i l H for k 2 1 and H E Rk.Since .nk is continuous,
it is measurable R"/Rk [MlO], and so R7 C 72". Moreover, since
.nilH = T ~ ; ~ (xHR1), the set of indices in the specification of an
727-set can always be enlarged, and it follows that two sets A and A'
in 727 can be represented as A = r i l H and A' = r i l H ' for the same
value of k. And now A n A' = .ni'(H n H') makes it clear that 727
is a n-system (even a field). Further, since the sets (1.3) form a basis
and each lies in 727, it follows by separability that each open set is
a countable union of sets in 727,which therefore generates the Bore1
a-field 72": 727 is a separating class.
SECTION1. MEASURESON METRICSPACES
11
If P is a probability measure on (Rw,Rw),
its finite-dimensional
distributions are the measures Pxi' on (Rk,
R'), k 2 1, and since Ry
is a separating class, these measures completely determine P.
0
Example 1.3. Let C = C[O,11 be the space of continuous functions z = z(.) on [0,1]. Define the norm of z as llzll = supt Iz(t)l,and
give C the uniform metric
The random-walk polygons of the Introduction lie in C , which will be
studied systematically in Chapter 2. Since p(z,,z) -, 0 means that
z, converges to z uniformly, it implies pointwise convergence. But of
course the converse is false: Consider the function z, that increases
linearly from 0 to 1 over [ 0, n-'], decreases linearly from 1 to 0 over
[n-l,2n-l], and stays at 0 to the right of 2n-l; that is,
(1.5)
zn(t) = ntI[,,,-l](t)
+ (2 - nt>l(,-l,z,-l](t).
This z, convereges pointwise to the O-function, while p(z,, 0) = 1.
The space C is separable. For let Dk be the set of polygonal functions that are linear over each subinterval Iki = [(i- l ) / k , i / k ] and have
Dk is countable. To show
rational values at the endpoints. Then
that it is dense, for given z and E choose k so that Iz(t) - z ( i / k ) l < E
for t E I k i , 1 5 i 5 k, which is possible by uniform continuity, and
then choose a y in DI, so that l y ( i / k ) - z(i/lc)l < E for each i. Now
y(i/k) is within 2~ of z ( t )for t E Iki, and similarly for y((i - l ) / k ) .
Since y ( t ) is a convex combination of y((i - l)/lc)and y ( i / k ) , it too is
within 2e of z ( t ) :p ( z , y ) 5 2 ~ .
And C is also complete: If z, is fundamental, which means that
E , = sup,,,
p(z,, )z
,
+, 0, then, for each t , { z n ( t )is
} fundamental
on the line and hence has a limit z ( t ) .Letting m -, 00 in the inequality
Iz,(t) -z,(t)I 5 E , gives Iz,(t) -z(t)I 5 E , ; therefore, z,(t) converges
uniformly to z ( t ) ,z is continuous, and p(z,, z) + 0.
Since C is separable and complete, it follows, again by Theorem
1.3, that each probability measure on the Bore1 a-field C is tight.
Like Rm,C is not u-compact. To see this, consider the function (1.4)again.
Uk
No subsequence of a,,can converge, because if ~ ( E z ,, z, ) * i 0, then z must be the
@function, while ~(Ez,,,,0) = 6 . Therefore, the closed ball B(0,E ) - is not compact,
and in fact no closed ball B(z,6 ) - is compact (consider the points z+Ez,,). It follows
as before that every compact set is nowhere dense and that C is not a-compact.
12
W E A K CONVERGENCE IN
METRICSPACES
For 0 5 tl < . . -< t k 5 1, define the natural projection from C t o
Rk by rt,...t,(z) = ( z ( t l ).,. . ,z ( t k ) ) . In C the finite-dimensional sets
are those of the form T & ~ . . ~ ,H
H ,E Rk,
and they lie in C because the
projections are continuous. As in the preceding example, the index set
defining a finitedimensional set can always be enlarged. For suppose
we want to enlarge t l , t 2 to t l , s , t 2 (where tl < s < t 2 ) . For the
projection $ from R3 to R2 defined by $(u, v,w) = ( u ,w), we have
rtlt2
= $rtl,t2and hence r&i2H = rGtt2$-'H, and of course $-lH E
R3 if H E R2.The proof for the general case involves more notation,
but the idea is the same. It follows as before that the class Cf of
finite-dimensional sets is a .rr-system. Furthermore, we have B ( z ,E ) - =
n,[y: Iy(r) - )I(.
5
where r ranges over the rationals in [0,1].
Therefore, the a-field a(Cf) generated by Cf contains the closed balls,
hence the open balls, and hence (separability) the open sets. Since Cf
is a r-system and a(Cf) = C, Cf is a separating class.
0
4,
The final example clarifies several technical points. Let SObe the
a-field generated by the open balls; we can call it the bull a-field. Of
course, SOc S. If S is separable, then each open set is a countable
union of open balls, and therefore SO= S, a fact we used in each of
the preceding two examples. In the nonseparable case, SOis usuallyt
smaller than S.
Ezample 1.4. Let S be an uncountable discrete space ( p ( z ,y) = 1
for z # y); S is complete but not separable. Since the open balls are the
singletons and S itself, SOconsists of the countable and the cocountable
sets. But since every set is open, S = 2', and so SOis strictly smaller
than S.
Suppose there exists on S a probability measure P that is not tight.
If SOconsists of all the z for which P { z } > 0, then SO is countable$
and hence PSo < 1 (since otherwise P would be tight). But then, if
p A = P(AnS,C)for A in S, p is a finite, nontrivial measure (countably
additive) on the class 2', and p { z } = 0 for each z. If S has the
power of the continuum, then, assuming the axiom of choice and the
continuum hypothesis, one can show [PM.46] that this is impossible.
Thus Theorem 1.3 sometimes holds in the nonseparable case.
0
The ball a-field will play a role only in Sections 6 and 15.
t For some nonseparable spaces, the two a-fields coincide-by accident, so to
speak. See Talagrand [64].
4 Since P is finite, S cannot contain an uncountable, disjoint collection of sets of
positive P-measure [PM.162].
SECTION
1. MEASURESON METRICSPACES
13
Problems
A simple assertion is understood to be prefaced by %how that." See Some Notes on
the Problems, p. '264.
1.1. The open finite-dimensional sets in Rm form a basis for the topology. Is this
true in C?
1.2. Are 'RT and Cf a-fields?
1.3. Show that, if A and B are at positive distance, then they can be separated
by a uniformly continuous f , in the sense that I A 5 f 5 I B C . If A and B
have disjoint closures but are at distance zero, this holds for a continuous f
but not for a uniformly continuous one.
1.4. If S is a Banach space, then either (i) no closed ball of positive radius is compact or else (ii) they all are. Alternative (i) holds for RM and C . Alternative
(ii) holds if and only if S has a finite basis; see Liusternik and Sobolev [44],
p. 69.
1.5. Suppose only that P is finitely additive. Show that, if each P A is the supremum of P K over the compact subsets K of A, then P is countably additive
after all.
1.6. A real function on a metric space S is by definition a Borel function if it
is measurable with respect to S. It is by definition a Baire function if it is
measurable with respect to the a-field generated by the continuous functions.
Show (what is not true for the general topological space) that the two concepts
coincide.
1.7. Inequivalent metrics can give rise to the same class of Borel sets.
1.8. Try to reverse the roles of F and G in Theorem 1.1. When is it true that,
for each E , there exist an open G and a closed F such that G C A c F and
P ( F - G) < E?
1.9. If S is separable and locally compact, then it is a-compact, which implies
of course that each probability measure on S is tight. Euclidean space is an
example.
1.10. Call a class 3of bounded, continuous functions a separating class if P f = Qf
for all f in 3implies that P = Q. The functions (1.1) form a separating class.
Show that 3 is a separating class if each bounded, continuous function is the
uniform limit of elements of 7 .
1.11. Suppose that S is separable and locally compact. Since S is then a-compact,
each compact set in S is contained in the interior of another compact set.
Suppose that Pf = Qf for all continuous f with compact support; show that
P and Q agree for compact sets, for closed sets, for Borel sets: The continuous
functions with compact support form a separating class in the sense of the
prededing problem.
1.12. Completeness can be replaced by topological completeness [M4] in Theorem
1.3, and separability can be replaced by the hypothesis that P has separable
support.
1.13. The hypothesis of completeness (or topological completeness) cannot be suppressed in Theorem 1.3. Let S be a subset of [ 0,1] with inner and outer
Lebesgue measures 0 and 1: A,(S) = 0, X*(S) = 1. Give S the usual metric,
and let P be the restriction of A' to S ([MlO]& [PM.39]). If K is compact,
then P K = 0, and so P is not tight.
14
W E A K CONVERGENCE IN
METRICSPACES
1.14. If S consists of the rationals with the relative topology of the line, then each
P on S is tight, even though S is not topologically complete (use the Baire
category theorem).
1.15. If
T
5 €/(I + ~ ) 2 then
~ , (see (1.3))~ ( z , rc) N k , c ( Z ) .
1.16. If A is nowhere dense, then A' = 0,but the converse is false. Find an A that
is everywhere dense even though A' = 0.
1.17. Every locally compact subset of C is nowhere dense.
1.18. In connection with Example 1.3,consider the space Cb(T)of bounded, continuous functions on the general metric space T ; metrize it by (1.4) (we specify
boundedness because otherwise 1(z11may not be finite). Show that Cb(T)is
complete. Show that it need not be separable, even if T is totally bounded.
SECTION 2. PROPERTIES OF WEAK CONVERGENCE
We have defined P, =$ P to mean that P,f ---t Pf for each bounded,
continuous real f on S. Note that, since the integrals Pf completely
determine P (Theorem 1.2), a single sequence {P,} cannot converge
weakly to each of two different limits. Although it is not important to
the subject at this point, it is easy to topologize the space of probability
measures on (S,S) in such a way that weak convergence is convergence
in this topology: Take as the basic neighborhoods the sets of the form
[Q: IQfi-Pfil < e , i 5 k],where the fi are bounded and continuous. If
S is separable and complete, this topology can be defined by a metric,
the Prohorov metric; see Section 6.
Weak convergence is the subject of the entire book. We start with
a pair of simple examples to illustrate the ideas lying behind the definition.
EzampZe 2.1. On an arbitrary S, write 6, for the unit mass at x,
the probability measure on S defined by & ( A ) = IA(z). If x, + xo
and f is continuous, then
and therefore ,S + Sz0. On the other hand, if z, f , xo, then there is
an e such that p(zo,zn) > e for infinitely many n. If f is the function
in (1.1) for F = {zo}, then f(zo) = 1 and f(z,) = 0 for infinitely
, + S, if and
many n, and so (2.1) fails: S, j4 S,. Therefore, S
only if z, + $0. This simplest of examples is useful when doubtful
conjectures present themselves.
0
SECTION
2. PROPERTIES
OF WEAKCONVERGENCE
15
Example 2.2. Let S be [ 0,1] with the usual metric, and for each
n, suppose that z,k, 0 5 k < r,, are r, points of [ 0,1]. Suppose that
these points are asymptotically uniformly distributed, in the sense that,
for each subinterval J ,
where I JI denotes length: As n -, 00, the proportion of the points z,k
that lie in an interval is asymptotically equal to its length. Take P, to
have a point-mass of l / r , at each xnk (if several of the z,k coincide, let
the masses add), and let P be Lebesgue measure restricted to [0,1].
If (2.2) holds, then P, + P. For suppose that f is continuous on
[ 0,1]. Then it is Riemann integrable, and for any given E , there is a
finite decomposition of [ 0 , 1 ] into subintervals Ji such that, if zli and
u, are the supremum and infimum of f over Ji, then the upper and
lower Darboux sums C vi IJi I and Ui I Ji I are within E of the Riemann
intergal Pf = f(x)dx. B y ( 2 . 2 ) ,
Ji
This, together with the symmetric bound from below, shows that
Pnf + Pf. Therefore, P, =+ P ; this fact, as Theorem 2.1 will show,
contains information going beyond ( 2 . 2 ) .
As a special case, take r, = 10, and let x,k = klO-n for 0 5 k <
r,. If J = ( a , b ] , then (2.2) holds because the set there consists of
those k satisfying Lal0"J < k 5 LblOnJ. That P, =+ P holds in this
case is an expression of the fact that one can produce approximately
uniformly distributed observations by generating a stream of random
digits (somehow) and for (a suitably chosen) large n breaking them
into groups of n with decimal points to the left.
As a second special case, take Znk to be the fractional part of k e ,
0 5 k < r, = n. If 6 is irrational, then (2.2) holds [PM.328], which
is expressed by saying that the integer multiples of 6' are uniformly
0
distributed modulo 1.
The Portmanteau Theorem
The following theorem provides useful conditions equivalent to weak
convergence; any of them could serve as the definition. A set A in S
whose boundary d A satisfies P(dA) = 0 is called a P-continuity set
16
W E A K CONVERGENCE IN
METRICSPACES
(note that d A is closed and hence lies in S). Let P,, P be probability
measures on (S,S ).
Theorem 2.1. These five conditions are equivalent
(i) Pn =+ P.
(ii) P, f + P f for all bounded, uniformly continuous f .
(iii) limsup, P,F 5 P F for all closed F .
(iv) liminf, P,G 2 PG for all open G.
(v) P,A 4 P A for all P-continuity sets A .
To see the significance of these conditions, return to Example 2.1.
Suppose that z, + T O ,so that ,S =+ S,,, and suppose further that the
zn are all distinct from zo (take zo = 0 and z, = 1/n on the line, for
example). Then the inequality in part (iii) is strict if F = {zo}, and the
inequality in (iv) is strict if G = {zo}'. If A = {TO}, then convergence
does not hold in (v); but this does not contradict the theorem, because
the limit measure of d { ~ =} (20) is 1, not 0.
And suppose in Example 2.2 that (2.2) holds, so that Pn =+ P. If
A is the set of all the z,k, for all n and k, then A is countable and
supports each P,, so that P,A = 1 ft P A = 0; but of course, d A = S
in this case. By regularity (Theorem 1.1), there is an open G such that
A c G and PG < f (say); for this G the inequality in part (iv) is strict.
To end on a positive note: If (2.2) holds for intervals J , then by part
(v) of the theorem, it also holds for a much wider class of sets-those
having boundary of Lebesgue measure 0.
PROOF
OF THEOREM
2 . 1 . Of course, the implication (i) --t (ii) is
trivial.
Proof that (ii) + (iii). The f of (1.1) is bounded and uniformly
continuous. By the two inequalities in (l.l),condition (ii) here implies
limsup, P,F 5 limsup, P, f = P f 5 PF'. If F is closed, letting E 1 0
gives the inequality in (iii).
The equivalence of (iii) and (iv) follows easily by complementation.
Proof that (iii) & (iv) + (v). If A" and A- are the interior and
closure of A , then conditions (iii) and (iv) together imply
(2.3)
PA-
2 lim sup P,A- 2 lim sup P,A
n
n
2 limninf P,A 2 limninf P,A" 2 PA".
If A is a P-continuity set, then the extreme terms here coincide with
P A , and (v) follows.
SECTION
2. PROPERTIES
OF WEAKCONVERGENCE
17
Proof that (v) + (i) . By linearity we may assume that the bounded
f satisfies 0 < f < 1. Then Pf = s,"P[f > t ] d t = P[f > t l d t ,
and similarly for Pn If f is continuous, then a[f > t] c [f = t ] ,and
hence [f > t] is a P-continuity set except for countably many t. By
condition (v) and the bounded convergence theorem,
Ji
f a
1
P n f = i Pn[f > t ] d t + l l P [ f > t ] d t = P f .
0
Other Criteria
Weak convergence is often proved by showing that PnA + P A holds
for the sets A of some advantageous subclass of S.
Theorem 2.2. Suppose (i) that d p is a 7r-system and (ii) that
each open set is a countable union of d p - s e t s . If PnA + P A f o r every
A in d p , then Pn =+ P .
PROOF.
If A l ,. . .A, lie in d p , then so do their intersections; hence,
by the inclusion-exclusion formula,
i
ij
ijk
i=l
If G is open, then G = UiAi for some sequence {Ai}of sets in d p .
Given E, choose T so that P(UiITAi) > PG - E. By the relation just
proved, P G - E 5 P(UilT Ai) = limn Pn(UisTAi) 5 liminf, PnG.
Since E was arbitrary, condition (iv) of the preceding theorem holds. 0
The next result transforms condition (ii) above in a useful way.
Theorem 2.3. Suppose (i) that d p is a 7r-system and (ii) that S
is separable and, for every x in S and positive E, there is in d p an A
f o r which x E A" c A C B ( x , E ) If
. PnA + P A f o r every A in d p ,
then Pn + P.
PROOF.The hypothesis implies that, for each point 2 of a given
open set G , 5 E A: C A, c G holds for some A, in d p . Since S
is separable, there is a countable subcollection {A&} of {AZ:a:E G}
that covers G (the Lindelof property [ M 3 ] ) ,and then G = Ui A&.
Therefore, the hypotheses of Theorem 2.2 are satisfied.
0
18
W E A K CONVERGENCE IN
METRICSPACES
Call a subclass d of S a convergence-determining class if, for every P and every sequence {P,}, convergence P,A + P A for all Pcontinuity sets in d implies Pn + P. A convergence-determiningclass
is obviously a separating class in the sense of the preceding section. To
ensure that a given d is a convergence-determining class, we need conditions implying that, whatever P may be, the class d p of P-continuity
sets in d satisfies the hypothesis of Theorem 2.3. For given A, let Ax,€
be the class of d-sets satisfying x E A" C A c B(X,E),
and let adx,€
be the class of their boundaries. If ad,,, contains uncountably many
disjoint sets, then at least one of them must have P-measure 0, which
provides a usable condition.
Theorem 2.4. Suppose (i) that A is a .rr-system and (ii) that S
is separable and, for each x and e,
either contains 0 or contains
uncountably many disjoint sets. Then d is a convergence-determining
class.
Since dB(x,r ) c [y: p(z,y) = r ] , the finite intersections of open
balls satisfy the hypotheses.
PROOF.
Fix an arbitrary P and let d p be the class of P-continuity
sets in A. Since
dp is a .rr-system. Suppose that P,A + P A for every A in d satisfying
P(BA) = 0, that is, for every A in dp. If adx,€
does not contain 0,
then it must contain uncountably many distinct, pairwise disjoint sets;
in either case, it contains a set of P-measure 0. This means that each
.Az,€contains an element of dp,which therefore satisfies the hypothesis
of Theorem 2.3. Since P,A -, P A for each A in dp, it follows that
P,
* P.
0
EzampZe 2.3. Consider Rk,as in Example 1.1, and let d be the
class of rectangles, the sets [ y : a i < yi 5 bi, i 5 k ] . Since it obviously
satisfies the hypotheses of Theorem 2.4,d is a convergence-determining
class.
The class of sets Q, = [y: yi 5 xi, i 5 k] is also a convergencedetermining class. For suppose that P,Q, -, PQx for each x such
that P(aQz) = 0. The set Ei of t satisfying P[y:yi = t ] > 0 is at
most countable, and so the set D = (Ui Ei)' is dense. Let dp be the
class of rectangles such that each coordinate of each vertex lies in D.
If A E dp, then, since
C U i [ y :yi = xi],Qx is a P-continuity
aQx
SECTION
19
2. PROPERTIES OF W E A K CONVERGENCE
set for each vertex x of A, and it follows by inclusion-exclusion that
PnA + P A . From this and the fact that D is dense, it follows that
dp satisfies the hypothesis of Theorem 2.3.
That these sets Qx form a convergence-determining class can be
restated as a familiar fact. Let F ( x ) = P(Qx)and Fn(x) = P,(Qx)
be the distribution functions for P and P ! . Since F is continuous
at x if and only if Qx is a P-continuity set, Pn + P if and only if
Fn(2) 3 F ( z ) for all continuity points x of F .
0
I27
Example 2.4. In Example 1.2 we showed that the class
of
finite-dimensional sets is a separating class. It is also a convergencedetermining class: Given x and e l choose k so that 2 T k < ~ / 2and
consider the finite-dimensional sets A, = [y: Iyi - xi1 < r ) , i 5 k ] for
0 < r ) < ~ / 2 .Then IC E A; = A, C B ( ~ , ESince
) . aA, consists of the
points y such that Jyi- xi(5 r ) for all i 5 k , with equality for some i,
these boundaries are disjoint. And since R" is separable, Theorem 2.4
applies: RT is a convergence-determining class, and Pn + P if and
0
only if PnA -+ P A for all finite-dimensional P-continuity sets A.
Example 2.5. In Example 1.3, the sequence of functions z, defined by (1.5) shows that the space C is not u-compact, but it also
shows something much more importa,nt: Although the class Cf of finitedimensional sets in C is a separating class, it is not a convergencedetermining class. For let Pn = s,, and let P = 60 be the unit mass
P , because zn ft 0.
at the O-function. Then P,
On the other hand, if 2n-' is less than the smallest nonzero ti, then
.t,...t,( Z n ) = r t *...t k ( 0 ) = ( O , . . . , O ) , andso Pnr<f,tkH= P X ~ , ~ , . for
~,H
all H . In this example, PnA -+ P A for all sets A in Cf (including those
that are not P-continuity sets, as it happens), even though P, j4 P.
In the space C, the arguments and results involving weak convergence
go far beyond the finite-dimensional theory.
0
+
Theorem 2.2 has a corollary used in Section 4. Recall that A is a semiring if it
is a r-system containing 0 and if A , B E A and A c B together imply that there
exist finitely many disjoint d-sets Ci such that B - A =
Ci.
uEl
Theorem 2.5. Suppose that (i) A is a semiring and (ii) each open set is a
countable union of d - s e t s . If P A 5 lim inf, P,A for each A in A, then P, + P .
ul='=,
PROOF.If A , , . . . , A, lie in A, then, since d is a semiring,
Ai can be
Bj of other set,s in A [PM.168],and it follows
represented as a disjoint union
ui,,
20
W E A K CONVERGENCE IN
METRICSPACES
that
The proof is completed as before.
0
A further simple condition for weak convergence:
Theorem 2.6. A necessary and suficient condition for Pn =+ P
is that each subsequence {Pni}contain a further subsequence {Pni(,,,)}
converging weakly (rn -+ m) to P.
PROOF.
The necessity is easy (but not useful). As for sufficiency,
if Pn j4 P , then Pnf f+ Pf for some bounded, continuous f . But
lPnif - Pfl > 6
then, for some positive e and some subsequence Pni,
for all i , and no further subsequence can converge weakly to P.
I3
The Mapping Theorem
Suppose that h maps S into another metric space S', with metric p' and
Bore1 a-field S'. If h is measurable S/S' [MlO], then each probability
P on (S,S) induces on (S',S') a probability Ph-' defined as usual
by Ph-l(A) = P(h-'A). We need conditions under which Pn + P
implies Pnh-' =+ Ph-'. One such condition is that h is continuous: If
f is bounded and continuous on S', then f h is bounded and continuous
on S, and by change of variable [PM.216], Pn =+ P implies
Ezample 2.6. Since the natural projections 7rk from R" to Rk are
continuous, if Pn =+ P holds on R", then Pnnkl+ P7rk1 holds on Rk
for each k . As the following argument shows, the converse implication
is a consequence of the fact that the class
of finite-dimensionalsets
in R" is a convergence-determining class (Example 2.4).
From the continuity Of 7rk it follows easily that d7ri'H C 7ri'dH for
H c Ilk. Using special properties of the projections we can prove inclusion in the other direction. If x E ni'dH, so that 7rkz E b H , then there
are points a(")in H and points ,d")in H Csuch that a(")+ 7rkX and
727
SECTION
2. PROPERTIES OF W E A K CONVERGENCE
21
~ ( " 1 --f TkZ (ti + m). Since the points (a:(), zk+l,. . .>lie in
r k l H and converge to x, and since the points (pi"', - pp', z k + l , . lie in ( r i ' H ) " and also converge to x, it follows that x E a ( r i l H ) .
Therefore, d x i l H = nk'aH.
If A = r k l H is a finite-dimensional P-continuity set, then we have
P r i l ( a H ) = P ( r L ' d H ) = P ( a x i l H ) = P(BA) = 0, and so H is a
Prk'-continuity set. This means that, if Pnril + Pn;' for all k, then
P,A
P A for each P-continuity set A in
and hence (since
is a convergence-determining class) it means that P, + P. Therefore:
P, + P if and onZy i j P,rk1 + P r i l for all k. This is essentially
just a restatement of the fact that the finite-dimensional sets form a
convergence-determiningclass. The theory of weak convergence in Rm
is applied in Section 4 to the study of some problems in number theory
and combinatorial analysis.
+ .
a)
+
727,
--f
727
Example 2.7. Because of the continuity of the natural projections
rt l.. . t k from C to Rk,
if Pn + P for probabilities on C,then P,T<~..~,
+
PrGf..,, for all k and all k-tuples t l , . . . ,t k . But the converse is false,
because, as Example 2.5 shows, C j is not a convergence-determining
P , even though
class. In fact, for P, and P as in that example, P,
= PT;~,.~,
for 272-1 less than the smallest nonzero ti we have P,x&~.,~,
(a unit mass at the origin of Rk).Again: Weak-convergence theory
in C goes beyond the finite-dimensional case in an essential way. The
space C is studied in detail in Chapter 2.
+
By (2.5), P, + P implies Pnh-' + Ph-l if h is a continuous
mapping from S to S', but the continuity assumption can be weakened.
Assume only that h is measurable S/S', and let Dh be the set of its
discontinuities; Dh lies in S [MlO]. The mapping theorem:
Theorem 2.7. If P,
+P
and PDh = 0, t h e n
+ Ph-l.
PROOF.If z E ( h - l F ) - , then z, --t z for some sequence (2,)
such that hx, E F ; but then, if z E D i , hx lies in F - . Therefore,
Di n (h-'F>- c h-'(F-). If F is a closed set in S', it therefore
follows from P D i = 1 that
W E A K CONVERGENCE IN
22
METRICSPACES
Condition (iii) of Theorem 2.1 holds.+
0
Ezample 2.8. Let F be a distribution function on the line, and
let cp be the corresponding quantile function: cp(u) = inf[s:u 5 F ( z ) ]
for 0 < u < 1 (put cp(0) = cp(1) = 0, say). If P is Lebesgue measure restricted to [ 0, l ] ,as in Example 2.2, then Pcp-' is the probability measure having distribution function F , since Pep-'( -00, z] =
P[u:cp(u) < z] = P[u:u < F ( z ) ]= F ( s ) . Since cp has at most countably many disconinuities, PD, = 0. If Pn is also defined as in Example
2.2, then Pn =$ P , and it follows that Pn9-l + Pcp-'. Consider the
case where Z n k = klO-n. If cp is calculated for each approximately uniformly distributed observation as it is generated, this gives a sequence
of observations approximately distributed according to F .
0
Example 2.9. If So E S, then [MlO]the Bore1 a-field for So in the
relative topology consists of the S-sets contained in SO.Suppose that
Pn and P are probability measures on S and that PnSo E PSo = 1.
Let Qn and Q be the restrictions of Pn and P to So. The identity map
h from So to S is continuous, and Pn = Qnh-', P = Qh-l. Therefore,
by the mapping theorem, Qn + Q implies Pn + P.
The converse holds as well. The general open set in So is G n SO,
where G is open in S. But Qn(G n So) = Pn(G n So) = PnG, and
similarly for Q. If Pn + P , then liminf, Q n ( G nSo) = liminf, PnG 2
PG = Q(G n SO).Therefore:
If PnSo E PSo = 1, then Pn + P (on S ) if and only if Qn + Q
(on So).
Example 2.10. Suppose again that So E S and PnSo G PSo = 1.
And suppose that the measurable map h: S + S' is continuous when
restricted to So, in the sense that, if points zn of So converge to a
point 5 of So, then hzn --+ hz. If Pn 3 P , then the Qn and Q of the
Q. The restriction ho of h from S to
preceding example satisfy Qn
So is a continuous map from So to S', and the mapping theorem gives
(So n h-'A' = h0'A') Pnh-' = Qnh;' + QhO' = Ph-'. Therefore:
If PnSo E PSo = 1 and h is continuous when restricted to So, then
Pn + P implies Pnh-' + Ph-'.$
0
For a different approach to the mapping theorem, see Problem 2.10.
$ See Problem 2.2.
SECTION
2. PROPERTIES
OF WEAKCONVERGENCE
23
Product Spaces
Assume that the product T = S‘ x S” is separable, which implies that
S‘ and S“ are separable and that the three Bore1 a-fields are related by
7 = S‘ x S” [MlO]. Denote the marginal distributions of a probability
measure P on 7 by P’ and P”: P’(A’) = P(A’ x S”) and P”(A”) =
P(S’ x A”). Since the projections d(d,
z”) = x’ and #(d,d’)= x“
are continuous, and since P‘ = P(d)-l and P‘‘ = P(T”>-’,
it follows
A + P’ and P[ + P”.
by the mapping theorem that Pn + P implies P
The reverse implication is false.+ But consider the .rr-system d of
measurable rectangles A‘ x A“ (A’ E S‘ and A“ E S”). If we take the
distance between (d,
d’)and (y’, y”) to be p(z’, y’) V p ( d ’ , y”), then
the open balls in T have the form [MlO]
These balls lie in A,and since the a&( (d,
d’),r ) are disjoint for different values of r , d satisfies the hypothesis of Theorem 2.4 and is
therefore a convergence-determining class.
There is a related result that is more useful. Let dp be the class
of A‘ x A“ in d such that P’(dA’) = P”(8A”) = 0. Applying (2.4) in
S‘ and in S” shows that dp is a .rr-system. And since
a(A’ x AN) c ( (aA’) x S”) u (S’ x
(aA”)),
each set in dp is a P-continuity set. Since the Bp/(z’,r)in (2.7) have
disjoint boundaries for different values of r , and since the same is true
of the Bp/’(z”,
r ) , there are arbitrarily small r for which (2.7) lies in
Ap. It follows that Theorem 2.3 applies to Ap: Pn + P if and only if
P,A + P A for all A in dp. Therefore, we have the following theorem,
in which (ii) is an obvious consequence of (i).
Theorem 2.8. (i) If T = S‘ x S“ is separable, then Pn + P if
and only if Pn(A‘ x AN) + P(A‘ x A“) for each PI-continuity set A‘
and each PI’-continuity set A”.
(ii) If T is separable, then P
A x P[ + P’x P“ if and only if P
A + P’
and P[+ PI’.
Problems
2.1. According to Examples 2.4 and 2.5, 7ZT is a convergence-determining class,
while Cf is not. What essential difference between Rw and C makes this
possible? (See Problem 1.1.)
t See Problem 2.7.
24
W E A K CONVERGENCE IN
METRICSPACES
2.2. Show that the assertion in Example 2.10 is false without the assumption that
P,So G 1, which points up the distinction between continuity of h at each
point of SOand continuity of h when it is restricted to SO.
2.3. If S is countable and discrete, then P, + P if and only if Pn{s}+ P{x} for
each singleton. Show that in this case supAEsIP,A - PA1 + 0.
2.4. In connection with Example 2.3, show for k = 1 that F has a t most countably
many discontinuities. Show for k = 2 that, if F has at least one discontinuity,
then it has uncountably many. Show that, if Fn(x) + F(x) fails for one x,
then it fails for uncountably many.
2.5. The class of P-continuity sets ( P fixed) is a field but may not be a a-field.
2.6. I f f is bounded and upper semicontinuous [M8], then P, + P implies that
limsup, Pnf 5 P f . Show that this contains part (iii) of Theorem 2.1 as a
special case. Generalize part (iv) in the same way.
2.7. The uniform distribution on the unit square and the uniform distribution on
its diagonal have identical marginal distributions. Relate this to Theorem 2.8.
2.8. Show that, if 6,, + P , then P = 6, for some x.
2.9. If f is S-measurable and Plfl < 00, then for each E there is a bounded,
uniformly continuous g such that Plf - g( < E .
2.10. (a) Without using the mapping theorem, show that P, + P f if P, f + P for
all bounded, continuous f such that P D f = 0. (See the proof of (v)+(i)
in Theorem 2.1.)
(b) Now give a second proof of the mapping theorem. (Use (2.5) and the fact
that Dfh C Dh.)
SECTION 3. CONVERGENCE IN DISTRIBUTION
The theory of weak convergence can be paraphrased as the theory
of convergence in distribution. When stated in terms of this second
theory, which involves no new ideas, many results assume a compact
and perspicuous form.
Random Elements
Let X be a mapping from a probability space (!2,3,
P) to a metric
space S. We call X a random element if it is measurable 3 / S ; we say
that it is defined o n its domain R and in its range S , and we call it
a random element of S. We call X a random variable if S is R1, a
random vector if S is Rk,a random sequence if S is R", and a random
function if S is C or some other function space.t
The distribution of X is the probability measure P = PX-l on
( S ,S)defined by
(34
t
P A = P(X-lA) = P[w:X(W)E A] = P[X E A].
Often in the literature an arbitrary random element is called a random variable.
SECTION 3. CONVERGENCE IN
DISTRIBUTION
25
This is also called the law of X and denoted C(X). In the case S = R k ,
there is also the associated distribution function of X = (XI,. . . ,Xk),
defined by
Note that P is a probability measure on an arbitrary measurable
space, whereas P is always defined on the Bore1 a-field of a metric
space. The distribution P contains the essential information about the
random element X. For example, if f is a real measurable function on
S (measurable S I R ' ) , then by change of variable,
in the sense that both integrals exist or neither does, and they have
the same value if they do exist.
Each probability measure on each metric space is the distribution
of some random element on some probability space. In fact, given P
on (S,S),we can simply take ( R , F , P) = ( S , S , P )and take X to be
the identity, so that X(w) = w for w E R = S:
(3.4)
( R , F , P) = (S,S,P ) , X(w) = w for w E R = S.
Then X is a random element on R with vdues in S (measurable F/S),
and it has P as its distribution.
Convergence in Distribution
We say a sequence {X,} of random elements converges in distribution
to the random element X if P, + P , where P, and P are the distributions of X, and X. In this case we write X, + X. (This double
use of the double arrow causes no confusion.) Thus X, + X means
that L(X,) + L(X). Although this definition of course makes no
sense unless the image space S (the range) and the topology on it are
the same for all the X, XI, X2,. - ., the underlying probability spaces
(the domains)-(R,F, P) and (a,,&, P,), say-may all be distinct.
These spaces ordinarily remain offstage; we make no mention of them
because their structures enter into the argument only by way of the
distributions on S they induce. For example, if we write E, for integrals with respect to P,, then, by (3.3), Pnf + Pf if and only if
E,[f(X,)] -+ E[f(X)]. But we simply write P in place of P, and E
in place of E,: P and E will refer to whatever probability space the
26
WEAKCONVERGENCE
IN METRIC SPACES
random element in question is defined on. Thus X, + X if and only
if E[f(Xn)] --t E[f(X)] for all bounded, continuous f on S.
Theorem 2.1 asserts the equivalence of the following five statements. Call a set A in S an X-continuity set if P[X E dA] = 0.
(i) Xn + X.
(ii) E[f(Xn)] + E[f(X)]for all bounded, uniformly continuous f .
(iii) limsup, P[X, E F] 5 P[X E F] for all closed F .
(iv) liminf, P[Xn E GI 2 P[X E GI for all open G.
(v) P[Xn E A] + P[X E A] for all X-continuity sets A.
This requires no proof; it is just a matter of translating the terms.
Each theorem about weak convergence can be recast in the same way.
Suppose h : S + S' is measurable S/S' and let Dh be the set of its
discontinuity points. If X has distribution P, then h(X) has distribution Ph-l. Therefore, the mapping theorem becomes: Xn + X (on
S) implies h(Xn) + h ( X ) (on S') if P[X E Dh] = 0.
The following hybrid terminology is convenient. If Xn and X are
random elements of S , and if P n and P are their distributions, then
Xn + X means P n + P. But we can just as well write Xn + P or
Pn + X. Thus there are four contexts for the double arrow:
The last three relations are defined by the first. If X, are random
variables having asymptotically the standard normal distribution, this
fact is expressed as Xn + N , and one can interpret N as the standard
normal distribution on the line or (better) as any random variable
having this distribution. In all that follows, N will be such a random
variable-normally distributed with mean 0 and variance 1.
ExumpZe 3.1. If So E S, then the Bore1 a-field of SOfor the relaA E S] [MlO] and SOc S. If x:a + SO,
tive topology is SO= [ A n SO:
then X is a random element of SO(measurable F-/So)if and only if it
the common requirement
is a random element of S (measurable F/S),
being that the set [w: X(w) E A n SO]= [ w : X ( w )E A] lie in F for
every A in S.
Suppose now that Xn and X are random elements of SO. The
general open set in So is G n SOwith G open in S. Therefore, there
is convergence in distribution in the sense of SOif and only if there is
SECTION
3. CONVERGENCE IN DISTRIBUTION
27
convergence in distribution in the sense of S , the common condition
being that lim inf, P[X, E G f l So] 1 P[X E G n SO]for every open set
G in S. This is essentially the convergence-in-distributionversion of
0
Example 2.9.
Convergence in Probability
If, for an element a of S,
for each E, we say X, converges in probability to a. Conceive of a as
a constant-valued random element on an arbitrary space, and suppose
that (3.6) holds. If G is open and a E G, then, for small enough E,
liminf, P[X, E G] _> limn P[p(X,,a) < €1 = 1 = P[a E GI, whereas if
a 4 G, then liminf,P[X, E GI 2 0 = P[a E GI. Thus (3.6) implies
X, + a , in the sense of the second line of (3.5). On the other hand,
if X, =+ a in this sense, then (3.6) follows because [z:p(x,a) < €1 is
open. For this reason, we express convergence in probability as
This can also be interpreted by the third line of (3.5): Identify a with
6,. Again, the random elements X, in (3.7) can be defined on different
probability spaces. By the mapping theorem, X, + a implies h(X,) +
h(a) if h is continuous at a.+
Suppose that (X,, Y,) is a random element of S x S ; this implies
(since the projections (z,y ) -+ z and (z,y) + y are continuous) that
X, and Y, are random elements of S.3 If p is the metric on S , then
p ( x , y ) maps S x S continuously to the line, and so it makes sense
to speak of the distance p(X,, Yn)-the random variable with value
p(X,(w), Y,(w)) at w . The next two theorems generalize much-used
results about random variables.
Theorem 3.1. Suppose that (Xn,Yn) are random elements of
S x S . If X, + X and p(X,,Y,> + 0 , then Y, + X .
~~
For X and Xn all defined on the same R, P[p(X,,X) < E ] +n 1 defines
convergence of X n to X in probability, but this concept is not used in the book
until Chapter 5 (except for Problem 3.8). Before that, X will always be a constant
and the Xn may be defined on different spaces.
3 The reverse implication holds if S is separable, but not in complete generality
(MIOJ.
t
28
W E A K CONVERGENCE IN
METRICSPACES
PROOF.Apply the convergence-in-distribution version of Theorem 2.1 twice. If F, is defined as [ z : p ( z , F )I €1, then
Since F, is closed, the hypotheses imply
limsupP[Y, E F ] I limsupP[X, E F,] 5 P[X E F,].
n
n
If F is closed, then F,
Take Xn
4F
as E 1 0 .
0
X:
Corollary. Suppose that (X,Yn) are random elements of S x S.
If p(X, Yn) + 0, then Yn =+X.
Theorem 3.2. Suppose that (Xun,Xn) are random elements of
+n ZU jux and
s x S. ~ f Xu,
t
for each
E,
then Xn
X.
*n
PROOF.
For F, defined as before, we have
Since Xu,
+n
2, and F, is closed,
And since 2, +u X, (3.8) gives
limsupP[Xn E F ] 5 P[X E F,].
n
Let r 10,as before.
0
Other standard results can be extended beyond the Euclidean
case in a routine way:
t Xu,+n Z,,indicates weak convergence as n -+
00
with u fixed.
SECTION
3. CONVERGENCE IN DISTRIBUTION
29
Example 3.2. Assume of the random elements (X,, Y,) of S x S
that X , and Y, are independent in the sense that the events [ X , E A]
and [Y,E B] are independent for A , B E S; assume that X and Y are
independent as well. If S is separable, then (Theorem 2.8) X , + X
and Yn+ Y imply (X,, Y,) =+ ( X ,Y).
If S = R1, we can go further and conclude from the mapping
theorem that, for example, X,Y, + X Y . If X , G a, and X = a ,
then X , =+ X is the same thing as a,
a (Example 2.1), and the
independence condition is necessarily satisfied: Q, -+ a and Y, + Y
imply a,Y, + aY.
0
--f
Local vs. Integral Laws
Suppose P, and P have densities
p on ( S ,S). If
(3.9)
fn(4
f, and f with respect t o a measure
+
f(4
outside a set of p-measure 0, then, by Scheffe's theorem [PM.215],
(3.10)
Thus (3.9), a local limit theorem, implies P,
theorem. But the reverse implication is false:
+ P , an
integral limit
Example 3.3. Let P = p be Lebesgue measure on S = [ 0,1].
Take f, to be n2 times the indicator of the set UE=;(kn-', J~n-'+n-~).
Then p[f, > 01 = n - 2 , and by the Borel-Cantelli lemma, f,(s) -+ 0
outside a set B of p-measure 0; redefine f, as 0 on B , so that f,(z) -+ 0
everywhere. If P, has density f, with respect to p , then IP,[ 0, 2]--zI I
l/n, and it follows that P, + P , a limit theorem for integral laws.
With respect to p , P has density f ( z ) = 1, and so (3.9) does not hold
for any z at all: There is no local limit theorem.
0
Local laws imply integral laws also in the case where S = Rk7P
has a density with respect to Lebesgue measure, and P, is supported
by a lattice. Let 6(n) = ( & ( n ) ,... ,&(TI)) be a point of Rk with
positive coordinates, let a(.) = ( a l ( n ) ., . . ,a k ( n ) ) be an arbitrary
point of R k , and denote by L, the lattice consisting of the points of
the form (ul6l(n)-al(n),. . . ,ulc&(n)-ak(n)),where u1,. . . ,ukrange
independently over the integers, positive and negative. If 2 is a point
of L,, then
(3.11)
[ y : q - & ( n )< y2
5 22, 2 5 I c ] ,
J:
E L,
30
W E A K CONVERGENCE IN
METRICSPACES
is a cell of volume V n = & ( n ) &(n), and Rk is the union of this
countable collection of cells.
Suppose now that Pn and P are probability measures on Rk,
where Pn is supported by Ln and P has density p with respect to
Lebesgue measure. For x E L,, let pn(x) be the Pn-mms (possibly 0)
at that point.
Theorem 3.3. Suppose that
and that, if X n is a point of Ln varying with n in such a way that
xn + X , then
Then Pn
+ P.
PROOF.
Define a probability density qn on Rk by setting q n ( y ) =
pn(x)/vnif y lies in the cell (3.11). Since xn + x implies (3.13), it
follows by (3.12) that q,(x) + p ( x ) . Let Xn have the density qn, and
define Ynon the same probability space by setting Yn= x if X n lies in
the cell (3.11). We are to prove that Yn+ P. Since IXn-Ynl 5 I&j(n)l,
this will follow by (3.12) and Theorem 3.1 if we prove that Xn =$ P.
But since q, converges pointwise to p , this is a consequence of Scheffk’s
theorem.
0
Ezample 8.4. If S, is the number of successes in n Bernoulli
then t
trials and vn = l/m,
provided k varies with n in such a way that (k - n p ) / m +n x .
Therefore, Theorem 3.3 applies to the lattice of points of the form
(k - n p ) / m : Pn is the distribution of (S, - n p ) / mand P is the
standard normal distribution. This gives the central limit theorem for
Bernoulli trials: (Sn - n p ) / m N .
0
*
Integration to the Limit
If X ,
+ X for random variables, when does EX, + EX hold?
t Feller (281, p. 184.
SECTION
3. CONVERGENCE IN DISTRIBUTION
31
Theorem 3.4. If X n + X , then ElXl 5 liminf, ElXnl.
PROOF.By the mapping theorem, IXnl =$ 1x1,and therefore
PIIXnl > t] --t P[lXl > t] for all but countably many t. By Fatou's
lemma on the line,
P[lXl > t]d t 5 liminf
n
P[IXnl > t]d t = liminf EIXnl.
n
The X n are by definition uniformly integrable if
0
This obviously holds if the X n are uniformly bounded. If a is large
enough that the supremum in (3.15) is at most 1, then supn ElXnl 5
l+a<0O.
Theorem 3.5. If X n are uniformly integrable and X n + X , then
X is integrable and EX, + E X .
PROOF.Since the ElXnl are bounded, Theorem 3.4 implies that
X is integrable. And since X: + X + and X i + X - by the mapping
theorem, the variables can be assumed nonnegative, in which case
and
(3.17)
EX =
a
P [ t < X < a]dt
+
By uniform integrability, for given E there is an a such that the second
term on the right in each equation is less than E , and it is therefore
enough to show that the first term on the right in (3.16) converges to
the corresponding term in (3.17). But a can be chosen in such a way
that P[X = a] = 0, and then this follows by the bounded convergence
theorem applied in the interval [ 0, a].
0
A simple condition for uniform integrability is that
(3.18)
SUP
n
EIIXnll+'] < 00
32
WEAKCONVERGENCE
IN METRIC
SPACES
for some E ; in this case, (3.15) follows from
Theorem 3.6. If X and the Xn are nonnegative and integrable,
and ijX, =+ X and EXn -+ EX, then the Xn are uniformly integrable.
PROOF.
From the hypothesis and (3.16) and (3.17) it follows that
if P(X = a]= 0. Choose Q so that the limit here is less than a given E.
Then, for n beyond some no,
,a Xn dP < 6. Increase Q enough to
take care of each of the integrab1e";andom variables XI,Xz, . . . ,X,,.O
sx
Relative Measure*
Let PT be the probability measure on (R1,R1) corresponding to
a uniform distribution over [-T, TI:
(3.19)
1
PTA = -1A
2T
n [-T,T](, A E R1,
where the bars refer to Lebesgue measure. Now define P,A
(3.20)
P,A=
as
lim PTA,
T-CO
provided the limit exists; this is called the relative measure of A. Note
that, since P,A = 0 if A is bounded, P, is not countably additive on
its domain of definition. For Bore1 functions f , define
provided the integrals and the limit exist; Emf is the mean value o f f .
To write P,A or Emf is to assert or assume that the corresponding
limit exists. If f is bounded and has period To, then it has a mean
~ . if a set A has period TO,in the
value,t and in fact, Emf = E T ~ And
sense that its indicator function does, then P,A = PT,A.
t Almost periodic functions also have means; see Bohr
[lo].
SECTION
3. CONVERGENCE IN DISTRIBUTION
33
Suppose that X : R1 + S is measurable R1/S. Then X , regarded
as a random element defined on (R1,
R1,
PT), has PTX-' as its distribution over S. Ift
PTX-' J T P
(3.22)
for some probability measure on S, then P is called the distribution of
X . By definition, this means that E,[f(X)] = Pf for every bounded,
continuous f on S, and by Theorem 2.1, it holds if and only if, for every
P-continuity set A, P,[X E A] = P A . By the mapping theorem, if
X has distribution P,and if h is a measurable map from S to S' for
which PDh = 0, then h(X) has distribution Ph-l. We can derive the
distributions of some interesting random elements by using standard
methods of probability theory. For example, by the continuity theorem
for characteristic functions, if f ( w ) is real, then it has distribution P if
and only if E,[exp(itf(w))] coincides with the characteristic function
of P.
If X is a nonzero real number, then cos Xu has period TA= 27r/lXI,
and its distribution is described by
(3.23)
P,
[COS XW
5 X] = P T [COS
~ XW 5 X]
1
= 1 - -arccosz,
7T
-1 5 z 5 +1,
where for the arc cosine we use the continuous version on [-1, +1] that
takes the values 7r and 0 at -1 and +l. By using the simple relation
(3.24)
E,[eisw] =
{0
1 ifs=O,
if s # 0,
we can calculate some moments and characteristic functions. Since E,
is obviously linear, it follows by (3.24) and the assumption X # 0 that
We have
ET[&'~~~'"] =
(it)'
C -ET[COS'
r!
O0
r =O
XU],
t In the theory of weak convergence, T can obviously go to infinity continuously
just as well as through the integers.
34
W E A K CONVERGENCE IN
METRICSPACES
where the expected value can be taken inside the sum because the
series on the right converges absolutely. And by the M-test, we can
let T go to infinity on each side:
(3.26)
cp(t):= Ew[eitCosXW]
=
O0
(it)'
Xu].
-E,[cosr
r!
r=o
Because of (3.23), this characteristic function is the same for all A.
A twedimensional version of the same argument shows that
(3.27) E, [ exp(it1 cos Xlu
+ it2 cos X ~ W ) ]
=ccFT
w o o
(itl)" (it2)TZ
r1=0 rz=o
Ew [COS'~ X ~ WCOS~' X ~ W ] .
*
Suppose now that A 1 and X2 are incommensurable (neither is 0 and
is irrational). We can show in this case that
X1/X2
(3.28)
Eoo[cosrl XIW
*
COS~' X
~W=
] EW[cosr1X ~ W ] EW[cosrzX ~ W ] .
The argument is like that for (3.25):
, [ COS" X ~ W
*
(3.29) E
XU]
By (3.24) and the assumption that XI and A2 are incommensurable,
this last expected value is 1 or 0 according as 2jl - r1 = 0 = 2j2 - 7-2
or not. But the product
is also 1or 0 according as 2j1-7-1 = 0 = 2 j 2 - r ~ or not, and substituting
this product into (3.29) gives (3.28)-see (3.25). Finally, (3.26), (3.27),
and (3.28) together imply
(3.30)
E,[exp(itl cos Xlu
+ it2 cos X ~ U ) ]= cp(tl)cp(tz).
SECTION
3. CONVERGENCE IN DISTRIBUTION
35
Let 771, q 2 , . . . be independent random variables, each having the
characteristic function 'p and the distribution function in (3.23). By
the continuity theorem for characteristic functions in two dimensions,
the random element (cos Xlw, cos X2w) of R2 has the same distribution
(in the sense of (3.22)) as the random vector (qq,r/2), provided A 1
and A2 are incommensurable. Now suppose that XI, A 2 . . . are linearly
independent in the sense that mlXl
. mkXk = 0 for integers mi
can hold only if mi = 0. Then the preceding argument extends from
R2 to Rn. Since the rlk have mean 0 and variance f by (3.25), it is
probabilistically natural to replace them by = &T)k. By (3.23),
+ +
. +
1
X
P[& I 4 = 1 - - arccos -
(3.31)
7T
Jz' -hIx 5 +&.
Since the distribution of (<I , . . . ,<n) is absolutely continuous with respect to Lebesgue measure in Rn,we arrive at the following theorem.
Theorem 3.7. Suppose that XI, X2,. . . are linearly independent
and that <I, <2, . . . are independent, each distributed according to (3.31).
Then
cos X ~ W., . . ,
cos Anw) has the distribution of (ti,. . . ,En)
for each n:
(Jz
(3.32)
Jz
P m [ ( h c o s X ~ w. ., . , ~ C O S X ~ W
E A]
) = P[(&, . . . ,&) E A]
if a A has Lebesgue measure 0 .
Since the [k have mean 0 and variance 1, the Lindeberg-L6vy
theorem implies a result of Kac and Steinhaus:
This approximates the relative time a superposition of vibrations with
incommensurable frequencies spends between x and y . For a refinement, see Theorem 11.2.
Return to the random variables Vk, and let $(x) = (2n)-l arccos 5,
with the arc cosine defined as in (3.23). Then the distribution function
on the right in (3.23) is F ( x ) = 1 - 21)(x) for -1 5 x 5 +1, and
since F(7k) is uniformly distributed over [ 0,1] (the probability transformation), & = $(qk) is uniformly distributed over [O, f]. And for
every x, $ ( C O S ~ ~ T Z )is (x),the distance from 2 to the nearest integer.
From the h e a r independence of {Xk} follows that of {27TXk}, and so
36
W E A K CONVERGENCE IN
METRICSPACES
the cos2nXkw are distributed like the Vk. By the mapping theorem,
we can apply II,to each cos2nXk if we relace each q h by ,Oh. Therefore:
Suppose that XI, X2, . . . are linearly independent and that PI, ,&,. . . are
independent random variables, each uniformly distributed over [ 0,
Then, for each n,
g].
(3.34)
Pm[((Xlw),*
7
(Xnw))
E A] = P[(Pl,. - .7 P n ) E A]
if dA has Lebesgue measure 0, a connection with Diophantine approximation.
Suppose that 0 < a < $ and take A to be the set of ( ~ 1 ,. .. , Z n )
such that 0 5 x i 5 a for exactly k values of i. If Un(w,a) is the
n , for which ( X i w ) 5 a , then the event on the left
number of i, 1 5 i I
= k] (for the corresponding A, dA has Lebesgue
in (3.34) is [Un(w,a)
measure 0), and the right side is a binomial probability:
Pm[Un(w,a) = k] =
(3.35)
(2
a"1-
a y .
We can apply the binomial central limit theorem her-ither
or the integral one. Or take a Poisson limit:
the local
(3.36)
Three Lemmas*
The rest of the section is needed only for the proofs of Theorems 14.4
and 14.5 and in Section 17.
Theorem 3.8. If (Xn,X) is a random element of S x S and
p(Xn,X) 3 0, and i j A is an X-continuity set, then it follows that
P([Xn E A]A[X E A]) -+ 0.
PROOF.
For each positive E,
P[Xn E A, X
4A]) 5 P[p(Xn,X) 2 €1 + P[p(X,A) <
€7
X Sr A].
This, the same inequality with A' in place of A, and the assumption
p(Xn,X) 0, together imply
*
limsup P([Xn E A]A[X E A])
n
2 P[p(X,A) < c, X @ A]]
+ P[p(X,AC)<
E,
X E A].
If A is an X-continuity set, then the right side goes to 0 as E --t 0. 0
SECTION
3. CONVERGENCE IN DISTRIBUTION
37
Let (X’,X’’) and (XA,Xi) be random elements of T = 5’‘ x 5’”.
If S is separable, then, by Theorem 2.8(i),
<x;,x;>* (X’,X”)
(3.37)
holds if and only if
(3.38)
P[XA E A’, X: E A”] -, P[X’ E A , X” E A”]
holds for all XI-continuity sets A‘ and all XI’-continuity sets A“.
Theorem 3.9. Suppose that T i s separable. If XA
X i + a“, then (X;, Xi) (XI,a”).
*
+ X‘ and
PROOF.We must verify (3.38) for X“ = a“. Suppose that A‘ is an
XI-continuity set and that a“ # aA“. If a” E A”, then P[Xi # A”] -, 0,
and (3.38) follows from XA + X’ and
P[XA E A’] - P[X:
If a”
# A”] 5 P[XA E A’, X:
# A“, then (3.38) follows from
E A”] 5 P[X; E A’].
P[XA E A’, X: E A”] 5 P[X; E A”] + 0 .
0
Suppose that YNand all the (XA,X i ) are all defined on the same
(Q3,
P). Suppose that 30is a field in 3 and let 3
1 = o(30).
Theorem 3.10. Suppose that T is separable, X’ and X” are
independent, and X” has the same distribution as Y”. If p ( X i , Y“)=+
0, if
P([XA E A’] n E )
(3.39)
-+
P[X’ E A’]P(E)
f o r each XI-continuity set A’ and each E in 30,
and i f each XA is
measurable &/S’,then (XA,X i ) + (XI,X”).
PROOF.Fix an XI-continuity set A’ and an XI’-continuity set
A“. It follows from the hypothesis and Rhyi’s theorem on mixing
sequences [M21] that (3.39) holds for every E in 3,and therefore,
(3.40)
P[XA E A’, Y“ E A”] -+ P[X’ E A’]P[Y” E A”].
We are to prove (3.38), which, since X‘ and X“ are independent and
the latter has the same distribution as Yn, reduces to
P[XA E A’, X: E A”]
Since p(Xi,Y”)
thing as (3.40).
-+
P[X’ E A’]P[Y”E A N ] .
+ 0, it follows by Theorem 3.8 that this is the same
0
38
WEAKCONVERGENCE
IN METRICSPACES
Problems
3.1. Let X I ,X p , , . , be independent random elements having a common distribution
P on S. Let P,,, be the empirical measure corresponding to the observation
(Xl(u),...,Xn(u)):
n
.”
k=l
Show that, if S is separable, then P,,, +, P with probability 1. Use the
strong law of large numbers and Theorem 2.3.
3.2. If X’ and X” are random elements of S’ and S”, then ( X ’ ,X ” ) : R -+ T =
Stx S” is measurable TIS‘ x S”.And ( X I ,XI’) is a random element of T if T
is separable but may not be a random element in the nonseparable case [MlO].
3.3. Let X n and X be random elements of a separable S, defined on ( R , T , P). Then
[ u : X , ( u ) + X ( u ) ] lies in 3; if it has probability 1, then p ( X n , X )+ 0, and
hence X , =+ X .
+ 6, if and only if Pn(B(4,6 ) ) + 1 for positive E . Prove
directly from the definition (3.6) that X , + 4 implies h ( X n ) + h ( X ) if h is
continuous at 4.
3.4. Prove directly that Pn
3.5. Apply the method of Example 3.4 to the hypergeometric distribution (see Prob-
lem 10 on p. 194 of Feller [28]).
3.6. Suppose there is a nonnegative random variable Y such that P[(X,( 2 t] 5
P[Y 2 t] for all n and all positve t , and EY < 00 (Y and the X , can be defined
on different spaces). Show that X n + X implies EX, -+ E X . This corresponds
to the dominated convergence theorem.
3.7. Prove (in the notation of (3.34)) that
3.8. For random variables qn and q on the same probability space, write 9, + p 9
if qn - q + 0 in the sense of (3.6). Let Sn be the partial sums of independent
and identically distributed random variables with mean 0 and variance 1. By
the Lindeberg-LCvy theorem, S n l f i + N . Show that S n / 6 -+p 77 cannot
hold for any random variable 9. But see Chapter 5.
SECTION 4. LONG CYCLES AND LARGE DIVISORS*
Here we study the cycle structure of random permutations on n letters,
obtaining in particular the joint limiting distribution ( n --t m) of the
greatest cycle lengths. And we derive analogous results on the limiting
distribution of the largest prime divisors of an integer drawn at random
from among the first n integers. These distributions are described by a
certain measure-the Poisson-Dirichlet measure-on a certain space of
sequences. We study weak convergence in this space, obtaining results
SECTION
4. LONGCYCLES
AND
LARGEDIVISORS
39
that have applications to population biology as well as to long cycles
and large divisors. The essential tool is the mapping theorem.
Long Cycles
Every permutation can be written as a product of cycles. For example,
the permutation
(14 2 7)(3)(56)
(44
on the seven “letters” 1,.. . , 7 sends 1 -, 4 + 2 -+ 7 + 1, 3 + 3,
and 5 -, 6 -, 5. To standarize the representation, start the first cycle
with 1 and start each successive cycle with the smallest integer not yet
encountered.
Of the n! permutations on 1,.. . , n, for how many does the leftmost
cycle have length i (1 5 i 5 n)? The answer is
(4.2)
( n - 1)(n - 2 ) .
-
*
(n - i
+ 1) x (n - i)! = (n - l)!
To see this, note that what we want is the number of products ap,
where Q is an i-long cyclic permutation of 1 together with i - 1 other
letters and p is a permutation of the remaining n - i letters. The
number of ways of choosing a is the product to the left of the x in
(4.2); and, Q having been chosen, the number of ways of then choosing
3!, is (n - i)!
Suppose now that a permutation is chosen at random, all n! possibilities having the same probability, and suppose that it is written as
a product of cycles in standard order, on the pattern of (4.1). That
is to say, the first cycle starts with 1, the second cycle starts with
the smallest element not contained in the first cycle, and so on. Let
Cr, C;, . . . be the lengths of the successive cycles; if the number of
cycles is u, set C t = 0 for > u (C,C t = n). It follows by (4.2) that
P[Cr = i ] = 1/n for 1 5 i 5 n. Suppose that C? = i and that in fact
the initial cycle consists of i specified letters (including 1, of course).
Then there remain n - i letters that make up the rest of the permutation, and conditionally, the chance that Cg = j is just the probability
that the first cycle in a random permutation on the remaining n - i
letters has length j , and this is l / ( n - i) for 1 5 j 5 n - i. Therefore,
1
(4.3) P[Cr = i ] = -, 1 5 i 5 n,
n
1
1
p [ q = i, c; = j ] = - . - 2 5 i j 5 n,
n n-i’
+
40
W E A K CONVERGENCE IN
METRICSPACES
and so on.
Let L: = Cc/n be the relative cycle lengths, and let B1, B2, . - - be
independent random variables, each uniformly distributed over [ 0,1].
(The uniform distribution has the beta-(1,l) density [ M l l ] ;hence the
notation Bi.)Since Ly is uniformly distributed over its possible values
i / n , it should for large n be distributed approximately as B1. That
this is true can be shown by the argument in Example 2.2: If f is
continuous over [0,1],then E [ f ( L y)]
= n-l
f ( i / n )is a Riemann
Ji
sum converging to
f(z)dz = E[f(B1)].Therefore, L? + B1.
If Cy = i, then Cg is, conditionally, uniformly distributed over
the values j for j 5 n - i = n - Cy. Therefore, conditionally on
Ly = i / n , L; is uniformly distributed over the values j / n for j / n 5
1 - i / n = 1 - Ly. This makes it plausible that (Ly,L;/(1 - Ly) )
will be, in the limit, uniformly distributed over the unit square-like
(B1,Bz). This follows by a two-dimensional Riemann-sum argument:
If f is continuous over the square [ 0, 112, then
Therefore, (Ly,L t / ( l - L y ) ) + (B1,Bz),and it follows by the mapping
theorem that (LT,L;) + (B1,( 1 - B1)B2).
Define
(4.4)
G1 = B1, Gi = (1- B 1 ) * * * (- &1 - I ) & ,
i > 1.
A three-dimensional version of the argument given above shows that
which, by the mapping theorem, implies (LT,L;, L;) + ( G I ,G2,G3).
This extends to the general r : ( L y,.. . ,Lp) +, ( G I , .. . , GT). We
can regard (Ly,L?, . . .) and ( G I ,G2, . . .) as random elements of Rw
(Example 2.6), and since the finite-dimensional sets in Rw form a
convergence-determining class, we have
(4.5)
Ln = (Ly,L!, - ..) +n G = (G17G2,. .).
SECTION
4. LONG CYCLES AND LARGEDIVISORS
41
This describes the limiting distribution of the relative cycle lengths
of a random permutation when it is written as a product of cycles in
standard order.
Gi = 1(1- Bi),and since, by the
Induction shows that
Borel-Cantelli lemma, there is probability 1that Bi > f holds infinitely
often, it follows that C z 1 G i = 1 with probability 1. The random
sequence G can be viewed as a description of a random dissection of
the unit interval into an infinite sequence of subintervals. A piece of
length B1 is broken off at the left, which leaves a piece of length 1- B1.
From this, a piece of length (1 is broken off, which leaves a
piece of length (1 - &)(1 - Bz), and so on. What (4.5)says is that
the splitting of a random permutation into cycles is, in the limit, like
this random dissection of an interval.
Let Ly, = (L?l), LT2),. . .) be Ln with the components ranked by
size: Li;, 2 L?2, 2 - - -. The details of how this is to be done are taken
up below. To study the asymptotic properties of the longest cycles of
a random permutation is to study the limiting distribution of Ly,. Let
G(.)be G with the components ranked by size. After the appropriate
groundwork, we will be in a position to deduce LT) + G(.l from (4.5)
by means of the mapping theorem, as well as to describe the finitedimensional distributions of G(.).It will be very much worth our while
to do this in a more general context.
zf=l
nf=l
The Space A
Consider the subset
of R" with the relative topology, that of coordinatewise convergence.
Let
be the set of z in R" for which 2 1 , . . . ,Z k > 0 and 1 E 5
z
i 5 1; it is closed in ROO, and since A = 0,
U k > l / € A,,k
(intersection over the positive rationals), (4.6)is a Bore1 set, a member
of R". From CiCp = n follows Ln E A, and since C i G i = 1 with
probability 1, P[G E A] = 1. This leads us to study weak convergence
in the space A.
If Y nand Y are random elements of A , then they are also random
elements of ROO; see Example 3.1. And Y n + Y holds in the sense
of A if and only if it holds in the sense of RO",which in turn holds if
and only if (YT,. . . ,Y,") + (Y1,. . . , Y,) holds for each T . Therefore:
'&
WEAKCONVERGENCE
IN METRICSPACES
42
Y njY holds for random elements of A if and only if (Yy,. . . ,Y,")J
(Y1,.. . ,Y,)holds for each r .
Change the G in (4.5) on a set of probability 0 in such a way that
C i G i = 1 holds identically, so that G becomes a random element of
A. Now (4.5) is to be interpreted as convergence in distribution on A
rather than on ROO.
We turn next t o ranking. The ranking function p: A + A is defined
this way: If z E A, then xi +i 0, and so there is a maximum zi,and
this is the first component of y = pz. There may of course be ties
for the maximum (finitely many), and if there are j components zi of
maximum size, take y 1 , . . . ,y j to have this common value. Now take
yj+l t o be the next-largest component of z, and so on: p is then defined
in an unambiguous way.
Lemma 1. The ranking function p: A
-, A is continuous.
PROOF.
To prove that z" + z implies pz" + pz, we can start by
applying an arbitrary permutation to the coordinates of x and applying
the same perutation to the coordinates of each z": This does not
change the hypothesis, which requires convergence xr +n xi in each
coordinate individually, and it does not change the conclusion because
it leaves pz and the pz" invariant. Use the permutation that ranks
the xi. Assume then that z1 2 x2 2 - (and hence pz = z) and that
zn + z; we are to prove that pz" + z. Since
z? =
xi = 1 and
the coordinates are all nonnegative, it follows by Scheffb's theorem that
; . 1 - +n 0. Suppose that
al = 5 1 = ... = X i l > a2 = Zil+l =
= X i 2 > a3 = X i z + l = . * .
For n beyond some n,, the coordinates zy, . . . ,xc all lie in the range
a1 f E , z ? ~ + ~
. .,,.z t all lie in the range a2 f E , and Z ~ ~ + ~ , I .C, . ~
are all less than a3 E . If a1 - E > a2 E > a2 - E > a3 E , then
(for n > n,) the first il coordinates of pzn are some permutation of
zy, . . . ,zc , all of which are within E of a l ; and the next i2 coordinates
are some permutation of
. . ,zr2,all of which are within E of a2.
A simple extension of this argument proves continuity.
0
xi
xi
Xi
* * *
+
+
*
+
An immediate consequence of Lemma 1 and the mapping theorem:
Theorem 4.1. If Y n and Y are random elements of A, and if
Y nJ n Y ,then pYn+n pY.
Since (4.5) now means weak convergence in A, we can conclude
that the vector Ly) of ranked relative cycle lengths satisfies
+ ~ ,
SECTION
4. LONG CYCLES
AND
LARGEDIVISORS
43
For this result to have real significance, we need detailed information
about the distribution of G(.) over A.
The Poisson-Dirichlet Distribution
Let B1, B2, . . . be independent random variables on (0,F ,P), each having the beta-(1,8) density 8(1-z)e-1 over [0,1], where 8 > 0. If 8 = 1,
the Bi are uniformly distributed. Define G = (GI, G2,. . .) by (4.4),as
in the special case above. For the general 8, we still have C i G i = 1
with probability 1; alter G on a set of probability 0 so as to make it
a random element of A. The distribution PG-l of G, a probability
measure on A, is called the GEM distribution with parameter 8. And
as before, let G(.) = pG = (G(1),G(2),.. .) be the ranked version of G.
The distribution PG;,’ of G(.) on A is the Poisson-Dirichlet distribution with parameter 8.
For 8 = 1, the GEMdistribution describes the random splitting
of the unit interval into a sequence of subintervals, and the PoissonDirichlet distribution describes the random splitting after the pieces
have been ordered by size (the longest piece to the left). And, also
for 8 = 1, by (4.5) the GEM distribution describes the asymptotic
distribution of the relative cycle lengths of a random permutation when
the cycles are written in standard order, and by (4.7) the PoissonDirichlet distribution describes the asymptotic distribution when the
cycles are written in order of size (the longest one first).
We can get a complete specification of the Poisson-Dirichlet distribution for the general 8 by an indirect method that comes from population genetics. Imagine a population of individuals (genes) divided into
n types (alleles),t the proportion or relative frequency of individuals
of the ith type being Zi,i = 1 , . . . ,n. In the diffusion approximation
to the Wright-Fisher model for the evolution of the population, the
stationary distribution for the relative frequency vector ( 2 1 , . . . ,
has the symmetric Dirichlet density [M13]
for
{
. . , 2,-1 > 0,
21,.
21
+ + zn-1 < 1.
* * *
t In the genetics literature, the number of types is denoted by K . For a general
treatment of mathematical population genetics, see Ewens [27]. The account here
is self-contained but ignores the biology.
WEAKCONVERGENCE
IN METRICSPACES
44
Here a represents the mutation rate per type. To study the limiting
case of a population in which all mutations produce new types, there
being infinitely many such types available, let n tend to infinity while
the total mutation rate 6 = na is held fixed. This forces the individual
components to 0, and so we focus on the largest components-the
components corresponding to types having appreciable proportions in
the limiting population-which means studying the first T components
Z ( l ) ., . . , Z(T)of the ranked relative frequencies.
The program is first to derive the limiting distribution of the ranked
frequencies, and then later to show that this is in fact the PoissonDirichlet distribution with parameter 0. Some of the arguments involve
considerable uphill calculation, and these are put at the end of the
section, so as not to obscure the larger structure of the development.
Changing the notation, let Zn be a random element of A for which
(Zr,.
. . , ZC-,) is distributed according to the density (4.8), 2: = 1 2,. -. . . - ZE-l, and 2: = 0 for i > n. And let Zr, = pZn. Here n will
go to infinity, a will be a positive function of n such that na + 6 > 0,
and T will be fixed. Define
Let g m ( . ; a ) be the rn-fold convolution of the density aza-l on (0, l ) ,
and write (assume n > T 2) (n), = n(n - 1 ) . - ( n - T 1).
-
+
+
Lemma 2. The density of ZT;,, . . . ,Zc, at points of
(4.10)
r ( n 4a - - ( n - T )
a-1
Z1
..,
a-1
'T-1
(n-r+l)a-2
'T
gn-T
(
MT
as
1 - Z1 - . . . ZT
For the proof, see the end of the section. To carry the calculation
further, we need the limit of gm( ; a ) . Suppose that 0 > 0. For k 2 1,
let c k ( 2 ) be the (s1,. . . ,sk)-set where s1,. . . ,S k > 1 and
Si < 2,
and define
+
note that, for
p - 1
2
xfZl
5 k , ck(Z) = 8 and Jk(X;6) = 0. Define J o ( z ; 0 ) =
SECTION
4. LONG CYCLES AND LARGEDIVISORS
45
Lemma 3. For 0 < x < m, we have
(4.12) g m ( x ; a )=
0
C (-l)k mk
O<k<x
rm-k(+m
r((m- k ) a )
where for k = 0 the integral is to be replaced b y xma-l. If m + 00 and
ma + 8 > 0, then gm(x;a ) converges to
(4.13)
f o r each positive x ; ge is a probability density o n ( 0 , ~ having
)
Laplace
transf o r m
Although ge is integrable, term-by-term integration in (4.13) gives
C f ~ Here
. y is Euler's constant and E l ( t ) is
e-"u-'du, the
exponential integral function. The sum in (4.13) can be extended to
all k 2 0, since the summand vanishes for k 2 x . Again the proof is
deferred.
Lm
Theorem 4.2. A s n 00 and na 8 ( r f i e d ) , the density (4.10)
of Zc,, . . . ,Zc, converges to the probability density
-+
-+
(4.15)
o n the set
MT
defined by (4.9).
Let d T ( z T8)
; be (4.15) with 2 1 , . . . , zT-1 integrated out (for fixed zT,
. . , z T - l , 2,) ranges over MT),so that dT( . ; 8 ) is the r t h marginal
density.
(21,.
Theorem 4.3. The rth marginal density of (4.15) is
(4.16)
dT(x;8) = xe-2
C
O<k<X-'-T
(- q k e T f k
l-x
JkST - 1 7
;0)
k ! ( r - l)!
(
W E A K CONVERGENCE IN
46
for 0 < x
< r-'. The corresponding
for rn = 0,1,2,.
METRICSPACES
moments are
. ..
The two preceding theorems are proved at the end of the section.
Size-Biased Sampling
Let X = (X1,X2,.. .) be a random element of A. The size-biased
version of X is another random element X = ( X I ,X 2 , . . .) of A defined
informally in the following way. Take X 1 = Xi with probability X i .
That done, take X 2 = X j ( j # i) with the conditional probability
X j / ( l - X i ) ; to put it another way, take X I = X i , X 2 = X j (i # j )
with probability Xi - X j / ( l - X i ) . Continue in this way. We must set
up a probability mechanism for making the choices, and we must avoid
divisions by 0.
Replace the probability space on which X is defined by its product
with another space, in such a way that the enlarged space supports
random variables (1, &, . . . that are independent of X and of each other
and each is uniformly distributed over [ 0,1]; arrange that 0 < & ( w ) <
X h (So = 0 ) ; since X is a
1 for every u and w . Let Si =
random element of A , Si t l . Let cu = i if Si-1 < tU 5 Si. Then,
conditionally on X , the qu are independent and each assumes the value
i with probability Xi (if X i = 0, then [cu = i] = 0). Let 71 = c1; let
7 2 = cu, where u is the smallest index for which cu is distinct from 71;
let 7 3 = cw,where v is the smallest index for which q,, is distinct from
7 1 and 7 2 (v necessarily exceeds u ) ; and so on. If we define Xu = X,,,
then X = ( X I ,X 2 , . . .) will have the structure we want, provided we
accommodate the case where we run out new values for the T ~ .
Suppose that the procedure just described defines 71, . . . ,T~ un* * X,, = 1. Then ~ ~ is+not1 defined,
ambiguously but that X,,
because there is no qw distinct from 71, . . . , ru. In this case, take T~ = 00
and Xw = 0 for v > u. Under this definition, if X has infinitely many
positive components, then all the components of X are positive, and
they are a (random) permutation of the positive components of X ; and
if X has exactly u positive components, then X I , . . . ,Xu are positive,
+ +
SECTION 4. LONG CYCLES AND
LARGEDIVISORS
47
a permutation of the positive components of X , and Xu = 0 for 21 > u.
And C,"=lXu = Czl Xu = 1 in any case, so that X is a random element of A (verify in succession the measurability of cu, T ~ X,u ) . This
defines the size-biased version X of X .
The significance of size-biasing for the biological model is this. If
( X I , .. . ,X,) has the density (4.15) for each T , then X = ( X I X,,
, . . .)
describes a population with infinitely many types and mutation rate
8. Now X = (X,,, X,,, . . .), and for a given X , X,, is the conditional
probability that 71 is the oldest type, X,, is the conditional probability
that 7 2 is the second oldest type, and so on. For us the significance of
size-biasing will be that we can use it to show that the densities (4.15)
specify the finite-dimensional distributions of the Poisson-Dirichlet distribution on A. This will give us the asymptotic distribution of the
lengths of the long cycles in a random permutation, and we can also
use size-biasing to analyze the distribution of the large prime divisors
of a random integer.
Return now to the Zn of Theorems 4.2 and 4.3. By (4.8) and
Dirichlet's formula [M12], (Zy,. . . ,ZF)has at (21,. . . ,z,) the density
(4.18) hr(zl,.. . ,2), =
+ +
if T < n, 21,. . . , z, > 0, and z1
Z, < 1; and by symmetry, this
is also the density for Zc,. . . ,Zc for any of the ( n ) , permutations
il, . . . , a, of size T from (1,2, . . . ,n}. For size-biasing on Zn,
(4.19)
P [ T ~= il, . . . ,T, = i,IlZ"]
the last equality defines p , and there are no divisions by 0. Since
(ZT,. . . ,ZF)= (Z7",,. . . , ZFr),
P[(@,
. . . ,Z:>E HllZ"] = C p ( Z Z " ,.,. . ,zc)Ijy(z;,
. . . ,ZC),
where the sum extends over all (n), choices for
follows by symmetry that
(4.20) P[(Zp, . . . ,
Z
:
)E H]
il,
. . . ,i,.
And now it
WEAKCONVERGENCE
IN METRIC
SPACES
48
Let [ 0, 11" be the subspace of R" consisting of the points x satisfying 0 xi 1 for all i. Suppose that Bn is a random element of
[ 0,1]" such that the components By,B?, . . . are independent and Bin
has the beta-(a
1, ( n - i)a) density
< <
+
on ( 0 , l ) . Define Gn by GY = By and GY = (1 - B y ) . . (1 - Br-l)Ba.
Consider the map given by z1 = x i , zi = (1 - x1) - . (1 - xi-l)zi,
1 < i 5 T . The inverse map is x1 = z1, X i = zi/(l - z1 - Zi-l),
1 < i 5 T , and so G;",. . . , GP has density
(4.22)
Since the Jacobian is n ~ = 2 ( 1 - z l - . . . - z i - l ) - 1 , algebra reduces (4.22)
to the integrand in (4.20):
(4.23)
(By, . .@)
=d
(G?, . . . , G r ) ,
in the sense that the two have the same distribution.
As n -, 00 and na -+ 8, (4.21) converges to the beta-(l,8) density
for 1 5 i 5 T . If B1, B2,. . . are independent, each having this limiting
distribution, then of course ( B r ,. . . ,Bp) an (B1,. . . ,B y ) . And if
G I ,G2,. . . are defined in terms of these Bi by (4.4), as before, then the
mapping theorem gives (G;",.. . , GP) +, ( G I , .. . , Gr). It follows by
(4.23) that (Zy, .. . ,&?) = s (~G I , .. . ,G r ) ,and since this holds for each
r , Zn= s ~G. And further, since the components of Zn are a (random)
permutation of those of Zn,pZn and pZn coincide, and it follows by
Theorem 4.1 that
(4.24)
2:) = pZn = p i n
J~pG = G(.).
Since Theorems 4.2 and 4.3 give the limiting densities for the components of 2;), we have a complete description of the Poisson-Dirichlet
distribution:
Theorem 4.4. The random vector ( G ( l ).,. . , G ( r ) ) has density
(4.15), G(r)has density (4.16), and the moments E [ ( G ( , I ) ~
are] given
b y (4.17).
SECTION 4.
LONG CYCLES
AND
LARGEDIVISORS
49
Numerical values for some of these distributional quantities can be
calculated. Below is a short table of the distribution of G(1) for the
case 8 = 1. The median is e-1/2, which is about .61, and so there is,
for large n, probability about that the length of the longest cycle
exceeds .61 x n.
X
.20
.30
.40 .50
.60 .70
.80 .90
P[G(1) 5 X] .OO
.02
.13 -31 -49 .65 .78 .90
Moments have also been computed: For 8 = 1, the first three components G(q, G(z),G(q of G(.) have means .62, .21, .01 and standard
deviations .19, .11, .07. And here is a graph of d l ( - ; 1):
Large Prime Divisors
Let Nn be an integer randomly chosen from among the first n integers:
P[Nn = 7-12]= 1/n for 1 5 m 5 n. Let Qni be the distinct prime
divisors of Nn, ordered by size: Qnl > Qn2 > * *, where Qnu = 1if Nn
Qnu be the
has fewer than distinct prime divisors,t and let Tn =
product of the distinct prime divisors.
nu
Theorem 4.5. We have
(4.25)
where G(.) has the Poisson-Dirichlet distribution for 8 = 1.
Note that the vector on the left does lie in A. We prove below that
(4.26)
t In the limit, the large prime divisors all have multiplicity 1: PIQ:iIN,]
for each i . See Problem 4.4.
+
,
0
50
W E A K CONVERGENCE IN
METRICSPACES
from which it follows that the asymptotic properties of the ratios
logQ,i/logNn are the same as those of the ordered relative cycle
lengths of a random permutation. For example, the chance is about $
that the largest prime factor of Nn exceeds N;L6l.
-
PROOF.Since Tn 5 Nn
(4.27)
I n, (4.26) will follow if we prove that
We need two facts from number theory:t
(4.28)
c
Pix
as x
-, 00.
P
= loga:
+ 0(1),
C l o g p = O(z)
P l X
Since
(4.28) implies that E[log n - log Tn] is bounded, and (4.27) follows from
this.
Size-bias the vector in (4.25), then multiply each component of
the result by logT, and apply exp( . ) to it. This gives a sequence
Dn1, Dn2, . . . of random variables such that, if N, has exactly T distinct
prime factors and P I , . . . ,pr is an arbitrary permutation of them, then
(4.29) P[Dn1 = p l , . . . , D n r = ~ r l l N n ]
And Dni = 1 if i > T . Note that the conditional probabilities (4.29)
summed over the r! permutations come to 1. Define Vnl = n and
Vni = n/Dnl * * * Dn,i-l, and take
(4.30)
Bni = log Dni / log Vni .
Finally, define Gn1 = B n l and Gni = (1- Bn1)
we can prove that
(4.31)
Bn2,.
a)
Jn
(BIT
(1 - Bn,i-l)Bni. If
a),
t Hardy & Wright [37],Theorems 414 and 425, or [PM.85].
SECTION
4. LONGCYCLES
AND
51
LARGEDIVISORS
where the Bi are independent random variables, each uniformly distriubted over [ 0,1], then the mapping theorem will imply that G, =
(Gnl,Gn2,. . .) +n G = ( G I ,G2,. . .), where the Gi are defined by
(4.4). NOW Gni = logD,i/logn (note that Vn,i-l/D,,i-l = Vni and
that G,i = 0 if i exceeds the number of distinct prime divisors of N,),
and if GLi = logD,i/logT,, then it follows by (4.27) that GL + G.
Theorem 4.1 now gives pGL + pG = G(.),and since the D,i are a
permutation of the Qni, this is (4.25).
To understand how (4.31) is proved, consider first an approximate
argument for Bn1. Suppose that 0 5 a 5 1. If t(rn) is the product of
the distinct prime divisors of rn (T, = t ( N , ) ) , then
P[Bn1 5 a] = P[D,1 5 na] =
C -P[D,1
n
h
m=l
n
h
p<na,p(m
1% P
logt(rn)
1
-
5 nalN, = rn]
m=l
h
m=l
n
p<na,plm
C,-.a,
1% P
1
logn
=E&im<n,plm
C P&i
Plna
pin"
where the first approximation works because log J: increases so slowly,
the second works because there are about n/p multiples of p preceding
n , and the final step is a consequence of (4.28).
By Theorem 2.5, in order to prove (4.31) we need only show that,
for each T ,
(4.32)
liminf ~ [ a<i ~
n
, i5
T
bi7 i 5 r] 2 n ( b i - ai)
i= 1
if 0 < ai < bi < 1. Note that there are at least
event in (4.32) occurs. By the definitions,
(4.33) ~ [ a<i Bni 5 bi,i 5 T I = P [ V <~ D,i 5
n
T
prime divisors if the
v,b,,,i 5 TI
m=l
where the inner sum extends over certain permutations p l . . . , p , of
the distinct prime divisors of rn, namely, over those permutations that
satisfy
n
(4.34)
( P I api-1
<Pi
.. .Pi-1 )bi, i 5 r.
* *
)"
qpl
52
W E A K CONVERGENCE IN
METRICSPACES
By (4.29), each term in the sum satisfies (t(m)defined as before)
Decrease the right side by increasing t(m)to n. This gives
and we can further decrease the right side by tightening (4.34) to
(0
< E < 1). For i = r, the right-hand inequality here implies pT 5
llbi
I
e n / p i . * * p T - iand hence p i - * - p TI
en. But then the number
Ln/p1- * . p T ] of multiples m of p i * pT satisfying m I
n is greater than
(1- e ) n / p l - * p T ,and reversing the sums in (4.36) gives
PT
+
T
the sum extending over the permutations satisfying (4.37).
If ni = n / p l * *pi-1, then (4.37) implies ni/ni+l = pi I
n: and
hence n,+l 2 nt-bi, and it follows by recursion that ni 2 nT 2 n6 ,
where 6 =
implies
nL1(l- bi) > 0. By (4.28), there is a u, such that u 2 u,
and if n 2 u,"I6,then this holds for each u = ni. Fix P I , . . .pT-l, sum
out pT, and apply (4.39) for u = nT:
where here the sum extends over p l , . . . ,pT-l and the term -r/logn6
is to account for the fact that in summing out p , we must avoid the
values of the earlier p i . And now successively sum out pT-1,. . . , p l in
the same way:
T
for n 2 u:I6. This proves (4.32).
0
SECTION
4. LONG CYCLES
AND
53
LARGEDIVISORS
Technical Arguments
PROOF
OF LEMMA2. Let
(4.40)
By symmetry, the density of ZE),. . . ,Zrr) a t a point
(21,.
c,"=,
. . ,zr) of M,.
where the domain of integration is given by 0 < y l , . . . ,ym
for 2,") 0 < 1 - s yi < Z r . The change of variables
integral here to
< zr
yi
is
and (to account
= zrs, reduces the
where now the variables are constrained by 0 < s1, . . . , sm < 1 and c- 1 <
si <
c. If S is a (positive) random variable having the density gm, then the integral in
(4.42) is the integral of (c - S)O-' over the event c - 1 < S < c (if c < 1, the
constraint c - 1 < S is vacuous). Therefore, (4.41) can be written as
Since the last integral is gm+l(c;a),this proves (4.10).
0
PROOF
OF LEMMA
3. The Laplace transform [ ( l - c l ~ ~ ( l - e - " ) z a - l d x ) l ~ a ] m o
of g m ( .; a ) converges to the right-hand member of (4.14). This by itself, however,
does not imply that the limit transform comes from a density to which gm(. ;a)
converges; see Example 3.3. Let Am(z) be the set where 0 < s i r .. . ,sm < 1 and
sE< x ; let &(5) be the set where s1,. . . , s k > 0 and
sz < z; and define
Ck(x) as in (4.11). First, for 0 < 5 < m ,we have
czl
(4.43)
cf=l
lxgm(u)du
=/
ams;I-'
...sE-;'dsl...ds,.
Arn(x)
Use the inclusion-exclusion principle, together with symmetry, to express this as the
integral over Bm(x) minus an alternating sum of integrals over the regions where
sl,.. . , S k 2 1, S k f l , . . . , sm > 0,and
si < z. This reduces (4.43) to
czl
WEAKCONVERGENCE
IN METRICSPACES
54
where in the term for k = 0 the iterated integral is replaced by a single integral over
&(z). Dirichlet's integral formula [M12] further reduces this to
where in the term for k = 0 the integral is replaced by Pa.
The derivative of the term for k = 0 in (4.44) matches the corresponding term
in (4.12), and we turn to the case 1 5 k < x < m. Write p = (m - k)a > 0 and
u = s1 . . . ck. Since (4.43) does have a derivative, we need only find the limit
for h decreasing to 0 of the difference quotient
+ +
(4.45)
Since z
again)
- .
'J
Ck(X+h)-ck(X)
s?-~...s;-'(z
+ h - c7)'dSi
...dSk
+ h - u 5 h and /3 > 0, the first term here is at most (Dirichlet's formula
r k ( a ) (z + h)ko - xka hP-rk(a) z k o - - l -+h 0,
N
h
Wa)
The difference quotient in the second term in (4.45) goes to P(z - ~ 7 ) O - l ~and so the
derivative of the kth term (k 2 1) in (4.44) will agree with the kth term in (4.12)
if we can integrate to the limit. But by the mean-value theorem, the difference
quotient is bounded by / 3 ( 2 ~ ) ~if -/3~2 1 (and h < z) and by the limit /3(z - 0 ) O - l
if 0 < /3 < 1, and in either case, the dominated convergence theorem applies. (This
argument is like the proof of the chain rule, which does not work here.)
Suppose now that m -+ 00 and m a -+ 8, and consider a fked k. Since r'(1) =
-y (for our purposes this can be taken as the definition of y), we have &(a) =
r ( l a) = 1 - y a O ( a 2 )and hence amrm(a)
-+ e--ye.
Since (m)k
mk and
rn/r(a) = r n a / r ( l + a) -+ 8, it now follows that the factor in front of the integral
in (4.12) converges to the corresponding factor in (4.13). It is now clear that the
term for k = 0 in (4.12) converges to the term for k = 0 in (4.13). Consider the case
1 5 k < z < m. Let Dk(Z) be the set defined by yl, . . . ,yk > 1/z and
yi < 1.
Transforming the integrals in (4.11) and (4.12) by si = zyi, consider whether
+
+
N
cf=l
+ z'-~. The two integrals themselves
holds for each positive z. Of course, Po-'
differ by at most
SECTION
4. LONGCYCLESAND LARGEDIVISORS
55
The integrand here goes to 0 pointwise; if 0 < 00 < 0, then (m- k)o > 00 for large
m , and so the integrand is at most 2(1 - ~Ek_lyz)eO-l,
which is integrable over
&(I) (Dirichlet yet again). By the dominated convergence theorem, the integral
goes to 0, and we do have g m ( x ;a ) + g e ( x ) pointwise. The limit go is nonnegative,
and by Fatou's lemma, it is integrable.
Although the series in (4.13) cannot be integrated term by term, it can be if
we multiply through by e-tx for t > 0. Reversing the integral in the kth term then
gives I'(0)t-e(E1(t))k. Putting this into (4.13) and summing over k gives the middle
expression in (4.14), and sum and integral here can be reversed because if we work
with absolute values (suppress the ( - l ) k ) , the resulting sum is finite.
For t > 0, the right-hand equality in (4.14) is equivalent to y log t E l ( t ) =
s,"(l- e-")u-'du, which follows by integration by parts on each side, together with
the fact that -7 = r'(1) =
e-" log u du. Finally, since the right-hand member
of (4.14) converges to 1 as t 10, g integrates to 1-is a probability density.
0
+
+
PROOF
OF THEOREM
4.2. Recall the definitions (4.40). Since r is fixed, we have
m + 00 and ma + 0, and so, by Lemma 3 (m + 1 in place of m),(4.10) converges
pointwise to (4.15). By Fatou's lemma, (4.15) integrates to at most 1, and we must
show that the integral in fact equals 1; this will be part of the next proof.
0
PROOF
OF THEOREM
4.3. By (4.15) and (4.13),
+
+
where the integration extends over s1 > . . . > sr-l > x and s 1 + . . . sr-l z < 1.
First, restrict the sum by 0 5 k 5 2-l - r (the other terms are 0) and take it outside
the integral; second, lift the constraint s1 > ... > s,-1 and divide by ( r - l ) ! to
compensate; and third, change variables: yi = s i / x . This leads to
(4.46) d,(x;O) = x ' - ~
O<kSx-'-r
k!(r - l)!
+ +
where here the integration extends over y l , . . . ,yr-1 > 1 and yl . . . yr-l <
(1 - x ) / x . Replace the Jk in (4.46) by its definition (4.11), and intersect the region
of integration for the yi with that for the si to arrive at (4.16).
It is probabilistically clear that d, must be supported by (0,~-l).
It is convenient to take the support as (0, l ) , which is consistent because all the summands in
(4.16) vanish if 2 > r-'. The mth moment of d,( . ; O ) is
56
W E A K CONVERGENCE IN
METRICSPACES
To simplifty the notation, take w = Ic
+ r - 1; by (4.11),the last integral in (4.47) is
+ + . - . + sv.
The change of variable s = a / y converts the
where a stands for 1 s1
inner integral into [Mll]
Therefore, the last integral in (4.47) is (w = k
+ r - 1)
(4.48)
Expanding the right-hand exponential in (4.17) in a series gives
What (4.17) says is that the right member of (4.47) is identical with (4.49), and this
will be true if the two match term for term, for each k . Since the last integral in
(4.47) reduces to (4.48), and since r(0 1) = W(e), the question now is whether
(take w = k r - 1 again in (4.49))
+
+
This last integral is (take si = xi/y)
+ + +
Since the inner integral on the right here is (1 s1 . . . sv)-mI'(m), we arrive at
the integral on the left in (4.50). This proves (4.17).
We have still to prove that (4.15) integrates to 1, which will be true if the
marginal density (4.16) does. Since (4.17) holds for m = 0, the question is whether
(4.51)
For r = 1, this is
(4.52)
ey-le-Ve-@El(v)dy = 1
1
and it holds because the integrand is the derivative of e-gE1(v). Multiply each side
of (4.52) by 0-' and repeatedly differentiate the resulting identity with respect to
8. This gives a sequence of identities equivalent to (4.51).
The proofs of Lemmas 2 and 3 and Theorems 4.2 and 4.3 are now complete. 0
SECTION
5 . PROHOROV’S
THEOREM
57
Problems
4.1. Show that
&z;e)
k
= -Jk-l(25
i;q,
forz
>k 2 1
if 0 = 1. Now show that &(s; 1) is differentiable except at
4.2. Let X be the sizebiased version of X . Show that X ,
ax.
2
+X
= 1/2.
implies X ,
+X .
4.3. Denote the size-biased version of X by
Show that p 2 X = p X , a 2 X = d
p a x = p X , upX = d ax. Show that, if X has the Poisson-Dirichlet
ax,
distribution, then p X = X and a X has the GEMdistribution. Show that, if
X has the GEMdistribution, then u X = d X and pX has the Poisson-Dirichlet
distribution.
4.4. (a) Let Ab be the set of integers m such that p 2 divides m for some prime p
exceeding b. Show that, if b, + 00, then P[N, E Abn] + 0 .
(b) Redefine the Qni to consist of all the prime divisors of N,, multiplicity
accounted for, in nonincreasing order: Qnl 2 Qn2 2 . . .. Show that (4.25)
still holds if N , is substituted for T,.
4.5. If 2 has density (4.13), then the denisty he of (1
+ Z)-’
is related to d l ( . , O )
Show that E[G;y] = Br(B)eYe.
SECTION 5. PROHOROV’S THEOREM
Relative Compactness
Let II be a family of probability measures on ( S ,8).We call II relatively compact if every sequence of elements of II contains a weakly
convergent subsequence; that is, if for every sequence {P,} in II there
and a probability measure Q (defined on
exist a subsequence {Pni}
(S,S) but not necessarily an element of II) such that P,, + Q. Even
though Pni+i Q makes no sense if QS < 1, it is to be emphasized that
we do require QS = l-we disallow any escape of mass, as discussed
below. For the most part we are concerned with the relative compactness of sequences {P,}; this means that every subsequence {P,,}
contains a further subsequence {P,,(ml}such that
=+m Q for
some probability measure Q.
Example 5.1. Suppose we know of probability measures P, and
P on (C,C) that the finite-dimensional distributions of P, converge
weakly to those of P: PnT;1..,,+, P7r;1..tk for all lc and all t l , . . . ,&.
We have seen (Example 2.5) that P, need not converge weakly to P.
Suppose, however, that we also know that {P,} is relatively compact.
Then each {Pni}
contains some {P,,(,,,,} converging weakly to some Q.
58
W E A K CONVERGENCE IN
METRICSPACES
Since the mapping theorem then gives Pn,c,,.rrl;l..t, =+, QrGf..t,,
and since Pnr,l!..t, +n PnG1..tk by assumption, we have QrG1..t, =
PrG;..tk for all t l , . , . ,t k . Thus the finite-dimensional distributions of
P and Q are identical, and since the class Cf of finite-dimensionalsets is
a separating class (Example 1.3), P = Q. Therefore, each subsequence
contains a further subsequence converging weakly to P-not to some
fortuitous limit, but specifically to P. It follows by Theorem 2.6 that
the entire sequence {Pn} converges weakly to P. Therefore:
If { Pn} is relatively compact and the finite-dimensional distrabutions of Pn converge weakly t o those of P, then Pn J n P .
This idea provides a powerful method for proving weak convergence
in C and other function spaces. Note that, if {Pn} does converge
weakly to P, then it is relatively compact, so that this is not too
strong a condition.
0
Example 5.2. Now suppose we know that a sequence of measures {Pa} on (C,C) is relatively compact and that, for all k and
all t l , . . . ,t k , PnTG1,,tkconverges weakly to some probability measure
pt l...tk on (Rk,7Zk)-the point being that we do not assume at the
outset that the pt l...tk are the finite-dimensional distributions of a
probability measure on (C,C). Some subsequence { Pni} converges
weakly to some limit P (sub-subsequencesare not relevant here). Since
PnirG1..tk ~i Prt,1kt, (again by the mapping theorem), and since
PnxG!..t, +n pt l...tk (again by assumption), P T G ~ . =
~ , pt *...t, for all
t l , . . . ,t k . This time we conclude that there exists a probability measure P having the pt l...t, as its finite-dimensional distributions. Therefore:
If {Pn} as relatively compact, and if PnrG1..t, +, pt l...t, f o r all
t l , . . . t k , then some P satisfies Pr<l..tk= pt l...tk f o r all t l , . . . , t k .
This provides a method of proving the existence of measures on
(C,C) having prescribed properties.
0
Tightness
To use this method requires an effective means of proving relative compactness.
Ezample 5.3. Suppose Pn are probability measures on the line.
How might we try to prove that {Pn} is relatively compact? Let F,
be the distribution functions corresponding to the Pn. By the Helly
selection theorem [PM.336],every subsequence {&*} contains a further
SECTION
5 . PROHOROV’S
THEOREM
59
subsequence { Fni(m)
} for which there exists a nondecreasing, rightF
(x)+, F ( x ) for all continuity
continuous function such that Fni(m)
points x of F. Of course, 0 5 F ( x ) 5 1 for all z, and if F has limits
0 and 1 at -GO and +GO, then F is the distribution function of a
probability measure Q, and Pni(m)+
, Q follows (Example 2.3). One
can try in this way to prove {P,} relatively compact.
If {P,} is not, in fact, relatively compact, then of course the effort
must fail for certain subsequences. Suppose for example that P, = 6,.
Then F ( x ) 0 is the only possibility for the limit function, and accordingly, there are no weak limits at all: If Pni(m) +
, Q, then
Q(-k, k ) 5 liminf, P,i(m)(-k, k ) = 0 for all k , and so Q cannot be a
probability measure. The reason this sequence is not relatively compact is that mass is “escaping to infinity.”
For a second example, let p, be the uniform distribution over
[-n,+n], and take P, = 60 for even n and P, = &50
i p , for odd
n. Then {Pni}contains a weakly convergent subsequence if ni is even
for infinitely many i , but what if ni runs through odd integers only?
Then the single possible limit F ( x ) is for x < 0 and $ for x 2 0, and
again Pni(m)=+, Q is impossible: It would imply Q(-k, k ) 5 for all
k . In this example, the mass lost (namely
rather than heading off
in any particular direction, simply evaporates.
0
=
+
i),
The condition that prevents this escape of mass generalizes the
tightness concept of Theorem 1.3. The family I’I is tight if for every E
there exists a compact set K such that P K > 1 - E for every P in n.
Theorem 5.1. If n is tight, then it is relatively compact.
This is the direct half of Prohorov’s theorem, and the main purpose
of the section is to prove it. The proof, like that of Helly’s theorem,
will depend on a diagonal argument. The following corollary extracts
the essence of the argument in Example 5.1.
Corollary. If {P,} is tight, and i f each subsequence that converges
weakly at all in fact converges weakly to P , then the entire sequence
converges weakly to P: P, +, P .
PROOF.
By the theorem, each subsequence contains a further subsequence converging weakly to some limit, and by the hypothesis, this
limit must be P. Apply Theorem 2.6.
0
EzampZe 5.4. Return to Example 5.1, and suppose that P, is a
unit mass at the function z, defined by (1.5). No subsequence can
60
W E A K CONVERGENCE IN
METRICSPACES
converge weakly (Example 2.7), and the reason for this is that {P,} is
not tight: If P,K > 1 - E > 0 for all n, then K must contain all the
0
z, and hence cannot be compact.
Theorem 5.2. Suppose that S as separable and complete. If TJ: as
relatively compact, then at as tight.
This is the converse half of Prohorov’s theorem. It contains Theorem 1.3, since a II consisting of a single measure is obviously relatively
compact. Although this converse puts things in perspective, the direct
half is what is essential to the applications.
PROOF.Consider open sets G, increasing to S . For each E there
is an n such that PG, > 1 - E for all P in n: Otherwise, for each n
we have P,G, 5 1 - E for some P, in II, and by the assumed relative
compactness, PniJi Q for some subsequence and some probability
measure Q, which is impossible because then QG, 5 liminfi P,,G, 5
lim infi PniG,, 5 1 - E , while G, t S.
From this it follows that, if Akl, A k 2 , . . . is a sequence of open balls
of radius l / k covering S (separability), then there is an nk such that
P(Uil,, A k i ) > 1 - ~ / 2for~ all P in n. If K is the closure of the totally
Ui<,,, A k i , then K is compact (completeness), and
bounded set
PK > 1 - E for ill P i n II.
0
n,
The Proof
There remains the essential agendum of the section, the proof of Theorem 5.1. Suppose that {P,} is a sequence in the tight family n. We
and a probability measure P such that
are to find a subsequence {Pni}
Pni=+-i P.
CONSTRUCTION.
Choose compact sets K, in such a way that K1 c
K2 c
and PnK, > 1 - u-l for all u and n. The set ( J U K Uis
separable, and hence [M3] there exists a countable class A of open sets
with the property that, if z lies both in U,K, and in G , and if G is
open, then z E A c A- c G for some A in A. Let ‘FI consist of 0 and
the finite unions of sets of the form A- n K , for A E A and u 2 1.
Using the diagonal procedure, choose from the given sequence {P,}
a subsequence {Pni}
along which the limit
a ( H ) = lim P,,H
2
SECTION
5. PROHOROV’S
THEOREM
61
exists for each H in the countable class H. Our objective is to construct
on S a probability measure P such that
PG = sup a ( H )
HCG
for all open sets G. If there does exist such a probability measure P ,
then the proof will be complete: If H C G, then a ( H ) = limi Pn, ( H ) 5
lim infi PniG, whence PG 5 lim infi PniG follows via (5.2), and therefore Pni=+i P.
To construct a P satisfying (5.2)’ note first that 1-I is closed under
the formation of finite unions and that a ( H ) , defined by (5.1) for each
H in 1-I, has these three properties:
a(Hi)5 a(H2) if H I
(5.3)
c H2;
And clearly 4 0 ) = 0. For open sets G define
= 0. Finally, for arbitrary subsets
then is monotone and p(0) = a(@)
M of S define
(5.7)
it is clear that y(G) = P(G) for open G.
Suppose we succeed in proving that y is a n outer measure. Recall
[PM.165] that M is by definition y-measurable if y(L) 2 y(M n L ) +
y(MCn L ) for all L c S, that the class M of y-measurable sets is a
a-field, and that the restriction of y to M is a measure. Suppose also
that we are able to prove that each closed set lies in M . It will then
follow, first, that S c M , and second, that the restriction P of y to S
is a measure satisfying PG = y ( G ) = P(G), so that (5.2) will hold for
open G. But P will be a probability measure because (each K,, having
a finite covering by A-sets, lies in 1-I)
(5.8)
1
2 PS = p(S) 2 supa(K,) 1: Sup(1U
U
u - ’ ) = 1.
62
WEAK CONVERGENCE IN
METRICSPACES
We proceed in steps.
Step 1: If F c G, where F is closed and G is open, and i f F c H
f o r some H in ‘FI, then F c Ho c G for some Ho an 3-1. To see
this, choose, for each z in F , an A, in the class A in such a way that
5 E A, c A; c G. These A, cover F, and since F is compact (being
a subset of H ) , there is a finite subcover A,, , . . .A,, . Since F c K,
for some u, we can take Ho = UF=l (A& n K,).
H
Step 2 P is finitely subadditive (on the open sets). Suppose that
c G1 U G2, where H E 3-1 and G1 and G2 are open. Define
If z E F1 and x 4 G I , then 2 E G2, so that, since Gg is closed,
p(z, G i ) = 0 < p(z, G;), a contradiction. Thus F1 c G I ,and similarly
F2 c G2. Since F1 c H a n d H E 3-1, it follows by Step 1 that F1 c H1 c
G1 for some H1 in 3-1; similarly, F2 c H2 C G for some H2 in 3-1. But
then a ( H ) 5 a(H1 U H2) 5 a ( H i ) + a(H2) 5 P(Gi) + P(G2) by (5.31,
( 5 4 , and (5.6). Since we can vary H inside GI U G2, P(G1 U G2) 5
@ ( G I ) P(G2) f
~
~
~
~
~
~
.
+
Step 3: P is countably subadditive (on the open sets). If H c
G,, then, since H is compact, H c Un<no
G, for some no, and
G,) 5 Ensno
P(G,) 5
finite subadditivity implies a ( H ) 5 /3(Un<no
C,P(G,). Taking the supremum over If contained in U,G, gives
P(U, Gn) I P(Gn).
U,
c,
Step 4: y is a n outer measure. Since y is clearly monotone and
satisfies ~ ( 0=
) 0, we need only prove that it is countably subadditive.
Given a positive E and arbitrary subsets M, of S, choose open sets
G, such that M, C G, and P(G,) < y(M,) ~/2,. Then, by the
+
SECTION5 . PROHOROV'S
THEOREM
63
countable subadditivity of P , y(U, M,) L P(u,G,) I
P(Gn)
C , y ( M n ) E ; since E was arbitrary, y(U, M,) 5 C, y ( M n ) .
+
<
+
Step 5: P(G) 2 y ( F n G) y(FCn G) for F closed and G open.
Choose, for given E , an H1 in 7-l for which H1 C FCr l G and a(H1) >
P(FCn G) - E. Now choose an HO in 7-l for which HO c H t n G and
a(H0) > p(H,CnG)-E. Since Ho and H1 are disjoint and are contained
in G, it follows by (5.6), (5.4), and (5.7) that P(G) 2 a(H0 U Hi) =
a(H0)+a(H1) > P( H,CnG)+P(F"nG) -2~ 2 y ( F n G )+y(FCnG)- 2 ~ .
But E was arbitrary.
Step 6 F E M if F is closed. By the inequality of Step 5,
P(G) 2 y ( F n L ) + y(FCn L ) if G is open and G 3 L; taking the
infimum over these G shows that F is y-measurable. This completes
the construct ion.
0
Prohorov's theorem (the direct half) will be used many times in
what follows. Here is a standard application to classical analysis:
ExarnpZe 5.5. Let P be a probability measure on the Bore1 subsets of the half-line S = [ 0,m). The Laplace transform of P is the
function defined for nonnegative t by L ( t ) = Jx>o edt'P(dz). There is
a uniqueness theorem [PM.286] according to which L completely determines P. Suppose we also have a sequence of measures P, with
transforms L,. If P, + P , then obviously L,(t) ---t L ( t ) for each t.
Let us prove the converse. The argument depends on the inequality
Since L ( t ) is continuous and L(0) = 1, there is for given E a u such
L ( t ) for all t ,
that u - l J t ( 1 - L ( t ) ) d t < E/e. But then, if L,(t)
u-l J,"(l- L n ( t ) dt
) < e/e for n beyond some no. And if a = u-l, then
(5.9) gives Pn[
0, a] 2 1-E for n > no, Therefore, {P,} is tight (increase
a to take care of 9 , .. . ,P,,), and by the corollary to Theorem 5.1 it
is enough to show that if a subsequence converges weakly at all then
it must converge to P. But Pni+ i Q implies that Q has transform
0
limi Lni(t)= L ( t ) ,so that, by uniqueness, P must be the limit.
--f
The arguments in Examples 5.1 and 5.5 prove very different propositions but have the same structure. Each argument rests on three
64
W E A K CONVERGENCE IN
METRICSPACES
things: (i) the concept of tightness, (ii) a previously established converse proposition, and (iii) a previously established proposition concerning uniqueness.
As to Example 5.1, on conditions under which P,nt,1..,, =+ Pnt,l..tk
for all t l , , . . ,t k implies P, + P on C:
(i) Methods for proving tightness will be developed in Chapter 2.
(ii) The previously established converse says that Pn =+ P implies
1
P,nt,1..,, =+ p.rrt,...,, for all t i , . . . , t k .
(iii) The previously established uniqueness proposition says that the
.~~
determine P.
finite-dimensional distributions P x ~ , ! . uniquely
As to Example 5 . 5 , on the fact that pointwise convergence of the
corresponding Laplace transforms (L,(t) + L ( t ) for t 2 0) implies
P, =+ P on [ 0 , ~ ) :
(i) Tightness was proved by means of (5.9).
(ii) The previously established converse says that Pn =+ P implies
Ln(t)+ L ( t )for all t.
(iii) The previously established uniqueness proposition says that the
transform L uniquely determines P.
Although these two arguments have the same structure, there is an
essential difference: If the Laplace transforms in Example 5.5 converge
pointwise, then, because of (5.9), {P,} is necessarily tight. On the
other hand, if the finite-dimensional distributions in Example 5.1 converge weakly, then {P,} may or may not be tight-and the essential
task in Chapter 2 is to sort out the cases.
Problems
5.1. In connection with Example 5.2, assume that P,n;f..,,
+ ptl...tk for all
tl...tk and prove directly that the pt l . . . t k are consistent in the sense of
Kolmogorov's existence theorem.
5.2. If S is compact, every separating class is a limit-determining class.
5.3. Let ll consist of the 6, for 2 in A . Show directly from the definition that II
is relatively compact if and only if A- is compact in S.
5.4. A weak limit of a tight sequence is tight.
II is tight, then its elements have a common a-compact support, but the
converse is false.
5.5. If
5.6. A sequence of probability measures on the line is tight if and only if, for
the corresponding distribution functions, we have limr+oo F,(z) = 1 and
limr+-oo Fn(z)= 0 uniformly in n.
SECTION6. A MISCELLANY
65
5.7. A class of normal distributions on the line is tight if and only if the means and
variances are bounded (a normal distribution with variance 0 being a point
mass).
5.8. A sequence of distributions of random variables X , is tight if it is uniformly
integrable.
5.9. Probability measures on S’x S” are tight if and only if the two sets of marginal
distributions are tight on S’ and S”.
5.10. As in Problem 1.10, suppose that S is separable and locally compact. Assume
that P, f + P f for all continuous f with compact support; show first that
{P,} is tight and second that P, + P.
5.11. Consider the following analogues of the concepts of this section: Points on the
line play the role of probability measures on S, ordinary convergence on the
line plays the role of weak convergence, relative compactness has the same
definition (every sequence has a convergent subsequence), and boundedness
plays the role of tightness. For each of the following facts, find the analogue
in the theory of weak convergence.
(a) If each subsequence of {x,} contains a further subsequence converging to
x, then x, + x.
(b) If (2,) is bounded, then it is relatively compact (and conversely).
( c ) If {x,} is relatively compact and every subsequence that converges at all
converges to x, then x,, + x.
(d) The same with “bounded” in place of “relatively compact.”
(e) If {x,} is bounded, sinx, + sinx, and sinrx, + sinrx, then x, + x.
5.12. The example in Problem 1.13 shows that Theorem 1.3 fails without the assumption of completeness, and hence so does Theorem 5.2. Here is an example of a separable S on which is defined a relatively compact (even compact),
non-tight set of probability measures the individual elements of which are
tight.
(a) Let Q be the closed unit square and put L, = [(x,y):O 5 y 5 11 for
0 5 x 5 1. Let K be the class of compact K in Q that meet each L,.
From the fact that K contains each [(x,y): 0 5 x 5 1 ] , 0 5 y 5 1, conclude
that K has the power of the continuum.
(b) Let x c) K , be a one-to-one correspondence between [ 0,1] and K. For
each x choose a point p , in K , n L,, and let A = [p,: 0 5 x 5 11. Show
that (i) A meets each L , and (ii) A meets each K in K.
( c ) Let S = Q - A . Show that (i’) S misses exactly one point on each L , and
(ii’) if K is a compact subset of S, then K misses some L,.
(d) Let P, be Lebesgue measure on L , n S (just one point on L, is missing).
Show that x, + x implies P,,, =+ P,. Show that each P, is tight but
that [P,:O5 x 5 11 is not tight.
SECTION 6. A MISCELLANY*
Of the four topics in this section, the first, on the ball a-field, is used
only in Section 15, and the second, on Skorohod’s representation theorem, is not used at all. The last two, on the Prohorov metric and a
coupling theorem, are used only in Section 21.
66
W E A K CONVERGENCE IN
METRICSPACES
The Ball +Field
Applications to nonseparable function spaces require a theory of weak
convergence for probability measures on the ball a-field So, the one
generated by the open balls. Although So coincides with the full Bore1
a-field S in the separable case, it may be smaller if S is not separable.
In this case, a continuous real function on S may not be measurable
So, which creates problems: If S is discrete and uncountable (Example
1.4) and A is neither countable nor cocountable, then I A is continuous
but not measurable So.
Lemma 1. (i) If M is separable, then p( , M ) is So-measurable
as well as uniformly continuous.
(ii) If M is separable, then M 6 E SOf o r positive 6 .
(iii) If M is closed and separable, in particular if at is compact, then
M E So.
PROOF.Since [ x : p ( x , y ) < u] = B(y,u) E So, it follows that
p( y) is So-measurable for each y . If D is countable, then p( - ,0 )
is So-measurable; and if D is dense in M , then p( ,0 ) = p( - ,M ) .
s ,
Hence (i), and (ii) follows. Finally, (iii) follows from (ii) and the fact
that M =
M1/, for closed M .
0
n,
The theory of weak convergence for So is closely analogous to that
for S.
Theorem 6.1. If P is a probability measure o n (,!?,SO),A is a n
So-set, and E is positive, then there exist a closed So-set F and an open
So-set G such that F c A C G and P(G - F ) < E .
PROOF.
The proof is analogous to that of Theorem 1.1. Let Q be
the class of So-sets with the asserted property. If A = [y : p ( x , y ) 5 r]
is a closed ball, take F = A and G = [ y : p(x, y) < r 61 for small 6:
A E Q. It is now enough to show that Q is a a-field, and the argument
in Section 1 carries over without change.
0
+
Call a probability measure on So separable if it has a separable
support (in So). By Lemma l(iii), the support can be assumed closed.
Theorem 6.2. Suppose probability measures P and Q o n SO are
separable. If Pf = Qf for all bounded, uniformly continuous, Someasurable functions f, then P = Q.
PROOF.Let M be a common closed, separable support for P and
Q. By part (i) of Lemma 1, the function
f(x) = 1 - (1 - p ( ~(F')C
,
nM ) / E ) +
(6.1)
SECTION6. A MISCELLANY
67
is So-measurable as well as uniformly continuous, and by part (iii), the
set A, = (F')' n M lies in So. If z E F , then p(z, A') 2 E, and hence
f (x)= 1. And f is supported by A: = F' U M c . Thus IF 5 f 5 I A ~ .
If F E So, then PF 5 Pf = Qf 5 QA: = Q(F' U M c ) = Q(F').
Suppose that F is closed and let E 1 0: PF 5 QF.Since, by symmetry,
QF 5 PF as well, P and Q agree for closed So-sets and hence, by
Theorem 6.1, agree on So.
0
For probability measures P, and P on SO,if P,f + Pf for every
bounded, continuous, So-measurable real function f on S , we say that
P, converges weak'ly to P and write P, =$" P. By Theorem 6.2, a
sequence cannot have two distinct, separable weak0 limits.
Theorem 6.3. If P has a separable support, then these five conditions are equivalent
(i)
(ii)
(iii)
(iv)
(v)
P.
P, f + Pf for all bounded, So-measurable, uniformly continuous f.
limsup, PnF 5 PF for all closed F in So.
liminf, P,G 1. PG f o r all open G in So.
P,A + PA f o r every So-set A f o r which So contains a n open G
and a closed F such that G c A c F and P(F - G)= 0.
P,
JO
PROOF.As in Theorem 2.1, the implication (i) + (ii) is trivial. To
prove (ii) + (iii) we can use (6.1) and the set A, again, where M is a
closed, separable support for P. Since lim SUP, P,F 5 lim SUP, P, f =
Pf 5 PA: = P(F' U M C )= P(F'),we can let E 1 0, as before.
The equivalence of (iii) and (iv) is again simple. To prove that (iii)
and (iv) together imply (v), replace A" by G and A- by F in (2.3).
Finally, suppose that (v) holds and that f is bounded, continuous,
and measurable with respect to SO.Let Gt = At = [f > t] and Ft =
[f 2 t ] . Then the So-sets Gt and Ft are open and closed, respectively,
and Ft - Gt C [f = t ] . Except for countably many t , therefore, we
have P(Ft - Gt)= 0 and hence P,[f > t] + P[f> t ] ,and so the old
argument goes through.
0
There is also a mapping theorem. Let S' be a second metric space,
with ball a-field S;. Let h map S into S'.
Theorem 6.4. Suppose that M (in So) is a separable support f o r
P and that h is measurable So/S; and continuous at each point of M .
If Pn +"P (in S),then P,h-' *"Ph-' (in S').
68
WEAK CONVERGENCE IN
METRIC SPACES
PROOF.Let G' be an open Sh-set, and put A = h-lG'. Since
h is continuous at each point of M , A n M c A'. Since A n M is
separable, there is a countable union G of open balls which satisfies
A n M c G c A" c A. But since SO contains A n M and the open
set G, liminf, P,h-l(G') = liminf, P,A 2 liminf, P,G 2 P G 2
0
P ( A n M ) = P A = Ph-'(GI).
Define a sequence {P,} of probability measures on So to be tight'
if for every E there is a compact set K such that, for each 6,t
By Lemma 1, K and K 6 are in SO,and so is K6 = [ z : p ( z , K )5 61. If
{P,} is tight' and P, J' P, then, since K6 is closed and contains K 6 ,
it follows by Theorem 6.3, together with (6.2), that PK6 2 1 - E. Let
6 10: PK 2 1 - E , and so P is tight in the old sense.
The next result is the analogue of Prohorov's theorem.
Theorem 6.5. If {P,} is tight', then each subsequence {Pni}
contains a further subsequence {Pni(m)} such that Pni(m) =+& P for
some separable probability measure P on SO.
PROOF.
Since a subsequence of a tight' sequence is itself tight', it
is enough to construct a weakly convergent subsequence of { P,} itself.
Choose compact sets Ku in such a way that K1 c K2 c - .and
for all u and 6. Now follow the proof of Theorem 5.1.
Define the classes A and 7-i just as before. By the diagonal method,
along which the limit a o ( H 6 )= limi PniH6
choose a subsequence {Pni}
exists for each of the countably many sets H 6 for H in 'H and 6 a
positive rational. Now put a ( H ) = limb ao(H6),where 6 decreases to
0 through the rationals.
~ Hf U H i , it also
Clearly, a satisfies (5.3). Since (HI U H z ) =
satisfies (5.5). And if H I and H2 are disjoint, then (compactness) Hf
and H l are also disjoint for small 6, and so a satisfies (5.4). Define
P(G) for open G by (5.6) and y(M) for arbitrary M by (5.7). Then
Steps 1 through 6 of the previous proof go through exactly as before.
t Problem 6.1 shows why the 6 and the limit inferior are needed here, even though
they are not needed for the theory based on S.
SECTION6. A MISCELLANY
69
Let P be the restriction of y to So. By (6.3), ao(K$) 2 1 - u-l
( K , E X),and so a(&) 2 1 - u-l.Therefore, (5.8) holds again, and
P is a probability measure on SO.Finally, if H c G, where H E Fl,
G is open, and G E So, then (compactness again) H 6 c G for small
6, and so a ( H ) 5 ar~(H‘) 5 liminfi PniG. Therefore, PG = P(G) 5
lim infi P,,G, and Pni+: P.
The measure P , as the weak’ limit of the tight’ sequence {P,}, is
tight in the old sense, and so it is certainly separable.
0
Suppose that p and p’ are two metrics on S , and let So, S, SA, S‘
be the corresponding ball and Bore1 a-fields. If p’ is finer [M2] than p,
then each popen set is also p’-open, and hence S c S’. In Section 15
there arise a function space S and a pair of metrics for which
(6.4)
s = s; c s‘.
If Pn and P are probability measures on S = SA, one can ask whether
(6.5)
Pn
+P
rep
(“re” for “with respect to”) and whether
(6.6)
P,
+’ P
rep‘.
Suppose P has a separable support M (in S = SA).If (6.6) holds, then
A and hence holds for
limsup, P,F 5 P F for each p’-closed F in S
each pclosed F in S. Thus (6.6) implies (6.5). Now assume also that
pconvergence to a limit in M implies p’-convergence. If the SA-set F is
p’-closed, and if F1 is its pclosure, then M n F 1 c F , and (6.5) implies
lim SUP, PnF 5 lim SUP, PnF1 5 PF1 = P ( M n F1) 5 P F , from which
(6.6) follows.
Theorem 6.6. Suppose that p’ is finer than p, that p-convergence
to a limit in M implies p’-convergence, that (6.4) holds, and that M
(in S = SA) is separable and supports P. Then (6.5) and (6.6) are
equivalent.
Even though (6.5) and (6.6) are equivalent here, (6.6) can give
more information: If h is a real So-measurable function on S , then
P,h-’ + Ph-’ follows from (6.5) if h is pcontinuous and from (6.6)
if h is p’-continuous. And if p’ is strictly finer than p, then there are
more p’-continuous functions than there are pcontinuous functions.+
Problem 6.4.
70
WEAK CONVERGENCE IN
METRICSPACES
Skorohod's Representation Theorem
If P, =+-, P , then there are (see (3.4)) random elements X , and X
such that X , has distribution P,, X has distribution P , and X, =+X .
According to the following theorem of Skorohod, if P has a separable
support, then it is possible to constuct the X and all the X , on a common probability space and to do it in such a way that X,(w) -',
X(w)
for each w . This leads to some simple proofs. For example, if P, +, P
(and P has separable support), and if h: S -' S' is continuous, construct X , and X as just described; then h ( X n ( w ) -)'n h ( X ( w ) )for all
w , and hence (see the corollary to Theorem 3.1) h X , J
, h X , which
is equivalent to P,h-l =+-, Ph-l. This gives an alternative approach
to the mapping theorem.
Theorem 6.7. Suppose that P, = s P~ and P has a separable
support. Then there exist random elements X , and X , defined on a
common probability space ( Q F P),
, such that L ( X n )= P,, L ( X )= P ,
and X n ( w )+, X ( w )for every w .
PROOF.
We first show that, for each E , there is a finite S-partition
Bo,B1,.. . , B k of S such that
PBo < E ,
P(BBi) = 0, i = 0 , 1 , . . . , k,
diam Bi < E , i = 1,.. . ,k.
Let M be a separable S-set for which PA4 = 1. For each z in M ,
choose r, so that 0 < r, < ~ / 2and P ( a B ( z , r , ) ) = 0. Since M is
separable, it can be covered by a countable subcollection A l , A2, . . .
of the B(x,?-,). As k t 00, P(&Ai)
t F'(UzlAi) 1 P M = 1,
and we can choose k so that P(&
Ai) > 1 - E . Let B1 = A1 and
Bi = A; n ... n A;-l n Ai, i = 2 , . . .,k,and take Bo = (&A$.
Then the middle relation in (6.4) holds because dBi C U,"=,aAj for
i = 0, 1,. . . ,k, and the other two are obvious.
Take Em = 1/2m; there are S-partitions B p , B r , . . . ,B E such that
Amalgamate with B r (which need not have small diameter) all the
B y for which P ( B T ) = 0, so that P( - [ B y )is well defined for i 2 1.
SECTION
6. A MISCELLANY
71
By the middle relation in (6.8) and the assumption that P,
is for each m an nm such that n 2 nm implies
& ( B y ) 2 (1- cm)I'(B,"),
(6.9)
=$
P , there
i = 0,1,. . . ,km.
We can assume nl < n2 <
There is a probability space supporting a random element with any
given distribution on ( S ,S),and by passing to the appropriate infinite
product spacet we can find an (a,3,P ) that supports random elements
X , Y, ( n 2 l), Yni(n,i 2 l), 2, ( n 2 1) of S and a random variable
E , all independent of one another, having these three properties: First,
L ( X )= P and L(Y,) = P,. Second, if nm I n < n m + l , then L(Y,i) =
P,(. IBT). Third, if nm 5 n < nm+l,then L(Zn)= p,, where
C P~(AIBz">[P,(B,")
- (1- ~ m ) p ( ~ r ) l ;
km
p , ( ~ )= E ; ; E ~
i=O
by (6.9), pn is a probability measure on ( S ,S). Finally, [ is uniformly
distributed over [ 0,1].
For n < n1 (if n1 > l),take X , = Y,. For n, 5 n < mm+l,define
xn =
c
km
+
4 6 5 1 - ~ m , x E B ~ ] y n 246>1-cm]&.
i=O
If nm 5 n < nm+1, then, by independence and the definitions,
P[X, E A]
-
C P[J 5 1 km
i=O
= (1- Em)
Em,
X E .By, Yni E A]
+ P[< > 1 - Em, Zm E A]
C P[X E B,"]P(A\Brj+ e m p n ( A ) = P,A.
km
i=O
Thus L ( X n ) = P, for each n. Let Em = [ X $! B r , [ I 1 - em] and
E = lim inf, Em. If nm 5 n, < nm+l,then on the set E m , X, and X
lie in the same B Y ,which has diameter less than em. It follows that
X , +, X on E , and since P(E&) < 2 ~ m we
) have PE = 1 by the
Borel-Cantelli lemma. Redefine X, as X outside E . This aoes not
change its distribution, and now X,(w) +, X ( w )for all w.
0
t
See Halmos [36], p. 157,
72
W E A K CONVERGENCE IN
METRICSPACES
The Prohorov Metric
Let P be the space of probability measures on the Borel a-field S
of a metric space S. If one topologizes P by taking as the general
basic neighborhood of P the set of Q such that lPfi - Qfil < E for
i = 1,.. . , Ic, where the fi are bounded and continuous on S, then weak
convergence is convergence in this topology. Here we study the case
where P can be metrized.
The Prohomv distance n(P,Q ) between elements P and Q of P is
defined as the infimum of those positive E for which the two inequalities
P A I Q A ' + E , QALPA'+r
(6.10)
hold for all Borel sets A. We first prove a sequence of facts connecting
this distance with weak convergence and then, at the end, state the
most important of them as a theorem.
(i) The Prohorov distance is a metric o n P . Obviously r ( P , Q ) =
r ( Q , P ) and T(P,
P ) = 0. If r ( P , Q ) = 0 , then for positive E , P F 5
QF'+E, and if F is closed, letting E J. 0 gives PF 5 Q F ; the symmetric
inequality and Theorem 1.1 now imply P = Q. If r ( P ,&) < €1 and
n(Q,R) < € 2 , then PA 5 QA'l + E I 5 R(AE1)E2
+ E I + E Z I R(A'1+'2)
€1 + € 2 . The triangle inequality follows from this and the symmetric
relation.
+
(ii) If P A 5 QA'+E for all Borel sets A, then n(P, Q ) 5 E . In other
words, if the first inequality in (6.10) holds for all A, then so does the
second. Note that A C S - BEand B c S - A' are equivalent because
each is equivalent to the condition that p(z, y) 2 E for all z in A and y
in B. Given A, take B = S - A'; if the first inequality in (6.10) holds
for B , then PA' = 1 - P B 2 1 - QB'- E = Q ( S - B E )- E 2 & A - c .
(iii) Ifr(P,, P ) --t 0 , then P, + P . Suppose that T(P,, P ) < E,
0. If F is closed, then limsup, P,F 5 limsup,(PF'n + ),E = P F .
+
(iv) If S is separable and P, + P , then n(P,, P ) + 0. For given
let {A*}be an S-partition of S into sets of diameter less than E .
Choose k so that P(Ui,I,Ai) < E and let 8 be the finite class of open
sets (Ai, U . - .U Aim)' for 1 5 il <
< ,i 5 k. If P, =$ P ,
then there exists an no such that n 2 no implies that P,G > P G - e
for each G in B. For a given A, let A0 be the union of those sets
among Al,. . . ,AI,that meet A. Then A6 E 8, and n 2 no implies
P A I PA0 P(Ui,k Ai) 5 PA0 E <
2~ 5 PnA2' 2 ~ .
Use (ii).
E,
+
+
+
+
SECTION6. A MISCELLANY
73
(v) If S is separable, then a set in P is relatively compact in the
sense of Section 5 if and only if its n-closure is n-compact. In the separable case, weak convergence and n-convergence are the same thing,
and so this is just a matter of translating the terms.
(vi) If S is separable, then P is separable. Given E , choose an Spartition {Ai}as in the proof of (iv). In each nonempty Ai choose
a point xi, and let II, be the countable set of probability measures
having the form CiskriSzi for some k and some set of rationals ri.
Given P E P , choose k so that P(Ui,kAi) < E, then choose rationals
r 1 , . . . , r k in such a way that
ri = 1 and CiskIri - P(Ai)I < E ,
and put Q = CiSk
ribzi. Given set A , let I be the set of indices i
(i 5 k) for which Ai meets A. If A0 = Uier Ai, then P A 5 PA0 E =
CiGz
P(Ai) E 5 Xiel ri 2~ = QAo 2~ 5 QA' 2 ~ By
. (ii), n, is
a countable 2e-net for P .
a
+
+
+
+
+
(vii) If S is separable and complete, then P is complete. Suppose
that {P,} is n-fundamental. It will be enough to show that the sequence is tight, since then it must (by Theorem 5.1 and (iv) above)
contain a n-convergent subsequence. The proof can be completed by
the argument in the proof of Theorem 5.2 if we show that for all E and 6
there exist finitely many S-balls Ci such that P,(C1 U.- - U C,) > 1 - E
for all n. First, choose so that 0 < 277 < E A 6. Second, choose
no so that n 2 no implies r(Pno,P,) < 7. Third, cover S by balls
Ai = B(z i , ~ and
) choose rn so that P,(A1 U
U A,) > 1 - 77 for
n 5 no. Third, let Bi = B(xi,27). If n 2 no, then P,(& U . . SUB,) 2
Pn((AIU.,.UA,)~)2 Pno(AlU...UA,)-q2 1-277. I f n 5 no, then
Pn(B1U .. -U B,) 2 P,(Al U U A,) 2 1 - 7. Take Ci = B(zi,S),
i 5 m.
q
.
Theorem 6.8. Suppose that S is separable and complete. T h e n
weak convergence is equivalent to r-convergence, P is separable and
complete, and a set in P is relatively compact if and only if its nclosure is n-compact.
A Coupling Theorem
Suppose M is a probability measure on S x S that has marginal measures P and Q on S and satisfies
I.
M [ ( s ,t ) :P ( S , t>>
(6.11)
< a,
where p is the metric on S. Then, for positive q, QA = M ( S x A ) 5
M [ ( s ,t ) :p ( s , t ) > a] M((A")- x S ) < a P(A*)- 5 a P(A"+T).
+
+
+
WEAKCONVERGENCE
IN METRICSPACES
74
This and the symmetric relation imply that the Prohorov distance
satisfies n(P, Q ) 5 a. The following converse, the Strassen-Dudley
theorem, is used in proving the approximation theorem in Section 21.
Theorem 6.9. If S is separable and n(P,Q) < a , then there is
a probability measure M on S x S that has marginals P and Q and
satisfies (6.11).
To begin the proof, choose P so that n ( P , Q ) < P < a,and then
choose E so that p 2~ < a and E < p. As in the proof of Theorem 6.7,
use separability to find a partition Bo,B1, . . . , BI,such that PBo < E
and diamBi < E for i > 1. Find a similar partition for Q and replace
{Bi}by the common refinement, so that PBo < E , QBo < E , and
diamBi < E < / ? f o r i = 1,..., k.
Let pi = PBi and qj = QBj.for 0 5 i , j 5 Ic, and let A be the set
of pairs ( i , j ) such that 1 5 i,j 5 k and dist (Bi,Bj)< p. It will be
shown later that there exists a (k 1) x (k 1) array of nonnegative
numbers pij such that
+
+
+
Assuming for the moment that there exists such an array, we can
complete the proof this way: Let Mij be the product of the conditional
probability measures P( * )Bi)and Q( 1Bj);if PBi or QBj is 0, take
Mij(S x S) = 0. Then Mij is a measure on S x S supported by
Bi x Bj. Let M be the probability measure CijpijMij. Then (sum
over the (z,j) for which pij > 0) M ( A x S ) = &pijMij(A x S ) =
pijP(A(Bi)Q(S(Bj)
= Cijp i j P ( A n B i ) / p =
i
P(AnB,) = P A .
Thus P is the first marginal measure for M , and similarly Q is the
second.
To prove (6.11), observe first that, if (i,j)E A, then Bi and Bj have
Bj) < P. Choose sij
diameters less than E (since i , j 2 1) and dist (Bi,
in Bi and t i j in Bj in such a way that p(sij, t i j ) < P. If s E Bi and
t E Bj,then p ( s , t ) F p ( s ,sij) p ( s i j , t i j ) p ( t i j , t ) < ,O 2~ < a. It
follows by (6.12) that
xi
zij
+
Hence (6.11).
+
+
SECTION6. A MISCELLANY
75
There remains the (considerable) task of constructing an array
(6.12). We start with finite sets X , Y ,and R c X x Y.
If (x, y) lies in R, write xRy: x and y stand in the relation (represented
by) R. For A C X , let AR be the set of y in Y such that z R y for some
z in A. Indicate cardinality by bars.
{ p i j } satisfying
Lemma 2. Suppose that IdR[2 IAl for each subset A of X . Then
there is a one-to-one map cp of X into Y such that xRcp(x) for every
x inX.
This can be understood through its interpretation as “the marriage lemma”:
X is a set of women, Y is a set of men, and x R y means that 2 and y are mutually
compatible. The hypothesis is that for each group of women there is a group of men,
at least as large, each of whom is compatible with at least one woman in the group.
The conclusion is that each woman in X can be married off to a Compatible man.
Some men may be left celibate, but not if the total numbers of men and women
are the same; in the notation of the lemma, if 1
x1 = lYl,then the map cp of the
conclusion carries X onto Y.
PROOF.For U C X and V c Y ,call the pair ( U , V )an R-couple
if [AR n VI 2 IAl for every A c U. This is just the hypothesis of the
theorem formulated for the relation R n (U x V )on U x V . Call f an
V )if it is a one-to-one map of U into V and u R f ( u )
R-pairing for (U,
for all u E U.We want to find an R-pairing cp for the R-couple ( X ,Y ) .
The proof of the lemma goes by induction on n =
It obviously
holds for n = 1. Assume that it holds for sets of size less than n. The
induction hypothesis implies this:
1x1.
(T) If U is a proper subset of X and (U,
V )is an R-couple, then there
is an R-pairing for (U,
V).
Fix an zo in X ; by the hypothesis, zoRyo for some yo in Y. There
are two possibilities: (a) ( X - {zo}, Y - {yo}) is an R-couple, and (b)
it is not. Assume (a). It then follows by (T) that there is an R-pairing
f for ( X - {zo}, Y - {yo}). Extend f to an R-pairing cp for ( X ,Y )by
taking cp(x0)= yo, and we are done.
So assume (b). Then ( X - {zo}, Y - {yo}) is not an R-couple, and
there is an A c X - {xo} such that IAR\{yo}l < [A[. But since ] A R /2
( A (by the hypothesis of the lemma, we have [ A R [2 IAl > IAR\{yo}l.
This means that (yo E AR and) [AR]= IAl. Since (by the hypothesis
of the lemma) IBRn ARI 2 IBI for B c A, (A, AR) is an R-couple,
and by (T) there is an R-pairing 1c, for (A,AR). Again consider two
possibilities: (ao) ( X - A, Y - AR) is an R-couple, and (b”) it is not. If
76
WEAK CONVERGENCE IN
METRIC SPACES
(a") holds, then, by (T),there is an R-pairing x for ( X - A,Y - AR).
But this x can be combined with 1c, to give an R-pairing cp for ( X , Y ) .
It remains only to rule out (b") as a possibility. If (b") does hold,
then there is a C c X - A such that ICR\AR( < (GI. But then (recall
) have [ ( AU C)RI= [ A R [ [@\AR[ < IAI ICI =
that lARl = [ A [we
0
/ AU CI, which contradicts the hypothesis of the lemma.
+
+
Next consider finite sets I , J , and a subset D of I x J . Consider
also a rectangular array of cells corresponding to the ( i , j ) in I x J ,
together with nonnegative numbers Q and r j . Think of the Q in a row
below the array (i running from left to right) and the rj in a column
to the left ( j running from bottom to top). For E c I , let E D be the
set of j such that ( i , j ) E D for some i E E .
Lemma 3. Suppose that, for each E
c I,
(6.13)
Then there are nonnegative numbers
(i,j) E D,and
mij
such that
mij
> 0 only for
(6.14)
If
xiGI
ci = CjE rj, then there is equality on the right in (6.14).
PROOF.
Suppose first that the ci and rj are rational. By multiplying through by a common multiple of the denominators, we can ((6.13)
and (6.14) are homogeneous) arrange that the ci and rj are nonnegative integers. Take Ai to be disjoint sets with (Ail = ci, take Bj to
be disjoint sets with lBjl = r j , and take X = UiAi,Y = Uj Bj. Now
define R c X x Y by putting into R those (x,y) such that the indices
of the Ai and Bj containing them (x E Ai, y E Bj) satisfy ( i , j ) E D.
There is an interpretation of this setup in terms of wholesale marriage. Suppose
the women are divided into clans and the men are divided into hordes. Certain clans
and hordes are compatible, and a woman-man pair is compatible if and only if the
clan and horde they belong to are compatible.
We can verify the hypothesis of Lemma 2. Suppose that A c X ,
and take E = [ i : A n A i# 81. If i $ E , then ( A n A i ) R
= 0. If i E E ,
then ( An Ai)R= U j : l i j I E Bj.
D Therefore, AR = UiEEUj:(i,j)ED
Bj =
U j E E D Bj. By (6.13)~lARl = C ~ E E DlBjl 2 X ~ E]Ail
E = IAl. The
hypothesis of Lemma 2 is thus satisfied, and there is a one-to-one map cp
SECTION
6. A MISCELLANY
77
of X into Y such that, for each z in X, zRq(z).Let mij be the number
of points in Ai that are mapped into Bj by cp. Then C jmij = ci (every
mij 5 IBj I = rj (9 is
point of Ai is mapped into some Bj),and
one-to-one). Hence (6.14). If mij > 0, then there are points z and y
of Ai and Bj such that zRy,which implies (i,j) E D.
To treat the general case, choose rational ci(n>and r j ( n ) such that
0 5 cj(n) cj and r j ( n ) rj ( D does not change). Let {rnij(n)} be a
solution for these marginals, and pass to a sequence along which each
mij(n)converges to some mij. This gives a solution to (6.14).
If
ci = Cj r j , then Cj
mij = Cj r j , and none of the inequalities in (6.14) can be strict.
0
xi
xi
xi
COMPLETION OF THE PROOF OF THEOREM
6.9. w e can use
Lemma 3 to finish the proof of Theorem 6.9. We must construct a
(k 1) x (k 1) array { p i j } satisfying (6.12). Let I = { O , l , . . . , k},
J = {0,1,.. . , k, GQ},ci = pi = PBi for 0 I
i 5 k, rj = q j = QBj
for 0 I
j 5 Ic, and roo = ,B 2 ~ .Use for D the elements of the
set A (defined just above (6.12)) together with the pairs (i,ca)for
a = O , l , . . . , k . If E is a nonempty subset of I , then, since ,B was
chosen so that n(P, Q ) < p,
+
+
+
EiEEci
~ P B O + P ( U ~ ~ ~ \<~+&(u,
~~)B')
B,)'+P
' '(
+
Bo
"
8EE\tO)
( ~n 13;))
f
U~EE\{O)
+ p.
Suppose that z E Bf, where 1 5 i 5 k, and z $ Bo; then z E Bj for
some j , 1 5 j 5 k, and dist (Bi,Bj)< fl, which implies ( i , j ) E A,
which in turn implies j E ED\(m}. Therefore, Ui,E\{o)(Bf n B,C) c
U j E E ~ \ { mBj,
) and it follows that
Thus (6.13) holds, and there exist
for (i,j) E D ,such that
(6.15)
{
(4
(b)
(4
mij
for (i,j) E I x J , positive only
CjEJmij = Pi,
&mij
I qj7
xi,1 mioo
I
2 E + P.
for 0 5 i 5 k,
for 0 5 j 5 k,
Reduce J to J' = {0,1,. . . , k}, let D' = I x J', and apply Lemma 3
a second time, with ci = miooand ri = qj rnij (all nonnegative).
xiel
WEAKCONVERGENCE
IN METRIC
SPACES
78
If E c I and E # 8, then ED’= J’, and by (6.15a) and the fact that
CiE-Pj = CjEJt
q j = 1, we have
The hypothesis of Lemma 3 is therefore satisfied. Since (6.16) also
= CjEJ,
r;, there exist mij such that
implies
xiGI
4
(6.17)
(a)
CjEJt
mGj = ci
mf. = r‘.
$3
Take pij = m i j
3
for i E I ,
for j E J’.
+ mij for (i,j)E D’.F’rom (6.15a) and (6.17a) follows
from (6.17.b) follows C i E l p i j = q j . And finally, since
mij > 0 only if (i,j)E A, it follows by (6.15a) and (6.15~)
that
C j E J t p i j = pi;
Therefore, (6.12) is satisfied.
Problems
0
+
6.1. Metrize S = [ 0 , l ) by p ( z , y) = z y for z # 9. Describe the balls in this
space; show that SO consists of the ordinary Borel sets and that S = 2[O1’).
Show that each point of (0,l) is isolated. An infinite compact set must consist
of 0 and a sequence converging to 0 in the ordinary sense. Define P,, on SO
as the uniform distribution over [2-n-1,2-n]. If K = {0}, then PnK6 = 1
for large n, and so {P,} is tight’. On the other hand, if K is compact, then,
for each n, PnK6 = 0 for small enough 6.
6.2. Does SOconsist of the S-sets that are either separable or coseparable?
6.3. Prove in steps that a separable probability measure on SO has a unique extension to S. Let P be the measure and M the separable support; let U and
0 be the classes of open balls and open sets in S.
(a) Show [PM.159]that the ball and the Borel a-fields in M (with the relative
topology) are M n a@) = M nSOand M n a(U) = M nS and that they
coincide.
(b) Define a probability measure P’ on M nSo = M nS by P’( M nB ) = P B
for B E SO.Define a probability measure P” on S by setting P”A =
P’(M n A ) for A E S.
( c ) Now show that P”B = P B for B E SOc S.
(d) Suppose that Q“ is a second extension. If A E S,then M n A = M n B for
some B E So; use this fact to prove that PI’ and Q” are identical (on S).
SECTION6. A MISCELLANY
79
6.4. Suppose that every p'-continuous function is also pcontinuous; show that p
is finer than p'.
6.5. There is another way to define weak convergence for SO.
If p is a probability
s*
f dp
measure on SOand f is bounded, define the upper [lower] integral
f dp] as the infimum [supremum] of s g d p for bounded, So-measurable
functions g satisfying g 2 f [g 5 f ] . Let P, be probability measures on SO,
let P be a probability measure on S, and take PO to be the restriction of P
to SO.Define P, + P (weak*) to mean that
[S,
*/K:
f dP, = lim/ f dP, =
/
f dP
for all bounded, continuous f . Show that this is equivalent to P,
JO
PO.
6.6. Let h and h, be maps from S to S', each measurable SIS'. Let E be the set
of z such that hnzn f , hz for some sequence {z,} converging to z. Suppose
that P, + P , P has separable support, E E S, and P E = 0. Use Theorem
6.7 to show that P,hi1 + Ph-l.
6.7. Suppose that S is separable. For probability measures P and Q, take M ( P ,Q )
to be the set of probability measures on S x S having marginal measures P
and Q. Let d ( P , Q ) = inf inf[a:M [ ( s ,t ) :p(s, t ) > a] < a],where the outer
infimum is over the M in M ( P ,Q ) . Show that d ( P ,Q ) = r ( P , Q).
6.8. Show that r ( P , Q ) is unchanged if we require (6.10) only for closed sets A.
+
z) and Jz1< c, then
n(P,Q) < E .
6.10. For distribution functions F on the line, consider the completed graphs r F =
at
[(z,y): F ( z - ) 5 y 5 F ( z ) ] .Each line y = a - z meets each of r F and
a single point; let L(F,G) be the supremum over a of the distance between
these two points. Show that L , the Livy metric, is indeed a metric. Show that
this metric carried over to P in the natural way is equivalent to the topology
of weak convergence and to the Prohorov metric for the case of the line.
6.9. For measures on the line, show that, if Q ( A )= P ( A
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
CHAPTER 2
THESPACE C
SECTION 7. WEAK CONVERGENCE AND TIGHTNESS
IN C
The Introduction shows by example some of the reasons for studying
weak convergence in the space C = C [0,1] of continuous functions on
the unit interval, where we give to C the uniform topology, defining
the distance between two points z and y (two continuous functions x(-)
and y(.) on [ 0,1]) as
P h Y ) = llz - YII = SUP I4t) - Y(t>I.
t
Although weak convergence in C need not follow from the weak
convergence of the finite-dimensional distributions alone (Example 2 . 5 ) ,
it does in the presence of relative compactness (Example 5.1). And
since tightness implies relative compactness by Prohorov’s Theorem
5.1, we have this basic result:
Theorem 7.1. Let Pn,P be probability measures o n (C,C). If the
finite-dimensional distributions of Pn converge weakly to those of P ,
and i f {Pn} is tight, then P, + P .
Tightness and Compactness in C
In order to use Theorem 7.1 to prove weak convergence in C , we need
an exact understanding of tightness and hence of compactness in this
space. The modulus of continuity of an arbitrary function z(-) on [ 0,1]
is defined by
(74
%(6) = W ( 2 , S )= sup
Is-tl<b
- z(t)I,
0 < 6 5 1.
A necessary and sufficient condition for x to be uniformly continuous
over [ 0,1] is
(7.2)
80
lim ~ ~ (=60. )
640
SECTION
7.
W E A K CONVERGENCE AND
c
TIGHTNESS
IN
81
- wy(6)I 5
And an z in C satisfies (7.2). Note that, since Iwx(6)
2 p ( z , y), w(z,S) is, for fixed positive 6, continuous in z.
Recall that A is relatively compact if A- is compact, which is equivalent to the condition that each sequence in A contains a convergent
subsequence (the limit of which may not lie in A).t The Arzelh-Ascoli
theorem completely characterizes relative compactness in C:
Theorem 7.2. The set A is relatively compact if and only if
(7.3)
and
(7.4)
lim sup wx(6)
= 0.
6+0 X E A
The functions in A are by definition equicontinuous at to if, as t +
to, supzEAIz(t)- z(to)l + 0; and (7.4) defines uniform equicontinuity
(over [ 0,1]) of the functions in A .
If A consists of the functions z, defined by (1.5), then A is not rela-
tively compact; since (7.3) holds, (7.4) must fail, and in fact w(z,, 6) =
1 for n 2 6-l.
PROOF.If A- is compact, (7.3) follows easily. Since w(z,n-l)is
continuous in z and nonincreasing in n, (7.2) holds uniformly on A if
A- is compact [M8], and (7.4) follows.
Suppose now that (7.3) and (7.4) hold. Choose k large enough that
SUP,^^ w X ( k - ' ) is finite. Since
it follows that
(7.5)
supsup Iz(t)l < 00.
t
XEA
The idea now is to use (7.4) and (7.5) to prove that A is totally
bounded; since C is complete, it will follow that A- is compact.
t Although this is analogous to the relative compactness in Prohorov's theorem,
it is technically different because in Section 5 the space of probability measures on
(S, S ) is not assumed metrizable. But see Theorem 6.8.
THESPACEC
82
Let a be the supremum in (7.5). Given E , choose a finite E-net
H in the interval [-a,a] on the line, and choose k large enough that
wz(l/k)< E for all z in A. Take B to be the finite set consisting of
the (polygonal) functions in C that are linear on each interval Iki =
[(i - l)/k,i/k], 1 5 i 5 k, and take values in H at the endpoints. If
z E A , then Iz(i/k)l 5 a , and therefore there is a point y in B such
that I z ( i / k ) - y(i/k)l < E for i = 0,1,. . . ,k. Now y(i/k) is within
2~ of z ( t ) for t E I k i , and similarly for y((i - l ) ) / k . Since y(t) is a
convex combination of y((i - l ) / k ) and y(i/k), it too is within 2~ of
0
z ( t ) :p ( z , y ) < 2 ~ Thus
.
B is a finite 2e-net for A.
Let P, be probability measures on (C,C).
Theorem 7.3. The sequence {P,} is tight if and only if these two
conditions hold
(i) For each positive q, there exist an a and an no such that
(ii) For each positive E and q, there exist a 6 , 0 < 6 < 1, and an no
such that
In connection with (7.7), note that w( 6) is measurable because
it is continuous. Condition (ii) can be put in a more compact form:
For each positive E ,
a
(7.8)
,
lim lim sup P, [z: wz (6) 2 E ] = 0.
6 4 0 n+w
PROOF.
Suppose {P,} is tight. Given q, choose a compact K such
that P,K > 1 - q for all n. By the Arzelh-Ascoli theorem, we have
K c [z: Iz(0)l 5 a] for large enough a and K C [z: w,(6)5 E] for small
enough 6, and so (i) and (ii) hold, with no = 1 in each case. Hence the
necessity.
Since C is separable and complete, a single measure P is tight
(Theorem 1.3), and so by the necessity of (i) and (ii), for a given q
there is an a such that P[z:Iz(0)l 1 a] 5 q, and for given E and q
there is a 6 such that P[z:w,(b) > E] 2 q. If {P,} satisfies (i) and(ii),
therefore, we may ensure that the inequalities in (7.6)and (7.7) hold
SECTION
7.
W E A K CONVERGENCE AND
TIGHTNESS
IN c
83
for the finitely many n preceding no by increasing a and decreasing
6 if necessary: In proving sufficiency, we may assume that the no is
always 1.
Assume then that {P,} satisfies (i) and (ii), with no = 1. Given q,
choose a so that, if B = [x:Ix(0)l I
a ] , then P,B 2 l - q for all n; then
< l/lc], then P,Bk 2 1 - ~ / for2 ~
choose 61, so that, if BI,= [IC: ~~(6,)
all n. If K is the closure of A = B n BI,,then P,K 2 1 - 2q for all
n. Since A satisfies (7.3) and (7.4), K is compact. Therefore, {P,} is
tight.
0
nk
Theorem 7.3 transforms the concept of tightness in C simply by
substituting for relative compactness its Arzelh-Ascoli characterization. The next theorem and its corollary go only a small step beyond
this but, even so, fill our present needs.
Theorem 7.4. Suppose that 0 = t o
(7.9)
< tl < - . < t, = 1 and
+
min (ti - t i - 1 ) 2 6.
l<i<v
Then, for arbitrary x,
and, for arbitramj P ,
Note that (7.9) does not require ti - ti-1 2 6 for i = 1 or i = v.
PROOF.Let rn be the maximum in (7.10). If s and t lie in the same
interval Ii = [ti-l,ti],then IIC(S) - x(t)l 2 Ix(s) - z(ti-l)l Ix(t)x(ti-1)( 5 2rn. If s and t lie in adjacent intervals Ii and Ii+1, then
Ix(s) - z(t)l I
I.(s)
- z(ti-l)l+ Iz(ti-1) - z(ti)l+ Iz(ti)- z(t)l I
3m.
If Is - tl 5 6, then s and t must lie in the same Ii or in adjacent ones,
0
which proves (7.10). And now (7.11) follows by subadditivity.
+
E
Corollary. Condition (ii) of Theorem 7.3 holds iJ for each positive
and q, there exist a 6 , 0 < 6 < 1, and an integer no such that
THESPACEC
84
for every t in [ 0, I].
If t > 1 - 6 ) restrict s in the supremum in (7.12) to t 5 s 5 1. Note
that (7.12) is formally satisfied if 6 > l/q; but we require 6 < 1.
PROOF.Take ti = i6 for i < v = 11/61. If (7.12) holds, then, by
(7.11), (7.7) holds as well ( 3 in
~ place of E ) .
0
Random Functions
Let ( R , F , P ) be a probability space, and let X map R into C: X ( w )
is an element of C with value X t ( w ) = X ( t , w ) at t. For fixed t , let
X t = X ( t ) denote the real function on R with value X t ( w )at w; X t
is the composition T t X , where 7rt is the natural projection defined in
Example 1.3. And let ( X t , ,. . . Xt,) denote the map carrying w to the
point ( x t , ( w ) ,. . . ,Xt,(w)) = 7rtl...tk(X(W))
in IP.
If X is a random function-that is, measurable F/C-then the
,that (Xt,, . . . Xt,) is a
composition 7rtl...tkX is measurable 7 / R kso
random vector. But the argument can be reversed: The general finiteand if 7rttl...tkX
dimensional set has the form A = T ; ~ . ~ , HH, E Rk,
is measurable F/Rk,
then X - l A = (7rt1...tkX)-lH E 7 ; but since
the class Cf of finite-dimensional sets generates C, it follows that X is
measurable F / C . Therefore, X is a random function if and only if each
(Xt,, . . . ,Xt,) is a random vector, and of course this holds if and only
if each X t is a random variable. Let P = PX-' be the distribution
of X ; see (3.1). Then P[(Xt,,..., Xt,) E HI = PT;~..~,H,and so
the finitedimensional distributions of P (Example 1.3 again) are the
distributions of the various random vectors ( X t , . . Xt,).
Suppose that X ,X ' , X 2 , .. . are random functions.
)
)
) .
)
Theorem 7.5. If
holds for all
(7.14)
tl,
.. .
)
tk,
and if
lim lim sup P [ w ( X n 6)
, 2
6-0
n-co
E]
=0
for each positive E , then X n =+, X .
FIRSTPROOF.Let P and P, be the distributions of X and the X,.
Since (7.13) is the same thing as P,7rG1 t k =+, P7rG!..tkand X n +, X is
the same thing as P, +, P , the result'will follow by Theorem 7.1 if we
SECTION
c
7. W E A K CONVERGENCE AND TIGHTNESS
IN
85
show that {X"} is tight in the sense that {P,} is tight. Now Xg J~Xo
implies that { P n r i l } is tight, which in turn implies condition (i) of
Theorem 7.3. Since (7.14) translates into (7.8), condition (ii) of the
theorem also holds, and {P,} is indeed tight.
0
There is a second proof, very different, based on Theorem 3.2.
SECOND
PROOF.
For u = 1 , 2 , . . ., define a map Mu of C into itself
this way: Let Mux be the polygonal function in C that agrees with
z at the points i / u , 0 5 i 5 u,and is defined by linear interpolation
between these points:
2-1
( M u z ) ( t )= ( 2 - .t)z(--)
+ (ut - (i - l ) ) z ( LU ) ,
2-1
2
U
I t s -U .
Since Mux agrees with z at the endpoints of each [(i - l ) / n , 2/71], it is
clear that p(M,z, z) I 2w,(l/u). And now define a map L,: R"+l -+
C this way: For a = (ao,. . . ,a,), take (L,a)(t) = (i - ut)ai-l+ (ut (i - 1))aifor t in [(i - l)/u,i/u]; this is another polygonal function,
and its values are ai at the corner points. Clearly, p(L,a,L,P) =
maxi Iai -,$I, so that L, is continuous; and M, = Lurto...t,if ti = 2/u.
By (7.13), rto...tuxn
+, rt o...t,X (ti = i / u again), and since L,
is continuous, the mapping theorem gives M,Xn J~MuX. Since
p(M,X, X)5 2w(X,l / u ) and X is an element of C , p(M,X, X ) goes
to 0 everywhere and hence goes to 0 in probability, so that MuX =+, X
by the corollary to Theorem 3.1. Finally, from (7.14) and the inequality
p(MuXn,Xn)5 2w(Xn, l/u), it follows that
limlimsupP[p(M,Xn,Xn) 2
u
n
Combine this with M,Xn
=+n
M,X
J,
E]
= 0.
X and apply Theorem 3.2. 0
This second proof does not require the concepts of relative compactness and tightness, and it makes no use of Prohorov's theorem.
See the discussion following the proof of Donsker's theorem in the next
section.
Coordinate Variables
The projection rt, with value r t ( z ) = z ( t )at z, is a random variable
on (C,C). We denote it by zt: For fixed t , zt has value z ( t ) at z.
If P is a probability measure on (C,C) and t is thought of as a time
parameter, then [zt:O I t <_ 11 becomes a stochastic process, and the
86
THESPACEC
xt are commonly called the coordinate varables or functions. We speak
of the distribution of xt under P and often write P[xt E HI in place
of P[x:xt E HI and J z t d P in place of J c z t P ( d z ) . Finally, when
t is a complicated expression, we sometimes revert back to x ( t ) , still
intended as a coordinate variable.
Problems
-
7.1. Let X ( t ) = t [ , where [ is a random variable satisfying P[I[l 2 a] aY-ll2;
for every n,let P,, be the distribution of X . Then {P,} is tight but does not
satisfy (7.12).
7.2. Show that, if the functions in A are equicontinuous at each point of [ 0,1],
then
A is uniformly equicontinuous.
SECTION 8. WIENER MEASURE AND
DONSKER'S THEOREM
Wiener Measure
Wiener measure, denoted here by W ,is a probability measure on (C,C)
having the following two properties. First, each xt is normally distributed under W with mean 0 and variance t:
For t = 0 this is interpreted to mean that W[zo = 01 = 1. Second, the
stochastic process [ q : O 5 t 5 11 has independent increments under
w: If
then the random variables
are independent under W . We must prove the existence of such a
measure.
If W has these two properties, and if s 5 t , then xt (normal with
mean 0 and variance t ) is the sum of the independent random variables
xs (normal with mean 0 and variance s) and xt - xs,so that x t - xs
SECTION
8. WEINER MEASUREAND DONSKER'S
THEOREM
87
must be normal with mean 0 and variance t - s , as follows from dividing
the characteristic functions. Therefore, when (8.2) holds,
In particular, the increments are stationary (the distribution of xt - xs
under W depends only on the difference t - s) as well as independent.
xs(xt- zs),it follows from the independence of
Since x,xt =
the increments that the covariance of xs and xt under W is s if s 5 t.
By a linear change of variables, the joint distribution of (xtl, . . . ,xt,)
can therefore be writen down explicitly; it is the centered normal distribution for which xti and xtj have covariance ti A t j . But (8.4) is the
clearest way to specify the finite-dimensional distributions.
Wiener measure gives a model for Brownian motion. In proving its
existence, we face the problem of proving the existence on (C,C) of a
probability measure having specified finite-dimensional distributions.
There can for an arbitrary specification be at most one such measure
(Example 1.3), and for some specifications there can be none at all
(there is, for example, no P under which the distribution of xt is a
unit mass at 0 for t < and at 1 for t 2
+
i
i).
Construction of Wiener Measure
We start with a sequence & , < 2 , . . . of independent and identically distributed random variables (on some probability space) having mean 0
and finite, positive variance u2. Let S, =
(So = 0), and
let X n ( w ) be the element of C having the value
+ + <,
at t. Thus X " ( w ) is the function defined by linear interpolation between its values X&(w) = S i ( w ) / u f iat the points i/n. Since (8.5)
defines a random variable for each t , X n is a random function, the one
discussed in the Introduction. If <i takes the values f l with probability
each, then u = 1 and X n is the path corresponding to a symmetric
random walk.
If gn,t is the rightmost term in ( 8 . 5 ) , then gn,t = s 0~ by Chebyshev's inequality. Therefore, by the Lindeberg-L6vy central limit theorem (together with Theorem 3.1, Example 3.2, and the fact that
THESPACEC
88
[ntJ/n-+ t ) , X p
+,
&N, where, as always, N has the standard
normal distribution. Similarly, if s 2 t , then
1
(X,",xz"- Xsn) = -(qnsJ
7 q n t j - q n s j1
(8.6)
u f i
+ (&s,
?In$ - &,s)
(Nl,N2),
where N1 and N2 are independent and normal with variances s and t-s.
And by the mapping theorem, (X,",
Xr)+, (N1,N1+ N2). The obvious extension shows that the limiting distributions of the random vectors ( X g, . . . Xt",) are exactly those specified as the finite-dimensional
distributions of the measure W we are to construct. To put it another
way, if P, is the distribution on C of X n , then for each t l , . . . ,t k ,
P,7r;1..tk converges weakly to what we want W7r;1,,tk to be.
Suppose we can show that {P,} is tight. It will then follow by
the direct half of Prohorov's theorem that some subsequence {Pni}
converges weakly to a limit we can call W . But then Pni7r;1.,tk+i
WrG!..tk,
and therefore, by what we have in fact just proved, W7r;1..tk
will be the probability measure on Rk we want. See Example 5.2.
The argument for the tightness of {X"} will be clearer if we first
consider the more general case in which {&} is stationary (the distribution of (Jk,. . . , J k + j ) is the same for all k).
=Sn
Lemma. Suppose that Xn is defined by (8.5), that {&} is stationary, and that
(8.7)
lim limsupX2P max ( S ~2I xu+]
X-tw
,--roo
[ kln
= 0.
Then { X " } is tight.+
PROOF.
Use Theorem 7.3. Since X g = 0, {P,} obviously satisfies
condition (i) of the theorem. Condition (ii) we prove by (7.8), which
translates into the requirement that
(8.8)
lim limsup P[w(X", 6) 2
6-0
TI-+W
€1 = 0
for each E . And to this we can apply Theorem 7.4. If (7.9) holds, then
so do (7.10) and (7.11):
if min (ti - t i - 1 )
l<z<u
t
The condition (8.7) is in fact necessary for X "
2 6.
+ W . See Problem 8.3.
SECTION
8. WEINER MEASUREAND DONSKER'STHEOREM
89
This becomes easier to analyze if we take ti = mi/n for integers mi
satisfying 0 = mo < ml <
< mu = n. The point is that, because
of the polygonal character of the random function Xn, if the ti have
this special form, then the supremum in (8.9) becomes a maximum of
differences Is, - S m i - l \/0&i:
(8.10)
P[w(Xn,S) 2 3 ~ 5]
2)
P
i=l
[
max
mi-lsksmi
lsk
- smi-1 I
ofi
1
€1
where the equality is a consequence of the assumed stationarity. The
inequality holds if the condition on the right in (8.9) does, which requires that
(8.11)
For a further simplification, take mi = im for 0 I i < w (and
mu = n), where m is an integer (a function of n and 6) chosen according
to these criteria: Since we need mi - mi-1 = m 1 n6 for i < v, take
m = [nSl; since we also need (v - l ) m < n 5 vm, take v = [n/ml.
Then
and it follows by (8.10) that, for large n,
(8.12)
~ [ w ( ~S)n2,
3 ~I
] v . P [max Isk(2 m f i ]
ksm
Take X and S to be functions of one another: X = ~/d%.
Expressed
in terms of A, (8.12) is
4X2
P[w(Xn,S) 1 3 ~ 5] -€2
.
P[max\SkJ
ksm
2 Xufi].
For given positive E and q , there is, by (8.7), a X such that
Once X and S are fixed, rn goes to infinity along with n, and so (8.8)
follows.
0
THESPACEC
90
We can complete the construction of Wiener measure by using
the independence of the random variables ( n in the definition (8.5).
Because of this independence, Etemadi’s inequality [M19] implies
(8.13)
P[max ISu[2 a]5 3max P[lSuI 2 a/3].
usm
u<m
Therefore, (8.7) will follow if
(8.14)
lim lirnsupA2maxP[(~k(
2 ~ a J n 7= 0.
x-.00
12400
K n
For the construction of Wiener measure, we can use any sequence {ti}
that is convenient. Suppose the are independent and each has the
standard normal distribution ( a = l), in which case S h / & also has
the standard normal distribution. Since
(8.15)
P[(NI2 A] < EN4 * A-4 = 3A-4,
we have P[lSkl 2 Aufl = P[v/jFINI 2 Aafi < 3/A4a4 for k 5 n,
which implies (8.14). This proves the existence of Wiener measure:
Theorem 8.1. There exists o n (C,C) a probability measure W
having the finite-dimensional distributions specified by (8.4).
Denote by W not only Wiener measure, but also a random function having Wiener measure as its distribution over C. It is possible
to reverse the approach taken above, constructing the random function first and the corresponding measure second. There are a number
of ways to do this. For one, use Komogorov’s existence theorem to
show that there exists a stochastic process [W(t):O5 t 5 11 having
the finite-dimensional distributions specified by the right-hand side of
(8.4). By a further argument involving Etemadi’s inequality (or something similar), modify the process so as to ensure that the sample path
W (* ,w)is continuous for each w [PM.503]. None of this involves the
, can regard it as a
space C at all. But once we have this W ( w ) we
random element of (C,C),and then we can define Wiener measure as
its distribution.
Donsker’s Theorem
Donsker’s theorem is the one discussed in the Introduction:
Theorem 8.2. I f (1, 52,. . . are independent and identically distributed with mean 0 and variance u2, and i f X n is the random function.
defined by (8.5), then X n +n W .
SECTION 8. WEINER
MEASUREAND DONSKER'STHEOREM
91
PROOF.The proof depends on Theorem 7.5. Now that the existence of Wiener measure and the corresponding random function
W have been established, we can write (8.6) as (X,",X,"
- X,") + n
(W., W,- W . ) ,and this implies (X,",X r ) +n (Ws,Wt).An easy extension gives (7.13) with W in the role of X .
We have already proved tightness by means of (8.14), but under
the additional assumption that the
are normal. In place of this,
we can use the central limit theorem, provided we consider separately
the small and the large values of k in the maximum in (8.14). By the
central limit theorem, if kx is large enough and kx k 5 n, then (use
(8.15)) P[lSkl 2 Xafl 5 P[(Skl 2 X o h ] < 3/X4. For k 5 kx we can
use Chebyshev's inequality: P[lSkl 2 X o f l 5 kx/X2n. The maximum
0
in (8.14) is therefore dominated by (3/X4) V (kx/X2n).
This argument is based on Theorem 7.5, the second proof of which
makes no use of Section 5. But Theorem 8.2 assumes the prior existence
of W ,and our proof that W does exist depended heavily on the theory
of Section 5 (compare Examples 5.1 and 5.2). On the other hand, as
pointed out above, the existence of W can be proved in an entirely
different way, and so one can do Donsker's theorem without reference
to tightness and Prohorov's theorem. On the otherother hand, without
the concept of tightness, the condition (7.14) is somewhat artificial, an
ad hoc device.
An Application
Donsker's theorem has this qualitative interpretation: X n + W says
that, if r is small, then a particle subject to independent displacements
(1, &, . . . at successive times ~ , 2 r... will, viewed from far off, appear
to perform a Brownian motion.
Specifically, we can use the theorem to derive limit laws for various
functions of the partial sums Sn. The Introduction indicates how to
use the relation X" + W to derive the limiting distribution of
(8.16)
Mn
= max
O<i<n
Si.
We are now in a position to carry through the details.
Since h ( z ) = sup,z(t) is a continuous function on C,it follows
from X n + W and the mapping theorem that suptXr =+-sup,Wt.
Obviously, sup, Xr = Mn/afi,and SO
(8.17)
THESPACEC
92
Thus we would have the limiting distribution of M,/u& (under the
hypothesis of Theorem 8.2) if we knew the distribution of supt Wt. The
idea now is to find the latter distribution by calculating the limiting
distribution of Mn/Ov/;; in an easy special case.
For the easy special case, assume that the independent & take the
values f l with probability each, so that SO,S1,. . . are the successive
positions in a symmetric random walk starting from the origin. We
first show that for each nonnegative integer a ,
The case a = 0 being easy ( M , 2 So = 0), assume a
P[Mn 2 a] - P[S, = a] = P[M, 2 a,
> 0. Since
s, < a] + P[M,
and since the second term on the right is just PIS,
follow if
1 a, s, > a ] ,
> a ] , (8.18) will
Now all 2, possible paths (&,. . . ,S,) have the same probability,
and so (8.19) will follow provided the number of paths contributing to
the left-hand event is the same as
the number of paths contributing to
the right-hand event. Given a path
contributing to the left-hand event in
(8.19), match it with the path obtained
by reflecting through a all the partial
I? sums after the first one that achieves
the height a (replace Sk by a-(sk-a)).
Since this describes a one-to-one correspondence, (8.19) and (8.18)
follow. This is an example of the reflection principle.
Assume a 2 0 and put a, = [an1I21.From (8.18) it follows that
P[Mn/fi 2 a]= 2P[S, > an] P[S, = a,]. The second term here
goes to 0, and P[S, > a,] -, P[N > a] by the central limit theorem,
and so P[M,/fl + 2P[N > a]for a 2 0. Combine this with (8.17):
+
(8.20)
2
P[supWt 5 a]= t
6
0
(the left side is 0 if a < 0). We have derived a fact about Brownian
motion by combining Donsker’s theorem with a computation involving
SECTION
8. WEINER MEASUREAND DONSKER'S
THEOREM
93
random walk, a computation which is simple because it reduces to
enumeration and because a random walk cannot pass above an integer
without passing through it.
Under the hypotheses of Donsker's theorem (drop the assumption
& = kl),(8.17) holds, and from (8.20) it now follows that
Under the hypotheses of the Lindeberg-Levy theorem, the limiting
distribution of Mn (normalized) is the folded normal law.
The next section has further examples that conform to this pattern.
If h is continuous on C-or continuous outside a set of Wiener measure
O-then Xn + W implies h(Xn) =$ h ( W ) .We can find the limiting
distribution of h ( X n ) if we can find the distribution of h ( W ) ,and
we can in many cases find the distribution of h(W)by finding the
limiting distribution of h ( X n ) in some simple special case and then
using h ( X n )+ h ( W )in the other direction.
Therefore, if the & are independent and identically distributed
with mean 0 and variance 0 2 ,then the limiting distribution of h ( X n )
does not depend on any further properties of the ti. For this reason,
Donsker's theorem is often called the (or an) invariance principle. It
is perhaps better called a functional limit theorem, or since the limit is
Brownian motion in this case, a functional central limit theorem.
The Brownian Bridge
A random element X of (C,C) is Gaussian if all its finite-dimensional
distributions are normal. The distribution over C of a Gaussian X
is completely specified by the means E[Xt] and the product moments
E[X,Xt],because these determine the finite-dimensional distributions.
For W ,the moments are E[Wt] = 0 and E[WsWt] = s A t.
A second important random element of C is the Brownian bridge, a
Gaussian random function W"specified by the requirements E[W,"] = 0
and E[W,"W,"]
= s(1 - t ) for s 5 t. The simplest way to show that
there is such a random function W" is to construct it from W by
setting W," = Wt - tW1 for 0 5 t _< 1. Obviously, W " ,thus defined, is
a Gaussian random element of C , and a calculation shows that it has
the required moments .
We also use W" to denote the distribution on C of the random
C carries 2 to the function with the value
function W " . If h: C
s ( t ) - tz(1) at t, then the measures W and W" are related by
W" = Wh-'.
--f
THESPACEC
94
Problems
8.1. Show that W has no locally compact support. See Problem 1.17.
8.2. Define % = (1 t)W,O/(,+,).The process [K:t 2 01 has continuous sample
paths, the finite-dimensional distributions are Gaussian, and the moments are
EK = 0 and E[V,&] = s A t . The process thus represents a Brownian motion
over the time interval [ 0,m).
8.3. Assume that X " is defined by (8.5) and that X n + W . Use (8.20) and
symmetry to show that sup, [Wtlhas a fourth moment. Conclude that (8.7)
holds.
+
..,&,k, be independent random 2variables with mean 0 and variances
-k * ' . Fni, S f i = 6,1 -k . . ' U i i , and S: = sfk,. Let
put Sni =
X" be the random function that is linear on each interval [ S $ - ~ / S : , sfi/sz]
and has values X"(S:~/S~) = Sni/Sn at the points of division. Assume the
Lindeberg condition holds and generalize Donsker's theorem by showing that
8.4. Let
&I,.
Uii;
X"
+
+
* w.
SECTION 9. FUNCTIONS OF BROWNIAN MOTION
PATHS
The technique used in the preceding section to find the distribution
of supt Wt and the limiting distribution of Mn/afi we here apply to
other functions of Brownian motion paths and partial sums. We also
compute some distributions associated with the Brownian bridge. In
subsequent sections, convergence in distribution to W will be proved for
a great variety of random functions, and in each case the calculations
carried out here apply.
Although the theory of weak convergence of probability measures
on function spaces had its origin in problems of the kind considered
here, and although these problems and their solutions are indeed
interesting, it is possible to understand the general theory of the
subsequent sections without studying this one.
In this section, the Lindeberg-LQvycase will mean the one in which
the Sn are partial sums of independent, identically distributed random
variables with mean 0 and finite, positive variance u2,and X n will
always denote the random function defined by (8.5). The random walk
case will be that in which each takes the values f l with probabilities
each (u2= 1).
4
Maximum and Minimum
Let m = inft W,and M = sup, W,,and let
SECTION
9. FUNCTIONS
OF BROWNIAN
MOTIONPATHS
95
be the corresponding quantities for the partial sums. The mapping
, z ( t ) ,~ ( 1 )of) R3 is
carying the point 2 of C to the point (inft z ( t ) sup,
everywhere continuous, and so by Donsker’s theorem and the mapping
theorem,
in the Lindeberg-L&y case.
We first find an explicit formula for
in the random walk case. We show that if
then
for integers a, b, and v satisfying
Since a < b, the series in (9.5) are really finite sums. Notice that both
sides of (9.5) vanish if n and v have opposite parity or if v is either a
or b.
For particular values of n , a , b , v , denote the equation (9.5) by
[n,a,b,w]. Then [n,u,b,v] is valid if (9.6) holds, and the proof
of this goes by induction on n. For n = 1, this follows by a
straightforward examination of cases. Assume as induction hypothesis
that [n - 1,a, b, U] holds for a, b, v satisfying (9.6). If a = 0, then (9.3)
vanishes (i starts at 0 in the minimum in (9.1), and So = 0), and
the sums on the right in (9.5) cancel because p n ( j ) = p n ( - j ) . Thus
[n,a, b, v] is valid if (9.6) holds and a = 0; we may dispose of the case
b = 0 in the same way. To complete the induction step, we must verify
[n,a, b, v] under the assumption that a < 0 < b and a 5 v 5 6. But in
this case, a+ 1 5 0 and b - 1 2 0, so that [n- 1,a - 1,b - 1 , v - 11 and
[n- 1,a + 1,b + 1,u 11 both come under the induction hypothesis and
hence both hold. And now [n,a, b, v] follows by the probabilistically
+
THESPACEC
96
obvious recursions (condition on the direction of the first step of the
random walk)
and
This proves (9.5), and it follows by summation over w that, if
a<O<b,
(9.7)
a<u<v<b,
then
(9.8)
P[a < m,
C
00
=
I M , < b, u < S, < w]
P[u
+ 2k(b - a ) < S, < v + 2k(b - a)]
kZ-00
C
00
-
P[2b - w
+ 2k(b - a) < s, < 2b - u + 2k(b - a ) ] .
k=-W
- 1 in this formula leads to
P[M, < b, u < s, < w] = P[u < s, < w]
- P[2b - 2, < s, < 2b - u ] ,
Taking a = -n
(9.9)
valid for -n - 1 5 u < w 5 b, b >_ 0. From (9.9) it is possible to derive
(8.18) again.
Now (9.8) holds in the random walk case, and because of (9.2), we
can find the distribution of (m,M , W1) by passing to the limit. If a, b, u,
and v are real numbers satisfying (9.7), replace them in (9.8) by the
integers [ a f i ] , r b f i 1 , LufiJ, and [wfil, respectively. Because of
the central limit theorem and the continuity of the normal distribution,
a termwise passage to the limit in (9.8) yields
(9.10)
P[a < m 5 M < b, u < Wl
c
00
=
P[u
< w]
+ 2k(b - a ) < N < + 2 k ( b - a ) ]
2,
k=-W
-
00
P[2b - v
+ 2k(b - a ) < N < 2b - u + 2 k ( b - a ) ] .
SECTION 9.
97
FUNCTIONS
OF BROWNIAN
MOTIONPATHS
The interchange of the limit with the summation over k can be justified
by the series form of Scheff6's theorem [PM.215].
The joint distribution of M and W1 alone could be obtained by
letting a tend to -w in (9.10), but it is simpler to return to the
random walk case and pass to the limit in (9.9), which gives
P[M <b, u < W1< v]
= P [U < N < V ] - P [2b - 21 < N
(9.11)
valid for u
< 2b - U ] ,
< v 5 b, b 2 0. Taking v = b and letting u
PIO 5 M
(9.12)
--+
--oo leads to
< b] = 2P[O 5 N < b];
this is the same thing as (8.20).
From (9.10) for u = a and v = b we get
(9.13)
P[a < rn 5 M
< b]
C (-l)'P[a + k(b - a ) < N < b + k ( b - a ) ] ,
00
=
k=-m
valid for a 5 0 5 b. And this result with a = -b gives
00
(9.14)
P[sup lWtl < b] =
t
(-l)kP[(2k
k=-W
- 1 ) b < N < (2k + l ) b ]
for b 2 0. By continuity, the strict inequalities in all these formulas
can be relaxed to allow equality. And the right sides can all be written
out as sums of normal integrals.
Although we derived (9.10) through (9.14) by passing to the
limit in the random walk case, we have the limiting distributions
Mn), and maxiin JSnl (all
for (mn,Mn,&), (Mn,Sn), Mn, (mn,
normalized by nfi) in the more general Lindeberg-L6vy case because
(9.2) holds there.
The Arc Sine Law
2 in C, let h l ( z ) be the supremum of those t in [ 0,1] for which
~ ( t=) 0; let hz(z) be the Lebesgue measure of those t in [ O , l ] for
For
which z ( t ) > 0; and let hs(z)be the Lebesgue measure of those t in
[ 0, hl(z)] for which z ( t ) > 0. Then T = h l ( W )is the time at which
W last passes through 0, U = hz(W)is the total amount of time W
THESPACEC
98
spends above 0, and V = h3(W )is the amount of time W spends above
0 in the interval [ 0, TI. The object is to find the joint distribution of
( T ,u,
It can be shown [M15] that each of the functions hl, ha, and h3
is measurable and is continuous outside a set of Wiener measure 0.
Therefore,
v,w.
(9.15)
(hl(Xn),h2(Xn)ih3(zn),Xln)
(T,U,V,Wl)
in the Lindeberg-Levy case. In the random walk case, the vector on
the left has a simple interpretation: T, = n h l ( X n ) is the maximum
i, 1 5 i 5 n, for which Si = 0; U, = nh2(Xn)is the number of i,
1 5 i 5 n, for which Si-1 and Si are both nonnegative; V, = nh3(Xn)
is the number of i, 1 5 i 5 T,, for which Si-1 and Si are both
nonnegative; and, of course, X y = S,/fi.
With these definitions we therefore have
in the random walk case; the distribution of (T,U, V,W1)will be found
by a passage to the limit. In the general Lindeberg-LBvy case, the left
side of (9.15) is a somewhat more complicated function of the partial
sums Si, but it will still be possible to derive limit theorems associated
in a natural way with these partial sums.
Since the random vector (T,U , V,W1)is constrained by
(9.17)
1-T+V
u={ V
if W 1 2 0 ,
if W1 5 0 ,
it suffices to consider (T,V," 1 ) and the related vector (T,, V,, S,).
The distribution of the latter vector in the random walk case will be
derived from three facts which admit of elementary proofs we do not
carry through here.
First, we need the local central limit theorem for random walk: If
m tends to infinity and j varies with m in such a way that j and rn
have the same parity and j / f i
y, then+
(9.18)
Feller [28], p. 184, has the local limit theorem for the binomial distribution,
and (9.18) follows because i(Sm m) is binomially distributed.
+
SECTION
9. FUNCTIONS
OF BROWNIAN
MOTIONPATHS
99
Second, we need the fact that
(9.19)
P[S1 > 0,. . . ,Sm-1 > 0, Sm = 3.) = - pj m ( j )
m
for j p0sitive.t If S2m = 0, then U2, = V2, assumes one of the values
0 , 2 , . . . ,2m; the third fact we need3 is that these m + 1 values all have
the same conditional probability:
(9.20)
P[h,
1
i=O,l,
m+i7
= 22 I S2m = 0] = -
...,m.
To compute the probability that T, = 2k, V, = 22, and S, =
Conditionally on this event,
(So,.. . , S 2 k ) and (SZk+l,.. . ,
are independent, V, depends only
on the first sequence, and T, = 2k and S, = j both hold if and only if
the elements of the second sequence are nonzero and the last one is j .
By (9.19) and (9.20) we conclude that
j , condition on the event s 2 k = 0.
s,)
if
(9.22)
0<2i<2k<n,
j>O.
Both sides of (9.21) vanish if n and j have opposite parity. For j
negative, the same formula holds with ( j (in place of j on the right.
We apply Theorem 3.3 to the lattice of points ( 2 k / n , 2 i / n , j / f i )
for which j and n have the same parity. Suppose that k, i, and j tend
to infinity with n in such a way that
2k
n
-+t,
2i
-+w,
n
-
where 0 < TJ < t < 1 and x > 0. Then (9.22) holds for large n, and it
follows by (9.21) and (9.18) that
P[T, = 2k, V, = 2i, S, = j ] + g ( t , x ) ,
t
See Feller [28], p. 73.
3 See Feller [28], p. 94.
THESPACEC
100
where
The same result holds for negative z by symmetry. Since local limit
theorems imply global ones (Theorem 3.3),
(9.24)
has (in the random walk case) the limiting distribution in R3 specified
by the density
(9.25)
f ( t ,v,z) =
{p
if O < v < t < l ,
other wise.
By (9.16), (T,V,Wl)has this density. Because of (9.17), the
distribution of (T,U , V,"1) can be written out explicitly as well.
From (9.25) it follows that the conditional distribution of V given
T and W1 is uniform on [O,T];this corresponds to (9.20). By (9.17),
if T = t and W1 = z, then U is distributed uniformly over [l- t , 11 for
x > 0 and uniformly over [ 0, t] for z < 0. Using (9.25) to account for
the possible values of t and z, we find the density of U alone:
Now the integral of g ( t , z ) over the range x > 0 is l/[27rt3/2(l - t)lj2],
which is the derivative of -7r-'((l- t ) / t ) 1 / 2 and
,
hence (9.26) reduces
to 1 / [ d 2 ( 1 - u ) ' / ~ ]Therefore
.
(9.27)
P[U 5 u]=
1od-
ds
2
= - arcsin&,
=
O < u < 1.
This is Paul LQvy's arc sine distribution. A similar computation shows
that T also follows the arc sine law:
2
P[T5 t] = - arcsin di, o < t < 1.
(9.28)
lr
Let us now combine (9.15) with the facts just derived t o obtain a
limit theorem for the general Lindeberg-LBvy case. Let us agree to say
that a zero-crossing takes place at i if the event
(9.29)
Ei = [Si = 01 U [Si-l > 0 > Si] U [Si-l < 0 < Si]
SECTION9. FUNCTIONS
OF BROWNIAN
MOTIONPATHS
101
occurs (which in the random walk case is to say that Si = 0). Let T'
be the largest i, 1 5 i 5 n, for which a zero-crossing takes place at i;
let UA be the number of i, 1 5 i 5 n, for which Si > 0; and let VL be
the number of i, 1 5 i 5 TA,for which Si > 0. It will follow that
if we can show that the left side here approximates the left side of
(9.15).
Clearly, TA/n is within 1/n of hl(Xn). If yn is the number of i,
1 5 i 5 n, for which Ei occurs-the number of zero-crossings-then
UA/nand Vh/n are within y n / n of h 2 ( X n )and h3(Xn), respectively.
Therefore, (9.30) will follow from (9.15) and Theorem 3.1 if we prove
that yn/n +, 0, and for this it is enough to show that
(9.31)
1
PEi
E[yn/n] = n 2.= 1
But
PEi 5 P [ I < ~2I 6
4
-+ 0.
+ P[ISi-ll 5 €4
for each positive 6, and hence, by the central limit theorem, PEi -+ 0.
And now (9.31) is a consequence of the theorem on CesBro means.
From (9.30) we may conclude for example that UA/n and T'/n
have arc sine distributions in the limit.
The Brownian Bridge
The Brownian bridge W" behaves like a Wiener path W conditioned
by the requirement W1 = 0. With an appropriate passage to the limit
to take account of the fact that [Wl = 01 is an event of probability
0, this observation can be used to derive distributions associated with
W".
Let P, be the probability measure on (C,C) defined by
The first step is to prove that
(9.32)
P, =+, W"
102
THESPACEC
as E tends to 0. Take W as a random function defined on some
probability space, and take W" to be defined on the same space by
Wt = Wt - tW1. If we prove that
(9.33)
limsupP[W E F 10 i Wi
€40
i €1 i PIWo E F ]
for every closed F in C, then (9.32) will follow by Theorem 2.1.
F'rom the normality of the finite-dimensional distributions it
follows that W1 is independent of each (Wt",,. . . , Wfk) because it is
uncorrelated of each component. Therefore,
(9.34)
PIWo E A, W1 E B] = PIWo E A]P[Wi E B]
if A is a finite-dimensional set in C and B lies in R1. But for B fixed
the set of A in C that satisfy (9.34) is a monotone class and hence
[PM.43] coincides with C. Therefore,
PIWo E A 10 5 W 1 5 E] = PIWo E A ] .
Since p(W, W " )= IWll, where p is the metric on C,(W1(<_ 6 and
W E F imply W" E F6 = [ z : p ( z , F )5 61. Therefore, if E < 6,
The limit superior in (9.33) is thus at most PIWo E Fb], which decreases
to PIWo E F ] as 6 .1 0 if F is closed. This proves (9.33) and hence
(9.32).t
Suppose now that h is a measurable mapping from C to Rk and
that the set Dh of its discontinuities satisfies PIWo E Dh]= 0. It
follows by (9.32) and the mapping theorem that
(9.35)
P[h(W") 5 a] = lim P[h(W) 5 cx
€40
1 o 5 W1 5 E]
holds for each a at which the left side is continuous (as a funtion of
a ranging over Rk).From (9.35) we can find explicit forms for some
distributions connected with W". Sometimes an alternative form of
(9.35) is more convenient:
(9.36)
t
P[h(Wo)I a] = lim P[h(W) _< a I - E 5
€+O
~1
5 01.
This part of the argument in effect repeats the proof of Theorem 3.1.
SECTION
9. FUNCTIONS
OF BROWNIAN
MOTIONPATHS
The proof is the same (we could use any subset of
Lebesgue measure).
Define
mo = inf W f , M" = sup W f .
t
Suppose that a
[-E,E]
of positive
t
< 0 < b and 0 < E < b; by (9.10) we have, if c = b - a,
P[a < m 5 M
(9.37)
103
< b, 0 < W1 < E ]
C
00
=
c
< 2kc+ E ]
P[2kc < N
kZ-00
m
-
P[2lcc+2b-
E
< N < 2kc+2b].
k=-W
Since
1
1
lim - ~ [ z
< N < z + €1 = - e - x 2 / 2 ,
(9.38)
6
€+O E
and since the series in (9.37) converge uniformly in E , we can take the
limit ( E + 0) inside the sums, and it follows by (9.35) that
(9.39)
P[a < mo
< M" 5 b] =
C
00
e-2(kc)2-
c
00
e-2(b+kc)2
*
Thus we have the distribution of (m",M " ) . Taking -a = b gives
(9.40)
P[sup IWJ 5 b] = 1
t
+2
c
co
(-1)
ke-2k2b2
,
b>0.
k=-W
By an entirely similar analysis applied to (9.11),
P[m" < b] = 1 - e-2b2, b > 0.
(9.41)
Let U" be the Lebesgue measure of those t in [ 0,1] for which
W,O > 0. It will follow that U" is uniformly distributed over [ 0,1]
if we prove
(9.42)
limP[U<aI - ~ < W 1 < 0 ] = c x , O < a < 1 .
€40
Because of (9.17), the conditional probability here is P[V 5 a I - E 5
W1 < 01. From the form of the density (9.25) we saw that the
THESPACEC
104
distribution of V for given T and W1 is uniform on (0,T). In other
words, L = V / T is uniformly distributed on (0,l) and is independent
of (T,W1). Therefore, the conditional probability in (9.42) is
P[TL 5 Q 1
- E 5 W1 <_ 01 =
P[T 5 O!/S
I
-E
< W1 5 O]ds,
and (9.42) will follow by the bounded convergence theorem if we prove
the intuitively obvious relation
But this follows by (9.38) and the form of the density (9.25). Therefore,
(9.43)
PIUO_< a] = Q ,
0<Q
< 1.
Problems
9.1. Show by reflection in the random walk case that
(9.44)
{
if v 2 b,
P[Mn2 b, > S,, = v] = Pn(')
pn(2b- v) if w 5 b
for 6 2 0. Derive (9.9) from this.
. . ,ck; v) be the probability that an nstep random walk ( nfixed) meets c1 (one or more times), then meets -c2, then
and ends at v. Use (9.44) and induction
meets ~ 3 ,. .., then meets (-1)"'Ck,
on k to show that
9.2. For nonnegative integers G ,let r(c1,.
(Reflect through (-l)kCk-l the part of the path to the right of the
first passage through that point following successive passages through
c1, -Q,. . . ,(-l)k-1Ck-2.) Derive (9.5) by showing that p,(a, b , v ) is
9.3. For z in C let h ( z ) be the smallest t for which z ( t )= sups ~
( 8 )Show
.
that h
is measurable and continuous on a set of W-measure 1. Let T,, be the smallest
k for which s k = maxis,, si and prove
P ?<a
[ n - I
+ - ar
2 cs i nf i ,
0<cr<1,
n.
in the Lindeberg-L6vy case. (See Feller [28], p. 94, for the random walk case.)
SECTION
10. MAXIMALINEQUALITIES
105
9.4. Derive the joint limiting distribution of the maximum and minimum of
Si-in-'S,,, 0 < i < n,in the Lindeberg-Lhvy case. (Consider yt" = X p - t X ;
with X" defined by (8.5).)
SECTION 10. MAXIMAL INEQUALITIES
To prove tightness in Section 8, we used Etemadi's inequality (8.13),
which requires the assumption of independence. Since we are also
interested in functional limit theorems for dependent sequences of
random variables, we want variants of (8.13) itself. This means that
we need usable upper bounds for probabilities of the form
P[muk<n Iskl 2 A],
that is, maximal inequalities. The inequalities derived here are useful
in probability theory itself and also in applications of probability to
analysis and number theory.
Maxima of Partial Sums
Let (1,. . . ,t n be random variables (stationary or not, independent or
not), let sk = & - . & (SO= 0), and put
+ +
(Because of the absolute values, the notation differs from (8.16) and
(9.1).) We derive upper bounds for P[Mn 1 A] by an indirect approach.
Let
and put
(10.3)
There is a useful companion inequality. If ISnl = 0, then, trivially,
THESPACEC
106
But this also holds if lSnl > 0. For in that case, ISkl 2 (Sn- Skl holds
for k = n but not for k = 0, and hence there is a k, 1 5 k 5 n,
for which ISkl 2 ISn - Skl but ISk-l( < (Sn - Sk-11; for this k,
(Sn- Skl = mOkn 5 Ln and ISk-11 = mO,h-l,n 5 Ln, which implies
IS, - Shl 5 2Ln ICkl: (10.5) again holds.
that IS,( 5 ISk-11 I&I
Finally, (10.4) and (10.5) combine to give
+ +
+
If we have a bound on Ln-that is to say, an upper bound on the
right tail of its distibution-as well as a bound either on ISnl or on
maxk I&[, then we can use (10.4) or (10.6) to get a bound on Ma, as
required to establish tightness. And in the next chapter, bounds on
Ln itself will play an essential role. We can derive useful bounds on
Ln by assuming bounds on the mijk.
Theorem 10.1. Suppose that a >
are nonnegative numbers such that
(10.7)
P[mijk
2 A] 5
1
-(
x4p
i<l<k
)I.
and ,Ll 2 0 and that u1,.
. . ,un
2a
,
0 5 i 5 j 5 k 5 n,
forA > 0. Then
(10.8)
for A > 0 , where K = Ka,p depends only on a and p.
Before turning to the proof, consider the independent case as a first
illustration.
EzampZe 10.1. Take a = ,Ll = 1 and suppose that the & are
independent and identically distributed with mean 0 and variance 1.
By Chebyshev’s inequality and the inequality xy 5 (x+ Y)~,
- i) (k - j ) < (k - i)’
<--( jA2
A2
- A4
*
Thus (10.7) holds with ul E 1, and by (10.8), P[Ln 2 A] 5 Kn2/A4
and hence P [ L n / f i 2 A] 5 KIA4 (normalize as usual by 6).
By
SECTION
10. MAXIMAL
INEQUALITIES
stationarity,
[
P max->A
k<n
fi
107
] I n P [F
-2X ]
Write $(A) for this last integal. By (10.6),
Since $($A) -, 0 as X + 00, we can use this inequality to prove (8.7)
and hence tightness. Thus we have a new proof of Donsker's theorem:
We have proved tightness by Theorem 10.1 instead of by the central
0
limit theorem and Etemadi's inequality.
This alternative proof still uses independence, however, and the real
point of Theorem 10.1 is that it does not require independence. See, for
example, the proof of Theorem 11.1,on trigonometric series, and the
proof of Theorem 17.2, on prime divisors. Postponing again the proof
of Theorem 10.1, we turn to the following variant of it, where conditions
are put on the individual ISj - Sil rather than on the minima rnijk.
Theorem 10.2. Suppose that Q > f and P 1 0 and that u1,.
. . ,un
are nonnegative numbers such that
f o r X > 0. T h e n
(10.10)
f o r X > 0, where K' = KA,pdepends only on a and P ,
PROOF.By Schwarz's inequality, P ( A n B ) 5 P1/2(A)P1/2(B),and
therefore, by (10.9) (xy I (x Y ) ~ ) ,
+
If s = Cllnul, then P[L, > A] 5 Ks2"/X4@by Theorem 10.1. But
PIISnl 2 A] < sza/X40 by (10.9), and so it follows by (10.4) that
P[M, 2 XI 5 (K + 1 ) 2 4 ~ ~ / ~ 4 ~ .
0
THESPACEC
108
By Markov’s inequality, (10.9) holds if
(10.11)
does.
Ezample 10.2. Return to Example 10.1, where a = ,B = 1. If the
~then
,
it is a simple matter to show [PM.85]
that ES; = kr4 3k(k - 1) 5 3kr4 (T >_ o = 1). In this case, (10.11)
holds for u1 = f i r 2 . By applying Theorem 10.2 we can now conclude
that P[M, 2 A] I ~ K ’ ~ ’ T ~or/ X
P[M,/@
~,
2 A] 5 ~ K ‘ T ~ /This
A~.
is enough to establish tightness in Theorems 8.1 and 8.2. And as
before, the real point of Theorem 10.2 is that it does not require
0
independence.
& have fourth moment T
+
Just as (10.11) implies (10.9), there is a moment inequality that
implies ( 10.7), namely
A More General Inequality
If we generalize Theorem 10.1, the proof becomes simpler. Let T be a
Bore1 subset of [ 0,1] and suppose that y = [yt: t E TI is a stochastic
process with time running through T . We suppose that the paths of the
process are right-continuous in the sense that if points s of T converge
from the right to a point t of T , then y8(w) + yt(w) at all sample
points w (if T is finite, this imposes no restriction). Let
(10.13)
mrst
= 1% - rrl A
1%
- Ysl,
and define
(10.14)
Theorem 10.3. Suppose that
finite measure o n T such that
cy.
> f and ,B 2 0 and that
p is a
SECTION
10. MAXIMAL
INEQUALITIES
109
for X > 0 and r,s,t E T. Then
(10.16)
for X
> 0, where K
= Ka,p depends only on a and ,B.
To deduce Theorem 10.1 from Theorem 10.3, we need only take
T = [i/n:O5 i I n] and r ( i / n )= Si,0 5 i 5 n , and let p have mass
ui at i l n , 1 5 i I n.
PROOF.
Write m(r,s, t ) for mr,s,t.The argument goes by cases.
Case 1: Suppose first that T = [0,1] and that p is Lebesgue
measure. Let DI, be the set of dyadic rationals 2 / 2 k , 0 5 i 5 2 k .
Let BI, be the maximum of m ( t 1 , t 2 , t 3 ) over triples in Dk satisfying
tl 5 t 2 5 t 3 . Let Ak be the same maximum but with the further
constraint that t l , t 2 , t 3 are adjacent: t 2 - tl = t 3 - t 2 = 2 - k . For t in
Dk, define a point t‘ in D k - 1 by
Then Ir(t)- y(t’)I 5 Ak for t in
Dk,
and therefore, for
t l , t 2 , t3
in Dk,
and
If tl < t 2 < t 3 , then ti I t; I ti, and since ti,t;, t$ lie in Dk-1, it
follows that m ( t 1 7 t 2 , t 3 ) 5 Bk-1+2Ak7 and therefore, B k 5 Bk-1+2Ak.
Since A0 = BO = 0, it follows by induction that BI,5 2(A1+ - - + Ak)
for k 2 1, and it follows further by the right-continuity of the paths
that L(y) 5 2CF1Ak.
THESPACEC
110
Ak. Suppose that 0 < 0 < 1 and choose C
We need to control
so that C c E 10' = Then
i.
Since p is assumed to be Lebesgue measure, (10.15) implies
Since 4/32 0 and 2a - 1 > 0, there is a 0 in (0,l) for which the series
here converges. This shows how to define K and disposes of Case 1.
Case 2 Suppose that T = [0,1] and p has no atoms, so that
F ( t ) = p [ O , t ] is continuous. If F is strictly increasing and F(1)= c,
take a =
so that a4@cza= 1, and define a new process
< by
<(t)= a y ( F - l ( c t ) ) .
<
comes under Case 1, and the theorem holds for 77 because
L(y) = aL(<). If F is continuous but not strictly increasing, consider
, then let E
first the measure having distribution function F ( t ) + ~ tand
go to 0.
Then
Case 3 Suppose that T is finite (which in fact suffices for Theorem
10.1). If 0 # T , let y(0) = y ( t l ) ,where t l is the first point of T , and
take p(0) = 0. If 1 # T , take y(1) = y ( t v - l ) , where t,-l is the last
point of T , and take p{l} = 0. Then the new process y and measure
p satisfy the hypotheses, and so we may assume that T consists of the
points
0 = to < tl < ' * . < t, = 1.
Define a process y' by
If mist denotes (10.13) for the process y', then mist vanishes unless T ,
s, t all lie in different subintervals [ti,t i + l ) . Suppose that
(10.17)
T
E
[ti,ti+l),
S
E [tj,tj+l), t E [tk,tk+l),
<j <
SECTION
10. MAXIMALINEQUALITIES
111
Then m:st = m(ti,t j , t k ) , and hence, by the hypothesis of the theorem
for the process y,
Now let Y be the measure that corresponds to a uniform
distribution of mass p{tl-1) p { t l } over the interval [tl-l,tl], for
1 I 12 v. Then
+
and so (10.18) implies
( 10.19)
P[mi,, 2 A]
I -v2y?-,
t].
x4p
1
Although (10.17) requires t < 1, (10.19) follows for t = 1 by a small
modification of the argument.
Thus (10.19) holds for 0 5 r I s I t I 1, and Case 2 applies to
the process y':
Since L(y') = L(y), if we replace the K that works in Cases 1 and 2
by 22aK,then the new K works in Cases 1, 2 , and 3.
Case 4: For the general T and p, consider finite sets
such that Tn C Tn+l and UTn is dense in T . Let pn have mass
P((tn,i-l,tn,i]n T ) at the point tni. If y(n) is the process y with the
time-set cut back to Tn, then L ( T ( ~1)L) ( y ) by the right-continuity of
the paths. Since each y(n)comes under Case 3, we have
Let
E
go to 0.
0
112
THESPACEC
A Further Inequality
The last theorem of this section will be used in Chapter 3. Suppose
that y = [yt:t E TI has right-continuous paths, as before, and for
positive 6 define L(y,6) by (10.14), but with the supremum further
restricted by the inequality t - r 5 6.
Theorem 10.4. Suppose that a
finite measure on T such that
> f and ,B 2 0 and that
p is a
1
(10.20) P[mrst 2 A] 5 @pZff(Tn (r,t ] ) , r 5 s 5 t , t - r < 26
for X
> 0 and r,s,t E T . Then
for X > 0 , where K = Ka,p depends only on a and p.
This theorem is somewhat analogous to Theorem 7.4 and its
corollary. The K's in (10.16) and (10.21) are the same.
PROOF.Take v = Ll/S], ti = i6 for 0 5 i < v, and t , = 1. If
Ir - tl 5 6, then r and t lie in the same [ti-l, ti] or in adjacent ones,
and hence they lie in the same [ti-l,ti+l] for some i, 1 5 i 5 v - 1.
If Zi is the supremum of mrst for r, s, t ranging over T n [ti-l, ti+l]
(r -< s <
- t ) , then P[Zi L A] 5 Kp2"(Tr l Eti-1, ti+l])/X4pby Theorem
10.3. If pi = p(T n [ti-l, t i + l ] ) ,then
and (10.20) follows from this.
0
Problems
10.1. Prove a theorem standing to Theorem 10.3 as Theorem 10.2 stands to
Theorem 10.1.
10.2. Weaken (10.11) by assuming only that it holds for i = 0 and j = n, but
compensate by assumin that SI,. . . ,S, is a martingale and 4/3 2 1. Show
that P[M, 2 A] 5 A- 4% (zl<nul)2a,
an inequality of essentially the same
-
strength as (10.10).
10.3. Adapt the proof of Menshov's inequality (Doob [19], p. 156) to show that, if
(10.12) holds with Q 2 112 and /3 1 114, then
SECTION
11. TRIGONOMETRIC
SERIES
113
Now deduce that, if (10.11) holds with a 2 1 and /3 2 1/2, then
If
Q
= /3 = 1, this gives Menshov’s inequality again.
10.4. Use Theorem 10.2 to show that, i f < l , & ,. . . satisfies (10.11) for 0 I i I j
where /3 2 0 and a > 1/2, then
<i converges with probability 1.
ci
< 00,
SECTION 11. TRIGONOMETRIC SERIES”
In this section we prove functional limit theorems for lacunary series
and for series defined in terms of incommensurable arguments.
Lacunary Series
A trigonometric series x E l ( a k cos mkz+br, sinmkz) is called lacunary
if the m k are positive integers increasing at an exponential rate:
m k + l / m k 2 q > 1. If z is chosen at random from [ 0,27r],then the series
has some of the properties of a sum of independent random variables.
Here we prove a functional central limit theorem for the partial sums
of a pure cosine series.
In this theorem, P and E will refer to normalized Lebesgue measure
on the Bore1 sets in Cl = [0,27r]. Since the cosine integrates to 0 over
[ 0,27r], it follows from the relation
(11.1)
1
cos 8 . cos 8’ = - cos(e
2
+ + -21cos(e el)
el)
that
(11.2)
2
E [ c o s ~ ~=] 0, E[COS
1
2
mu] = -,
E[cosmw * cosm’w] = 0,
1 5 m < m’
Because of the $ here, it is probabilistically natural to multiply the
this makes the variance equal to 1. And so, define
cosine by
&(W) = &ah cos m k w and consider the sums
a;
n
n
k=l
k= I
114
THESPACEC
Define a random element Y n of C by
for
This is a trigonometric analogue of the random function (8.5) of
Donsker's theorem: Y n is the polygonal function resulting from linear
= s k / s n at the points t = s i / s i ,
interpolation between its values
k = 0 , 1 , . . . ,n.t Strengthen the lacunarity condition to
(11.6)
mk+l/mk
2 2.
Theorem 11.1. If (11.6) holds and
(11.7)
then Y n + W .
To show that the finite-dimensionaldistributions converge it is best
to consider a triangular array first: Let c n k ( w ) = &ank cos m k w for
1 5 k 5 n. In the following lemma and its proof, k ranges from 1 to n
in the sums, products, and maxima.
Lemma 1. If (11.6) holds, Zf c
-+ 0, then c k <nk =$ a N .
k a i k +
a2
>
0, and
Zf C r i
=
mElXk a i k
PROOF.The hypothesis of convergence to a2 we can strengthen
to equality, and we can arrange that a = I: If e; =
c k a i k and
<Ak(u) = &%~lu-lankcosmkw,
then On + 1, & ( 8 L 1 a - 1 a n k ) 2 = I ,
and (Example 3.2) E k c L k + N implies c k ( n k + a N . Assume,
therefore, that c k a i k = 1.
t If a;,, = 0, there is a division by 0 in (11.5); but in that case, no t can satisfy
the defining inequality anyway. Or simply assume all the af to be positive.
115
SECTION11. TRIGONOMETRIC
SERIES
< 1, the principal value of log(1 + iz) can be defined as
~ ( x )where
,
~ ( x=) ~ ~ 1 3 ( - l ) u ( i x ) u + 1and
/ u lr(x>I 5 1xI3
This gives
Then
exp [it
(11.8)
and so XI, <,k
+N
zk
<nk]
+
= ~ , e - ~ A,,
~ / ~
will follow if we prove
EA,
(11.9)
+0
and
ET, = 1.
(11.10)
We first show that
(11.11)
x k c k
From the assumption that
formula that
xk
&(W)
=
* 1*
cka:k =
1, it follows by the double-angle
xk
2a;k cos2mkw = 1
And now (11.2)-orthogonality-implies
and (11.11) follows from this.
+
ck
2
ank
cos 2mkw.
116
THESPACEC
F'rom
have
l<nkl
I filankI
it follows that, for all sufficiently large n, we
By this and (ll.ll),Zn + et2/2. But since 1 + u I eu, we also have
lTnl 5
t2aEk)'/2 I
exp(Ckt2aEk/2)_< e t 2 , and so An + 0.
Finally, by (11.8), lAnl I 1 lTnl I 1 et2, and so, by Theorem 3.5,
we can integrate to the limit: (11.9) follows.
There remains the proof of (11.10). Expanding the product that
defines Tngives
nk(l+
+
ETn = 1+ ) : ( f i i t ) " a j ,
(11.12)
+
1
[
-**aj,EC O S ~ ~ , U * * . C O S ~ ~, , W
where the sum extends over v = 1 , . . . ,n and n 2
And repeated use of (11.1)leads to
j1
> . > j , 2: 1.
(11.13) E [ c o s ~ ~ , u * * . c o s ~ ~ , u ]
1
-2"-1
C E [ c o s ( ~ ~f, mj, f
* * *
fm j , ) ~ ] ,
where here the sum extends over the 2"-l choices of sign. The expected
value in the sum is 0 unless
(11.14)
But if
mjl f mjz f
j1
(11.15)
* *
f mj, = 0.
> ... > j,, then by (11.6),
mj, f m j 2
1
f
e
.
9
1
2"- 1
and so (11.14) is impossible. The expected values on the right in (11.13)
and (11.12) are therefore 0, and (11.10) follows.
0
We need a second lemma for the proof of tightness. Define Sn by
(11.3), as before.
Lemma 2. If(11.6) holds, then
(11.16)
where C is a universal constant.
117
SECTION11. TRIGONOMETRIC
SERIES
PROOF.The proof uses (10.11) and Theorem 10.2 for
By (11.13) for = 4,
Q
= p = 1.
, " ~ E [ c o s ( mf m
j , j , f m j , &mj,)w],
(11.17) E[Si]= c ' a j , * * * a j ~
where in C' the four indices range independently from 1 to n, and C"
extends over the eight choices of sign. In place of (11.6) temporarily
make the stronger assumption that mk+l/mk 2 4. We must sort
out the cases where the coefficient of w in (11.17) vanishes for some
choice of signs. Suppose that j1 2 j 2 2 j 3 2 j 4 and consider whether
L: = mj, f mj, = f m j , f mj, =: R is possible. It is if j 1 = j , and
j 3 = j 4 , but this is the only case: If j 1 > j 2 , then mj, 2 4mj, and
hence L 2 4mj, fmj, > 2mj, 2 mj, mj, 2 R. Suppose on the other
hand that jl = j 2 but j 3 > j 4 . Then L = mj, k m j , is either 0 or 2mj2;
2mj,
but R # 0 rules out the first case and R 5 mj, mj, < 2mj, I
rules out the second.
Therefore (if mk+l/mk 2 4), the only sets of indices j 1 , . . . , j 4 that
contribute to the sum in (11.17) are those consisting of a single integer
repeated four times and those consisting of two distinct integers each
occuring twice. Since in any case the inner sum has modulus at most 1,
+
+
( 11.18)
Since (11.6) implies mk+2/mk 2 4, it implies that (11.18) holds if
k is restricted to even values in the sum Sn and also in the sum on
the right. The same is true for odd values, and so by Minkowski's
inequality, (11.18) holds if the right side is multiplied by 24. Finally,
the same inequality holds with Sj - Si in place of S,:
And now (11.16) follows by Theorem 10.2 if C = 16. 3 . K'.
0
PROOFOF THEOREM
11.1. We prove Y n + W by means of
Theorem 7.5 (with Y nand W in the roles of X n and X ) , and we turn
- lakl/Sn. Taking ank = a k / S n , k 5 n,
first to (7.13). Let an = maxk<,
in Lemma 1 gives Yy = Sn/sn + W1. Fix a t in (0,l) and let k n ( t )
THESPACEC
118
be the largest k for which s",sE
( t ) + l , and therefore,
4,
5 t. Then ts: lies between s i n ( t ) and
( 11.19)
By (11.5), qn is within f i a n of S k , ( t ) / S n , and it converges in
distribution to Wt by (11.19) and the lemma. And ( q n , Y r ) +
(Wt,W1)will follow by the Cram&-Wold method [PM.383] if we show
that uqn+vYy + uWt+vWl for all u, v. But this also follows from the
lemma if we take unk = (u v)ak/sn for k 5 k n ( t ) and ank = vak/sn
for k n ( t ) < k 5 n. This argument extends to the case of three or
more time points, and therefore (q:,. . . , &!) +n (Wt,,. . . ,Wt,) for
all t l , . . . ,t k .
We turn next to (7.14), which we prove by means of (7.10). Let
X = an.If 1 = 0 and m = n, then (11.16) can be written
+
But this holds for any pair of integers satisfying 0 5 1 5 m 5 n,
because it depends only on the condition (11.6). Suppose now that ti
are points satisfying 0 = t o < - < tv = 1 and (7.9), so that, by (7.10))
Suppose further that we can choose the ti in such a way that ti =
s k i / s : for integers mi satisfying 0 = mo < - - - < mv = n. Since the
random function Y" is a polygon with break-points over the ti, (11.21)
implies
w ( Y n , 6 )5 3 max max Isk - Sui-11.,
l<i<v ui-l<k<ui
Sn
the inner maximum here equals the supremum in (11.21). It follows
by (11.20) that
c C(ti
v - ti-^)^ 5 C
m
(11.22) P[w(Yn,6) 2 361 5 7
i=l
a ( t i - ti-1).
€4 l < i < v
Suppose we can, for all sufficiently large n, arrange that
SECTION
11. TRIGONOMETRIC
SERIES
119
The first two of these conditions will ensure that (11.22) holds for large
n, and it will then follow from the third one that P[w(Y",b) 2 3 ~ 15
36C/e4. And of course this implies (7.14), which will complete the
proof.
Integers mi satisfying (11.23) can be chosen this way: Take v =
[1/(26)1 and m, = n, and for 0 2 i 5 - 1, take mi = kn(2Si),
where k n ( t ) is defined as before. For large n, (11.23) then follows from
(11.19).
Incommensurable Arguments
Replace the integers m l , m2,. . . in (11.3) by a sequence XI, X2,. . . of
linearly independent real numbers, as at the end of Section 3. And
for simplicity take U k G 1. That is, replace & ( w ) = fiUkcosmkw by
G ( w ) = ficos&w and replace (11.3) by
(11.24)
k=l
For each w E R1,define a function Z " ( w ) in C by
k
for - I t < - k + l , 0 5 k < n .
n
n
This, a second trigonometric analogue of ( 8 . 5 ) , is a polygonal function
with values Z;,,(w) = Sl,(w)/fi
at the corner points. The objective
is to extend (3.33) to a functional central limit theorem.
Let <k be the random variables in (3.31). If we put v,(w) =
(ficos Xlw, . . . , ficosX,w) and V, = (0,.
. . ,&), then what (3.33)
says is that PTV;' JT PV;',
where P refers to the space the
independent sequence {<k} is defined on. Define 7,:Rn -+ C this
way: If z = (21 ,...,zn), take 7nz = z, where z(0) = 0, z(k/n) =
C;=,z i / f i for 1 5 k 5 n, and z is linear between these points. Since
T~ is continuous, it follows by the mapping theorem that PTV;'T;~
+T
PV;'T;~.
But T,(zI,(w)) is exactly Zn(w)as defined by (11.25), and
Xn = T ~ isV a ~special case of the random function (8.5) of Donsker's
theorem (a= 1). Therefore, Donsker's theorem applies:
PT(Zn)-l
+T
P(Xn)-l J
n
PW-l,
120
THESPACEC
where the P at the right refers to some space supporting a Wiener
random function W .
Now suppose that h: C -+ R1 is measurable, and let Dh be the set
of its points of discontinuity. If
P[Xn E Dh]= P[W E Dh] = 0,
(11.26)
n 2 no,
for some no, then two further applications of the mapping theorem give
P ~ ( Z ~ ) - l h -JT
l P(Xn)-lh-' +n PW- 1h-1 .
Finally, if H is a linear Bore1 set and
P[h(Xn) E a H ] = 0 = P[h(W) E a H ] , n 2 no,
(11.27)
then
PT[h(Zn(w))E HI
-'T
P[h(Xn) E H ] -$n P[h(W) E HI.
What the argument shows is this: If (11.26) and (11.27) hold, then
(11.28)
lim P,[h(Zn(w)) E HI = P[h(W) E HI.
n+m
Take H = (--oo,x]. Then (11.27) holds for all but countably many x,
and we arrive at our theorem.
Theorem 11.2. If XI, X I , . . . are linearly independent and Z"(w)
is defined over R1 by (11.25), and if (11.26) holds, then
(11.29)
lim P,[h(Zn(w)) 5 x] = P[h(W) 5 x]
n-mo
for all but countably many x.
If h(x) = suptx(t), for example, then (11.26) holds because h is
continuous everywhere on C. On the other hand, h has discontinuities
if h(x) is sup[t:z(t) = 01, for example, or if h(x) is the Lebesgue
measure of [ t : x ( t )> 01; but P[W E Dh] is 0 for these functions as
well [M15]. Since the random variables & in the definition of Xn have
continuous distributions in this particular case (see (3.31)),we also
have P[Xn E D h ] = 0 as long as n > 1. Therefore, Theorem 11.2
gives the folded normal distribution as the limiting distribution of the
maximum of the sums ~ C i < k c o s X i w1, 5 k 5 n, as well as an
arc sine law for the fraction of positive partial sums, and so on.
Problem
11.1. Prove a theorem that stands to (3.41) as Theorem 11.2 stands to (3.33).
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
CHAPTER 3
THE SPACE D
SECTION 12. THE GEOMETRY OF D
The space C is unsuitable for the description of processes that, like
the Poisson process and unlike Brownian motion, must contain jumps.
In this chapter we study weak convergence in a space that includes
certain discontinuous functions.
The Definition
Let D = D [ 0,1] be the space of real functions x on [ 0,1] that are
right-continuous and have left-hand limits:
(i) For 0 5 t < 1, x(t+) = lim,ltz(s) exists and x(t+) = ~ ( t ) .
(ii) For 0 < t 5 1, x ( t - ) = lim,px(s) exists.
Functions having these two properties are called cadlag (an acronym
for “continu 8. droite, limites 8. gauche”) functions. A function x is said
) x(t+) exist
to have a discontinuity of the first kind at t if ~ ( t - and
but differ and x ( t ) lies between them. Any discontinuities of a cadlag
function-of an element of D-are of the first kind; the requirement
x ( t ) = x(t+) is a convenient normalization. Of course, C is a subset
of D .
With very little change, the theory can be extended to functions on [ 0,1] taking
values in metric spaces other than R1.What changes are needed will be indicated
along the way. Denote the space, metric, and Bore1 a-field by V, v , and V ; V will
always be assumed separable and complete. The definition of the cadlag property
needs no change at all.
For x E D and T c [0,1], put
(12.1)
Wz(T)
= W ( Z , T ) = sup Ix(s>- x(t)l.
s,tET
121
THESPACED
122
The modulus of continuity of z, defined by (7.1), can be written as
(12.2)
w,(6) = w ( x ,6 ) = sup
Oltll-a
w,[t, t + 61.
A continuous function on [ 0,1] is uniformly continuous. The following
lemma gives the corresponding uniformity idea for cadlag functions.
Lemma 1. For each x in D and each positive E , there exist points
to, t l , . . . ,t, such that
(12.3)
0 =to
< tl <
* . *
< t, = 1
and
(12.4)
wz[ti-l,ti) < €,
i = 1 , 2 , . . . ,v.
PROOF.
Let to be the supremum of those t in [ 0,1] for which [ 0, t )
can be decomposed into finitely many subintervals [ti-l, ti) satisfying
(12.4). Since z(0) = x(O+), we have to > 0; since z(to-) exists, [ 0 , t o )
can itself be so decomposed; to < 1 is impossible because %(to)=
z(t"+) in that case.
n
F'rom this lemma it follows that there can be at most finitely many
points t at which the jump Iz(t) - z(t-)l exceeds a given positive
number; therefore, z has at most countably many discontinuities. It
follows also that z is bounded:
(12.5)
11x11 = SUP 1
4 < 00.
t
Finally, it follows that x can be uniformly approximated by simple
functions constant over intervals, so that it is Borel measurable.
If x takes its values in the metric space V, replace $he lx(s) - z ( t ) (in (12.1) by
v(x(s),z(t)); the magnitude of the jump in z at t is v(x(t),z(t-)). Lemma 1 and
its proof need no change, and in place of (12.5) we have the fact that the range of x
has compact closure. Indeed, for given points x(t,) in the range, choose a sequence
{ni}and a t so that either t,, 1 t or tni T t ; in the first case, z(tn,) -t x ( t ) , and
in the second, z(tni)+ x ( t - ) . As for measurability, a function that assumes only
finitely many values is measurable B/U,where 13 is the a-field of Borel sets in [ 0,1],
and [MlO] a limit of functions measurable B/U is itself measurable B/U.
We need a modulus that plays in D the role the modulus of continuity plays in C. Call a set {ti}satifying (12.3) 6-sparse if it satisfies
m i n l ~ i ~ v( tti-1)
i
> 6. And now define, for 0 < 6 < 1,
(12.6)
wL.6) = w'(z,6) = inf max wx[ti-l,ti),
{ti} l l i l v
SECTION
12. THEGEOMETRY
OF
D
123
where the infimum extends over all 6-sparse sets {ti}. Lemma 1 is
equivalent to the assertion that limb wg(6) = 0 holds for every z in D.
Notice that wL(6)is unaffected if the value z(1) is changed.
Even if z does not lie in D ,the definition of wL(6)makes sense. Just
as limb w,(6)= 0 is necessary and sufficient for an arbitrary function
z on [ 0 , 1 ] to lie in C, limb wL.6) = 0 is necessary and sufficient for z
to lie in D.
Now to compare wL(6)with ~~(6).
Since [ 0 , l ) can, for each 6 < $,
be split into subintervals [ti-l, ti) satisfying 6 < ti - ti-1 5 26, we have
(12.7)
There can be no general inequality in the opposite direction because
of the fact that w,(6) does not go to 0 with 6 if z has discontinuities.
But consider the maximum (absolute) jump in z:
(12.8)
the supremum is achieved because only finitely many jumps can exceed
a given positive number. We have
(12.9)
+
To see this, choose a &sparse {ti}such that w,[ti-l,ti) < wL(6) E
for each i. If 1s - tl 5 6, then s and t lie in the same [ti-l,ti) or else
in adjacent ones, and Iz(s) - z(t)l is at most wh(6) E in the first case
and at most 2w;(6)+e+j(z) in the second. Letting E + 0 gives (12.9).
Since j ( z ) = 0 if z is continuous, we also have
+
(12.10)
Because of (12.7) and (12.10), the moduli w,(6) and wk(6)are
essentially the same for continuous functions z.
The Skorohod Topology
Two funtions z and y are near one another in the uniform topology
used for C if the graph of z ( t ) can be carried onto the graph of y ( t )
by a uniformly small perturbation of the ordinates, with the abscissas
kept fixed. In D ,we allow also a uniformly small deformation of the
time scale. Physically, this amounts to the recognition that we cannot
measure time with perfect accuracy any more than we can position.
The following topology, devised by Skorohod, embodies this idea.
THESPACED
124
Let A denote the class of strictly increasing, continuous mappings
of [0,1] onto itself. If X E A, then A0 = 0 and A 1 = 1. For z and y in
D ,define d ( z , y ) to be the infimum of those positive E for which there
exists in A a X satisfying
sup I X t - tl = sup It - X-ltl < E
(12.11)
t
t
and
(12.12)
sup I x ( t ) - y(Xt)( = sup (z(x-'t)
t
t
- y(t)J < E .
To express this in more compact form, let I be the identity map on
[ 0,1] and use the notation (12.5). Then the definition becomes
By (12.5),d ( z , y ) is finite (take A t t ) . Of course, d ( z , y ) 2 0; and
d ( z , y ) = 0 implies that for each t either z ( t ) = y ( t ) or z ( t ) = y(t-),
which in turn implies z = y. If X lies in A, so does X-'; d ( x ,y) = d ( y , x)
follows from (12.11) and (12.12). If XI and A2 lie in A, so does their
composition X 1 X 2 ; the triangle inequality follows from (IX1X2 - Ill 5
l1X1 -Ill+ l l X 2 - I I I together with ~ ~ z - z X ~ X i
~ I (1z-yX211+I(y-zX1((.
I
Thus d is a metric.
This metric defines the Skorohod topology. The uniform distance
(Iz - y(( between z and y may be defined as the infimum of those
positive E for which sup, Iz(t)- y ( t ) ( < E . The X in (12.11) and (12.12)
represents the uniformly small deformation of the time scale referred
to above.
Elements z, of D converge to a limit z in the Skorohod topology if
and only if there exist functions A, in A such that limn zn(Xnt)= x ( t )
uniformly in t and limn Ant = t uniformly in t . If z, goes uniformly to
z, then there is convergence in the Skorohod topology (take Ant = t ) .
On the other hand, there is convergence z, = IIO,a+l/n) + z =
in the Skorohod topology (0 5 a < l ) , whereas zn(t)--t z ( t ) fails in
this case at t = a. Since
(12.14)
IznW - WI I
I%(t>- &wl I z ( W - z(t>l,
+
Skorohod convergence does imply that zn(t)+ z ( t ) holds for continuity points t of z and hence for all but countably many t. Moreover, it follows from (12.14) that if z is (uniformly) continuous on
all of [ 0,1], then Skorohod convergence implies uniform convergence:
llzn - zII 5 l)zn- zXnII w,(llA, - Ill). Therefore:
The Skorohod topology relativized to C coincides with the uniform
topology there.
+
SECTION
12. THEGEOMETRY
OF
D
125
Example 12.1. Consider again j ( z ) , the maximum jump in z.
Clearly, l j ( z ) - j ( y ) l < E if llz - yII < ~ / 2 ,and so j ( - ) is continuous in the uniform topology. But it is also continuous in the Skorohod
topology: Since j ( y ) = j(yX) for each A, d(z,y) < ~ / will
2 imply for
an appropriate X that l j ( z ) - j ( y ) ( = l j ( z ) - j(yX)I < E .
The space D is not complete under the metric d:
Example 12.2. Let zn be the indicator of [0, 1/2n). Suppose that
Xn(1/2,) = 1/2n+1. If A, is linear on [0, 1/2,] and on [1/2,, 11, then
((zn+lX,- znll = 0 and ((A, - 111 = 1/2n+1. On the other hand, if A,
does not map 1/2n to 1/2nf1, then ((zn+l
-znI( is 1 instead of 0, and it
follows that d(x,, zn+l) = 1/2n+1. Therefore, { z n }is d-fundamental.
Since zn(t)+ 0 for t > 0 and the distance from zn to the 0-function
0
is 1, the sequence {z,} is not d-convergent.$
We can define in D another metric do-a metric that is equivalent
to d (gives the Skorohod topology) but under which D is complete.
Completeness facilitates characterizing the compact sets. The idea in
defining do is to require that the time-deformation X in the definition
of d be near the identity function in a sense which at first appears more
stringent than (12.11); namely, we are going to require that the slope
( A t - X s ) / ( t - s ) of each chord be close to 1 or, what is the same thing
and analytically more convenient, that its logarithm be close to 0.
If X is a nondecreasing function on [0,1] satisfying A0 = 0 and
A 1 = 1, put
( 12.15)
If IIXJJ"is finite, then the slopes of the chords of X are bounded away
from 0 and infinity and therefore it is both continuous and strictly
increasing and hence is a member of A. Although IIXII" may be infinite
even if X does lie in A, these elements of A do not enter into the
following definition.
Let dO(x,y) be the infimum of those positive E for which A contains
some X such that ((All" < E and (12.12) holds. In other words, let
t We can replace [ 0,1/2") by [to, t 0 + 1 / 2 ~ here,
)
the point being that the example
does not depend on special properties of 0 (like A0 = 0).
THESPACED
126
Since IU - 11 5 el'Ogul - 1 for u > 0,
A t - A0
(12.17)
I <- ellhllo -
-1
1.
And since w 5 e" - 1 for all w , it follows by (12.13) that
Symmetry and the triangle inequality for do follow from \lX-lllo
and the inequality
= IlXll"
That do(z,y) = 0 implies z = y follows from (12.18) together with the
corresponding property for d. Therefore, do is a metric.
z) + 0 implies d(z,, z) -+ 0. The reverse
Because of (12.18), do(zn,
implication is a consequence of the following lemma.
Lemma2. ~ f d ( z , y <
) S2 and6 5 1/2, thendo(z,y)5 46+wk(S).
In connection with Example 12.1, note that, if A,(1/2,)
= 1/2nS1,
then the chord from (0, XO
, ) to (1/2,, X,(1/2,))
has slope 1/2, from
which it follows that do(zn,z,+l)= IIXnllo = log2. Therefore, {z,}
is not do-fundamental, as must be the case if D is to be do-complete.
And if 6 is just greater than
then d(zn,z,+l) < S2, and the
lemma applies; but since wkn(S) = 1, all that follows from this is the
fact that log 2 5 22-2/n 1.
0,
+
PROOF.Take 6 < S and then take {ti}to be a S-sparse set satisfying wz[ti-1, ti) < wk(S) E for each i. Now choose from A a p such
that
+
and
(12.21)
sup Jpt- t ( < 62.
t
We want to define in A a X that will be near p but will not, as p
may, have chords with slopes far removed from 1. Take X to agree with
p at the points ti and to be linear in between. Since the composition
SECTION
127
12. THEGEOMETRY
OF D
p-lX fixes the ti and is increasing, t and p-lXt always lie in the same
subinterval
[ti-1, t i ) . Therefore,
by (12.20) and the choice of { t i } ,
It is now enough to prove (IAIJ" I 46. Since X agrees with p at the
it follows by the condition (12.21) and the inequality ti - ti-1 > 6
that ( ( M i- X t i - 1 ) - (ti - ti-1)1 < 262 < 26(ti - t i - 1 ) . From this we
can deduce that I ( X t - As) - (t - s)l I 261t - sI for all s and t: Because
of the polygonal character of A, this is clear for s and t in the same
[ti-l, t i ] .And it follows in the general case because, for every triple u1,
ti,
u2, u3,
Therefore,
At - As
log(1 - 26) 5 log t-s 5 log(1
+ 26).
Since 1 log(1 fu)I 5 2 ( u ( for (u(
5 1/2, it follows that ((X(1" I 46.
0
By Lemmas 1 and 2 together, d(z,, x) + 0 implies dO(xn,
z)
0;
and as already observed, the reverse implication follows from (12.18):
--f
Theorem 12.1. The metrics d and do are equivalent.
Separability and Completeness of D
The following lemma will simplify several proofs. Suppose that the
< s k = 1, and define a map
set g = {s,} satisfies 0 = so < s1 <
A,: D + D in the following way. Take A,z to have the constant value
z(s,-~) over the interval [ s u - l , s u ) for 1 5 u I k and to agree with x
at t = 1. If x is sufficiently regular, then A,z will be close to z in the
metric d :
Lemma 3. Ifmax(s, - su-l)
I 6 , then d(A,x,x) 5 6 V wL.6).
PROOF.Write A,x = i to simplify the notation. Let Ct be the s,
"just to the left" o f t , in the sense that (1 = s k = 1 and <t = su-l
for t E [s,-l,s,).
Then i ( t )= z(<t).Given E, find a &sparse set { t i }
such that zur[ti-l,ti) < wk(6) E for each i. Let X t i be the sv "just
to the right" of ti, in the sense that Xto = SO = 0 and X t i = s, for
+
THESPACED
128
( s v - l , s v ] . Since ti - ti-1 > 6 2 sv - s,-l, the X t i increase with i.
Extend X to an element of A by interpolating linearly between the ti.
Obviously, sup, I X t - tl 5 6.
It is now enough to show that I2(t)-~(X-'t)l= Iz(<t)-z(X-lt)l5
wL(6)+ E (let E -, 0 at the end). This holds if t is 0 or 1, and it is
enough to show that, for 0 < t < 1, <t and X-lt always lie in the same
[ti-l, ti). To prove this it is enough to show that ti 5 <t is equivalent
to ti 5 X - l t (and hence <t < t j is equivalent to X-lt < t j ) . The case
t j = 0 being trivial, suppose that t j E (sv-l,sv].In this case, since
<t is one of the si7 t j 5 Ct is equivalent to s, _< <t, which in turn is,
by the definition of <, equivalent to sv 5 t. But from ti E (sv-l,s,] it
also follows that X t j = sv, and hence sv 5 t is equivalent to X t j 5 t , or
ti E
tj
5 X-lt.
0
Theorem 12.2. T h e space D is separable u n d e r d and do and is
complete u n d e r do.
PROOF.
Separability. Since d and do are equivalent and separability is a topological property, we can work with d. Let BI,be the set of
functions having a constant, rational value over each [(u- l ) / k , u / k )
and a rational value at t = 1. Then B = UI,
BI, is countable. Given
z and E , choose k so that k-l < E and wL(k-') < E . Apply Lemma 3
with u = { u / k } : d(z, A,z) < E . Clearly, d(A,z, y ) < E for some y in
BI,:d ( z , y ) < 2~ and y E B .
Completeness. It is enough to show that each do-fundamental
sequence contains a subsequence that is do-convergent. If {zk} is
do-fundamental, it contains a subsequence {y,} = {q,}such that
do(y,, yn+l) < 1/2,. Then A contains a pn for which
(12.22)
IlPnlI0 < 1/2"
and
The problem is to find a function y in D and functions A, in A for
which I X,I o --t 0 and y,(&lt> 3 y(t) uniformly in t.
A heuristic argument: Suppose that yn(A,'t)
does go uniformly to a limit
y(t). By (12.23), yn(pLIA;tlt) is within 1/2" of yn+l(Aiilt), and therefore it, like
y(A,'t),
must go uniformly to y(t). This suggests trying to choose A, in such a
way that yn(pL1A;ilt) is in fact identically equal to y,(A,'t),
or p;'A;:,
= A;',
SECTION 12.
THEGEOMETRY
OF D
129
or An = An+1/1n = An+2/1n+lpn = ..., and this in turn suggests trying to define
A, as an infinitely iterated composition: A, = . . * p n + l p n . If this idea works at all,
An should be near the identity for large R , just as the tail of a convergent infinite
product is near 1.
Since e" - 1 5 2u for 0 5 u <_ 1/2, it follows by (12.17) and (12.22)
that
This means that, for fixed n, the functions pn+m * . .pn+lpntare uniformly fundamental as m -+ 00. Therefore, the sequence converges
uniformly to a limit
The function An is continuous and nondecreasing and fixes 0 and
1. If we prove that IIXnll" is finite, it will follow that An is strictly
increasing and hence is a member of A. By (12.19) and (12.22),
Letting m + 00 in the first member of this inequality shows that
IIXnll" 5 1/2n-1; in particular, ( ( A n ( ( "is finite and An E A.
From (12.24) follows An = Xn+lpn, which is the same thing as
Xi!l
= pnXG1. Therefore, by (12.23),
It follows that the functions yn(X,lt), which are elements of D , are
uniformly fundamental and hence converge uniformly t o a limit function y(t). It is easy to show that y must be an element of D . Since
[(An((" -+ 0, yn is do-convergent to y.
0
It is interesting to observe that if do is replaced by d and llAllo is temporarily redefined as sup, / A t - t J , then this proof of completeness goes through word
for word, except at one place: sup,lAnt - tl 5 1/2"-' does not imply that A,
is strictly increasing. In fact, if ,!in carries 1/2" to 1/2"+' and is linear over
THESPACED
130
[ 0,1/2"] and [1/2", 11 (this is the function of Example 12.1 in new notation), then
p,+,,, . * . p n + i p n ( l / 2 n=
) 1/2"+,,,+'; (12.24) now implies that Xn(1/2n) = 0.
Compactness in D
We turn now to the problem of characterizing compact sets in D. Using
the modulus wk(S) defined by (12.6), we can prove an analogue of
Theorem 7.2, the Arzellt-Ascoli theorem:
Theorem 12.3. A necessay and suficient condition for a set A
to be relatively compact in the Skorohod topology is that
(12.2 5)
and
(12.26)
lim supwL(6) = 0.
6+0 x E A
This theorem differs from the Arzellt-Ascoli theorem in that for
no single t do SUP,~A ( z ( t ) <
J 00 and (12.26) together imply (12.25)
(consider xn = nI[a,ll). The important part of the theorem is the
sufficiency, which will be used to prove tightness.
PROOFOF SUFFICIENCY.
Let a! be the supremum in (12.25).
Given E , choose a finite €-net H in [-a,a!] and choose 6 so that 6 < E
and wk(6)< E for all z in A . Apply Lemma 3 for any 0 = { s u } satisfying max(s, - su-l) < 6: z € A implies d(x,A,x) < E . Take B t o
be the finite set of y that assume on each [su-l,su) a constant value
from H and satisfy y ( 1 ) E H . Since B obviously contains a y for which
d(A,x,y) < E , it is a finite 2enet for A in the sense of d. Thus A is
totally bounded in the sense of d.
But we must show that A is totally bounded in the sense of do,
since this is the metric under which D is complete. Given (a new) E ,
choose (a new) 6 so that 0 < 6 5 1/2 and so that 46 wi(6)< E holds
for all z in A. We have already seen that A is d-totally bounded, and
so there exists a finite set B' that is a b2-net for A in the sense of d.
But then, by Lemma 2, B' is an E-net for A in the sense of do.
+
The proof of necessity requires a lemma.
Lemma 4. For @ed 6 , w'(x, 6 ) is upper-semicontinuous in x .
PROOF.Let x, 6, and E be given. Let {ti}be a 6-sparse set such
that w,[ti-l, ti) < wk(6) E for each i. Now choose q small enough
+
SECTION
12. THEGEOMETRY
OF
D
131
that 6+2q < min(ti-ti-1) and q < E . Suppose that d ( z ,y) < q. Then,
forsomeAinA,wehavesuptly(t)-z(At)I <qandsuptIA-lt-tl < q .
Let si = A-lti. Then si - si-1 > ti - ti-1 - 2q > S. Moreover, if s and
t both lie in [si-l, sj), then As and A t both lie in [ti-l,ti) and hence
~Y(s)- Y(t)l < Iz(As) - z(At)l+ 2q 5 w;(S) c 277. Thus d(z,9) < 77
0
implies wh(S)< wL(6) 3 ~ .
+ +
+
PROOF
OF NECESSITY
IN THEOREM
12.3. If A- is compact, then
it is d-bounded (has finite diameter), and since supt Iz(t)(is the ddistance from z to the 0-function, (12.25) follows.
By Lemma 1, w’(z, 6) goes to 0 with 6 for each z. But since w’(.
,S)
is upper semicontinuous, the convergence is uniform on compact sets
[W*
0
For the general range space V ,write supt v(z(t), y(t)) in place of 112 - yI1 in all
the definitions and arguments (it is mostly a matter of using the triangle inequality
in V rather than on the line); [IX - I [ (and (IX((’ need no change. In Example 12.1,
take zn to have value a on [ 0,1/2”) and value b on [1/2n, 11, where a and b are
distinct points of V . In the argument for separability, any countable set dense in
V can replace the rationals. The completeness argument, which uses the assumed
completeness of V ,goes through as before. In place of (12.25) in the condition for
5 E A , t E [ 0, I] ] has compact closure, and
compactness, assume that the set [z(t):
in the proof of sufficiency use an +net from this set.
A Second Characterization of Compactness
Although the modulus wk(6)leads to a complete characterization of
compactness, a different one is in some ways easier to work with. This is
the supremum extending over all triples t l , t , t 2 in [ 0 , 1 ] satisfying the
constraints. Suppose that w;(S) < w , and let { ~ i } be a &sparse set
such that wZ[7i-1, ~ i )< w for all i. If t 2 - t l 5 6,then tl and t 2 cannot
lie on opposite sides of any of the subintervals [ ~ i - 1, ~ i ) , and therefore,
if tl 5 t 5 t 2 , either t l and t lie in a common subinterval or else t
and t 2 do, so that either Iz(t)- z(t1)l < w or else Jz(t1)- z(t)l < w.
Therefore (let w 1 w;(S)),
(12.28)
w;(S) 5 wL(6).
There can be no inequality in the opposite direction: For the functions
(12.29)
z n = I[O,n-l),
= I[l-n-l,l],
THESPACED
132
we have wgn(b) = 0, whereas win(6) = 1 for n 2 6-I. In neither case
does {z,} have compact closure, and so there can be no compactness
condition involving (in addition to (12.25)) a restriction on w$(6) alone.
It is possible, however, to formulate a condition in terms of wi(6) and
the behavior of 2 near 0 and 1.
Theorem 12.4. A necessaq and suficient condition for a set
A to have compact closure in the Skorohod topology is that it satisfy
(12.25) and
lims+o S U P , ~ A ~"(6)= 0,
lim6+0 SUP,€A Iz(6) - z(O)( = 0,
lim6,osupZEA \z(l-) - x (1- 6)l = 0.
(12.30)
PROOF.
It is enough to show that (12.30) is equivalent to (12.26).
That (12.30) follows from (12.26) is clear from (12.28). In fact,
(12.31)
w:(6)
V Iz(6) - z(0)I V [ ~ ( l --)~ (-16
)l
5 wk(26).
We can prove the reverse implication by showing that
(12.32) wk(i6) 5 24{w:(6)
V Iz(6) - z(0)l V Iz(l-)
- z(1 - 6)l).
We first show that
(12.33)
Ix(s)
- z ( t l ) (V Iz(t2)- z(t)l I
2w:(6)
t 5 t 2 , t2 - tl I
6.
if tl 5 s I
Indeed, if lz(s)-z(tl)l > w;(6), then, by the definition, Iz(t)-z(s)l 5
wg(6) and (z(t2)- z(s)l 5 wg(S), so that Iz(t2)- z ( t ) (5 2w,'(6).
Next,
(12.34).
w,[tl,t2) I 2(wi(6)
+ Iz(t2)- z(tl)l) if t2 - tl 5 6.
To see this, note that, if tl 5 t < t2 and Iz(t)- z(t1)l > wi(6), then
Iz(t2)- z(t)I I
w,N(6), and therefore, ( z ( t )- z ( t l ) (5 ( z ( t )- x(t2)(
Iz(t2)- z(t1)l Iw:(6)
lz(t2)- z(tl)l. Hence (12.34).
F'rom (12.34) it follows that w,[O,6) 5 2(wg(S)+)2(S)-z(O)\) and
wz[l - 6 , l ) 5 2 ( 4 ( 6 ) lz(l-) - z(1 - &)I) (for the second inequality, take tl = 1 - S and let t 2 increase to 1). Because of these two
inequalities, (12.32) will follow if we show that
+
+
+
(12.35)
w k ( i 6 ) 5 6{w:(6)
V
w,[O,6) V w2[1 - 6,l)).
SECTION
133
12. THEGEOMETRY
OF D
Let a exceed the maximum on the right in (12.35). It will be enough
to show that wh(6/2) 5 6 a . Suppose that z has jumps exceeding 2 a
at u1 and u 2 . If u2 - u1 < 6, then there are disjoint intervals ( t 1 , s )
and ( t ,t 2 ) containing u1 and u 2 , respectively, and satisfying t 2 - tl < 6,
and if these intervals are short enough, we have a contradiction with
(12.33). Thus [ 0 , 1 ] cannot contain two points, within 6 of one another,
at each of which z jumps by more than 2 a . And neither [0,6) nor
[l - 6 , l ) can contain a point at which z jumps by more than 2 a .
Thus there exist points si, satisfying 0 = so < s1 < - . < sT = 1,
such that si - si-1 2 6 and such that any point at which z jumps
by more than 2 a is one of the si. If s j - sj-1 > 6 for a pair of
adjacent points, enlarge the system {si} by including their midpoint.
Continuing in this way leads to a new, enlarged system S O , . . . , sT (with
a new r ) that satisfies
1
2
-6 < si - si-1 5 6, i = 1 , 2 , . . . ,r.
Now (12.35) will follow if we show that
(12.36)
wZ[si-l,
$2)
5 6a
for each i. Suppose that si-1 5 tl < t 2 < Si. Then t 2 - tl < 6.Let u1
be the supremum of those points u in [ t l , t 2 ] for which the inequality
suptllu<o Iz(u) - z(t1)l5 2 a holds. Let u2 be the infimum of those u
in [tl,t 2 ] for which supolultz 1 z ( t 2 ) - z ( u ) I 5 2 a . If 01 < 0 2 , then there
exist points s just to the right of u1 satisfying Ix(s) - z(t1)l > 2 a and
there exist points t just to the left of 02 satisfying Iz(t2)- z(t)l > 2a;
since we can arrange that s < t , this contradicts (12.33). Therefore,
u 2 5 01 and it follows that Iz(ul-) -z(tl)l 5 2 0 and ( z ( t 2 ) - z(01)l 5
2 a . Since tl < 01 5 t 2 , 01 is interior to ( s i - l , s i ) , and so the jump at
u1 is at most 2a. Thus lz(t2)- z(tl)l 5 60. This establishes (12.36),
hence (12.35), and hence (12.32), which proves the theorem.
0
Finite-Dimensional Sets
Finite-dimensional sets play in D the same role they do in C. For
0 5 tl < . . < t k 5 1, define the natural projection 7rtt,...tkfrom D to
R~ as usual:
(12.37)
Ttl”’tk(x)
= (z(tl),* * * z ( t k ) ) ‘
Since each function in A fixes 0 and 1, no and T I are continuous.
Suppose that 0 < t < 1. If points zn converge to z in the Skorohod
THESPACED
134
topology and 5 is continuous at t , then (see (12.14)) x n ( t ) + x ( t ) .
Suppose, on the other hand, that x is discontinuous at t. If A, is the
element of A that carries t to t - l / n and is linear on [ 0, t ] and [t,11,
and if zn(s)3 x(X,s), then xn converges to 2 in the Skorohod topology
but x n ( t ) does not converge to x ( t ) . Therefore: If 0 < t < 1, then rt
is continuous at x if and only if x is continuous at t .
We must prove that rt1...tk
is measurable with respect to the Bore1
a-field D,We need consider only a single time point t (since a mapping into Rkis measurable if each component mapping is), and we may
assume t < 1 (since TI is continuous). Let h,(x) = 6 - l
x ( s )ds.
If x, --f 2 in the Skorohod topology, then zn(s) x ( s ) for continuity points s of 2 and hence for points s outside a set of Lebesgue
measure 0; since the z, are uniformly bounded, limn he(xn)= h,(x)
follows. Thus he( . ) is continuous in the Skorohod topology. By rightcontinuity, hm-l(x) -P r t ( x ) for each x as m -+ 00. Therefore: Each
rt is measurable.
Having proved the 7rtl...tk measurable, we may, as in C,define in D
the class D Dof~ finite-dimensional sets-those of the form T G ~ . . ~for
~H
k 2 1 and H E Rk.If T is a subset of [ 0 , 1 ] ,let p [ r t : t E T ] be the
class of sets r G f , . t k Hwhere
,
k is arbitrary, the ti are points of T , and
H E Rk;
these are the finite-dimensional sets based on time-points in
T . Then p[.rrt:t E T ] is a r-system (see the argument in Example 1.3).
Let a[rt:t E T ] be the a-field generated by the real functions rt for
t E T ;it is also generated by the class p [ r t : t E TI.
At+'
--f
Theorem 12.5. (i) The projections no and 7r1 are continuous; for
0 < t < 1, rt is continuous at x if and only if x is continuous at t .
(ii) Each nt is measurable V / R 1and
, each ntl...tkis measurable D / R k .
(iii) I f T contains 1 and is dense in [ 0 , 1 ] , then a[7rt:t E T ] = D and
p [ ~ tt :E TI is a separating class.
PROOF.Only (iii) remains to be proved. By right-continuity and
the assumption that T is dense, it follows that 7ro is measurable with
respect to a [ r : t E TI, and so we may as well assume that T contains
0. For each rn, choose points SO,.. . ,s k of T in such a way that 0 =
SO <
< sk = 1 and max(s, - su-l) < rn-', and take Om = { s u } ;
here k and the s, are functions of m. For o = ( 0 0 , . . . ,ak)in
let
Vma be the element of D taking the constant value a,-1 on [su-l, s u ) ,
1 i u 5 k, and taking the value crk at t = 1. Clearly, Vm:Rk+l +
D is continuous (relative to the Euclidean and Skorohod topologies)
is
and hence is measurable with respect to 7Zk+1/D. Since rso...sk
SECTION
135
12. THEGEOMETRY
OF D
measurable a[7rt:t E T]/R"', the composition Vmn,o...,kis measurable
a[7rt:t E T ] / D . But this composition is just the map A,, of Lemma
3, which is therefore measurable ~ [ n tt : E T ] / D . Since d ( z , Aom) 5
rn-l V u~l(m-~)
by the lemma, z = lim, Aa,z for each z. This means
[MlO] that the identity function is measurable a[7rt:t E T]/D,
which
implies D c a[7rt:t E TI. Since p[7rt:t E TI is therefore a n-system
0
generating D,it is a separating class.
It follows from Lemma 4 that w'(.,6) is D-measurable. Since the
functions in D are right-continuous, the supremum in (12.27) is unchanged if t l , t, t 2 are restricted to the rationals; since the individual
7rt are D-measurable, it follows that w"( ,6) is also D-measurable.
-
Random Functions in D
Just as in the case of C, the finite-dimensional distributions of a probability measure P on ( D ,D)are the measures P7rg1..tk.Since the projections are not everywhere continuous on D , there is this difference with
C: Pn 3 P does not always imply Pn7rG1..tk+ P7rg1..tk;see the next
section. On the other hand, D is like C in that if the finite-dimensional
distributions do converge weakly, the measures on D they come from
may not; in fact, Example 2.5 applies to D just as well as to C.
If (R,.F, P) is a probability space and X maps R into D , then X
is a random element of D-in the sense that it is measurable F/Dif and only if each Xt(w) = nt(X(w)) defines a random variable; the
argument is as for C (see p. 84). Finally, each coordinate function
7 r t ( z ) = z ( t )= zt can be viewed as a random variable on (D,D)
(p.
86).
The Poisson Limit*
A probability measure Pa on D describes a Poisson process with rate
cr if the increments are independent under Pa and have Poisson distributions:
(12.38)
P,[zt - z, = i] = e -a(t-s)
(QG- S H i
i!
To prove the existence of such a measure, first construct the corresponding random function: Call a function in D a count path if it is
nondecreasing, takes integers as values, and has jumps of exactly 1 at
its points of discontinuity. Let D, be the set of count paths. The idea
is that the elements of D, are the possible paths for a point process,
a process that counts events. Take a Poisson process [Xt:O 5 t 5 11,
THESPACED
136
arrange that each sample path X ( . , w )lies in D, [PM.297], and let Pa
be the distribution of X ( w ) ,
The following theorem shows that weak convergence to Pa is just a
matter of the convergence of the finite-dimensional distributions. Let
Pn, P be probability measures on (D,D).
Theorem 12.6. Suppose that E E D and To is a countable, dense
set in [ 0,1]. Suppose further that, if x ,xn E E and zn(t) --$ z(t) for
t E TO,then xn + x an the Skorohod topology. If PnE = PE = 1 and
PnnGt...,tk J n PT;; ..,,tk for all k-tuples an To, then Pn +n P.
The set D, of count paths satisfies the hypothesis on E , as will be
shown.
PROOF.
The idea is, in effect, to embed E in R" with the product
topology; see Example 2.4. Let TO= { t l ,t 2 , . . .} and define 7r: D -, R"
by T ( Z ) = ( z ( t l )z(tg),
,
. . .). If 7rk is the natural projection from Rm
to RL,defined by nk(z1,z2,. . .) = ( ~ 1 , .. . ,zk),then 7 r k ~is the natural
projection Tt,...tk from D to Rk.Therefore, T-'(TF'H) =
E 2)
for H E R k , and since the sets ";'HI the elements of Ry,generate
R", it follows that 7r is measurable D / R m .
For A c D,define A* = r-l(.rrA)-. Then A* E D and A c A*. If z
lies in ( A m ) *so
, that T X E ( T ( A n E ) ) -then
,
there is a sequence { x n }
in A n E such that m, -, 7rx. If z also lies in E , then by the hypothesis,
xn converges to z in the Skorohod topology. Therefore, ( An E)* n E
is contained in the closure A- of A in the Skorohod topology. Since
-1
Pn7rG!..tk +, 7rtl...tk
by hypothesis, and since 7rtl...tk = 7rk7r, we have
PnT-lKil J n Pn-'7rF1. But since in R" weak convergence is the
same thing as weak convergence of the finite-dimensional distributions,
it follows that Pnxr-l + P7r-l on R". Therefore, if A E D ,
lim sup PnA 5 lim sup P,A*
n
n
= limsup PnT-'(TA)n
5 PT-'(TA)- = PA*.
And now, since P,E = PE = 1 and ( An E)* n E
c A-,
lim sup PnA = lim sup Pn(An E )
n
Therefore, Pn =+-P.
n
< P ( An E)* = P ( ( An E)* n E ) 5 PA-.
0
SECTION
13. W E A K CONVERGENCE AND TIGHTNESS
IN D
137
It is not hard to see that the set D, of count paths is closed in
the Skorohod topology. Let To be any countable, dense set in [ 0,1],
and suppose that z,z, E D, and that, for each t in TO,z n ( t )-, ~ ( t )
(which means that zn(t) = z ( t )for large n). A function in D,has only
finitely many discontinuities; order those for 2: 0 < tl < - .. < t k 5 1.
For a given E , choose points ui and ui in To in such a way that ui <
ti 5 vi, ui - ui < E , and the intervals [ui-l,
ui]are disjoint. Then, for
n exceeding some no, z, agrees with z over each [ui-l,ui] and has a
single jump in each [ui,
4. If A, (in A) carries ti to the point in [ui,
ui]
where 2, has a jump (and is defined elsewhere by linearity, say), then
supt IX,t - tl 5 E and z:,(A,t) = z ( t ) . Therefore, z, converges to 2
in the Skorohod topology, and so D, satisfies the hypothesis on E in
Theorem 12.6.
Exumple 12.3. Suppose that, for each n, <,I,. . . ,tnnn
are independent and take the values 1 and 0 with probabilities a / n and 1 - a/n.
Define a random function X n by X r ( w )=
&i(w). Then X n has
independent increments; since X r - XF has the binomial distribution
for lntJ - LnsJ trials, with probability of success a / n at each, it has in
the limit the Poisson distribution with mean a(t - s). It follows easily
from the theorem that X n + Pa. This can be extended in various
ways, for example to the (Aiw) of (3.34) and (3.35).
13
Problems
12.1. Let D+ be the class of functions on [ 0,1] that have only discontinuities of
the first kind in (0,l) and have right-hand limits at 0 and left-hand limits at
1. Convert D+ into a pseudo-metric space in such a way that (i) 5 and y are
at distance 0 if and only if they agree at their common continuity points and
at 0 and 1, and (ii) D is isometric to the standard quotient space (see Kelley
[41],p. 123).
12.2. Under the Skorohod topology and pointwise addition of functions, D is not a
topological group.
12.3. The set C is nowhere dense in D .
12.4. Put
w r ( 6 ) = sup
SUP {wz(ti,t) A W , ( t , t z ) } .
tz-t~<dtl<t<tl
Show that
w r ( 6 ) = sup
inf
ta-t1<6 ti<t<tz
and
W:(a)
12.5. Suppose that
functions
{wz(tlr t ) V w z ( t ,t z ) }
5 w:"(6) 5 2 4 6 ) .
< is uniformly distributed over [i,I],and consider the random
x = 24€,1],
X" = I[c-n-',l]+ IIE+n-l,l].
138
THESPACED
Show that X" ji X , even though XG...tk
does Theorem 12.6 not apply?
* X t , ...t ,
for all t l , . . . ,t k . Why
SECTION 13. WEAK CONVERGENCE AND TIGHTNESS
IN D
Prove weak convergence in function space by proving weak convergence
of the finite-dimensionaldistributions and then proving tightness-this
was the technique so useful in C,and we want to adapt it to D . Since D
is separable and complete under the metric do, a family of probability
is relatively compact if and only if it is tight, and
measures on (D,D)
so there is no difficulty on that point. On the other hand, the fact
that the natural projections are not continuous complicates matters
somewhat.
Finite-Dimensional Distributions
For probability measures P on ( D ,D),denote by T p the set o f t in [ 0,1]
for which the projection ~t is continuous except at points forming a
set of P-measure 0. Since TO and T I are everywhere continuous, the
points 0 and 1 always lie in T p . If 0 < t < 1, then ~t is continuous at
z if and only if z is continuous at t , and it follows that t E T p if and
only if PJt = 0, where
(13.1)
Jt = [ z : z ( t#
) z(t-)].
(This equivalence holds only in the interior of the unit interval: 0 lies
in T p , and each function z is continuous at 0; 1 lies in T p , but an x
may or may not be continuous at 1.)
An element of D has at most countably many jumps. Let us prove
the corresponding fact that PJt > 0 is possible for at most countably
many t. Let Jt(e) = [z: Iz(t)- z(t-)l > €1. For fixed, positive E and
6, there can be at most finitely many t for which P ( J t ( e ) ) 2 6, since
if this held for infinitely many distinct tn, then P(limsup, Jt,( E ) ) 2 6
would follow, contradicting the fact that for a single z the saltus can
exceed e at only finitely many places. Since P ( J t ( e ) ) 1 PJt as E 1 0,
the result follows:
T h e set T p contains 0 and 1, and its complement in [ 0,1] is at
most countable.
If t l , . . . ,t k all lie in T p , then Ttl...tk is continuous on a set of Pmeasure 1, and it follows by the mapping theorem that
(13.2)
Pn
+P
SECTION
13. W E A K CONVERGENCE AND TIGHTNESS
IN D
139
implies
(13.3)
If some ti lies outside Tp,then (13.3) may not follow from (13.2): Take
P to be a unit mass at Ipt)and Pn to be a unit mass at Ipt+n-i).
Here is the analogue of Theorem 7.1:
Theorem 13.1. If {Pn} is tight, and if PnTGf,,t,
whenever t i , . . . t k all lie an Tp, then P n + P .
+ PIT;^..^, holds
PROOF.By the corollary to Theorem 5.1, it is enough to show
that, if a subsequence {Pn,} converges weakly to some Q, then Q must
coincide with P. Assume that Pn, +i Q. If t i , . . . ,t k lie in Tp, then
Pnir&!..tk =+i PTG1,.tkby the hypothesis of the theorem. If t i , . . . , t k
lie in TQ,then Pnir&f..t, +i Qr;1..,, by the assumption. Therefore,
if t l , . . . , t h lie in Tp n TQ,then PrL1,.tk= QT;~..~,.But by Theorem
12.5, this implies that P = Q, because Tp n TQcontains 0 and 1 and
has countable complement-since Tp and TQboth do.
0
Tightness
The analysis of tightness in C began with a result (Theorem 7.3) which
substituted for compactness its ArzelA-Ascoli characterization. Theorem 12.3, which characterizes compactness in D , gives the following
result. Let {Pn} be a sequence of probability measures on ( D ,D).
Theorem 13.2. The sequence {Pn} is tight if and only if these
two conditions hold
(i) We have
(ii) For each
(13.5)
E,
limlimsup P,[x: wL(6)2
6
n
€1 = 0.
PROOF.Conditions (i) and (ii) here are exactly conditions (i) and
(ii) of Theorem 7.3 with 1 1 ~ 1 1in place of Iz(0)l and w' in place of w.
Since D is separable and complete, a single probability measure on D
is tight, and so the previous proof goes through.
0
140
THESPACED
There are two convenient alternative forms for (13.4). The first
controls Ilzll by controlling Iz(t)l for the individual values of t , the
second by controlling Iz(0)l and j ( z ) (defined by (12.8)).
Corollary. Either of the following two conditions can be substituted for (i) in Theorem 13.2:
(i‘) For each t in a set T that is dense an [ 0,1] and contains 1,
lim limsupP,[z: Iz(t)l 2 a] = 0.
(13.6)
a+oo
n
(i“) The relation (13.6) holds for t = 0, and
lim limsupP,[z:j(rc) >_ a] = 0.
(13.7)
a4oo
n
The proof will show that (i), (i’), and (i”) are all equivalent in the
presence of (ii).
PROOF.It is clear that (i) implies (i‘) and (i”). It is therefore
enough to show that, in the presence of (ii), (i) follows from (i’) and
also from (i”).
Assume (ii) and (i’). Let {to,.. .t v }be a &sparse set for which
w,[t+l,ti) < wL(6) 1, i 5 v. Choose from T points sj such that
0 = so < s1 < - .- < sk = 1 and sj - sj-1 < 6. (The ti depend on
z, but the s j do not.) If m ( z ) = maxo<jikJz(sj)l,then, since each
[ti-I,ti) contains an sj, 11zl1 5 m(z) wi(6) 1. If (13.5) and (13.6)
hold, then there exist a 6 and an a such that Pn[z:wL(6) _> 11 < 77 and
Pn[z:m ( z )2 a] < 77 for large n. But then P,[z: 11x11 2 a 21 < 277 for
large n, which proves (13.4). Thus (ii) and (i’) imply (i).
Assume (ii) and (i”). Let {ti}be the &sparse set (depending on
z) described above. Now (z(ti)- z(ti-1)l I
wk(6) 1 j ( z ) , and
so maxisv (z(ti)l5 Iz(0)l w(wk(6) 1 j(rc)) and (since 6v 5 1)
11z11 5 Jz(O)( 6-’(w;(6) 1 j ( z ) ) wk(6).If (13.5) holds, there
is a 6 such that Pn[z:w;(6) 2 11 < q for large n. Further, if (13.6)
holds for t = 0, and if (13.7) holds, then there is an a such that
Pn[z:
Iz(O)( 2 a] < 77 and Pn[z:
E 1 ( 2 j ( z ) ) 1 2 a] < 77 for large n.
And then Pn[z:llzll 2 2a] < 377 for large n. Thus (ii) and (i”) imply (i).
+
+
+
+
+
+
+ +
+ +
+ +
+
+
+
We can use Theorem 12.4 in place of Theorem 12.3 here. In fact,
the inequalities (12.31) and (12.32) make it clear that we can replace
SECTION
13. W E A K CONVERGENCE AND TIGHTNESS
IN D
141
(13.5) by the condition that, for each positive E and q, there exist a 6,
0 < 6 < 1, and an integer no such that
for n 2 no. As the next theorem shows, the second and third conditions
in (13.8) are automatically satisfied in the most important cases.
Theorem 13.3. Suppose that Pnxgl..,, + PT;!..~, whenever the
ti all lie an T p . Suppose further that, for every E ,
lim P[z: Iz(1) - z( 1 - 6) I
(13.9)
6+0
and that, for positive
such that
and q, there exist a 6 , 0 < 6 < 1, and a n no
Pn[z:w:(6) 2
(13.10)
Then P,
E
> E ] = 0,
€1 I
q,
n 2 no.
+ P.
PROOF.We need only prove that {P,} is tight, and for this it is
enough to check (13.8) and (13.6) with T p in the role of 5". For each t
in Tp, the weakly convergent sequence {P,x,l} is tight, which implies
(13.6).
As for (13.8), we need consider only the second and third conditions, since of course (13.10) takes care of the first one. By rightcontinuity we have P[z:Iz(S) - z(0)I E ] < q for small enough 6; by
hypothesis, P,T~:+ P.0,: if 6 E T p , and it then follows that the
middle condition in (13.8) holds for all sufficiently large n. With one
change, the symmetric argument works for the third condition. From
the fact that cadlag functions have left-hand limits at 1, it follows that
P[z: lz(l-) - z(1 - S)l 2 E ] -+ 0 as S + 0, and from (13.9) it follows
further that P[z:Iz(1)-z(l-)l 2 E ] = 0-that is, PJ1 = 0. Therefore,
there is continuity from the left at 1 outside a set of P-measure 0, and
the symmetric argument does go through.
0
>
Let $t:D
R2 carry z to (z(t),z(l-)). One can replace (13.9)
in this theorem by the condition that Pn$tl+ P$tl for t E T p . But
some restriction on the oscillations near 1 is needed: Consider the case
where P, is a unit mass at 1p-,-1,~1.
--f
THESPACED
142
Theorem 13.3 can be stated in terms of random elements of D ;compare Theorem 7.5. The second proof of Theorem 7.5 makes very little
use of the general theory of weak convergence, and the convergence-indistribution version of Theorem 13.3 can be given a similar proof.+
Let X n and X be random elements of D.
Theorem 13.4. Suppose that Xn
and only i f j ( X n )=+ 0.
+X.
Then P[X E C ] = 1 if
PROOF.By Example 12.1 and the mapping theorem, j ( X n ) =+
0
j ( X ) . And j ( X )= 0 if and only if X E C.
A corollary shows what happens if we use the conditions for weak
convergence in C.
Corollary. Suppose the conditions (7.6) and (7.7) hold for probability measures on (D,D).
Suppose further that PnrG1,.tk =+ P K ; ~ . . ~ ~
for all t l , . . . ,t k . Then Pn + P , and Pc=1.
PROOF.
The proof of Theorem 7.3 shows that condition (i) of Theorem 13.2 holds, and by (12.7), condition (ii) holds as well. Therefore,
Pn =+ P,or Xn
X for the corresponding random functions. And
since j(Xn)
5 W(Xnl6)for each 6 , (7.7) implies that j ( X n )+ 0. tl
A Criterion for Convergence
Write Tx for T p , where P is the distribution of X .
Theorem 13.5. Suppose that
for points ti of Tx,that
and that, for r 5 s 5 t , n 2 1, and X > 0 ,
(13.13)
1
P[IXz - XFI A lXr - XrI 2 A] 5 -[F(t)
X4P
i,
- F(r)I2",
where /3 2 0 and CY > and F is a nondecreasing, continuous function
on [0,1].ThenXn*n X .
t
See Problem 13.1.
SECTION
13. W E A K CONVERGENCE AND TIGHTNESS
IN D
143
There is a more restrictive version of (13.13) involving moments,
namely
PROOF.By Theorem 13.3, it is enough to show that for each
positive E and q there is a 6 such that P[w"(Xn,6) 2 €1 5 q for all
n. We can prove this by applying Theorem 10.4 to each Xn, since the
paths are right-continuous and w"(Xn, 6) = L ( X n ,6). Let T = [ 0,1]
and define p by p(s, t] = F ( t ) - F ( s ) . With Xn in the role of y, (10.20)
is the same thing as (13.13). It follows by (10.21) that
P[WN(Xn,6) 2
€1 < -2K
p[
€4P
0,1]
sup
O<t< 1-26
p2"-l[t, t
+ 261
where WF is the modulus of continuity. Since F is uniformly continuous
and (Y > f , it is possible, for given E and q , to choose 6 so that the
right side here is less than q.
0
For applications of Theorem 13.5, see the next section.
A Criterion for Existence*
These ideas lead to a condition for the existence in D of a random
element with specified finite-dimensional distributions. For each ktuple tl, . . . ,t k , let pt l...tk let be a probability measure on (Rk,
Rk),
and assume that these measures satisfy the consistency conditions of
Kolmogorov's existence theorem.
Theorem 13.6. There exists in D a random element with finitedimensional distributions pt, ...tk,provided these distributions are consistent; provided that, f o r tl t t 2 ,
< <
where p 2 0, (Y > f, and F is a nondecreasing, continuous function
on [ 0,1]; and provided that
THESPACED
144
By right-continuity, (13.16) is in fact necessary for the existence of
such a random element of D.
PROOF.
For each n, consider the points ty = i / 2 n for i = 0 , . . . ,2n,
and let X n be a random function that is constant over each [ty-l,tP)
and for which (Xz",,. . . ,X&,) has distribution pt 0 . . . t 2 n . Let Y nbe the
process X n with the time-set cut back to Tn= {t?}.Let pn have mass
F ( t l ) - F(ty-L_,)
at tP for i = 1 , . . . , 2 n , and apply Theorem 10.4. By
(13.15),
and it follows by (10.21) that
Suppose that T I s 5 t and t - s I 6. Move T , s , ~to the left
endpoints T ' , s', t' of the intervals [t&l,t r ) containing them (t' = 1 if
t = 1). Then IXr-XFIAIXF-XrI = lY$-T?[Al~?-YTl
and t'-s' I
6 + 2-n. It follows that, if 2-n I 6, then w"(Xn,
6) 5 L ( Y n2, 6 ) . And,
again if 2-n I 6, pn(Tnn [t,t 461) 5 F ( t + 46 2-n) - F ( t - 2 - n ) I
~ ~ ( 42 62-n) I ~ ~ ( 6 6Since
) . pn(Tn)
= F(l) - F(O), we arrive at
+
+
P[w"(Xn,6)2 €1
I ,4p[F(1)
2K
+
-F(O)](w~(66))~"-~.
Fkom the uniform continuity of F , it follows that, for given
there exists a 6 such that
(13.17)
P[w"(Xn,6)2
E
and 77,
€1 5 77
for n large enough that 2+ 5 6.
The distributions of the Xn will be tight if they satisfy (13.4) and
(13.8). If 2-k 5 6, then
Since the distributions of the first term on the right all coincide for
n 2 k, it follows by (13.17) that (13.4) is satisfied.
SECTION
13. W E A K CONVERGENCE AND TIGHTNESS
IN D
145
The first condition in (13.8) of course holds because of (13.17). To
take care of the second and third conditions, we temporarily assume
that, for some positive 60, h 2 60 implies
Under this assumption, the second and third conditions in (13.8) hold,
so that {X"} is tight. Suppose that X is the limit in distribution of
some subsequence. Because of the consistency hypothesis, the distribution of ( X t ,, . . . ,X t k )is pt l . . . t k for dyadic rational ti, and the general
case follows via (13.16) by approximation from above.
It remains to remove the restriction (13.18). For a 60 < take f ( t )
tobeOtotheleftof60, 1 totherightof1-607and(t-60)/(l-260) in
~ si = f(ti). Then the vtl...tk
between. Now define utl . . . t k as P , ~ . . . ,for
satisfy the conditions of the theorem with a new F , as well as the
special condition (13.18), so that there is a random element 2 of D
with these finite-dimensional distributions. We now need only define
X by X ( t )= Z(60 t(1 - 260)) for 0 2 t 5 1.
O
i,
+
Example 13.1. We can use this theorem to construct a random
function representing an additive process. Suppose that F is nondet 5 1, let vt be a
creasing and continuous over [ 0,1], and for 0 I
measure on the line for which q(B1) = F ( t ) . Suppose now that, for
s 5 t , v,(A) 5 v t ( A) for all A , so that vt - v, is a measure with total
mass F ( t )- F ( s ) . By the general theory [PM.372],there is an infinitely
divisible distribution having characteristic function
the mean and variance are 0 and F ( t ) - F ( s ) .
We are to construct a random element of D for which the increments are independent and Xt - X,has characteristic function (13.19).
Since q5,,t = q$,,+,,t, the distributions specified for X , - X , and X t - X ,
convolve to that specified for X t - X,,and so the implied finitedimensional distributions are consistent. Further, by Chebyshev's inequality and the assumed independence, the left side of (13.15) is at
most
And (13.16) follows by another application of Chebyshev's inequality.
146
THESPACED
In the case of Brownian motion, vt consists of a mass of t at the
origin. If vt consists of a mass of F ( t ) at 1, then (13.19) is the characteristic function of a Poisson variable with its mean F ( t ) - F ( s )
subtracted away. We can add the means back in, which gives a Poisson process with rate function F ( t ) : The increments are independent]
and X t - X , has the Poisson distribution with mean F ( t ) - F ( s ) . 0
Problem
13.1. Prove Theorem 13.3 by the method of the second proof of Theorem 7.5. In
place of the Mu of the previous proof, use the A, of Lemma 3 in Section
12.
SECTION 14. APPLICATIONS
This is a sampling of the applications of weak-convergence theory in
D. Of the results here, only Theorems 14.2 and 14.4 are used in later
proofs.
The identity map i from C to D is continuous and is therefore
measurable C/D. If W is Wiener measure on (C,C), then Wi-' is
a probability measure on (D,D),
and it is easy to see that W and
Wi-' have the same finite-dimensional distributions. F'rom now on,
we denote this new measure by W rather than WZ-'; W will also
denote a random element of D having this distribution.
Donsker's Theorem Again
Given random variables &, &, . . . on an (52,3,
P), with partial sums
*
6 ,let X n ( w )be the function in D with value
Sn = [I
+ +
(14.1)
at t . This random element of D (each X r is measurable D / R 1 ) is the
analogue of (8.5); it is slightly easier to analyze.
Theorem 14.1. Suppose the [n are independent and identically
distributed with mean 0 and variance u2. Then the random functions
defined by (14.1) satisfy X n =%W .
PROOF.The proof in Section 8 that the finite-dimensional distributions converge carries over with essentially no change. Since W is
147
SECTION14. APPLICATIONS
continuous at 1, we can apply Theorem 13.5. By independence,
E[IX:
- XZ",l21X2",
- X:I2]
1
= ;lz(LntJ -
for t l 5 t 5 t 2 . If t 2 - tl 2 l/n, the right side here is at most
4(t2 - t ~ ) If~ t2. - tl < l/n, then either t l and t lie in the same
subinterval [(i - l)/n,i/n) or else t and t 2 do; in either case the left
side vanishes. Therefore, (13.14) holds for Q = ,8 = 1 and F ( t ) = 2t. 0
The equations (9.2) and (9.10) through (9.14) can be derived as
before. Some functions of the partial sums Si can be more simply
expressed in terms of the random element of D defined by (14.1) than
in terms of the random element of C defined by (8.5). For example,
for x E D , let h(x) be the Lebesgue measure of the set o f t for which
x ( t )> 0. Then [M15] h is measurable D / R 1 and is continuous on a set
of Wiener measure 1. If X n is defined by (14.1), then h ( X n )is exactly
n-l times the number of positive partial sums among S1,. . . , Sn-l,
which leads to a simple derivation of the arc sine law.
An Extension
Let X be the random function constructed in Example 13.1. Suppose
are independent random variables with
that, for each n, & I , . . . ,
mean o and variances U i k . Let s& =
and M~ = m a x a i k ,
and assume that siTn = 1 and Mn + 0. Let kn(t) be the maximum k
for which s i k 5 t , and define measures vn,t and random functions X n
bY
enrn
cLluzi
If vn,t converges vaguely to vt (in the sense that vn,t(u,b] +n vt(u,b] if
vt{u}= vt{b}= 0), then it follows by the theory of infinitely divisible
distributions [PM.375] that X r + X t . Since vn,t - vn,3then converges
vaguely to vt - vs for s < t , it follows also that X r - XF +- X t X 3 ;and by the independence of the increments, the finite-dimensional
distributions of X n converge weakly to those of X .
We can easily prove that
(14.2)
xn*x
THESPACED
148
oik,
if we assume that vt(R1) E t and that, if m, = mink
then M, 5
Km, for some K . The variance of Xp - Xp is Ck,(sl<k5k,(t)
ff2
nk <
(t - s M,), and so
+
t 2 - tl 2 m, implies that the right side here does not exceed
(2K + 1)2(tz- t ~ )while
~ , t 2 - tl < m, implies that the left side is 0.
This proves (13.14) for a = ,Ll = 1 and F ( t ) = (2K + l)t, and (13.12)
holds because X1 - XI-6 has variance 6. Hence (14.2). This extends
Theorem 14.1 to triangular arrays satisfying the Lindeberg condition.
It also covers Example 12.3.
Now
Dominated Measures
The random variables [, in Theorem 14.1 are defined on a space
(a,F,P1, and we can replace P by any probability measure PO absolutely continuous with respect to it. Suppose, for example, that
= [ 0,1], 3 consists of the Bore1 subsets of [ 0,1], and
is the nth
Rademacher function: &(w) = 2w, - 1, where w, is the nth digit in
the dyadic representation of w . Theorem 14.1 applies to {&} if w is
drawn from [ 0,1] according to Lebesgue measure, but this is also true
if w is drawn according to a distribution having a density with respect
to Lebesgue measure:
<,
Theorem 14.2. Theorem 14.1 remains true if P is replaced by a
Po absolutely continuous with respect to it.
PROOF.
Define X" by
(14.3)
where the pn are integers going to infinity slowly enough that p, =
o(n). Since llXn - Xnll I
E:zl l&[/ofi, it follows by Chebyshev's
inequality that (d is Skorohod distance)
(14.4)
d ( X n , X n ) 5 llXn - X y
* 0,
where this is interpreted in the sense of P (that is, P governs the
distribution of the &). By Theorem 14.1,
(14.5)
xn* w
SECTION
14. APPLICATIONS
149
in the sense of P, and it follows by (14.4) and Theorem 3.1 that
(14.6)
X"*W
in the sense of P.
Let A be a W-continuity set (in D ) , temporarily fixed; (14.6) implies that P[Xn E A] --+ W(A). Let Fo be the field of cylinderssets of the form [(&, . . . ,&) € HI. If E € Fo,then, since pn + 00,
P([Xn E A] n E ) = P[Xn E A]P(E) for large n, and therefore
(14.7)
P([Xn E A] n E ) -, W(A)P(E).
Since the events [X" E A] all lie in ~ ( F oit )
follows
, by Rhyi's theorem
on mixing sequences [M21] that Po[X" E A] -, W(A). This holds for
each W-continuity set A, and so (14.6) holds in the sense of PO (as well
as P). Since P dominates Po, it follows that (14.4) holds in the sense
of Po (think of the E-S version of absolute continuity), and by another
application of Theorem 3.1, so does (14.5).
0
Empirical Distribution Functions
The empirical distribution for observations & ( w ) , . . . ,&(w) in [ 0,1] is
defined as F n ( t , w ) = n-l Cy=lI[o,ti(&(w)).Assume the En are independent and have a common distribution function F over [ 0,1], and
consider the random function defined by
This describes the empirical process.
Theorem 14.3. If &, &, . . . are independent and have a c o m m o n
distribution function F over [ 0,1], and i f (14.8) defines Y", then Y n+
Y ,where Y i s the Gaussian random element of D specified by E K = 0
and E[Y,yt] = F(s)(l- F ( t ) ) f o r s 5 t .
That such a random function Y exists will be part of the proof.
PROOF.We first prove the theorem under the assumption that
F ( t ) = t , in which case Y is the Brownian bridge W" (extended from
C to 0 ) . Let UF be the number among < I , . . . ,t n that lie in [ 0, t ] .
For 0 = t o < tl <
< t h = 1, the Uc - Uc-, have the multinomial
distribution with parameters ti-ti-1, and it follows by the central limit
theorem for multinomial trials that the finite-dimensional distributions
THESPACED
150
of the Y nconverge weakly to those of W". By Theorem 13.5, it suffices
to prove that
for tl 5 t 5 t2.
Let pl = t - t l , p2 = t2 - t and p3 = 1 - pl - p2; for 1 <
-i < n,
let at be 1 - pi or -pi as ti lies in (tl,t] or not, and let Pi be 1 - p2
or -p2 as lies in (t,tz] or not. Then the first inequality in (14.9) is
equivalent to
(14.10)
E
[(2
at)'
i=l
(5Pi)
i=l
2]
I 6n2p1p2.
Since Eai = EPi = 0, considerations of symmetry show that the left
side of (14.10) is
And now (14.10) follows from
That proves the theorem for the uniform case. The quantile function 4 ( s ) = inf[t:s 5 F(t)] satisfies +(s) 5 t if and only if s 5 F ( t ) .
If qn is uniformly distributed over [0,1], then q5(qn) has distribution
function F , and so we can use the representation tn = q5(qn),where
the qn are independent and uniformly distributed.
If Gn(.,w) is the empirical distribution for ql(w), . . . ,qn(w) and
Z r ( w ) = ,/E[Gn(t,u ) - t], then Zn + W" by the case already treated.
But Fn(t,w)
= Gn(F(t),w),so that Y nas defined by (14.8) satisfies
qn(w)= ZFct,(w). Define $:D + D by ($z)(t) = z ( F ( t ) ) . If z,
converges to z in the Skorohod topology and z E C,then the convergence is uniform, so that $xn converges to $x uniformly and hence
in the Skorohod topology. From the mapping theorem and the fact
that Zn=+ W " ,it follows that Y n= +(Zn) $(Wo);since +(Wo)is
Gaussian and has the means and covariances specified for Y ,the proof
0
is complete.
*
SECTION
14. APPLICATIONS
151
For an example, apply the mapping theorem:
If F is continuous, the limiting varable here has the same distribution
as sup, IWtI-namely that given by (9.40). If F ( t ) = t , we can also
apply (9.39) through (9.43).
Random Change of Time
If S n / a f i is approximately normal for large n, and if Y is a random
index that is large with high probability, then perhaps S v / a f i or
S v / a f i will be approximately normal. Since S v / a f i = Xu”/n if X n
is defined by (14.1), this leads to the question of what happens to a
random function if the time scale is subjected to a random change, a
question best considered first in a general context.
Let DOconsist of those elements 4 of D that are nondecreasing and
satisfy 0 5 +(t)5 1 for all t. Such a 4 represents a transformation of
the time interval [ 0,1]. We topologize Do by relativizing the Skorohod
topology of D . Since Do E D,as is easily shown, the a-field DO of
Bore1 sets in Do consists of the subsets of Do that lie in 27 [MlO]. For
z E D and E Do, the composition 2 o 4, with value z ( ~ $ ( tat
) ) t , is
clearly an element of D . Define +: D x Do + D by
+
+
then [M16] is measurable D x Do/D.
Let X be a random element of D and let @ be a random element
of Do. We assume X and @ have the same domain, so that ( X ,@) is a
random element of D x Do with the product topology [M6]. If X o @
has value X ( w )o @(w)at w-that is, if X o @ = + ( X , @)-then X o @ is
the random element of D that results from subjecting X to the timechange represented by @. Suppose that, in addition to X and @, we
have, for each n, random elements X n and Qn--of D and Do-having
a common domain (which may vary with n).
Lemma. I f ( X n , a n )+ ( X , @ and
) P[X E C] = 1, t h e n X n o P +
XO@.
PROOF.This will follow by the mapping theorem if we show that
2 E C. Suppose that
zn + z and q5n + 4 in the Skorohod topology, and choose A, E A in
+ (defined by (14.11)) is continuous at (z,4) for
THESPACED
152
such a way that
in C,
IlX,
- 111 + 0 and
ll4n - 4XnII + 0. Then, since 3: lies
Izn4nt - z4Xntl I Izn4nt - z+ntl+ Izbnt - z4Xntl
I IIZn - zll+ W z ( l l 4 n - $ A n [ [ )
0.
+ + en.
+
0
Return now to a consideration of sums S, = 51
Let un be
positive-integer-valued random variables defined on the same space as
the &. Define X" by (14.1), and define Y nby
(14.12)
Theorem 14.4. If
Vn
3 8,
an
where 8 is a positive constant and the an are constants going to infinity,
then X n + W implies Y n+ W.
( 14.13)
PROOF.
There is no loss in generality in assuming that 0 < 8 < 1
(this can be arranged by passing to new constants an) and that the a,
are integers. Define
Since
(14.15)
the Skorohod distance from an to the element $ ( t )= 8t of DOgoes to
0 in probability. Because of the assumptions X n + W and an 3 00
it follows by Theorem 3.9 that ( X a n , a n )+ (W,4)and hence by the
lemma above that X a n o an + W o 4. If
1
RZ"(w)= -q v n ( w ) t j(4
then X a n o an and Rn have the same value at w if v n ( w ) / a n < 1,
the probability of which goes to 1 by (14.13) and the fact that 8 < 1.
Therefore, Rn + W o 4. Now
*
and hence Y n 8-1/2(W o 4). Since this limit has the same distribution as W ,Y n+ W .
0
SECTION14. APPLICATIONS
153
Of course, Y nj. W implies S,,/a,,&
+ N , as well as an arc sine
law and limit theorems for maxima and so on. Since the hypothesis
requires only (14.13) and X n + W, it applies to various dependent
sequences {tn}.
If the limit in (14.13) is not constant, we need more specific assumptions about the 5,. Again define Y nby (14.12).
Theorem 14.5. Suppose <I, &, . . . are independent and identically
distributed with mean 0 and variance u. If
(14.16)
-,I
vn
I
e +o,
where 0 is a positive random variable and the a, are constants going
t o infinity, then Y n+ W .
PROOF.Assume first that there is a constant K such that 0 < 0 <
K with probability 1. We may adjust the a , so that they are integers
and K < 1.
If we define an by (14.14), and if is the random element of DO
defined by at = O t , then ( d is the Skorohod distance) d ( P , @ ) + 0.
From this and (14.16), it follows that the distance in Do x R1 between
( a n , v n / n )and (@,e) goes to 0 in probability, and it follows by the
corollary to Theorem 3.1 that
(14.17)
(an,
vn/an) + (a,0).
Define X" by (14.3), where pn + 00 and p , = o ( n ) . As before, we
have (14.4) and( 14.6)-and hence Xan + W-and (14.7) holds for E
in the field Fo of cylinders. And now, by (14.17) and Theorem 3.10,
where 0" is
in the sense of the product topology on D x (DOx R1),
independent of W and @a,O = 0"t. By (14.4),
The mapping that carries the point ( z , 4 , a ) to a - l j 2 ( z o 4) is
continuous at that point if 2 E C , 4 E DO,and a > 0. By the mapping
theorem, therefore,
( V n / a n ) - 1 / 2 ( X a n0
an>+ (eo)-1/2(w0 a").
THESPACED
154
Since 8" and W are independent, (8°)-1/2(Wo a') has the same distriubtion as W . Moreover, (vn/an)-1/2(Xano a') coincides with Y n
if un/an < 1, the probability of which goes to 1 because K < 1. Thus
Y n+ W if 8 is bounded.
Suppose 8 is not bounded. For positive u,define 0, = 0 and
uu," = u, if 8 5 u,and define OU = u and Vu,n = anu if 8 > u. Then,
for each u,IVu,n/an - OU( + 0 as n
00, and by the case already
treated, if
--f
then Yuyn +n W . Since P[Yuin# Y"] 5 P[B
Theorem 3.2 that Y" + W .
> u],it follows by
0
Renewal Theory
These ideas can be used to derive a functional central limit theorem
connected with renewal theory. Let 71,72,.. . be positive random variables and define
c
k
(14.18)
ut = max [k:
7)i
5 t ] , t 2 0;
i=l
take ut = 0 if t < 71. If q k is the length of time between the occurrences
of the (k - 1)st and kth events in a series, then ut is the number of
occurrences up to time t.
We assume the existence of positive constants p and CT such that,
if
then Xn + W . This will be true if, as in the usual renewal setting,
the q n are independent and identically distributed with mean p and
variance C T ~ .Define Zn by
Theorem 14.6. If Xn =+ W , then 2"
+W.
PROOF.
We assume in the proof that p
mater of scale. We first show that
( 14.19)
sup I - -u,- - [ + , Ol . o
ogvgu
pu
> 1, since this is only a
SECTION14. APPLICATIONS
155
The hypothesis X n 3 W implies
(14.20)
since replacing s-l by s-1/2 would give convergence in distribution.
By the definition (14.18), v, > t implies ltl qt 5 21. Therefore,
,
implies
(14.21)
Similary, if
E
< p-', then
implies
(14.22)
By (14.20), the probabilities of (14.21) and (14.22) go to 0 as u
which proves (14.19).
Put
vtn(w>/n if h&4/n 5 1,
@XW> =
qp
otherwise.
--f
00,
{
By (14.19), an + 4, where 4(t)= t / p , so that, by the lemma (p. 151)
again, X n o an + W o 4. Let
Y n= XnoW if un/n 5 1, the probability of which goes to 1 by (14.19)
and the assumption p > 1. Therefore, Y n+ W o 4.
THESPACED
156
By the definition (14.18),
From mmisn l q i l / f i =+ 0 it follows that sup,,l- Iq,,t+ll/afi
which in turn implies that, if
+ 0,
then Rn =$ W o $ . Therefore, p112Rn=$ W (recall that 4(t) = t / p ) ,
0
from which Zn =$ W follows because of the symmetry of W .
Problems
14.1. Let
Show that q5,, + q5 (in the Skorohod topology) but that x o &
this to the lemma in this section.
f ,Z
O ~ .Relate
14.2. Under the hypotheses of the Lindeberg-LBvy theorem, there are limiting distributions for
Construct the relevant mappings from D to R' and prove that they are measurable and continuous on a set of W-measure l . For the forms of the limiting
distributions, see Donsker [18] for (a) and (b) and Mark (471 for (c).
14.3. Let & be the Rademacher functions on the unit interval, and let P be Lebesgue
measure, so that Theorem 14.1 applies. Show that the requirement in Theorem 14.2 that PObe dominated by P is essential: The therorem fails if PO is
a point mass or (more interesting) if PO is Cantor measure.
SECTION 15. UNIFORM TOPOLOGIES*
In this section we use the weak-convergence theory for the ball o-field
(Section 6) to prove two results in empirical-process theory.
SECTION
15. UNIFORMTOPOLOGIES
157
The Uniform Metric on D [0,1]
Let d be the Skorohod metric on
there:
D and let u be the uniform metric
Clearly u is finer than d; and in fact, since U ( I I~~
~J
=
,~
1]for
~
) ,s # t ,
the u-topology is not separable. If D is the a-field of d-Bore1 sets, and
if D j is the class of finite-dimensional sets as defined above Theorem
12.5, then, by that theorem, D = a ( D j ) ; and since D is d-separable,
D coincides with the d-ball a-field Do.
Let UOand 24. be the ball and Borel a-fields for the uniform metric
u. The five classes of sets are related by
Do = D
(15.2)
= a ( D r ) = uo
c u,
where the inclusion is strict. We have already noted the first two
equalities. For the third equality, observe first that the closed u-ball
Bu(z,
c)- is by right-continuity the intersection over rational T of the
) z ( T ) + E ] , so that Uo c a ( D j ) . The reverse
Dj-sets [y:z(r)--E 5 y ( ~ 5
inclusion will follow if we prove for each t and a that
(15.3)
[z:7rt(z)= z ( t )> a] =
u
n
Bu(z,,n)-
for an appropriate {z,} depending on t and a. If t < 1, take zn =
c ~ + ( n + n - l ) I [ ~ , ~It+ is
~ -easy
~ ) .to show that the right side is contained
in the left side. Suppose on the other hand that x ( t )> a and choose n
so that z(s) > a n-' for s E [t,t n-') and Iz(s)[ < n - (a(
for all s ;
that Iz(s) - zn(s)I 5 n for all s then follows by separate consideration
of the cases where s does and does not lie in [t,t n-'). If t = 1,
take zn = a ( n n-l)l{t)and use a similar argument. Hence (15.3),
which proves the right-hand equality in (15.2).
Let 23 be the class of ordinary Borel sets in [ 0,1] and define 4 :
[ 0 , 1 ] + D by 4(t) = Ilt,l1. For each s and a , q5-1[z:z(s) < a] lies
in B, and since the finite-dimensional sets [z:z(s) < a] generate D ,
4 is measurable B / D . But 4 is not measurable B/u,because the set
A = UtEHBu(4(t),f) is u-open for each H , while +-'A = H need not
lie in B. This proves that the inclusion in (15.2) is strict.
Consider next the empirical process, the random function Y ndefined by (14.8) for random variables & on a probability space (a,F ,P).
+
+ +
+
+
THESPACED
158
Now Y nis a random element of D with the Skorohod metric d, because
it is measurable T / D . But Y nis not a randomelement of D with the
uniform metric u (is not measurable T / U ) , as follows by essentially
the same argument that proves the inclusion relation in (15.2) to be
strict. By Theorem 14.3, Y n+ Y ,and so, if P, is the distribution on
D of Y nand P is the distribution on D of Y ,then 'P, + P holds in
the sense of d:
P,
(15.4)
+P
red.
Since these measures are defined on Uo,one can ask whether there is
weak' convergence in the sense of Section 6:
Pn
(15.5)
J'
P reu.
But here Theorem 6.6 applies. Since (15.2) holds, and since dconvergence to a point of the separable set C of continuous functions
implies u-convergence, it is enough to ask whether C supports P , which
is true if the distribution function F common to the & is continuous.
For in that case, j ( Y " )= l / f i (see (12.8)),and Theorem 13.4 implies
that PC = P[Y E C]= 1. Therefore, (15.4) and (15.5) both hold. As
explained in Section 6, (15.5) contains more information than (15.4)
does.
A Theorem of Dudley's
Let (T,h ) be a compact metric space and let 7 be the Borel a-field
for T . Take M = M ( T ) to be the space of all bounded, real, Tmeasurable functions on T , with the metric u defined (on M now) by
(15.1). Note that, for T = [0,1], M is much larger than D[O,11. In
Theorem 15.2 below, h will be the Hausdorff metric [Mli'] on the space
T of closed, convex sets in the unit square. But for now, (T,h)can be
any compact space. Let C = C(T)be the space of continuous functions
on T . Since an element of C is measurable 7 and is bounded because
of compactness, C is a closed subset of M . And C is separable: For
each integer q, decompose T into finitely many 7-sets A,i of diameter
less than l/q. Consider the class of functions that, for some q, have a
constant, rational value over each Api; it is countable, and it is dense
in C because the continuous functions are uniformly continuous.
Let Mo and M be the ball a-field and Borel a-field for ( M ,u).We
can define the modulus of continuity for elements of M in the usual
way:
(15.6)
w2(6)= w(z, 6) = sup
h(s,t)l6
Iz(s)
- z(t)I.
SECTION
15. UNIFORM
TOPOLOGIES
159
It is easy enough to see that w(.,S) is u-continuous and hence M -
measurable, but conceivably it is not always Mo-measurab1e.t On the
other hand, 11 11 is obviously Mo-measurable.
Let P, be probability measures on ( M ,Mo), and let P,* be the
corresponding outer measures (on the class of all subsets of M ) .
-
Theorem 15.1. Suppose that f o r every Q there exist a n a and a n
no such that
(15.7)
Suppose further that f o r every E and Q there exist a S and a n no such
that
(15.8)
n 2 no.
P,"[z:wx(S)2 E ] < 7 ,
T h e n {Pn}i s tight', and C supports the limit of every weak'ly convergent subsequence.
This, of course, looks like Theorem 7.3. The outer measure in
(15.8) covers the possibility that the set in question does not belong to
the a-field Mo on which the P, are defined. In order to prove Theorem
15.1, we need a generalization of the Arzelh-Ascoli theorem.
Lemma. A set A in M is relatively compact i f
(15.9)
and
(15.10)
lim sup wx(6)= 0.
6 4 0 XEA
In Theorem 7.2, it was sufficient to control Iz(0)l (see (7.3)), but
(15.9) is needed here; consider T = { a ,b } and A = [x:.(a) = 01. It is
a consequence of (15.10) that A- is in fact contained in C.
PROOF.Since T is compact by assumption, it contains a countable,
dense subset To. Given a sequence { z n }in A , first use (15.9) to replace
it by a subsequence along which z n ( t )converges to a limit for each t in
TO.Given E , choose 6 so that wx(S)< E for all x in A , and choose in TO
a finite S-net { t l ,. . . ,t k } for T . Finally, choose no so that rn,n 2 no
implies that Izm(ti)- zn(ti)l5 E for each i. For each s in T , there is
an i such that h(s,ti) 5 6 and hence Ix,(s) - zn(ti)l5 E for all n. But
- z,(s)l 5 3 ~ Thus
.
(2,)
is uniformly
then, m,n 2 no implies lz,(s)
fundamental and hence converges uniformly to a continuous function
0
on T .
t Whether this can in fact happen seems to be unknown.
THESPACED
160
PROOF
OF THEOREM
1 5 . 1 . We must show that, for each E , A4
contains a compact set K with the property that, for each positive y,
P,Kr 2 1 - E for all large enough n. Given the E , first choose an a and
an no such that a > 1 and
P,[z:
(15.11)
€
Ilzll 2 a] 5 -2
for n 2 no.
Then choose a sequence { ~ i } such that 0 < ai+l < ai and a sequence
{no(i)}of positive integers such that no < no(i)and
(15.12)
-1
1
E
P,' z: wZ(cq) 2 22 5 for n 2 no(i).
2%
And now define
(15.13)
(a >
1) and
(15.14)
This, a compact subset of C because of the lemma, is our K .
Let B and Bi be the sets in (15.11) and (15.12). Given y, choose
rn so that 1/2m-1 < y and take Gm = BC n
B: and nm =
maxi<, no(i).By (15.11) and (15.12), the inner measure (Pn)*
satisfies
(Pn)*(Gm)2 1 - E for n 2 nm. If we prove that
ni,,
( 15.15)
G,
c KY,
then PnKr 2 (P,)*(Gm) 2 1 - E will hold for n 2 nm, which will
complete the proof. (In the application below, Theorem 15.2, we have
P,K 5 P,C = 0, and so we definitely need the KY,not just K.)
Fix an z that lies in Gm. Then z satisfies
( 1 ~ 1 1Ia,
w,:(Q~) 5 1/22
for i 5 rn.
To prove (15.15), we must find a z such that
(15.16)
zE
K,
112 - zll
< y.
If h(s,t) 2 ai,then (~(3)- z(t)l 5 2a 5 2ah(s,t)/ai = h ( ~ , t > / 2 ~ & .
And if h ( s , t ) 5
and i 5 rn, then Iz(s) - x ( t ) l 5 1/22. Therefore, for
all s and t ,
(15.17)
SECTION
15. UNIFORM
TOPOLOGIES
161
Choose a finite set T, in T with the properties that (i) h ( s ,t ) 2 6
,
for distinct s and t in T, and (ii) for each s in T we have h(s,t ) < 6
, for
some t in T,. (If we choose points one by one to satisfy (i), we must,
by compactness, end with a finite set satisfying (ii).) If s and t are
, (15.17) for i = m gives
distinct points of T,, then, since h ( s , t ) 2 6
Iz(s) -z(t)I 5 h ( s ,t)/2,6,,
which of course also holds if s = t. This is
a Lipschitz condition for the function z restricted to T,, and [M9] the
restricted function can be extended to a function t that is defined on
all of T ,has the same bound, and satisfies the same Lipschitz condition
for all s and t in T :
(15.18)
This is our t.
Since llzll 5 a, z E K will follow if we show that ~ ~ ( 6 5
2 )3/2a
for all i. If i 2 m and h ( s , t ) 5 62, then by (15.18) and the fact that
2262 is decreasing, I t ( s ) - z(t)l 5 h ( ~ , t ) / 2 ~ 65, h ( ~ , t > / 2 ~ 652 1/2i.
Now suppose that i < rn and h ( s , t ) 5 6i.Choose points s, and t , in
T, such that h(s,s,) < 6, and h(t,t,) < 6
,
. Since h(s,s,) < 6
,
(15.18) gives Iz(s)-z(s,)~
5 h ( ~ , s , ) / 2 ~ 56 ~1/2" 5 1/22. Similarly,
Iz(t)- z(tm)l 5 1/22. And since h(s,,t,)
5 6i +26, 5 362 5 a2 and t
agrees with z on T,, Iz(s,)-z(t,)l
= Iz(s,)-z(t,)l
5 w,(ai) 5 1/22.
And now the triangle inequality gives I t ( s ) - z ( t ) I 5 3/2i, and so z does
lie in K .
For the other condition in (15.16), it will be enough to show that
Iz(s) - z(s)l 5 2/2" for all s. Choose from T, an ,s such that
h(s,s,) < 6
,
. From ~(s,) = z(s,) it follows that Iz(s) - t ( s ) l
Iz(s) -z(s,)l+
Iz(s,) - t ( s ) l . Since z E G, and h ( s ,s,) < 6
, 5 a,,
we have Iz(s) -z(s,)(
5 w,(a,) 5 1/2,. And from (15.18) it follows
that Iz(s,)
- z(s)l 5 h(s,, s)/2,6,
5 1/2,.
It remains to show that, if Pni+-O P , then PC = 1. Since C is
closed and separable, it lies in Mo (Lemma 1, Section 6). Each of
the compact sets K constructed above is contained in C, and since
P,KY 2 1 - y for large n, P((CY)-)2 P ( ( K Y ) - )2 limsup, P,KY 2
1 - E . Let y
0 and then E + 0: PC = 1.
0
---f
Empirical Processes Indexed by Convex Sets
Let T be the space of closed, convex subsets of Q = [ 0, 112, and let h be
the Hausdorff metric [M17] on T: (T, h ) is a compact metric space. Let
7,
M , C,Mo, and M be the specializations to this case of the spaces
THESPACED
102
and classes of sets defined above. Define the projection 7rtl...tb: M + Rk
in the usual way: 7 r t l . . . t k ( z )= ( z ( t l ).,. . ,z(tk)).And let M f be the
class of finite-dimensional sets, those of the form 7r;!..tkH for H E Rk.
Note that ( M ,u ) is not separable.
First, 7rt is measurable Mo/R1. To prove this, it is enough to
verify (15.3) (same symbols in a new context), where now IC,, = LY
( n n-l)IitI. Since { t }is h-closed and hence lies in 7, IC,
E M . That
the right side of (15.3) is contained in the left is again easy. If z ( t )> a ,
choose n so that z ( t )2 LY n-' and I I C ( S ) ~ 5 n - la1 for all s;to show
that IIC(S) - z,(s)l 5 n for all s, consider separately the cases s = t
and s # t . Since each 7rt is Mo-measurable,
+
+
+
(15.19)
a ( M f )c Mo C M .
Let M' be the set of y in M that are continuous from above in the
sense that, if t,, -1 t in the set-theoretic sense, then y ( t n ) + y(t). There
is [M18] a countable set To in T such that, for each t E T , there exists
a sequence {tn}in TOfor which tn 1t. From this it follows that
Suppose that (0,F,P) is a probability space and 2:R + M ; let
Z ( t , w ) = 7rt(Z(w)). Call Z a random element of ( M , M o ) if it is
measurable F/Mo-because M is not separable, Mo, not M, is the
relevant a-field here. If Z is a random element of ( M , M o ) then,
,
by
(15.19), each Z(t, .) is a random variable (measurable F/R1).
Suppose, on the other hand, that Z(t, .) is a random variable for each t
and that Z( - , w ) lies in M' for each w. Then, by (15.20),
and therefore Z is a random element of ( M ,Mo).
Suppose that (1 , <2 , . . . are independent two-dimensional random
vectors on (0,3,
P), each uniformly distributed over Q. Suppose that
& ( w ) E Q for each i and w , and define p,,(w) = pn(. ,u) by
SECTION
15. UNIFORMTOPOLOGIES
163
Of course, p n ( t , u ) is well defined for arbitrary subsets t of Q, and if
it is restricted to the a-field of ordinary Borel subsets of Q, it is a
probability measure. But we restrict it further to the class T . Since
p n ( .,u)is the restriction of a probability measure, it lies in M ' , and
since each p n ( t , .) is a random variable, p n is a random element of
( M , M o ) . And if X is Lebesgue measure on the Borel subsets of Q,
then
(15.22)
defines another random element of ( M ,Mo).
Theorem 15.2. There is a random element X of ( M , M o )such
that X n =+O X ; and X E C with probability 1.
Of course, X n = J O X means that the corresponding distributions
P.
on Mo satisfy Pn j0
PROOF.By the multinomial central limit theorem, the random
vector ( ( X " ( t l ) ., . . ,X n ( t k ) )has asymptotically the centered normal
distribution with covariances X ( t i n t j ) - X ( t i ) X ( t j ) . Suppose we can
verify the hypotheses of Theorem 15.1. It will follow by Theorems
6.5 and 15.1 that every subsequence of {P,} contains a further subsequence that converges weak'ly to a limit supported by C. We showed
above that rt is measurable M o / R 1 ,and it is obviously w-continuous.
Therefore, each rt1...tkis measurable M o / R k and u-continuous. If
Pni=+; P , then it follows by Theorem 6.4 that PnirG1.,tk+i PrG!..tk
and hence that PrG1..tk is the normal distribution specified above.
Since C C M' (tn I t implies h(t,,t) 3 0 [M17]), (15.20) holds if
M' is replaced by C. Therefore, CnB,(z, r ) - E a ( M fnC) and hence
MonC c a ( M f n C ) :M f n C is a r-system which generates MonC. It
follows that, if P and Q are two probability measures on ( M ,Mo),and
. . Q~ X
~ G ~ .for
. ~ all
~ t l , . . . , t k , then P = Q.
if PC = QC = 1 and P X G ~ =
This means that there is only one limit for the weak'ly convergent
subsequences-the one having the normal distributions specified above
as its finite-dimensional distributions. Since this will complete the
proof, it is enough to verify the hypotheses of Theorem 15.1.
Assume for the moment that (15.8) holds, and consider the 6 and
no corresponding to an E of 1, say. Let t l , . . . ,t k be a &net. Since
the finite-dimensional distributions converge, there is an a such that
P[maxilk IXcl 2 a] I 77 for all n. But then P[llXnll 2 a 11 5 2q for
n 2 no, and (15.7) follows. It is therefore enough t o verify (15.8).
+
THESPACED
164
Preliminaries: Let An(s,t) = IXn(s)- Xn(t)I.Since I,(&) - It(<i)
has variance at most A(sAt), Bernstein’s inequality [M20] gives (x > 0)
(15.23)
P[An(s,t) 2
45 2exp
Let Tm be the 2-m-net T(2-m) constructed in [M18]; the number Nm
of points (sets) in Tm satisfies
(15.24)
log Nm
A
6 log 2m 5 Bm2m/2
for constants A and B. For each t in T, let tA and
[M18(32)]:
(15.25)
t& c t C t;,
A ( t C - t&)< 2-m,
tk
be the sets in
h ( t , t k )< 2-m;
for each m, the t A and tk are functions o f t . There are Nm pairs tk,
tk. Fix a 8 in the range < 0 < 1, and define a function m = m(n)
by m = [(a8 log 2)-l log nJ. Then
(15.26)
2em i fi < 2e(m+1),
The next step is to show that
(15.27)
1
(tk is a function of t )
I?,* [x:sup IX(tk)- x(t)l >
tET
< e < 1.
€1
+n
0
for each E . We have
and the last term goes to 0 by (15.26). Let Cn be the maximum of
An(tk,th)for t ranging over T . Since there are only N m pairs (th,t;),
it follows by (15.23) through (15.26) that
This implies (15.27),
165
SECTION
15. UNIFORMTOPOLOGIES
Because of ( 1 5 . 2 7 ) , ( 1 5 . 8 ) will follow if we prove that
(15.28)
~ , 2 el = 0,
lim lim sup P max n n ( st;)
6
n
[h(s,t)56
where the supremum has become a maximum (and there are no measurability problems) because Tm is finite. We prove ( 1 5 . 2 8 ) by the
method of chaining. We take S = 2 - k , where k will be specified later.
Suppose that rn > k, and consider the chain of transitions
To bound the increment in ( 1 5 . 2 8 ) ,we add the increments across the
links of the chain:
(15.29)
An(sk,tk)I
An(s:,s:-l)
k<i<m
+ An(s:,
tg)
+
An((',
k<i<m
Define
Erik =
Note that
max
h(s,t,)52-k
An(si,tK).
Fn(S)= F n ( 2 - k ) = En,m(n).By ( 1 5 . 2 9 ) ,
(15.30)
i=k+l
and we can treat the terms on the right separately.
The number of possible pairs (ty,t:-l) in Dni exceeds Ni, because
for each 7 in Ti there are several t in T for which T = t:, and these
different t's can give different values for ty-l. But the number of these
pairs is certainly bounded by N:. By ( 1 5 . 2 5 ) ,A(t:At:-_,)5 A(t:At)
A(tAt:-_,)< 2 - ( i - 2 ) . If i 5 rn, then fi 2 2ei, and (15.23) and (15.24)
imply
+
166
THESPACED
where bl is a positive constant independent of i.
If h(s,t) 5 2-', then [M17(30)],A(sAt) < 45 2-' and hence, by
(15.25), X(sKAtK) < 47.2-'. Suppose that k 5 m and use (15.23) and
(15.24) again:
(15.32)
P[E,k 2 E ] IN: - 2exp -
[
+
1
2(47. 2-' E 2 ~ 1 2 ~ ' )
5 2exp[-b2~2~'] for k 5 rn,
where b2 is independent of k and E . Given E , choose k large enough
< E . Since (15.31) and (15.32) both hold if k 5 i 5 rn,
that &ki-2
it follows by (15.30) that
P[Fn(2-le) 2 3 ~5]
m
2 exp[-b12'~]
+ 2 exp[-b2~2~']].
i=k+l
Increase the sum here by requiring only i > k. By further increasing
k, we can ensure that the right side is less than q. Therefore, for each
E and q , there is a k, such that, if S = 2-',
then P[F,(S) 2 3~1< 7 for
n large enough that m > k. This proves (15.28).
0
If the index set T in Theorem 15.2 is replaced by the smaller set
consisting of the rectangles [O,u] x [O,v]for 0 5 u,w 5 1, the theo-
rem then has to do with the distribution of two-dimensional empirical
distribution functions: The methods touched on in this section have
implications for empirical-process theory.
SECTION 16. THE SPACE D[O,oo)
Here we extend the Skorohod theory to the space D , = D [ 0 , ~ of)
cadlag functions on [ 0, oo),a space more natural than D = D [0, I]
for certain problems. This theory is needed in Chapter 4 but not in
Chapter 5.
Definitions
In addition to D,, consider for each t > 0 the space Dt = D[O,t]
of cadlag functions on [ 0, t ] . All the definitions for D1 have obvious
analogues for Dt: ) ) z ) )=
t supSltlz(s)I, At, IlAl ;, d;, dt. And all the
theorems carry over from D1 to Dt in an obvious way. If z is an
element of D,, or if z is an element of D, and t < u , then z can also
167
SECTION16. THESPACED [0,m)
be regarded as an element of Dt by resticting its domain of definition.
This new cadlag function will be denoted by the same symbol; it will
always be clear what domain is intended.
One might try to define Skorohod convergence x, --+ x in D , by
requiring that dZ)(x,,z) +, 0 for each finite, positive t (2, and x restricted to [ O , t ] , of course). But in a natural theory, x, = I~o,l-l/,~
will converge to x = I[O,l]in D,, while d;(x,, x) = 1. The problem
here is that x is discontinuous at 1, and the definition must accommodate discontinuities.
Lemma 1. Let x, and x be elements of D,. If d;(z,,x)
and m < u, and if x is continuous at m, then d&(xn,x) t, 0.
0
-+,
PROOF.We can (Theorem 12.1) work with the metrics d, and d,.
By hypothesis, there are elements A, of A, such that IIX, - Ill, -+
0 ,
and llz, - xXnIlu -+,
0. Given 6 , choose 6 so that It - m( I
26 implies
Ix(t) - z(m)I < ~ / 2 . Now choose no so that, if n 2 no and t 5 u ,
then IX,t - tl < 6 and lxn(t)- x(X,t)l < ~ / 2 .Then, if n 2 no and
It - ml I 6, we have IX,t - ml I
[Ant - tl It - m( < 26 and hence
I ~ n ( t-) x(m)I I
Ixn(t) - x(Xnt)l I x ( L t )- x(m)I < 6 . Thus
+
(16.1)
sup Ix(t)-z(m)I< E ,
lt-mI56
+
sup Ix,(t)-x(m)l
lt-rn156
< 6 , for n 2 no.
If (i) ~ , m< m, let p, = m-n-'; if (ii) ~ , m> m, let p, = ~ ; ' ( m n-I); and if (iii) X,m = m, let p, = m. In case (i), Ip, - mJ = n-l; in
case (ii), Ip,-ml I
IX,'(m-n-')-(m-n-')l+n-';
andincase (iii),
p, = m. Therefore,p, -+ m. Since IX,p,-mJ 5 IX,p,-p,I+Ip,-mI,
we also have X,p, -+ m. Define p, E Am so that p,t = Ant on [ 0, p,]
and p,m = m; and interpolate linearly on [p,,m]. Since pnm = m
and pn is linear over [Pn,m], we have Ipnt - tl I
IX,p, - pml there,
and therefore, p,t ---t t uniformly on [ 0, m]. Increase the no of (16.1)
so that p, > m - 6 and X,pn > m - 6 for n 2 no. If t 5 p,, then
Ix,(t) - x(pnt)l = Ix,(t) - x(X,t)l I
1
1
2
, - xX,I,.
On the other hand,
tI
m and n 2 no, then m 2 t >p, > m - 6 and m 2 p,t 2
ifp, I
pnp, = Xnp, > m - 6, and therefore, by (16.1), Ix,(t) - x(p,t)l 5
) ~(p,t)l
0
Ixn(t) - x(m)I [ ~ ( m
- )~ ( / l n t ) l< 26. Thus I ~ n ( tuniformly on [ 0, m].
0
+
-+
The metric on D , will be defined in terms of the metrics d h ( x ,y)
for integral m,t but before restricting z and y to [ 0, m], we transform
t In what follows, m will be an integer, although the m in Lemma 1 can be
arbitrary.
THESPACED
168
them in such a way that they are continuous at m. Define
gm(t> =
(16.2)
{
1
m-t
0
if t 5 m - 1,
i f m - 1 5 t 5 m,
ift2m.
For z E D,, let xm be the element of D, defined by
z:"(t)= g m ( t ) z ( t ) , t 2 0.
(16.3)
And now take
(16.4)
d&,(z, y) =
2-m(l A d k ( z m ,y")).
m= 1
If d&((z,y) = 0, then d & ( z , y ) = 0 and xm = ym for all m, and this
implies z = y . The other properties being easy to establish, d& is a
metric on D,; it defines the Skorohod topology there. If we replace
d& by dm in (16.4), we have a metric d , equivalent to d&.
Properties of the Metric
Let A, be the set of continuous, increasing maps of [ 0, 00) onto itself.
Theorem 16.1. There as convergence d & ( z n , z )
and only i f there exist elements An of A, such that
+
0 in D , i f
(16.5)
and, f o r each m,
(16.6)
PROOF.Suppose that d & ( z n , x) and d w ( z n ,x) go to 0. Then there
exist elements ;A
of Am such that
€2= 111 - A,"llm v IIZFA~
- zm(lm--tn O
for each m. Choose 1, so that n 2 Zm implies E; < l / m . Arrange
that 1, < Zm+l, and for 1, 5 n < Zm+l, let m n = m. Since lm, _< n <
Imn+l, we have mn --tn 00 and e? < l/mn. Define
SECTION16. THESPACED [0,m)
169
Then IX,t - tl < l/m, for t 2 m, as well as for t 5 m,, and therefore,
supt IX,t - tl 5 l/m, +, 0. Hence (16.5). Fix c. If n is large enough,
then c < m,-l, and so Ilx,Xn-x1lc = 1 1 ~ p X p - xI I C~I~l/m, -+n 0 ,
which is equivalent to (16.6).
Now suppose that (16.5) and (16.6) hold. Fix m. First,
holds uniformly on [ 0, m].Definep, and pn as in the proof of Lemma 1.
As before, pnt + t uniformlyon [0, m].Fort 5 p,, Ix:"(t>-zF(p,t)I =
Izm(t)- x:,"(Ant)l, and this goes to 0 uniformly by (16.7). For the case
p , I t I m, first note that Ixm(u)I I gm(u)IIxIlm for all u 2 0 and
hence
By (16.5), for large n we have X,(2m) > m and hence IIxCnllmI
IIxnXnllzm;and llxnXnllarn-+n IIx11zrn by (16.6). This means that llxnllm
is bounded ( mis fixed). Given E , choose no so that n 2 no implies that
p n and pnpn both lie in ( m- E , m],an interval on which g, is bounded
by E. If n 2 no and pn I t 5 m, then t and pnt both lie in ( m- E , m ] ,
and it follows by (16.8) that Ixm(t)- xF(p,t)l I ~(IIxll, Ilz,llrn).
Since IIx,Ilm is bounded, this implies that Ix:"(t)-xF(p,t)( +, 0 holds
uniformly on Ip, m] as well as on [ 0, p,] . Therefore, d, (xr, xm) +, 0
for each m, and hence d,(z,,
x) and dL(z,, x) go to 0.
0
+
A second characterization of convergence in D,:
Theorem 16.2. There is convergence d&(x,,x) + 0 in D, if
and only ifd:(x,, x) -+ 0 for each continuity point t of x.
This theorem almost brings us back to that first, unworkable definition of convergence in D,.
PROOF.If d&(x,,x) -+ 0, then d&(xF,xm) -+
0 for
,each m.
Given a continuity point t of x, fix an integer m for which t < m - 1.
By Lemma 1 (with t and m in the roles of m and u)and the fact that
y and ym agree on [0, t ] ,d;(x,, x) = d,"(xr,zm)+, 0.
To prove the reverse implication, choose continuity points t , of x
in such a way that t , t 00. The argument now follows the first part of
the proof of Theorem 16.1. Choose elements A? of At, in such a way
that
E? = IlX? - q t , v IIxCnX? - 41tm +n 0
THESPACED
170
for each m. As before, define integers m, in such a way that m,
and e p < l / m n , and this time define A, E A, by
Ant =
{ yt
--f
m
if t 5 tm,,
if t 2 tm,.
Then IX,t - tl 5 l/m, for all t , and if c < tm,, then 1)z,A, - zllc =
Ilz,Xp - zllc 5 l/m, +, 0. This implies that (16.5) and (16.6) hold,
which in turn implies that dL(z,, z) 0.
Separability and Completeness
define $mx as zm restricted to [O,m]. Then, since
dh($mzn,$mz) = dh(zF,zm),$m is a continuous map of D, into
Dm. In the product space II = D1 x 0 2 x
the metric p ( a , P ) =
Cg=l2-m(l A d R ( a m , & ) ) defines the product topology, that of COordinatewise convergence [M6]. Now define $:D, + II by $2 =
For z E D,,
s..,
($13,$ 2 2 ,
* *
.I:
$m:
D,
--t
Dm, 4: D,
--t
II.
Then d&(z,y) = p($z,$y): $ is an isometry of D , into n.
Lemma 2. The image $D, is closed in II.
PROOF.
Suppose that z, E D , and a E II, and p($z,,a) -’,
0;
then d;(zF,am) +, 0 for each m. We must find an z in D, such
that a = $z-that is, am= $mz for each m.
Let T be the dense set of t such that, for every m 2 t , am is
continuous at t. Since dh(zF,a m ) +, 0, t E T n [0, m] implies z;(t) =
g m ( t ) Z n ( t ) +, am(t). This means that, for every t in T,the limit
z ( t ) = lim,z,(t) exists (consider an m > t 1, so that g n ( t ) = 1).
NOWgm(t)z(t)= a m ( t ) on T n [O,m]. It follows that z ( t )= am(t) on
T n [ 0, m - 11, so that z can be extended to a cadlag function on each
[ 0, m - 11 and then to a cadlag function on [ 0, m). And now, by right
continuity, g m ( t ) z ( t )= a m ( t ) on [O,m],or ?,!Jmz= xm = am.
0
+
Theorem 16.3. The space D, is separable and complete.
PROOF.
Since II is separable and complete [M6],so are the closed
subspace $D, and its isometric copy D,.
0
Compactness
Theorem 16.4. A set A is relatively compact in D , if and only
if, f o r each m, $mA is relatively compact in Dm.
SECTION
16. THESPACED[0, m)
171
PROOF.
If A is relatively compact, then A- is compact and hence
[M5] the continuous image $,A- is also compact. But then, $,A, as
a subset of $,A-, is relatively compact.
Conversely, if each $,A is relatively compact, then each ($,A)- is
compact, and therefore [M6] B = ($1A)- x ($2A)- x . - .and (Lemma
2) E = $Dm n B are both compact in II. But x E A implies $,x E
($,A)- for each m, so that $x E B. Hence $ A c E , which implies
that $ A is totally bounded and so is its isometric image A .
For an explicit analytical characterization of relative compactness,
analogous to the Arzelk-Ascoli theorem, we need to adapt the w’(z, 6)
of (12.6) to D,. For an x in D, (or an x in D , restricted to [O,m]),
define
(16.9)
where the infimum extends over all decompositions [&-I, ti),1 5 i 5 w,
of [ 0, m ) such that t i - ti-1 > 6 for 1 5 i < w. Note that the definition
does not require t , - tv-l > 6: Although 1 plays a special role in the
theory of D1,the integers m should play no special role in the theory
of D,.
The exact analogue of w’(x,S) is (16.9), but with the infimum
extending only over the decompositions satisfying ti - ti-1 > 6 for
i = w as well as for i < w. Call this G,(z, 6). By an obvious extension
of Theorem 12.3, a set B in D, is relatively compact if and only if
supzEBllzll, < 00 and lim6supxEBG ( x , S ) = 0. Suppose that A c
D,, and transform the two conditions by giving $,A the role of B.
By Theorem 16.4, A is relatively compact if and only if, for every m
(recall that $,x = xm),
(16.10)
and
(16.11)
lim s u p ~ , ( z ~6), = 0.
6-0
xEA
The next step is to show that (16.10) and (16.11) are together
equivalent to the condition that, for every m,
(16.12)
172
THESPACED
and
(16.13)
lim sup &(z, 6) = 0.
6 4 0 xEA
The equivalence of ( 16.10) and (16.12) (for all m)follows easily because
llxrnllrn5 11x11, 5 llxm+lllm+l.Suppose (16.12) and (16.13) both hold,
and let K , be the supremum in (16.12). If x E A and 6 < 1, then
we have Ixm(t)I 5 K,S for rn - 6 5 t < m. Given E, choose 6 so that
K,6 < e/4 and the supremum in (16.13) is less than ~ / 2 .If x E A
and m - 6 lies in the interval [tj-l, t j ) of the corresponding partition,
replace the intervals [ti-l, t i ) for i 2 j by the single interval [tj-i, m).
This new partition shows that 2om(x,6) < E. Hence (16.11).
That (16.11) implies (16.13) is clear because wA(x, 6) 5 2o,(x, 6):
An infimum increases if its range is reduced.
This gives us the following criterion.
Theorem 16.5. A set A in D, is relatively compact if and only
if (16.12) and (16.13) hold f o r all m.
Finite-Dimensional Sets
Let D, and D, be the Bore1 a-fields in D, and D, for the metrics
dh and d&. And let rt l...tk:D , + Rk (ti 2 0) and rg...tk:
D, -, Rk
(0 5 ti 5 m) be the natural projections. We know from Theorem 12.5
that each r p is measurable Dm/R1; and $
, is measurable D,/D,
because it is continuous. But rt = ry$+,, for t 5 m - 1, and therefore
each rt is measurable DD,/R1 and each 7rtl...tk is measurable D,/7Zk.
Finally, the argument following (12.37) shows that TO is everywhere
continuous and that, for t > 0, rt is continuous at z if and only if x is
continuous at t.
Suppose that T C [ 0, m). Define a[7rt: t E T ] in the usual way,
and let p [ r t : t E TI be the r-system of sets of the form rl;ll..tkH for
k >_ 1, ti E T, and H E Rk-the finite-dimensional D,-sets based on
time-points lying in T .
Theorem 16.6. (i) The projection TO is continuous, and f o r t > 0 ,
x if and only if x is continuous at t .
(ii) Each 7rt is measurable D, /R1;
each rt l...tk is measurable D,/7Zk.
(iii) If T is dense in [ 0 , oo),then u[rt:t E TI = D, and p[nt: t E T ] is
a separating class.
rt is continuous at
PROOF.
Only (iii) needs proof. If we show that
(16.14)
D,
C o[rt:t E
TI,
173
SECTION16. THESPACED [0 , ~ )
then there is actually equality, since each .rrt is measurable D,/R1.
And (16.14) will imply that the n-system p [ n t :t E TI generates D,
and hence is a separating class. The finite case of (16.14) is Theorem
12.5(iii), which applies to T, = (T n [ 0, m ] )U { m } :
D,
(16.15)
= a[7r,m:tE T,].
The problem is to derive (16.14) from (16.15).
If t 5 m, then
If t = m, then g,(t) = 0, and this set is D , or 8 according as 0 E H or
0 @ H-an element of a[nt:t E TI in either case. If t < m, then (16.16)
is [IC: n t Z E ( g m ( t ) ) - l H ] and
, this, too, lies in a[.rrt:t E TI (if H E R1).
Therefore, if A = ( n T ) - l H and t E T,, then $GIA E a[.rrt:t E TI. By
, is
(16.15), the sets A of this form generate D,, and it follows that $
measurable a[.rrt:t E T]/D,.
Now d&( ,a ) is measurable D,/R1 for each a in D,, and so (composition) d&($,(
a ) is measurable a[.rrt:t e T]/R1,as is do( y) =
C2-,(l A d&($,( .),qm(y))). This means that a[.rrt:t E T ] contains
the balls and hence (separability) contains 27,.
0
e ) ,
e
,
Weak Convergence
Let
D,)
Pn and P be probability measures on (D,, D,).
Lemma 3. A necessary and suficient condition for Pn + P ( o n
=& P$;l ( o n D,) for every m.
is that
PROOF.Since $
, is continuous, the necessity follows by the mapping theorem.
For the sufficiency, we also need the isometry $ of D , into II; since
$ maps D , onto $D, there is the inverse isometry 5:
Consider now the Bore1 a-field P for II with the product topology
[M6]. Define the (continuous) projection &:II ---t D1 x
x DI, by
< ~ , ( a=) ( a 1 , .. . ,a k ) , and let Pj be the class of sets (rlH for k 2 1
and H E D1 x . . x DI,.
The argument in Example 2.4 shows that P,f
is a convergence-determining class: Given a ball B ( a , c )in II,take k
so that 2-k < c/2 and argue as before (use Theorem 2.4) with the sets
THESPACED
174
A, = [p E II:d;(ai, p i ) < q, i 5 k] for 0 < q < ~ / 2 .And now argue as
in Example 2.6: a<ilH= <i'aH for H E Dl x - .. x Dh and hence,
for probability measures Q, and Q on II, Q&'
+ Q<i'
for every k
implies Qn Q.
We need one further map. For y E Dk,let pk(y) = (a',.. . ,arc)
be the point of D1 x
x Dk for which ai(t) = g i ( t ) y ( t ) on [O,i],
*
i = l , ..., k:
The map pk is continuous: Suppose that vn + y in Dk;then gi . yn +
gi . y in Dk,and this is also true if the functions are restricted to [ 0, i]
because gi . y is continuous at i (use Lemma 1).
The hypothesis now is that Pn$i' =& P$[' (on Dk)for each k,
and by the mapping theorem this implies that P,$,'p,l
+ P$i'pil
(on D1 x - .x Dk).Since pk$k = <k$, we have Pn$-'<il P$-'pk'
(on Dk) for each k. It follows by the argument given above that
P,$-' + P$-l (on II).
Extend the isometry [ defined above to a map q on II by giving
it some fixed value (in D,) outside the closed set $D,. Then q is
continuous when restricted to $Dm, and since $Doo supports P$-'
and the I?,$-',
it follows by Example 2.10 that (q$ is the identity on
D,) P, = P,$-lq-'
PTpq-1 = P.
0
*
*
As in the case of D [0,1], define T p as the set of t for which rt is
continuous outside a set of P-measure 0. As in Section 13, t E Tp if
and only if PJt = 0 where Jt is the set of x that are disconinuous at t.
And as before, T p contains 0, and its complement is at most countable.
For 5 in D,, let rtz be the restriction of 2 to [ O , t ] .
We must prove that rt is measurable Doo/Dt. Define rFz as the element of Dt having the value z ( ( i - l ) t / k ) on [ ( i - l ) t / k , i t / k ) , 1 5 i 5 k,
and the value z ( t )at t. Since the T i t / k are measurable Dm/R1, it follows as in the proof of Theorem 12.5(iii) that rf is measurable D,/Dt.
By Lemma 3 of Section 12, dt(rtz,rts) 5 tk-' V w i ( x , t k - ' ) -+k 0 for
each z in D,. Therefore [MlO], rt is measurable.
Theorem 16.7 A necessary and suficient condition for P,
is that P,r;'
+ Pr,' for every t an T p .
+P
PROOF.Given a t in Tp, fix an integer m exceeding t + 1; if
d&(zn,z)-+ 0, then d&(z,,z) --t 0, and if z is continuous at t , then
SECTION
16. THESPACED [0 , ~ )
175
it follows by Lemma 1 that dZ(rtzn,rtz)+ 0. In other words, the set
of points at which rt is continuous contains Jf,and if P n =+ P and
t E T p , then the mapping theorem gives PnrF' + Pr;'.
For the reverse implication, it is enough, by Lemma 3, to show that
Pn$G' + P$,'.
Choose t so that t E Tp and t 2 m. Let rm be the
continuous mapping from D, to D, defined by
Since $
, = Tmrt, the mapping theorem gives Pn$m' = (PnrF')TG'
(Pr,-').;'=
P$;1.
+
0
Tightness
Here is the analogue of Theorem 13.2; it follows from Theorem 16.5 by
the analogous argument.
Theorem 16.8. The sequence {P,} is tight i f and only if these
two conditions hold
(i) For each m,
(16.17)
lim limsvq Pn[z:llz/lm2 a] = 0.
a-co
(ii) For each m and
(16.18)
n
E,
limlimsupPn[z: w & ( z , ~ )2
6
n
E]
= 0.
And there is the corresponding corollary. Let
(16.19)
jm(z>= sup Iz(t)- Z(t+
tsrn
Corollary- Either of t h e following two c u d z t i o n s can be substituted f o r (i) in Theorem 16.8:
(i') For each t in a set T that is dense in [ 0, oo),
(16.20)
lim limsupPn[z: J z ( t )2( u]= 0.
a+oo
n
(i") The relation (16.20) holds for t = 0, and for each m,
(16.21)
lim limsupPn[z:jm(z) 2 a] = 0.
a-+w
n
176
THESPACED
PROOF.The proof is almost the same as that for the corollary
to Theorem 13.2. Assume (ii) and (i'). Choose points ti such that
O=to<tl <...<t,=m,ti-ti-l>6forl<Z<v-l(perhapsnot
for i = v), and w,[ti-i,ti) < wh(z,6) + 1 for 1 I
iI
v. Choose from
T points S j such that 0 = SO < s1 < < sk = m and S j - sj-1 < 6
for 1 I
j I
k. Let m ( z )= mao<j<k Iz(sj)l. If t, - t,-l > 6 , then
11~11rn< m ( z ) + w;(z,6) + 1, just as before. If t , - t,-l 5 6 (and
6 < 1, so that t,-l > m - l ) ,then llzllm-l 5 m(z>+wA(z,6)+ l . The
1~
by IIzllrn-l,
old argument now gives (16.17), but with 1 1 ~ 1 replaced
which is just as good.
In the proof that (ii) and (i") imply (i), we have (v - 1)s 5 m
instead of v6 I
1. But then, 21 I
m6-I 1, and the old argument goes
0
through.
+
Aldous's Tightness Criterion
Let Xn be random elements of D,. We need probabilistic conditions
under which the (distributions of the) X n are tight. First, (16.17)
translates into
(16.22)
lim limsupP[llXnIIm 2 a] = 0,
a+w
n
which must hold for each rn. There is a useful stopping-time condition
that implies (16.18).
A stopping time for the process X n is a nonnegative random variable r with the property that, for each t 2 0, the event [r 5 t ] lies
t ] . A stopping time also satisfies [I- = t] E
in the a-field a[Xr:s I
a [ X r : s _< t ] . All our stopping times will be discrete in the sense that
they have finite range. Consider two conditions.
Condition l o . For each E , q, m, there exist a 60 and an no such
that, if 6 5 60 and n 2 no, and if r is a discrete Xn-stopping time
satisfying r 5 m, then
(16.23)
P[lX,n,, -X,nI
Condition 2'. For each
that, if n 2 no and 7 1 and
r1 5 7 2 5 m, then
(16.24)
E,
r2
2 €1 i 7 .
q, m, there exist a 6 and a n no such
are Xn-stopping times satisfying 0 5
P[IX7"2 - XPJ 2 E ,
72 - 7 1
I
61 I
q.
SECTION
16. THESPACED [0, m)
177
Note that, if (16.24) holds for a value of 6, then it holds for all
smaller ones.
Theorem 16.9. Conditions 1" and 2" are equivalent.
+
PROOF.Since T 6 is a stopping time, 2" implies 1". For the
converse, suppose that T I m and choose 60 so that 6 5 260 (note the
factor 2) and n 2 no together imply (16.23). Fix an n 2 no and a
6 5 60,and let (enlarge the probability space for X n ) 8 be a random
variable independent of F = a[X?: s 2 01 and uniformly distributed
over J = [ 0,261. For the moment, fix an z in D, and points tl and t 2
satisfying 0 5 tl 5 t2. Let p be the uniform distribution over J , and
let I = [ 0 , 6 ] , Mi = [s E J : Iz(ti s ) - z(ti)l < €1, and d = t 2 - t l .
Suppose that
+
(16.25)
t2
- tl 5 6
and
(16.26)
3
p ( M i ) = P[8 E Mi] > 4, fori = 1a n d i = 2.
If p ( M 2 n I ) 5 $, then p(M2) 5 $, a contradiction. Hence p ( M 2 n I ) >
$, and (0 I d IS)p ( ( M 2 + d ) nJ ) 2 p ( ( M 2 n I ) + d ) = p ( M 2 n I ) >
Thus p ( M l ) + p ( ( M z + d ) n J ) > 1,which implies p ( M 1 n ( M 2 + d ) )> 0.
There is therefore an s such that s E M I and s - d E M2, from which
follows
i.
Thus (16.25) and (16.26) together imply (16.27). To put it another
way, if (16.25) holds but (16.27) does not, then either P[8 E Mf]2 $
or P[8 E Mi] 2 Therefore,
i.
Since 0 5 8 5 26 5 260, and since 8 and .F are independent, it
follows by (16.23) that the final term here is at most 877. Therefore, 1"
implies 2".
0
THESPACED
178
This is Aldous's theorem:
Theorem 16.10. If(16.22) and Condition 1" hold, then {X"}is
tight.
PROOF.
By Theorem 16.8, it is enough to prove that
limlimsup P[wk(Xn, 6) 2 E ] = 0.
(16.28)
6
n
Let Ak be the set of nonnegative dyadic rationals j/2k of order k.
Define random variables T;, T?, . . . by T; = 0 and
T?
= min[t E A,:
Tr-1
< t 5 m, lXF - X$-l I 3 €1,
with T? = m if there is no such t. The T? depend on E , m, and k as
well as on i and n, although the notation does not show this. It is easy
to prove by induction that the T? are all stopping times.+
Because of Theorem 16.9, we can assume that Condition 2" holds.
For given E , q , m, choose 6' and no so that
P[lx+ - Xn
q-1 I 2 E, T? - (-1
5 6'1 5 77
for i 2 1 and n 2 no. Since T,'" < m implies that 1X$a - X$-lI 2
have
(16.29)
P[T? < m, T?
-
T?-~ 5 6'1 5 q ,
E,
we
i 2 1, n 2 no.
Now choose an integer q such that q6' 2 2m. There is also a 6 such
that (increase no if necessary) P[T? < m, T?
5 61 5 q / q for i 2 1
and n 2 no. But then
9
(16.30)
57, n2no.
P(U[T?<~,T?-T.~,~_~<~I)
i=l
Although the T? depend on k, (16.29) and (16.30) hold for all k simult aneously.
By (16.29),
E[T? - T?-~~T:
< m] 2 ~'P[T?- T?-~ 2 6'1~: < m]
2 6'(1 - 'V/p[T: < m ] ) ,
t This is easy because the 7," are discrete; proofs of this kind are more complicated
in the continuous case.
179
SECTION
16. THESPACED [0,m)
and therefore,
m 2 E [ T ~ ~ T<; m] =
Q
E[T? - T E ~ ~<
T m]
: 2 q6'(1 - ~/P[T:
< m]).
i=l
Since q6' 2 2m by the choice of q, this leads to P [ T ~< m] 5 27. By
this and (16.30),
Let Ank be the complement of the set in (16.31). On this set, let
v be the first index for which T," = m. Fix an n beyond no. There are
< t; = m and tf - tf-l > 6
points tt (the T,'") such that 0 = ti <
for 1 5 i < v (but not necessarily for i = v). And IXp - XcI < E if s
and t lie in the same [tf-l, t f ) as well as in A,. If An = lim supk Ank,
then PA,
1 - 37, and on An there is a sequence of values of k along
which v is constant (v 5 q ) and, for each i 5 v, tf converges to some
< t, = m, ti - ti-1 2 6 for i < v, and
ti. But then, 0 = t o <
by right continuity, lXF - XfI 5 E if s and t lie in the same [ t i - l , t i ) .
e
It follows that w&(Xn,6) 5
Hence (16.28).
E
.
1
on a set of probability at least 1 - 37.
0
From the corollary to Theorem 16.8 follows this one:
Corollary. If, for each m, the sequences {Xg} and {jm(Xn)}are
tight on the line, and i f Condition 1" holds, then {X"} is tight.
Finally, Condition 1" can be restated in a sequential form: If T,
are discrete Xn-stopping times with a common upper bound, and if 6,
are constants converging to 0, then
(16.32)
n
X~n+6n
-XT",
Jn
0.
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
CHAPTER 4
DEPENDENT VARIABLES
SECTION 17. MORE ON PRIME DIVISORS *
Introduction
Section 4 treats, among other things, the asymptotic distribution of
the largest prime divisors of a random integer.t Here we study the
total number of prime divisors of a random integer and the number of
them in a given range, and we show that the limit is normal in some
cases and Poisson in others. And we prove the associated functional
limit theorems.
On the a-field of all subsets of R = { 1 , 2 , .. .}, let P, be the probability measure corresponding to a mass of 1/n at m for 1 5 m 5 n,
so that P,A is the proportion among the first n integers that lie in
A. And let E, denote the corresponding expected value. For each
integer a,
(17.1)
1-1
1 1
1 n
1
- - - < P,[m:alm] = 5-:
a n
n u
a
the probability that a divides m is approximately 1/a for large n. By
the fundamental theorem of arithmetic, distinct primes p and q individually divide m if and only if their product does, and this, together
with (17.1), leads to the approximate equation
the three probabilities having the approximate values l/pq, l/p, and
l / q . The events “divisible by p” and “divisible by q” are therefore
approximately independent, and similarly for three or more distinct
t The theory of Section 4 is not required here; it is used only in passing, for
comparison.
180
SECTION
181
17. MORE ON PRIME DIVISORS
primes, and this means that probability theory can be brought to bear
on problems of multiplicative arithmetic.
Let Sp(m) be 1 or 0 according as p divides m or not. By (17.2),
the Sp behave approximately like independent random variables. Now
the sum
(17.3)
Un<PlVn
is the number of prime divisors p of m lying in the range un < p I v,.
There are many theorems about sums of stictly independent random
variables, and many of these can be carried over to the arithmetic case.
A General Limit Theorem
Replace (17.3) by the more general sum
(17.4)
fn(m)
=
C
Pln
fn(p>Sp(m),
1 I m I 72.
Consider independent random variables Ep (on some probability space),
one for each prime p , that take the values 1 and 0 with probabilities
p-' and 1 - p - l . The properties of the sum S n =
fn(p)<p can
serve as a guide to those of f n . Define
xp5n
Assume that ui > 0, at least for large n.
If m is drawn at random from {1,2,. . . ,n } , then fn becomes a
random variable governed by P,. Write fn
E to indicate that this
random variable converges in distribution to [.
*
that
Theorem 17.1. Suppose there exist positive constants an such
where the second relation holds for each positve e . If
(17.7)
DEPENDENT
VARIABLES
182
and the distribution of ( is determined by its moments, then
(17.8)
The proof depends on the method of moments. Consider first an
application.
EzampZe 17.1. Suppose that f n ( p ) G 1, so that fn(m)is the total
number of prime divisors of rn. (Even though the function f n is the
same for all n in this case, the probability mechanism changes with n,
and so it is clearer to preserve it in the notation.) All that is needed
from number theory is the fact that there exists a constant c such thatt
(17.9)
Z:
= logloga:+c+o(l) =logloga:+O(l).
P5X
From this it follows that the Pn and on of (17.5) are each log log n +
O(1). If an = nl/loglogn,then the first sum in (17.6) is logloglogn
0(1),and the second is [an] = o(n'): (17.6) is satisfied. Since on -+ 00
and the random variables tp- l / p are uniformly bounded, it follows
by the Lindeberg (or the Liapounov) central limit theorem that (17.7)
holds with N in the role of t. Therefore, Theorem 17.1 gives the
striking result of Erdos and Kac:
+
4
Dividing through by
of Hardy and Ramanujan:
(17.11)
leads to the weak law of large numbers
1
log log n
Cs,=+ 1.
Pln
0
PROOFOF THEOREM
17.1. We can assume that on
from f n ( p ) to fn(p)/on).Let
t Hardy & Wright (371, Theorem 427, or [PM.240].
G
1 (pass
SECTION
17. MOREON P R I M E DIVISORS
183
By the first condition in (17.6),
an<pln
Therefore, (17.8) is equivalent to
(17.12)
* 5,
gn - un
and similarly, (17.7) is equivalent to
(17.13)
Tn
- Vn
+ 5.
It is therefore enough to show that (17.13) implies (17.12).
For each positive integer r ,
by the multinomial theorem, where C' extends over those u-tuples
T I , .. . , T, of positive integers adding to r and C" extends over the utuples P I , . . . , p , of primes satisfying p i < . - < p , 5 a n . And En[gk]
is the same thing with
replaced by
Since these two expected values differ by at most l / n , I E[TA] -En [gk]I 5
~L-~(C,,,~
Ifn(p)l)T. Two binomial expansions, together with the second condition in (17.6), now give
(17.15) IE[(Tn - ~ n ) ' ] - En[(gn - ~
n > ~ ] l
184
DEPENDENT
VARIABLES
(Notice that the first condition in (17.6) requires large values for the
a,, while the second requires small ones.)
By (17.13), if E[(T, - v,)~] is bounded in n for even r, then (see
holds for all r. It will follow by (17.15)
(3.18)) E[(T, - v,)~] t
that En[(gn- v,)']
E[r]
holds for all r, which, by the method of
moments [PM.388], will in turn imply (17.12). If qp = tP- p-', then
E[r]
--f
Since the qp are independent and have mean 0, we can require in C'
that each ri exceed 1. If pn is the maximum on the right in (17.6), then
pn I p for some p, and if we take p > 1,then Ifn(p)qpITi5 p f i ( p ) q E
for ri >_ 2. Therefore, the inner sum on the right in (17.16) has modulus
at most
C%P"f,2(p1)7;,1
Now 1 = u: >_
*..E~PTuf,2(Pu)~;J
I DT(
c
f3P)E[v;1)U.
Plan
cpSa,
fi(p)E[qi],and it follows by (17.16) that
Thus IE[(T, - v,)']I
is indeed bounded for each r.
This argument uses from number theory only the fundamental theorem of arithmetic. The applications of Theorem 17.1 require further
number theory-(17.9), for example.
EzampZe 17.2. To prove the functional version of (17.10), we need
a slight generalization of the argument leading to it. Suppose that
Ifn(p)I < 00 and on
00.
Then the central limit theorem
applies to S n just as before. Take a, = n1iU:. Then the first sum in
0(1)= o(u,); the second sum is O(nl/"':), which is
(17.6) is logo:
~ ( n 'because
)
u,
00. Therefore, (17.6) holds, and we can conclude
that
+
(17.17)
--f
-
f n un
-pn = L
un~
( f , ( p ) ( 6 p - ~ ) )+ N .
P<n
0
SECTION
185
17. MORE ON P R I M E DIVISORS
An example where the limit is Poisson:
Example1 7.3. Suppose that f, counts the prime divisors between
u, and v,:
(17.18)
f n ( P ) = qun,vn](P),
fn(m) =
c
Sp(m).
U n <P<Vn
Let
<A
= <(A) have the Poisson distribution with mean A. If
Y, = o ( n E ) ,
(17.19)
u, 4 00,
where the second condition holds for each positive E , then the f,
in (17.18) satisfies f, =$ <A. To prove this, first note that, since
maxun<p<vnl / p -, 0 by the third condition in (17.19), it follows by
a standard limit theorem for the Poisson case [PM.302] that S, =
C u n < p < v n P + <A. And since p, and uz go to A, it follows further
that (S, - p,)/un + 77 = (<A - X ) / 6 Apply Theorem 17.1 with
a, = v,; the first and third conditions in (17.6) are obviously satisfied,
and the second condition in (17.19) implies the second one in (17.6).
Therefore, (fn - p,>/a, + 77, which in turn implies f, + CA.
Let u, = eSCnand Y, = etcn, where 0 < s < t. If c,
00 and
c,/ log n ---t 0, then (17.19) holds for X = log t - logs (use the middle
< logp 5 tc,] +
member of (17.9)), and we obtain C[Gp(m):sc,
[(logt-log s). To compare this Poisson case with the Poisson-Dirichlet
case (Section 4),take c, = log log n:
--f
(17.20)
c
logp
[Sp(m):
s < log log n
5 t ] =$, <(logt - logs).
This magnifies what goes on just to the right of 0 in Theorem 4.5. 0
The Brownian Motion Limit
Suppose as in Example 17.1 that f , ( p ) E 1. For 1 5 m 5 n, define an
element X n ( m )of D [0,1] by
(17.21)
The reason for the range in the sum is this: Define a new f n ( p ) as 1 if
P -< .clogtn and 0 otherwise; then the corresponding p, and 0: of (17.5)
186
DEPENDENT
VARIABLES
+
are both t log log n 0(1)by (17.9). The variance of X p under P , is
therefore approximately t , the variance of Wt.
Theorem 17.2. For the X n of (17.21), X n =+W .
PROOF. It follows immediately from (17.10) that XT
pose that 0 < s < t < 1, and apply Example 17.2 to
fn(P) =
{
+N.
Sup-
a i f p 5 elog'n,
- elogt 72,
b if elog'n < p <
0 otherwise.
+
+
Then the a; of (17.5) is (a2s b2(t - s))loglogn 0(1),and (17.17)
gives
ax: b ( ~ -r x:) (u2s b2(t - s ) ) ~ / ~ N .
+
*
+
+
Since the limit here has the same distribution as uWs b(Wt- W,),it
follows by the Cram&-Wold argument that (X,",X,"- X,") converges
in distribution to (W,,Wt - Ws). An extension shows that all the
finite-dimensional distributions of X n converge weakly to those of W .
Define Y nby (17.21) but with p further constrained by p 5 n1l4
in the sum. Then
by Markov's inequality (first order) and (17.9). Therefore, it is enough
to prove Y" + W ,and since the finite-dimensionaldistributions of Y n
also converge to those of W ,it is enough to show that {Y"}is tight.
Let r and s be positive integers, let U and V be disjoint sets of
primes, and consider the four sums
We have (see (17.14))
where C' extends over the u-tuples {ri} adding to r and the v-tuples
{ s j } adding to s , and C" extends over the {pi} c U and { q j } c V
satisfying p l < * - < p , and q1 < . * < qv. And E,[f&f$-] is the same
thing with the inner summand replaced by n-l Ln/pl - p,ql . . q v j .
a
SECTION
17. MORE ON PRIME DIVISORS
187
Therefore, IE[ShS$] - En[f&fG]I 5 n-11UITIV(s7
where IUI and IVI
denote cardinality. This also holds if T = 0 or s = 0, and the proof is
q-'. By a double binomial
simpler. Let a = x p E u p - l and b =
expansion,
'&
Take T = s = 2 and W = U U V . Since we have E[(Su - a)'] =
cpEup-l(l
- p - l ) I a and similarly for Sv,it follows by (17.22) that
and Schwarz's inequality)
(use xy 5 (z
+
If Q < p 5 n1/4and W is contained in the set of primes in the range
Q < p 5 0,
then
Suppose that Q < T 5 p 5 n1I4,and let U and V consist of the
primes in the ranges Q < p I T and T < p 5 p, respectively. The
preceding inequality gives
Suppose now that X > 4, so that X/4 exceeds the maximum absolute
value of the summands 6, -p-l. Apply Theorem 10.1 and then (10.6):
(17.23)
P[,
max
a<r<P
1
a<p<T
($ -
i)
12 A] 5 -
DEPENDENT
VARIABLES
188
Let 52 = CP5+p-'.
Change the definition of Y": Normalize by
sn rather than d l z . It is enough to prove tightness for the new
random functions Yn. By (17.23), there is a constant K', such that, if
a < p 5 n1/4and n is large enough that ES" > 4, then
As 2 increases from 0 to n, 5 2 increases in jumps from 0 to ):5 and
no jump exceeds 1. Therefore, for a given 6, if n is large enough
that s i < 6, then there exist ao, . . . ,avsuch that, for ti = sii/s;,
0 = t o < tl < . . * < t, = 1 and 6 < ti - ti-1 < 26 for each a < v. By
(17.24) and (7.11),
for large n. This proves tightness.
0
We can clarify the number-theoretic meaning of this theorem by
transforming (17.21). First, P,[m: log log m 5 (1 - E) log log n] ---t 0 by
(17.9). Therefore, if P, governs the choice of m,
(17.25)
log log m
log log n
*"1.
Let y+ = CP<,p-';
this is essentially s;, but we now regard it as an
approximate mean rather than variance. Define an( m) by
a,
(17.26)
Q,(O,m)= 0; Qn(l,m) = 1;
and interpolate linearly between these points. It follows by (17.25) and
two more applications of (17.9) that
and therefore, if I is the identity function on [ 0,1], then Q, =+ I in
, by the
the sense of D[O,l]. By Theorem 3.9, (X", iPn) + ( W , I )and
SECTION
189
17. MORE ON P R I M E DIVISORS
lemma to Theorem 14.4, X n o an =+ W o I = W . But X n o an is the
random function Zn defined by
Z r ( m )=
(17.27)
1
c
JmGG P:-YPl%a+l<t (s,(m,-
‘).
P
Thus Zn =$ W . We could also divide by J m - t h e
result is the
same.
The random function Zn can be described this way: If t = yp/ym+l,
then
(17.28)
and if p‘ is the next prime after p , then Z n ( m ) is constant on the
interval [yp/ym+l,yP!/ym+l],which has length proportional to p - l .
Suppose an idiot-savant successively checks the primes up to m in order to see
which ones divide m, the amount of time he devotes to p being proportional to
l / p - h e calculates with ever-increasing fury. Call p “advantageous” with respect to
m if
6,(m) > 7,; this means that, with respect to divisibility by the primes
preceding p , m is “more composite’’ or “less prime-like” than the average integer. If
q’ is the product of the prime divisors of m that precede p , then p divides m if and
only if it divides m/q‘, and the latter is easier to check than the former. The savant
knows all this. If p is advantageous, then q’ is likely to be large, which simplifies
the computational task and hence irritates the idiot (computation being his life).
The proportion T,(m) of time he spends on advantageous (and to him vexatious)
primes is the Lebesgue measure of the set of t in [ 0,1] for which Z p ( m ) > 0, and
in the limit this follows the arc sine law (9.27). Since the corresponding density is
U-shaped, T,, is more likely to be near 0 or 1 than to be near the middle of [ 0,1].
For example, for about 20% of the integers under n (n large), T,, > .9, for about
20%, T~ < .1, and for only about 6% do we have .45 < T,, < .55.
Let a p ( m )be the highest power of p that divides m. If we redefine 2” by
substituting cyp(m)for b,(m) in (17.27), then 2“ + W still holds. Indeed, if
p P ( m )= cyp(m)- 6,(m), then Pn[pp 2 k ] = n-llnp-k-’J 5 p-“-’ , and hence
En[&] 5
p-k-’ = ( p ( p - l))-’. It follows that E , , [ z p p p is
] bounded, and
p p / log log n J, 0: The new Zn does converge
therefore, by Markov’s inequality,
in distribution to W .
We can now imagine that the idiot-savant finds for each successive p its exact power in the prime factorization of m. Now he can quit after he has encountered the largest prime divisor pl(m) of m. By Theorem 4.5, logpl(m)/logm =
c,,,
czl
c,
190
DEPENDENT
VARIABLES
x,(m) 3, x,where x has the density &(x; 1) as defined by (4.16). It follows that
loglogpl(m)/loglogm = 1 logx,(m)/loglogm + n 1. Therefore, if the idiot
quits after encountering pl(m), he saves (to his satisfaction) very little time.
The skewing of the distribution of T, toward the ends of [ 0,1], as described
above, is even more pronounced if our idiot-savant spends the same amount of time
on each p : Let A6 be the set of x in D [0,1] that are nonzero and have constant
sign over [l - 6,1]. For each E there is a 6 such that P[W E A6 n C] > 1- E (see
(9.28)). And by Theorem 2.l(iv) and the fact that A: n C = A6 n C, it follows
that liminf, P,[Z" E A&] 2 P[W E A:] = P[W E A6 f l C] > 1 - E . The points of
~ +
logl logp/ log log m, and the great
discontinuity of Zn(m)are the ratios ~ ~ / ' y %
majority of these lie to the right of 1- 6. In fact, if ~ ( x is) the number of primes
up to x, then the proportion of the discontinuity points less than 1 - 6 is about
r(exp(log'-6m))/?r(m), and this goes to 0 in probability because ~ ( z is) of the
order x/logz. Therefore, under the new regime for our idiot-savant, the limiting
distribution of 7, has mass at 0 and mass f at 1.
+
The Poisson-Process Limit
There is a functional limit theorem corresponding to (17.20). First,
transform the ratio there by taking its logarithm. If
u, = exp exp(s
+ log log log n ),
vUn= exp exp(t
+ log log log n),
then
(17.29)
z[6,(rsn
<)loglogp
:
- logloglogn 5 t]
=
c
un<Plvn
bp(m)J n C(t - 4.
For each prime divisor p of m, place a point at log log p - log log log n.
If P, governs the choice of m, this gives a point process on (-00, 00).
Let X r ( m )be the number of points in (0, t] for t 2 0 and minus the
number in ( t ,01 for t 5 0. Then XF(m)- Xr(m)is the sum in (17.29).
Now consider a Poisson process on (00, w) with a constant intensity
of 1, and let Xt be the number of points in ( O , t ] or minus the number
in ( t , O ] as t 2 0 or t 5 0. Then X has independent increments, and
X , - X , has the Poisson distribution with mean t - s. Thus (17.29)
can be written as Xr - X r + X , - X,.
We can extend this in the usual way by the Cram&-Wold method.
Suppose that T < s < t , define T~ = expexp(r - logloglogn), and
SECTION 17.
MOREON
define un and vn
PRIME
a~ before.
DIVISORS
191
NOWdefine
By the analysis above, Sk and f; converge in distribution to X, - Xr,
and S: and f:
converge in distribution to Xt - X,. By independence,
Theorem 2.8(ii), and the mapping theorem, S, = ask bS: + a(X, Xr) b(Xt - X,) = <. If fn = af; bf:, A' = logs - log r , and XI' =
log t - logs, then the pn and a: of (17.5) satisfy p n + p = aX' bX"
and U: + a2 = a2X' b2X". But now, (Sn - p n ) / a n + (< - p ) / a ,
and it follows by Theorem 17.1 that (fn - pn)/an+ (< - p ) / a . But
then it follows further that fn + 5. This means that af; + bf: +
a(X, - XT) b(Xt - X,) for all a and b, and therefore, (X,"- XF, XF X:) = (fA,f,")
+ (X, - Xt,Xt - X,). The argument extends: The
finite dimensional distributions of the process Xn converge to those of
+
+
+
+
+
+
X.
The sample paths of Xn and X are elements of the space D ( -00, 00)
of cadlag functions over (-00,oo). If we restrict X n and X to [s,t],
then, by Theorem 12.6, there is convergence in distribution in the sense
of the space D [ s ,t]. It is possible to adapt the reasoning of Section 16
to the space D(-m,00). Replace the g m of (16.2) by the function
that is 1 on [-m 1 , m - 11 and 0 outside [-m, m], and is linear on
[-m, -m 11 and [m- 1,m]. Define P ( t ) by (16.3), but for all t on
the line, and define a metric on D ( - ~ o , ~ oby
) (16.4), where now d&
is the metric for the Skorohod topology on D[-m, m]. One can derive
for D(-00, 00) the analogues of Theorems 16.1 through 16.7. If r,,p is
the restriction of z to [s,t ] ,then, as observed above, the distributions
P, and P of Xn and X satisfy P,r;t + Pr;; for all s and t , and by
the analogue of Theorem 16.7, P n + P:
+
+
Theorem 17.3. We have X n
D(-0O, 00).
+X
an the sense of the space
For negative [nonnegative] i, let pi be the successive X-points
< p-2 < p-1 <
(points where X jumps) preceding [following] 0:
0 < PO < p1 < . * (with probability 1, 0 is not an X-point). It is a
standard property of the Poisson process that the interarrival intervals
pi - pi-1 have the exponential density &(z) = e-" for i # 0, while
,130- p-1 has density $2(z) = ($1 * $1)(x) = z e - x .
0
.
.
DEPENDENT
VARIABLES
192
Thus the interarrival intervals have mean 1, except for the one that
covers 0, which has mean 2: A long interval has a better chance of
covering 0 than a short one has. In fact, for every fixed t , the interval
that covers it has density 49 and mean 2. This seems paradoxical,
because an arbitrary interval can be made to cover t simply by shifting
t the right amount. But this is like trying to place your bet after
the roulette wheel has come to rest, which casino operators vigorously
discourage.
For negative [nonnegative]i, let pi(m) be the prime divisors of m
preceding [following] logn: . < p-2(m) < p-l(m) < logn < po(m) <
pl(m) < - - - (p = log n is impossible because e is transcendental).
Then the Xn-points corresponding to the pi are the points pr(m) =
loglogpi(m)-logloglogn. Let D,be the set of functions in D(-w, 00)
that are nondecreasing, take integers as values, and have jumps of
exactly 1 at the points of discontinuity. The corresponding set for
D [0,1] is described after (12.38).
Suppose that zn and z lie in D, and that zn + z in the topology of
D ( - o ~ ,00). Assume that s < 0 < t and that z is continuous at s and
t. Then z has a finite number M of jumps in [ s , t ] , and it is not hard
to show (see the discussion following the proof of Theorem 12.6) that,
for large n, x, has exactly M jumps in [s,t] and that the positions of
these jumps converge to the positions of the jumps in z. Since X n and
X have their values in D,, it follows by Example 3.1 and the mapping
theorem that (p,",. . . ,,Or) (&, . . . ,&) for all u and v. Therefore,
~ - pi-1, and from this it follows
loglogpi(m) - loglogpi-i(m) = s pi
exp(Pi - ,&I) and (for y > 1)
that logpi(m)/logpi-l(m)
*
*,
(17.30)
limP,[m:pi(m) 2 ~ : - ~ ( m )=] {;::(I
n
+logy)
if i = 0,
if i # 0.
Also, E[exp(pi - Pi-l)] = 00 (for i = 0 as well as for i # 0), and since
pi-pi-1 >_ logpi/logpi-l, Theorem 3.4 implies that En[pi-pi-l] + 00.
Because of (17.25), Amn = logloglogm - logloglogn = s 0~ if
P, governs the choice of m. The points corresponding to the process X n are the loglogp - log loglogn. If instead we use the points
loglogp - logloglogm, we have a different process Yn.Since this is
just a matter of translating the time scale by the amount Amn,the distance in D ( - o ~ ,co)between X n and Y ngoes to 0 in probability, and
therefore, by Theorem 3.1, Y n+ X. For example, (17.30) also holds
< p - z ( m ) < p-l(m) < logm <
if we redefine the pi(m) so that
po(m) < p d m ) < * * ' .
193
SECTION 18.MARTINGALES
SECTION 18. MARTINGALES
This section makes use of the basic facts of ergodic theory and discretetime martingale theory. Here is a statistical example to which the limit
theory of the section applies.
EzampZel8.1. Let {(o, G , . . .} be a stationary Markov chain with
finite state space. Suppose the transition and stationary probabilities
p ( z , j ; 0) and p ( i ;0) are positive and depend on a parameter 0 ranging
over an open interval. Define (assume the derivatives exist)
cn
The log-likelihood of the observation C O , . . . , is Ln(0) = logp((0; 0) +
CE=llogp(<k-l,&;O), and its derivative is L p ) = xE=o&. It is
easy to verify that E[<o] = 0 and E[&Il(o,. . . ,&-I] = 0, from which it
follows that {t,}is a martingale difference and (equivalently) { L c ( 0 ) }
is a martingale.
Large-sample theory for Markov chains starts with a central limit
theorem: If 0 is the true value of the parameter, then n-1/2Lk(0) =$
a ( 8 ) N , where a2(0)= E[J:].
But {[1,<~,...} is stationary (not so
if (0 is included) and ergodic, and it follows by Theorem 18.3 that
& + E[<:]N. And because of the norming factor n-1/2,
n-1/2
this still holds if we include t o in the sum, which gives the required
limit theorem. These arguments can be made to cover chains of higher
0
order as well as more general processes.
Triangular Arrays
Suppose we have a triangular array of random variables. For each
n, <,I, & 2 , . . . is a martingale difference with respect to the a-fields
. .: &k is q-measurable, and E [ t , k l l q - l ] = 0. There may
be a different probability space for each n. Suppose that the t,k have
If the martingale is
second moments, and put a& = E[Jik(IG-,].
originally defined only for 1 5 k 5 T,, take Jnk = 0 and q =
for
k > T,. Assume that
&k and Ezl
converge with probability 1. The development begins with the following theorem [PM.476].
q,c,.
cgl
aik
Theorem 18.1. Assume that
00
(18.1)
k=l
cn
DEPENDENT
VARIABLES
194
where u is a nonnegative constant,tand that
M
(18.2)
k=O
each E. Then EL1
[nk
for
=+ uN.
We turn now to the corresponding functional limit theorem. The
are still a martingale difference, and the notation is the same as
before.
(nk
Theorem 18.2. Assume that
(18.3)
for
every t and that
(18.4)
ksnt
*
for every t and E. If xr = &,nt - (nk, then Xn W an the sense of
D[O,m).
A simple way to construct a Brownian motion W on [ 0, m) is t o
take a Brownian bridge W" on [ 0,1] and define Wt = (1 t)W,"/(l+tl
for t E [O,m).
+
PROOF. Suppose that s < t , and define qnk as a(& for k 5 [ns]
and as btnk for 8L1.
< k 5 [nt]. BY (18.3), ~ k < n t E [ ~ ~ ~ (=$l ~
a2s b2(t - s); therefore, Theorem 18.1 applies to {ink}, and it follows
that ax:
b(Xr - X,") aW, b(Wt - W s ) .By the Cramdr-Wold
argument, the two-dimensional distributions converge, and the same is
true for higher dimensions.
For tightness, we use the corollary to Theorem 16.10. Since k:[ 5
e2 (;kI[l<,,kl>f] , (18.4) implies that E[maxkSnm($1 --tn o for each rn.
Therefore, {jm(Xn)}is tight on the line for each rn, and of course
{Xt} is tight.
To verify Condition 1" of the theorem, we use the sequential version, (16.32). Assume then that Tn are discrete Xn-stopping times
bounded by m and that 6, + 0. We must show that
+
+
*
+
+
(18.5)
t
Q
X7,+bn - x
:
n
The theorem is stated [PM.476] for
= 0 as well.
Q
*n
0-
> 0, but the proof there covers the case
- l ]
195
SECTION 18.MARTINGALES
If A[ is the event where LnTnl < k 5 Ln(rn+&)I, and if Cnk = IA&k,
then the difference in (18.5) is x k &. For large n, 6, < 1, and in that
case ( T ~5 m) A[ # 8 implies lc 5 n(m + l ) ,and we can take the last
sum as &n(m+l)Cnk*
Now apply Theorem 18.1 for the case where
cr2 = 0 in (18.1): (18.5) will follow if we prove, first, that the Cnk form
a martingale difference with respect to the
and, second, that
(18.6)
EIC:kll-?-ll
kln(mf1)
and
kln(m+l)
5 ( & I,
(18.7) is a consequence of (18.4).
is the largest point in the (finite) range of T~ for which T <
k / n , then [LnTnJ < k] = [nTn < k] = [ T ~5 T ] E a [ X F : s 5 7-1 c
C q-l.Since T~
6, is another discrete Xn-stopping time,
[ l n (+6,
~)]~ 2 k] also lies in .?"-I, and hence A; E q-l.From this it
follows that E [ < n k l l c - l ] = IA;E[&IIG-~] = 0, and so { & k } is indeed
a martingale difference,as required.
Only (18.6) remains to be proved. For each integer q, let
Since
If
T
qn,.,
+
By (18.3), PB,"--t, 1 for each q. For large n, 6, < l / q , and then
6, < T~ l / q 5 m 1, which implies that
T~
+
+
+
2
IAEank
EICikllz-ll =
k<n(m+l)
k<n(m+l)
Given t , choosej so that ( j - l ) / q
and so, on the set B i ,
Since PB;
--tn
< t 5 j / q . If t 5 m, t h e n j 5 qm+l,
1 for each q, (18.6) follows.
0
196
DEPENDENT
VARIABLES
Ergodic Martingale Differences
Now consider the stationary case. Let
be a doubly infinite sequence of random variables. It is a martingale
difference if tn is Fn-measurable and E[&llFn-l]= 0, where F n =
a[&:k 5 n].
Theorem 18.3. Let (18.8) be a stationary, ergodic martingale
diflerence for which E[<,] = cr2 is positive and finite. If we take X p =
Cklnt& / u f i , then X n =+W in the sense of D [0,oo).
{en}
PROOF.Represent
as the coordinate variables on the space
of doubly infinite sequences [M22] and put c r i = E[[ilIFn-l].
Then [M22] {a:} is stationary and ergodic.
Now we use Theorem 18.2. Let tnk = & / U f i ,
= F,, and
unk:
2 = E[[&llq-l]
= u;/u2n. By stationarity,
P O o
Hence (18.4). By the ergodic theorem,
llklnt
1
nu2
lsklnt
Hence (18.3).
0
Theorem 18.3 also holds if (18.8) is replaced by a one-sided sequence ( t i , t 2 , . . .) for which E[[nll[l,. . . , &1]
= 0: Represent the
process on R+" [M22].
SECTION 19. ERGODIC PROCESSES
The Basic Theorem
Consider a stationary, ergodic process
197
SECTION
19. ERGODIC
PROCESSES
Represent the process on the space of doubly infinite sequences [ M 2 2 ] ,
use 11 11 to denote the L2 norm, let T be the shift, and put
the
Theorem 19.1. Assume that (19.1) is s t a t i o n a q and ergodic, that
tn have finite second moments, and that
(19.3)
n=l
Then
and the series
03
(19.5)
n=l
converges absolutely. I f a > 0 and
n
(19.6)
Sn
=
Ctk,
XF = S l n t j
k=l
then Xn + W in the sense of D[O,ca).
That (19.4) holds is usually obvious a priori. The condition (19.3)
can be replaced by
M
(19.7)
n=l
This is simply a matter of reversing time: T-l is ergodic if T is,
t k has the same distribution as
t - k , and (19.7) is the same
thing as
I(E[<-n((Go]((< 00.
If we start with a one-sided sequence ( t o , [I,. . .) that is stationary
and ergodic and satisfies (19.7), we can construct the two-sided version
[M22] and still conclude that Xn + W .
The proof uses Theorem 18.3, on martingale differences. Before
proceeding to the proof itself, consider two special cases that explain
cg=l
c,"=1
cgZ1
198
DEPENDENT
VARIABLES
the idea behind it. The first idea is to consider q k = & - E[&lIFk-1].
Now E[qkIJFk-l] = O-{qn} is a martingale difference-and Mk =
C;=,q k will be approximately normally distributed by the theory of
the preceding section. On the other hand, the difference S, - M, is
C;=,E[&II3k-1], and there is no a priori reason why this sum should
be small (unless, for example, (5,) is a martingale difference to start
are 1-dependent.1 If we redefine
with). But now suppose that the
v k as
<,
qk =I k
(19.8)
+ E[Sk+lllFk] - E“kllFk-11,
then, since E[E[&+i))Fk])JFk--i] = E[&+illFk-i] = 0, {q,} is again
a martingale difference. This time, S, = M, + ~ ~ = l ( E [ & + l l l F k ] E [ & ~ ~ F ~ - - i and
] ) , the last sum telescopes to E[&+lllF,]
- E[&llFo],
which is small compared with S, and M, (we use a norming factor of
If {&} is 2-dependent, the same argument works for
the order fi).
( 19.9)
vrC = [k -k E[6c+ll( Fk]-k E[<k+2 IlFk] - E[<k (1 Fk-11 - E [ [ k + l ( (F k - 1 1 .
The assumption (19.3) makes it possible t o replace the “correction”
terms in (19.8) and (19.9) by infinite series.
PROOF OF THE
(19.10)
THEOREM.
First [M22(40)],
E[sillFklTn = E[ti+nIIFk+,].
By Schwarz’s inequality, we have IE[&&]I
5 E[l&l IE[&11Fo0]1] 5
115011 * IIE[&11Fo’o]II, and it follows by (19.3) that the series (19.5) converges absolutely. If pk = E[Jo&], then, by stationarity, E[S$ =
npo 2 Czz:(n- ~ c ) p k .From
+
it now follows that
The tnare m-dependent if (ti,.
. . ,(k) and ( ( k + n , . . . ( j ) are independent whenever n > m. An independent process is 0-dependent.
If u = 0, then, by Chebyshev’s inequality, S,,/J;; =+ 0.
199
SECTION
19. ERGODIC
PROCESSES
By Lyapounov's inequality again, together with (19.3),
E[x
00
(19.12)
00
IE[&+illFkl]
i=l
5
<
~ ~ E [ ~ k + i ~ ~ ~ k l ~ ~
i=l
Define
(19.13)
i=l
Because of (19.12), the series here converges with probability 1. Now
define q k = [ k 81, - e k - 1 (( 19.8) and (19.9) are special cases). Because
of (19.12) again, the partial sums of the series in (19.13) are dominated by an integrable random variable, and so the sum can be interchanged with E[ * ll.Fk-1]: E[ekIl.Fk-l] = CEl E[&+iII.Fk-1]. Therefore,
E[qk11.Fk-1] = 0, and {qn} is a martingale difference.
To apply the results of the preceding section, we must show that
the [n have finite second moments. If
= E[&[I.Fo], then E[@] 5
C& E[IPil . [Pjl] I C;=, llPill - IIPjll, and this is finite by (19.3).
Therefore, T~ = E[$J is finite, and it is a consequence of Theorem 18.3
that n-lI2Mn = n-lI2 Cz=l qn + T N . But Mn = Sn + 8, - 00, and
--t 0. This means in the
since Ilenll = lleoll is finite, lln-lj2(Mn first place that n-1/2Sn + T N , and in the second place, that T = CT
because of (19.11) and the fact that the martingale property implies
lln-1/2Mn112= T ~ Therefore,
.
Sn/Ufi +N.
In most of the preceding sections, we have proved weak convergence
by verifying tightness and the convergence of the finite-dimensional
distributions. But here we can prove it directly by comparing X n with
the random function Y ndefined by yt" = M L n t j / a f i , and we need
only tighten the preceding argument and use the full force of Theorem
18.3. By Theorem 16.7, it is enough to show that, for each m,
+
Since E[@] < 00, we have CnP [ e i / n 2 E] = En P[ei 2 ne] < 00, and
hence, by the Borel-Cantelli lemma, Oz/n --+ 0 with probability 1. But
1
n k<mn
-
and (19.14) follows from this.
0
DEPENDENT
VARIABLES
200
Example 19.1. Suppose that (19.1) is rn-dependent. If E[&] = 0,
then IIE[(,1130]11 = 0 for n > rn, and so (19.3) holds. If 5, = Cn - Cn+1,
where the
are independent with mean 0 and variance 1, then {&}
is 1-dependent. In this case, E[@ = 2, but the a2 of (19.5) is 0.
0
<,
Exumple 19.2. Define an autoregressive process as a sum tn =
CEO@p"cn-i,
where \PI < 1 and the & are independent and identically distributed with mean 0 and variance 1. The a-field Fo is
generated by the & for k I 0 and hence also by the <k for k I 0.
Since (1.. . Cn are independent of Fo,E [ & ( ( F 0 ] = E r n @Cn-i and
IIE[&~~F
=o
P2,/(l
]~~~
- P2). Therefore, (19.3) holds, and since {&} is
ergodic [PM.495], Theorem 19.1 applies.
0
Uniform Mixing
Consider three measures of dependence between the a-fields
Fk and
&+, of (19.2) (t E Fk means 6 is Fk-measurable):
(19.15)
an = sup[(P(An B ) - PA PB(:A E Fk,B E &+n],
*
(19.16)
Pn = suP"~rl11:J
E Fk,E[t1 = 0, 1
1
1
t1I 1,
rl E G k S n , E[rll = 0, llrlll I 11,
(19.17)
cpn = sup[lP(BIA) - PBI: A E Fk,PA
(19.18)
an I Pn
> 0, B E B k + n ].
These are independent of k if {tn} is stationary. The idea is that,
if one or another of these quantities is small, then Fk and &+n are
"almost" independent. If limn an = 0, the sequence {&} is defined t o
be a-mixing, and the notions of p-mixing and cp-mixing are defined in
the same way.
The three measures of the rate of mixing are related by the inequalities [M23(46)]
I 2fin.
Theorem 19.2. Assume that (19.1) is stationary, that E[&] = 0
and the tn have second moments, and that
( 19.19)
2,Pn <
00.
T h e n the conclusions of Theorem 19.1 follow.
PROOF.First [M23(48)], IIE[&&Fo]ll I pnll(oll. Therefore, (19.19)
implies (19.3). We need only prove ergodicity, and for this it suffices
[M22] to show that the shift satisfies P(A n T n B )+ P A . P B for
cylinders A and B. But since an 5 pn, from A E Fi and B E Gj it
follows that (P(An T n B )- P A . PB( 5
0
<
- pj-i+n --t 0.
201
SECTION
19. ERGODIC
PROCESSES
{en}
en f(cn),
Example 19.3. Let
be a stationary Markov chain with finite
state space, and let
=
where f is a real function over the
states. Let p$) be the nth-order transition probabilities and let pv be
the stationary probabilities. Suppose that the chain is irreducible and
aperiodic, so that the p , are all positive and [PM.131]
(19.20)
I -Puv
-(n)
I
Pu
1 (<KOn, O < 1
for all u, TJ, n. Then for cylinders A =
B = [&+n+i = vi, 0 5 i m ] ,we have
<
(19.21)
[ck-i
= ui, 0 5 i
< 11 and
IP(A n B) - PA - PBI 5 KOn. PA. PB.
For fixed A, the class of B satisfying (19.21) is a A-system, and for
fixed B , the class of A satisfying (19.21) is also a A-system. Since the
class of cylinders A and the class of cylinders B are 7r-systems, (19.21)
holds for A E o[ci,i 5 k] and B E a[ci,i 2 k n]. Since these last
a-fields are larger than Fk and &+n, we have yn 5 KOn. And now
(19.19) follows by (19.18), so that Theorem 19.2 applies.
0
+
Functions of Mixing Processes
Let f be a measurable mapping from the space of doubly infinite sequences of real numbers to the real line: f(. . . ,z-1, zo, XI,.. .) E R1.
Let . . . q-1, qo, r]l . . . be a function of the process (19.1), in the sense
that
where tn occupies the 0th place in the argument o f f . (Although the qn
are real, the tn could now take values in a general measurable space.)
Let fk be a measurable map from the space of left-infinite sequences
to R': f ( . . . ,z-1, Xo) E R1.Put
(19.23)
rlkn =
fd
* *
,tn+k-l,<n+k).
We want to show that {qn} satisfies the conclusions of Theorem 19.1
if {&} is p-mixing and there exist functions fk for which
(19.24)
DEPENDENT
VARIABLES
202
Theorem 19.3. Suppose that (19.1) is stationary and Enpn < 00.
Suppose the qn of (19.22) have mean 0 and finite variance and that
the qkn of (19.23) have second moments and satisfy (19.24). T h e n
. . . ,rl-1, qo, ql,. . . satisfies the conclusions of Theorem 19.1.
The qn of (19.22) involves all the ti and hence extends its influence
into the future of the q-process. But qkn does not involve the for
a > n k, and hence its influence on the future is muted. But (19.24)
controls the influence of qn itself by ensuring that it is near qkn for
large k. There is an alternative version of the theorem: Reverse time
and replace (19.23) by q k n = fk(fn-k,(n-k+l,. . .).
+
PROOF.
If q has a second moment and a-fields M and N satisfy
M C h',then it follows by Jensen's inequality that JIE[vJJM]JJ2
=
E[(E[E[77lln/l llMl>21I E[E[(E[~11N~211M11
= llE[qIln/l[I2. Therefore,
(19.25)
l l ~ ~ ~ l lI
~ llEb?llNll?
lll
IIE[~IlM111I ll~ll,
where the second inequality follows from the first.
Denote the kth term in (19.24) by p k . Since q k 0 , as defined by
(19.23), is &-measurable, it follows by (19.25) that 11qko- E[qollFk]ll=
IIE[qko - qo113jc]ll I P k . Therefore, I(qo- E[qo11Fk]ll I 2 P k . This means
that if we take
(19.26)
qkn = E['%Il-%+k],
then (19.24) still ho1ds.t For each k ,
0 < k < n. We have
{ v k n } is
stationary. Suppose that
+ IIE~~kOllGnlII*
IIE[~01l~nlII
5 IIE[~011~nl
- E[~kollGnlll
By a second application of (19.25), the first term on the right is at
most 11qo -qkoll. Since E[qko] = E[qo] = 0 by the new definition (19.26)
of q k o , the second term on the right is at most [M23(48)] pn-k(lqkoll,
and by a third application of (19.25), JlqkoII 5 Ilqol(. This brings us to
IIE[qo11Gn]ll Illrlo--77/coII+pn-lc(1q011. Take k = [n/21. From (19.24) and
< 00 it now follows that En IIE[qollGn]ll< 00.
the assumption
A final application of (19.25): If ?in= a(q,, qn+l,. , .), then ?in c
Gn? and hence I I E [ ~ o l I ~ n lI
l l IIEb?ollGnlll. Therefore,
~ ~ ~ [ q o ~<
~?in]~~
00. Since {En} is ergodic (being pmixing), {qn}is also ergodic and
hence satisfies the hypotheses of Theorem 19.1 (see (19.7)).
0
cnpn
c,
t The point is that this new qkn is Fn+k-measurable,not that it has the specific
form (19.23) (although it does in fact have this form: see PM.186, Problem 13.3 and
the note).
203
SECTION
19. ERGODIC
PROCESSES
Example 19.4. Suppose that the [ n are independent and identically distributed with mean 0 and variance 1 and define
M
where we assume that
Cia:<
If for (19.23) we take
00.
k
then the requirement (19.24) becomes
f~2
k=l
< oo.
0
i=k+l
For the one-sided version of Theorem 19.3, suppose that
is stationary and define
[ I , (2,
...
and
Take F n = a(&,. . . ,[ n ) and Gn = a ( [ n , [ n + l , . . .), and modify (19.15),
(19.16), and (19.17) by inserting to the left of each supremum another
supremum extending over positive k. Again assume that Cnpn<
00, the q n have mean 0 and finite variance, and the qkn have second
Ilql-qkl 11 < 00. Then q1, q 2 , . . . satisfies
moments; and assume that
the conclusions of Theorem 19.3.
Ezample 19.5. Let P be Lebesgue measure on the unit interval,
and let [ l ( a : ) , & ( a : ) , . . . be the digits of the dyadic expansion of a:. This
is a stationary, independent process under P, and hence it is ergodic.
A random variable q n of the form (19.27) can be regarded as a function
q n ( 2 ) = f ( T n - b ) ,where f is a function on the unit interval and T
is the transformation Ta: = 2z(mod 1). Suppose that f is square- fk(z))2dZ
integrable and that XI,,& < 00, where @ =
and each fk depends on a: only through the first k digits of its dyadic
~i(f(a:)
DEPENDENT
VARIABLES
204
expansion. It then follows by Theorem 19.3 (one-sided version) that,
if p = $' f(z)dz, then
(19.29)
for the appropriate asymptotic variance a2. And there is the corresponding functional limit theorem. If there exist such functions fk at
all, they may be taken to be
since then fk(.&, . , . &) = E [ q 1 [ 1 3 k ] .
If f = I p t ) and fk is defined by (19.30), then p k 5 2-k, so that
(19.29) holds ( p = t ) . If t is a dyadic rational, then f = fk for some lc;
but if t is not dyadic, then f(z)involves the entire expansion of z.
If f is continuous and fk is defined by (19.30), then @ 5 w;(2-",
where wf is the modulus of continuity. Therefore, (19.29) holds if
Ckwf(2-') < 00, a condition which follows if f satisfies a uniform
Holder condition of some positive order: If(z) - f(z')l5 Klz - z 1I e ,
8 > 0. For example, we can take f(z)= z. In this last case, {qn}
is not 9-mixing (or even a-mixing) at all-to know Tnz is to know
T~+'z,
~ ~ + ~
. exactly.
z . .
0
\
Diophantine Approximation
Suppose that R consists of the irrationals in [ 0,1], F consists of the
linear Bore1 sets contained in R, and P is Gauss's measure:
(19.31)
Let T be the continued-fraction transformation: T z = z-' - [z-'J.
And let u k ( z ) be the lcth partial quotient in the continued-fraction
expansion of z, so that
(u1,u2,.. .) will play the role of the (<I,&,
. . .) in Theorem 19.3.
SECTION
19. ERGODIC
PROCESSES
205
We need several facts about continued fractions [PM.319-3241. The
n t h convergent to x is
(19.33)
where the fraction is in lowest terms, and
firther,
(19.35)
and
(19.36)
Finally,
(19.37)
The basic fact [M24] is that {a,} is cp-mixing, where
(19.38)
cpn 5 KOn, O < 1.
En
En
Since
cp1I2 < 00, we have
Pn < 00, as required in Theorem 19.3.
Let qn(x) = - logTn-lx - p, where p = 7r2/1210g2; this in effect has
the form (19.27) (& = ai). And take
qkn(x) = - 1%
Then, by (19.35),
D [0 , 1 ] defined by
[
11q1 - qklll
+ +1* *
-
< 00. If X n is the random element of
1
xp=->qk
(19.39)
’
ufi
lntl
k=l
for the appropriate u,t then X n + W .
By (19.36),
qk - (log qn - n p ) , is bounded, from which it follows that (logq, - n p ) / a f i + N . This leads to theorems connected
with Diophantine approximation. A fraction p/q is a best approximation to x if it minimizes the form Iq’ . x - p’l over fractions p‘/q‘ with
denominators q‘ not exceeding q. The successive best approximations
to x are* just the convergents pn(x)/qn(x), and so the value of the form
cizi
t Which is positive; see Remark 5 on p. 482 of Gordin’s paper [32].
*
See, for example, Khinchine [42] or Rockett & Sziisz [58].
DEPENDENT
VARIABLES
206
for the nth in the series of best approximations is
d n ( 4 = Ian(.>
Since
1 - logd,(x)
*
- Pn(4l.
- logq,+l(x)l 5 log2 by (19.34), we arrive at
1
(- logdn(x) - nr2 ) + N .
a f i
12 log 2
(19.40)
There is the corresponding functional limit theorem. If we define
Znby
then 2” + W in the sense of D[O,11. By (19.40), k-llogdk =+k - p ,
so that the discrepency &(z) has normal order e - k p . Call the kth best
approximation p k (x)/ q k (x) “superior” if
dk(4 < e
-k~~/1210g2
and “inferior” otherwise. If T ~ ( z is) the fraction of superior ones among
the convergents
Pn(4
Pl(4
41(4 ’. * ’ ’ Qn(4 ’
then, by (9.28),
(19.42)
P[T,
2
n-
5 u] + -arcsin&
O
< u < 1.
It is possible to prove Theorem 14.2 for X” and Zn as defined by
(19.39) and (19,41). To prove that X n + W still holds if P (defined by
(19.22)) is replaced by a probability measure Po absolutely continuous
with respect to it, define X n by (14.3) with qi in place of ti. If E lies
in Fk,then
IP([Xn E A] n E ) - P[Xn E A] PE( <_
*
(pp,-k
Uk
+
,
0,
and so (14.7) holds just as before. Since 7-l =
Fk is a field and each
[X”E A] lies in a(‘FI),we have [M21] P,[Xn E A] + W(A). The rest
of the proof is the same as that for Theorem 14.2. And Zn is treated
the same way. It follows, for example, that the P in (19.42) can be
taken to be Lebesgue measure instead of Gauss’s measure.
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
CHAPTER 5.
OTHER MODES OF CONVERGENCE
SECTION 20. CONVERGENCE IN PROBABILITY
There are four major modes of convergence for random variables:
1": Convergence in distribution.
2": Convergence in probability.
3": Convergence with probability 1 (or almost-sure convergence).
40:Convergence in L2.
Neither of 3" and 4" implies the other, but each implies 2", and
2" implies 1". Each of these concepts extends to random elements of
other metric spaces; 4" for the space C would be E[llX" - Yn1I2]-+ 0,
where X n and Y nare random functions defined on the same probability
space. And the relations between the four carry over as well. That 2"
implies lo, for example, is the corollary to Theorem 3.1.
Donsker's theorem and most of the other results of the preceding
chapters concern convergence in distribution. Instances of convergence
in distribution, even for random variables, often cannot be strengthened as they stand to convergence in probabi1ity.t But sometimes, if
rl, + ( cannot be replaced by convergence in probability, it is possible
to prove (on a new probability space) that qn - tn+ 0, where each
has the distribution o f t . In this section and in the next, Donsker's
theorem will be strengthened in this way. And Strassen's theorem of
Section 22 concerns the convergence with probability 1, in C , of the
random paths ( 8 . 5 ) with new norming constants; it is a greatly generalized version of the law of the iterated logarithm.
In connection with the law of large numbers, 2" is called the "weak"
law, and 3" is called the "strong" law. For convergence of random
functions, the terminology in the literature is sometimes confusing.
A theorem of type 1" concerns "weak convergence" as defined at the
en
t Problem 3.8.
207
208
OTHER
MODESOF
CONVERGENCE
beginning of this book, the invidious “weak” being an inheritance from
functional analysis. A theorem of type 2” for random functions is often,
for contrast, called “strong,)) a term best reserved for theorems of type
3”. Probably the simplest way out is to use the phrases in 1”through
4”,either as they stand or converted into adjectives.
As explained at the end of Section 8, the term “invariance principle”
(IP) was first applied to Donsker’s theorem (and its earlier forms) to
indicate that various limit distributions are invariant under changes in
the distribution common to the summands ti (provided the mean and
variance do not change). But now “invariance principle” has become a
catchall phrase applied to a large miscellany of approximation theorems
in probability; there are IPS in distribution, IPS in probability, almostsure IPS, and L2-IPs.
Section 22 depends on Theorem 20.1 but not on anything else in
Sections 20 and 21.
A Convergence-in-Probability Version of Donsker’s Theorem
Let ,<2, . . . be independent and identically distributed random variables on (a,3,P); suppose they have mean 0 and variance 1. Let
Sk = CiSk
ti, and let X” be the random polygonal function (8.5) (for
Q = 1): XGn = &I&, and X” is linear between the points iln, i 5 n.
According to Donsker’s theorem, X” + W in the sense of C = C [0,1].
By Skorohod’s representation theorem-Theorem 6.7-there exist on
of
some (new) common probability space random elements X” and
C such that L ( X n ) = L ( X n ) ,L ( w )= L(W),and X n ( w )+ w ( w ) for
each w. But in this construction, the relations between the various X”
are not carried over to the X”:L ( X n , X ” ) has no connection with
L(X”,X”), for example.
Here we use an entirely different construction, also due to Skorohod.
Let (a’,F’,
P’) be a space on which is defined a standard Brownian
motion [B(t):O5 t < 001: The increments are independent, the paths
are continuous, and B ( t ) is normally distributed with mean 0 and
variance t. (We reserve W for Brownian motion as a random element
of C [0,1]). This is Skorohod’s theorem [PM.519]:
w
Theorem 20.1. O n (a’,3’,P’) there exists a nondecreasing sequence TO = O , q , 7 2 , . . . of random variables such that the differences
<n = B(Tn)- B(7n-1) are independent and have the distribution com-
SECTION
20. CONVERGENCE IN PROBABILITY
209
m o n t o the Jn (above) and the differences Tn - ~ " - 1 are independent
and identically distributed with m e a n 1.t
Let T k = Cilk<i= B ( T n ) , and define a random element Y" of
C by linear interpolation between the values yZyn = T i / f i at the
points i/n. Then the system {[i, T k , Y " }on R' is an exact probabilistic
replica of the system {ti,S k , X " } on R. Define Bn(t)= B ( n t ) / f ifor
0 5 t 5 1. Then Bn is a random element of C , and it has there the
same distribution as W (check the means and variances). Let 11 . 11 be
the supremum norm on C: 112 - 311 = sup, Iz(t)- y ( t ) I .
Theorem 20.2. With these definitions and assumptions, we have
llYn- Brill
(20.1)
* 0.
Since L ( B n )= L ( W )and L(Y")= L ( X n ) ,it follows by the corollary to Theorem 3.1 that (20.1) implies Donsker's theorem: X n jW .
No weak-convergence theory is required for the statement and proof of
Theorem 20.2 itself.
PROOF.Define a random element 2" of C by linear interpolation
between the values Z&, = B"(i/n) at the i/n. A summary of the
notation:
These random functions are linear on the subintervals [(i - l)/n, i / n ] .
We prove (20.1) in two steps:
llB"
(20.3)
-
Znll rj 0
and
First [M25(58)]
(20.5)
t If the &
P'[supsg IB(s)l 2 a] 5 4e-a2/2t7
had variance u 2 ,the
T~
-~
~
a 2 0.
would
1 have mean 0'.
210
OTHER
MODES OF
CONVERGENCE
Since Z"( is Bn(.
) made linear between the i/n, llBn - Znll is at
most the maximum over i 5 n of the supremum over t 5 1 / n of
21Bn(t + i / n ) - Bn(i/n)I . Since the increments of Brownian motion
are stationary, (20.5) gives
a )
2 €1 I nP'[2suptsl/n IBn(t)IL €1
= nP'[supssl ~ ~ ( s2>en1/2/2~
l
5 4ne-Ean/8+" 0.
(20.6) P'[llB"-Pll
Hence (20.3).
Becaue of the polygonal form of Y nand Z",
(20.7)
llyn-
= n-1/2 maxisn I ~ ( r i-) ~ ( i ) l
= m a i l n IB"(ri/n) - Bn(i/n)I.
Since the differences ri - ri-1 are independent and identically distributed with mean 1, it follows by the strong law of large numbers
that ri/i +i 1 with PI-probability 1. This means that ri/n is near i/n
(i large), and we can use path continuity to show that Bn(ri/n) is near
Bn( i / n ).
Let E be given. If we define Bn(t)as B ( n t ) / f i for all t 2 0 (rather
than for 0 I t 5 1only), then it is another standard Brownian motion,
and so its paths are uniformly continuous over [ 0,2]. Therefore, there
is a 6 small enough that
(20.8)
P'bPsg,lt-,Is6
IW)- Bn(s)l 2 €1 < E .
This holds for every n. Since r i / i +i 1 with probability 1, there is an
m such that
(20.9)
P' max
Li5"
I n- ni I-> 61 <- P
Ti
I
I 1
Ti
sup 7
- 1 >6
2
<E.
Fix the pair 6,m. If IBn(t)- Bn(s)I < E for s 5 1 and It - sI I 6,and
if 1ri/n - i/nl < 6 for m I i 5 n, then IBn(ri/n)- Bn(i/n)I < E for
m I i 5 n. Therefore, by (20.8) and (20.9),
This holds for all n. But obviously there is an no such that
And now (20.4) follows from (20.7), (20.10), and (20.11).
SECTION
211
21. APPROXIMATION
BY NORMAL
SEQUENCES
SECTION 21. APPROXIMATION BY INDEPENDENT
NORMAL SEQUENCES
The approximation theorem is this:
Theorem 21.1. Let {ti} be an independent, identically distributed
P), and assume that E[&] = 0
sequence of random variables o n (Q3,
and E[<:] = 1. Suppose there is o n (Cl,3,P) a random variable U that
is independent of {ti} and is uniformly distrubuted over [0,1]. Then
there exists o n ( Q F P)
, a sequence {qi} of standard normal random
variables such that
(21.1)
For an example, take the probability space to be the unit square
with Lebesgue measure, and define U ( w 1 , w ~=
) w2 and & ( ~ 1 , w 2=
)
ri(w1), where the ri are the Rademacher functions. Like Theorem 20.2,
this theorem implies that of Donsker.
It is impossible to strengthen convergence in probability to convergence with probability 1 in ( 2 l . l ) , t although this is possible if the
norming factor f i is replaced by d
m (see (22.25) in the next
section). The sequences {ti} and {vi} cannot be independent of one
another, since this would contradict the central limit theorem theorem
for {ti - qi}.
We give two proofs of this theorem, and each of them requires some
preliminary definitions and results. Let S and T be metric spaces with
Borel 0-fields S and 7. Call S and T Borel isomorphic, and write
S T, if there is a one-to-one map cp of S onto T such that cp is
measurable S / 7 and cp-' is measurable 7 / S . We need the following
isomorphism relations:
N
R'
(21.2)
-
(0,l)
N
[ 0,1]
-
[ 0 , 1]0°
-
Rm.
The first one is easy: There are many continuous, increasing maps
cp of R onto (0,l). For the second, let zn be points of (0, that
increase to
let yn be points of
1) that decrease to $, let cp map
x1 0, xn+l + zn,y1 + 1, yn+l + yn, and take cp to be the identity
elsewhere.
For the third isomorphism, we construct a map cp: 10° + I , where
I = [ 0,1]. Associate with each point of I a unique binary expansion:
i,
+
t Major [45].
(i,
i)
212
OTHER
MODESOF
CONVERGENCE
Take 0 = -00 . - and 1= .11. -,and use the nonterminating expansion
(say) for points of (0,l). Let n/ = {1,2,. . .}, take f to be some oneto-one map of N onto N x N , and let g = f - l . For y E I write
y = . y 1 y 2 - - ' , and for x = (x1,x2,...) E 1" write xi = . x i l x i 2 - . . .
Define a one-to-one map cp of 1O0 onto I by requiring of y = p(x) that
yk = x ~ f ( hfor
) all k: Intertwine the expansions of the coordinates of
x to arrive at the expansion of (~(5).
Then cp-'(y) = x if and only if
xij = yg(ij) for all i , j . Let Z be the Borel a-field in I . If J is the set
of points of I whose expansions start with the digits d l , . . . ,dm, then
cp-'J is the set of x such that zcf(k) = dk for 1 5 k 5 m, a set that lies
in Z"O. Since these sets J generate Z, cp is measurable P / Z .On the
other hand, if Ji is the set of points of I whose expansions start with
the digits dil,. . . ,dimi, then cp( J1 x * x Ji x I x - consists of those
y such that yglg(ij)= dij for 1 5 i 5 1 and 1 5 j 5 mi. This set lies in
Z, and it follows that 9-l is measurable Z/Z"".
To construct the final isomorphism, take an isomorphism from I
onto R1 and apply it to each coordinate of the points in Iw.
In the following lemma, R, S, and T can be any metric spaces that
are Borel isomorphic to [ 0,1]. In fact, every uncountable, separable,
complete metric space is Borel isomorphic to [ 0,1],t but we have proved
this result-and need it-only for the spaces in (21.2). In Lemma 2
below, p is a random element on ($2, F,P) with values in R. In both
P), with
lemmas, a , T , and u are random elements on the same (Q,.F,
values in S, T , [ 0,1], respectively, u is a probability measure on S x T
with marginal measure p on S, and L(a) = p:
a
-
p E R, u E S,
u E [0, I], /L( * ) = V( *
a )
X
T ) , L(u) = /L;
T
E T.
Lemma 1. Assume that S and T are Borel isomorphic to [ 0,1],
that L(a)= p, that u is uniformly distributed over [ 0,1], and that u
and u are independent. Then there is o n (Q,3,
P) a random element
a function of (CY,u ) , such that L(a,T ) = u.
T,
Lemma 2. Assume that R, S, T are Borel isomorphic t o [ 0,1],
that C(a) = p, that u is uniformly distributed over [ 0,1], and that p, u,
and u are independent. Then there is o n ( Q F ,
P) a random element
r , a function of ( c r , ~ ) such
,
that L(u,T)
= u and p is independent of
4.
(0,
t Parthasarathy [48], p. 14.
SECTION
21. APPROXIMATION
BY NORMALSEQUENCES
213
PROOFS.We prove the second lemma first. Assume the result
holds in the special case R = S = T = [ 0 , I ] = I , and let ( P R ,ps, (PT
be isomorphisms of R, S , T onto I . Let ~ S T ( St ,) = ( p ~ ( spT(t)),
),
so that ST is an isomorphism of S x T onto I x I; take v’ = vp&,
p‘ = pp;’,
p’ = p ~ p and
,
a‘ = psa. Then L(a’) = p‘ (the first
marginal of Y’),and p’, a’,u are independent. By the special case,
there is a random element T‘ of I , a function of (a’,u), such that
L(a’,7’) = v’ and p‘ is independent of (a’,
u ) . If T = pT17’,then T is a
function of (a,
u),L(a,T ) = v and p‘ is independent of (a,
u).The point
is that isomorphic spaces are measure-theoretically indistinguishable,
and the lemma has to do with measure theory only (no topology).
In the special case, there is for v a conditional probability distribution over the second component of I x I given the first. That
is, there exists a function p ( s , B ) , defined for s E I and B E 1,
such that, for fixed s, p ( s , - ) is a probability measure on 1,and for
A , B E 1,p ( , B ) is 1-measurable and v ( A x B ) = J,p(s, B ) p ( d s ) .
Define a distribution function F ( s , by F ( s , z ) = p ( s , [ 0 , z ] ) ,and let
4(s,
be the corresponding inverse, or quantile function: 4(s, t ) =
inf[z: t 5 F ( s , z ) ] ,so that 4(s, t ) 5 z if and only if t 5 F ( s , z ) .
Then [(s,t ) :4(s, t ) 5 z ] = n r ( [ s :T 5 F ( s , z ) ] x [T, l ] ) ,where the intersection extends over rational T . Thus 4 is measurable 1 x 1,and
~(w)
=:q5(a(w),u(w))is a random variable. If X is Lebesgue measure
on I , then X [ t : + ( s , t ) 5 21 = X [ t : t 5 F ( s , z ) ] = F ( s , z ) = p ( s , [ O , z ] ) ,
and it follows that X [ t : 4(s, t ) E B] = p ( s , B ) for B E 1.Since L ( a ) = p
and L ( u ) = A, and since t~ and u are independent, L ( a , u ) = p x A.
Therefore,
a
a
)
)
p ( s , B ) p ( d s )=
= ( p x X)[(S,
X [ t : 4(s,
t ) E B]p(ds)
t ) :s E A , 4 ( ~t ), E B]
= P[a E A , $ ( a , u )E B] = P[o E A , T E B] :
(a,
T ) has distribtion v.
In Lemma 2 , p is assumed independent of ( a , ~and
) , so it is inde4(a,u))= (a,
T ) as well.
pendent of (a,
For the proof of Lemma 1 , simply remove from the preceding argument all reference to R and p (or introduce a dummy space R and
take p T for some T in R).
0
=
FIRSTPROOFOF THEOREM
21.1. The first proof depends on
Theorem 20.2. Changing the notation there, write for (i and 7; for
214
OTHER
MODESOF
CONVERGENCE
B ( i )- B(i - 1); these are defined on the space 0’. Then L({C})=
L({&}), $C = B ( T ~and
) , the qi are independent standard normal
variables. And by (20.4), (21.1) holds if we replace & and qi (variables
on n) by and q: (variables on a’).
But we can use Lemma 1 to pull everything back to a. Take
S = T = Rw and v = L({<i},{qj}),
the first marginal of which is
L({{a})= L({&}). Take Q = {ti} on (n,.F,P), and use U in the
role of u. By the lemma, there is on ( Q F P)
, a 7 = { q i } such that
L({&},{qi}) = L({<i},(7:)). And this {qi} is exactly the sequence we
0
want.
This first proof depends on Theorem 20.1. Our second proof is
intricate but has the advantage that it avoids the Skorohod representation theorem.+
SECOND PROOF OF
THEOREM
21.1. Suppose E given. Define
Next define
(21.4)
Let Sj =
that
(21.5)
By the central limit theorem, there is a ~ o ( E ) such
.(L(j-%j),
L ( N ) )< 2 ,
j
> jO(E),
T is the Prohorov distance. And since nk + 00, it follows that
.(L(&),L(N)) 5 8 for k exceeding some kO(E). By Theorem 6.9,
there exists a probability measure v k on R1 x R1 that has first and
second marginal measures L(&) and L ( N ) ,respectively, and satisfies
where
It is possible [PM.265] to define independent random variables
270,U1, Uz,. . ., each uniformly distributed over [ 0,1] and each a func-
tion of U ,so that, by the hypothesis of the theorem, {Ui}is independent of {ti} and hence of { X i } . The next step is to construct on
t Which does not extend in a simple way to vector-valued processes, for example;
see Monrad & Philipp [46].
SECTION
21. APPROXIMATION
BY NORMAL
SEQUENCES
215
(0,3,P ) an independent sequence Y1,Y2,.. . of standard normal variables, each independent of Uo, such that
(21.7)
P[lXk - Y k l >
< c6,
k > ~o(E).
First apply Lemma 1 for S = T = R1,u = X I , u = U1, and v = v1.
Since X 1 and U1 are independent, there is a random variable T =
Y l 1a function of ( X l , U l ) , such that L ( X l , Y 1 ) = vl. Since L(Y1)
is the second marginal of v1, namely L ( N ) ,Yl is a standard normal
, and hence of ({ti},U l ) , it
variable. Since Y1 is a function of ( X I U1)
is independent of Uo.
Suppose that Y1,.. . ,Y k have been defined: They are independent,
standard normal variables, Y ,is a function of ( X i ,Ui),and L ( X i ,Y,)=
ui (i 5 k). Apply Lemma 2 for R = Rkl S = T = R1, p =
(Y1,.. . , Y k ) , u = Xk+l, u = Uk+l, and v = vk+l. Since p is a function
of ( X I , . . ,X k ,U1,. . . , Uk),the three random elements p, u,u are independent. Therefore, there is a random variable T = Yk+1, a function
of ( X k + l ,Uk+l), such that .C(Xk+l,Y k + d = Vk+l, and ( Y l , . 7 Yk)and
( X k + l ,Yk+1) are independent. Again Yk+1 is a standard normal variable: L(Yk+1)is L ( N ) ,the second marginal of vk+l. And Yk+1, being a
function of ({ti},Uk+l), is independent of Uo. This gives the sequence
we want: (21.7) holds because of (21.6).
Next let ( I , & , . . . be an independent sequence of standard normal
variables on some (new) probability space and set
(21.8)
Apply Lemma 1 again, this time with S = T = R M ,u = (Y1,Y2,.. .),
u = Uo, and v = L({Wk},
{Ci}). Since u and 2~ are independent, there
exists (on the original (al3,P)) a random element T = (71,772,. . .) of
R"O such that L({Yk},{7i})
= v. This means in the first place that
the qi (like the ci) are independent standard normal variables, and in
the second place that the joint law of { Y k }and {qi}is the same as the
joint law of { W k } and {Ci}. But then (21.8) implies that
(21.9)
holds with probability 1 on (al3,P ) .
To sum up: We have constructed independent standard normal
variables qi such that the Y k given by (21.9) satisfy (21.7). Write
216
OTHER
MODES OF
CONVERGENCE
Sj = C a.Ijti and Tj = CiSjqi. To prove (21.1) is to prove that there
exists an absolute constant K with the property that for each E there
is an no(€) such that
(21.10)
P[n-1/2maxj5n
IS^ - ~
j
2l €1 5 K E ,
n
> no(€).
The following argument does prove (21.10), but it leaves a major
problem: The sequence {qi} we have constructed depends on E itself. Nonetheless, we proceed to prove (21.10), leaving to the end the
resolution of this problem.
Define M , s, m (functions of n and E) by
tM-1 2 n
(21.11)
< t ~ s, = [ E - ~ ] ,
m = M - s.
By the definitions (21.3), for large n we have n
t ~ / 4 and
,
so it
is enough to find an upper estimate for P[maxjS, ISj - T'l 2 7 e t z ] .
Define
Then
And for m < k 5 M , each term in this last maximum is at most
M-1
112
ni IXi-Y,l. Therefore, it will be enough to bound the terms in
=I
+ I1 + I11 + IV + v.
SECTION
217
21. APPROXIMATION
BY NORMAL
SEQUENCES
We estimate the terms separately. In several places we use the fact
that 1x1 2 x/2 if x 2 2. We can use Kolmogorov's inequality on the
first two terms. By (21.3) and (21.11),
+
(21.13) I < ~-~t,$Var[St,] 5 2 ~ - ~ ( 1e4)-'
5 2E-2 exp
<a
--E-5
1
1og(1+ e4)) 5 2e-2 exp( -ze-l)
<K ~ E ,
where K1 is a constant large enough that the last inequality holds for
0 < E < 1. This and the same argument for Tt, give
I < KlE,
(21.14)
I1 < KlE.
By Etemadi's inequality [M19],
The number of summands here is M - m = s 5 E - ~ . To estimate
111, consider separately the terms for i 5 n1l2 and i > n1/2. By
Chebyshev's inequality, the terms for i 5 n1/2 contribute to I11 at
most (use (21.11))
(21.16)
3 ~ - ~ ( ~ t z / 3 ) =- 2~7n~ -~~ /t ;~n l / ~< 2 7 ~ n--'I2
~ < E,
where the last inequality holds for n sufficiently large.
If k < M and i < nk, then i < n M = tM+1- t M and hence
-<
tM
tM+l-tM
tM
5 2
+
(1
E4)M+1
-
+
(1
(1
+
E4)M
E4)M
+1
2E4
+
2
(1
+ E4)M'
Since M goes to infinity along with n, i/tM < 4e4 for large n. But
also, for large n, i > n1/2 implies that i exceeds the k O ( E ) of (21.7),
and (if e7 < 1/12) it follows that [M25(55)]
218
OTHER
MODES OF
CONVERGENCE
where K2 is a constant large enough that the last inequality holds for
0 < E < 1. Since the number of summands in (21.15) is at most E - ~ ,
the terms for i > nm contribute to I11 at most ~ E - ~ =K3 K
~ 2E~ . ~
Therefore
And IV satisfies the same inequality.
To estimate V, note that, by Schwarz's inequality,
It follows by (21.7) that
Putting the five estimates into (21.12) leads to (21.10) with K =
2 K 1 + 6K2 + 3 . We must still find a single sequence {qi} that works
for every €-satisfies (21.10) for every value of E .
There exists a (new and eventually irrelevant) probability space on
i,p 2 1 , and U b ) ,p 2 1, such
which are defined random variables
that all are independent of one another, each
is distributed like the
original ti,and each U(P)is uniformly distributed over [ 0 , 1 ] . Let SF) =
&k
@I.
The argument leading to (21.10) shows that for each p there
(VIP),
q?), . . .)
exists on this probability space an independent sequence
of standard normal variables, a function of (U(P), (PI ,t2
(PI , . . .), such
that, if T t ) = &k
(21.19)
$),
P[maxk<,In
then there is an no(p) such that
-1/2s(p)
- n-l/2T(P)I 2
k
PI I 2 - ~ ,
for n 2 no(p).
By the construction, the sets {&'),$'):
i = 1 , 2 , . . .} for different values of p are independent of one another (but SF) and TF' are not
independent).
Put T ( V ) = CP5,n o ( p ) , and define
(21.20)
c = &\(,), rli = Tl&r(p)?
I
(PI
for
T(P)
< 2 I r(p+ 1).
SECTION
21. APPROXIMATION
BY NORMALSEQUENCES
219
Then {ti}is an independent sequence of random variables, each having
the distribution of the original ti,and { r ] : } is an independent sequence
of standard normal variables. Define S; = CiSk
ti and TL =
7;.
Given E , choose po so that 2-P0 < E . And choose n* large enough that
P[n-ll2maxb<r(po)IS; - T ~2 I€1 5 E for n 2 n*.
Suppose that n > n* V r ( p o ) , and choose u so that r(w) < n 5
r ( u 1). Write
(21.21)
+
If k 5 ~ ( p o )then
,
+
< &$PO)
0 -
+ 1) and q 2 po, then
c
If r ( q ) < k 5 r(q
v k
c
VrT$+')+Vk
r(9) <
- Ml(Po)+
M$)+')+M&,.
PO<P<9
PO<P<9
And now, since r(v) < n 5 r ( u 1) (which, together with n > ~ ( p o ) ,
implies u 2 PO),
vk
0 -< vr(P0)
0
+
M$ L
Ml(po)+
c
r(p+l)
Mr(p)
PO<P<V
By (21.20), if ~ ( p<) k 5 r ( p + l ) ,then
+ M&).
Put all this together:
M g : = max ln-1/2(Si- q')l 5 max In-1/2(Si- T,)I
a<n
ilr(po)
+
By (21.21), P[A 2 E] 5 E. If po 5 p < u,then n > r ( u ) 2 no(p l ) ,
and hence, by (21.19), P[Bp 2 2-P] 5 2-P. And n > r(w) 2 no(u),
so that, by (21.19) again, P[C 2 2-"1 5 2-'. It follows that P[M$ 2
2 4 I+ Cpo<p<v
2 - p < 2E.
Thus (7:) stands in the desired relation to {ti}.What we want,
however, is a sequence {qi} on the original ( Q F , P ) that stands in this
relation to the original sequence {ti}. But now we can apply Lemma
1 again, just as in the first proof of the theorem.
0
f
220
OTHER
MODESOF
CONVERGENCE
SECTION 22. STRASSEN’S THEOREM
The Theorem
Let &, (2, . . . be an independent and identically distributed sequence of
random variables on (0,F,P); suppose the (i have mean 0 and variance
1. Let Sn = Cis, (So = 0), and let Xn be the random element on
0, with values in C = C[O,
11, defined by
for 0 5 t 5 1. This is the random function (8.5) of Donsker’s theorem,
but with a very different norming constant. (We consider (22.1) only
for n 2 3, so that the square root is well defined.) Strassen’s theorem
has to do not with the asymptotic distribution of Xn (the approximate
behavior of X n ( w )for fixed large n and w varying over 0), but with
the properties of the entire sequence
(22.2)
( X 3 ( w )X
, 4 ( 4* * .)
for each individual w in a set of probability 1.
Consider elements x of C that are absolutely continuous in the
sense of having a derivative x‘ outside a set of Lebesgue measure 0, a
derivative that is integrable and in fact integrates back to x, so that
x ( t )- x(0)= x:’(s)ds for 0 5 t 5 1. Let K be the set of absolutely
continuous functions x in C for which x ( 0 ) = 0 and the associated X I
satisfies
(22.3)
Lemma 1. The set K is compact.
PROOF.
If x E K and s 5 t , then by Schwarz’s inequality,
It follows by the Arzelh-Ascoli theorem that K is relatively compact.
But K is also closed: Suppose that points x, of K converge to a point
SECTION
22. STRASSEN’S
THEOREM
221
z of C. Then there exists [PM.246] a sequence {ni}of integers and a
function 2‘ on [ 0 , 1 ] such that J;(~’(s))~ds5 1 and zi,(s)z(s) ds +i
Jt
Ji
z’(s)z(s) ds for all square-integrable z on [ 0,1]. (This is the weak
compactness of the unit ball in L 2 . ) But then we have z ( t )- z(0) =
t
1imi(znj(t)- zni(o>)
= limi Jozii(s)
cis =
z’(s) ds.
0
Jl
It is not hard to see that K is convex as well as compact. But
since the set is nowhere dense (Example 1.3), it bears no resemblence
to, say, a closed disk in the plane-it is more like a closed line-segment
in the plane. Strassen’s theorem:
Theorem 22.1. With probability 1, the sequence (22.2) is relatively compact and the set of its limit points coincides with K .
In this section, to say that a property holds with probability 1
means that, if E is the set of w having the property, then there is an
F-set F such that F c E and P F = 1. If (0,F ,P) is complete, it
follows that E E F and P E = 1. Theorem 22.1 implies the ordinary
law of the iterated logarithm, and this and other consequences are
discussed at the end of the section.
Preliminaries on Brownian Motion
To simplify the notation, write
(22.5)
L, = &Gii&&
Let [B(t):
t 2 01 be a standard Brownian motion on some (R, F,P),
define a random element Bn of C by
(22.6)
B?(w) = L,’B,t(w),
0 5 t 5 1,
and consider the sequence
(22.7)
( B 3 ( w )B
, 4 ( w ) ., . .).
Before proving Theorem 22.1 for the Xn,we prove the analogous result
for the Bn.
Theorem 22.2. With probability 1, the sequence (22.7) is relatively compact and the set of its limit points coincides with K .
For w in a set of probability 1, the path B( . ,w ) is nowhere differentiable [PM.505], and for no such w can B n ( w ) lie in K ; the reason
222
OTHER
MODESOF
CONVERGENCE
why subsequences of (22.7) can approach the points of K is that K is
nowhere dense.
PROOF.
The proof is in two parts. In the first we show that, with
probability 1, (22.7) is relatively compact and all its limit points lie in
K . In the second part we show that each point of K is such a limit
point.
First part. It is enough to show that, with probability 1, for each
(rational) E we have Bm(w) E K Efor all sufficiently large n. For then
there is probability 1 that dist(Bn(w),K)-, 0, and there are points
yn(w)of K such that llB”(w) - yn(w)ll --t 0, which implies that (22.7),
like { y n ( w ) } ,is relatively compact and all its limit points lie in K .
Given E , fix a large positive integer m and a real number T just
greater than 1, each to be specified later. We have
m
(22.8) P[Bn $! K‘] L P [ m x ( B n ( i ) - B n (2-1
m ) ) ’ 2 r2]
i=l
Let x& have the X2-distribution with m degrees of freedom. Since
f i ( B n ( i / m-) B n ( ( i- l)/m)) has variance n L i 2 = 1/(2loglogn),
(22.9)
I = P[& 2 2r210glogn]
-
t(m/2)-le-t/2dt
as n ---t 00 (l’H6pital).
Let Zn be the random function obtained by linear interpolation
between its values Z n ( i / m )= Bn(i/m)at the points i / m , 0 5 i 5
m. The paths of Zn are absolutely continuous, and the derivative is
(Zn)’(t)= m(Bn(i/m)
- B n ( ( i- l)/m)) for t in ((i - l ) / m , i/m). The
second condition in I1 is exactly the requirement that s,’((Zn)’(t))2dt<
r 2 ,which implies r-lZn E K . Therefore,
I1 5 P[r-lZn E K , Bn $2 K‘] 5 P[r-lZn E K , (Ir-lZn - Brill 2
+
El.
Now Ilr-lZn - BnJIL Ilr-lZn - Znll llZn and the first term
n~
by~ (22.4)
,
at most r - 1 if r-lZn E K .
on the right, ( r - l ) ~ ~ r - l Z is
SECTION
223
22. STRASSEN7S THEOREM
Choose T close enough to 1 that
T -
I1 5 P[llzn -
1 < r/2. Then
2 €/2].
Since
and the suprema here all have the same distribution, it follows by the
definition of Zn that
The T of (22.8) has already been chosen so that 0 < T - 1 < r/2.
Choose m large enough and then choose c close enough to 1 that
(22.12)
c2m/16 > 1, 1 < c < r 2 , c < r2m/16.
By (22.8), (22.9), and (22.11), we have, for all sufficiently large n,
P [ B ~@ ~
(22.13)
Take nk =
1.".
€ 515me-c10g10gn.
Then nk 2 ck-' for large k, and
and therefore CkP[Bnk @ K'] < 00. By the Borel-Cantelli lemma,
there is probability 1 that B n k E K' for all large enough k. This is
what we want to prove, but we must also account for n's not of the
form nk. To do this, we must control the size of
224
OTHER
MODESOF
CONVERGENCE
and
For large k, nk-1 2 c ~ - so
~ ,that nk-1 5 n 5 nk implies t 5
nkt/n Inktlnk-1 I c2t. Thus
where
[
14exp --cE 2
4
Therefore,
k-3
1
P[A', L c] < 00. Also, for large k [M25(57)],
.
SECTION
22. STRASSEN'S
THEOREM
Since
xi"=,
c(i-1)/2
225
= ( c ~/ 1~ ) / ( d 2 - l ) ,we have
(22.17)
where a is a constant. Now c was chosen to satisfy (22.12); if we
move it closer to 1, then the factor in front of the iterated logarithm
in (22.17) will exceed 1, in which case, & P[Ai 2 E ] < 00. Therefore,
XI, P[Ak 2 €1 < 00.
The maximum preceding the supremum in (22.16) is at most
By moving c still closer to 1, we can arrange that e2/c2(1 - c - , ) ~ >
a > 1. Then, for large k, the factor in front of the iterated logarithm
in (22.18) will exceed a , so that C kP[Bk 2 E ] < 00.
Since we have controlled A k and B k , the Mk of (22.14) satisfies
P[Mk 5 E ] < 00 for each E , and there is probability 1 that Mk + k 0.
As we already know, there is probability 1 that B n k E K Efor all large
k, and hence Bn E K2' for all large n. This completes the first part of
the proof.
Second part. Suppose that, for each z in K , there is probability 1
that some subsequence of (22.7) converges to z, or
Xk
(22.19)
liminf llB"(w) - 211 = 0.
n
There is a countable, dense subset D of K , and it follows that, for w
in a set of probability 1, (22.19) holds for every z in D . Fix such an w .
For y in K and E arbitrary, there is an z in D such that (Iy- $11 < E .
Since (22.19) holds, for infinitely many n we have llBn(w)- 211 < E
and hence llB"(w)- yII < 2 ~ For
. each y in K , this holds for every E ,
and so a subsequence of (22.7) converges to y. It is therefore enough
to deal with (22.19) for a single z in K , and we can even assume that
P l
(22.20)
226
OTHER
MODESOF
CONVERGENCE
since the z with this property are dense in K .
Fix such an z and let E be given. First choose m and then choose
6 so that
me2 > I,
(22.21)
< E,
x(l/m) < E ,
m6
< E;
the third condition is possible because z(0) = 0. Let A, = A,(z, m, S)
be the w-set where
(22.22)
I (.,(A)
-B " ( y ) ) - ( z ( ; ) - z ( G ) )
I< 6,
note that i starts at 2 rather than 1. Write
Since the distribution of N is symmetric about 0, it follows by the
definition (22.6) of Bn that
apply this with
By (22.4), ( z ( t )- ~ ( s ) )I~ (t - s) J;(z'(~))~du;
s = (i - l ) / m and t = i/m, add, and use (22.20): m C z l d : I
J;(~'(u))~du
< 1. By reducing 6 still further, we can arrange that
mC%l(di ~ 5 <
) ~1. If n is large enough that 6L,,m 2 6,
then
[M25(56)1
+
P[di I L , k N I di
+ 61 2 exp[-L:,,(di + 6)2/2]
= exp[-m(di
+ s )log
~ log n],
and it follows that
m
PA, 2 exp[-rn c ( d i
a= 1
+ S)2 log log n] 2 exp[- log log 713 = &.1
Let n k = mk. If F ( s , t ) is the a-field generated by the differences
B(w)- B(u) for s 2 u 5 v 5 t , then A, E F ( n / m , n ) , and A,, E
F(rn"', m'). Therefore the A,, are independent. (If we had started
i at 1 instead of at 2 in the definition of A,, then A,, would lie in
F(O,mk), and we could not have drawn the conclusion that the A,,
SECTION22. STRASSEN'S
THEOREM
227
were independent.) Since PA, 2 1/ logn, X I ,PA,, = 00, and by the
Borel-Cantelli lemma, there is probability 1 that infinitely many of the
A,, occur. For large n we have [M25(55)]
€1 = P"" 2
Since me2 > 1 by (22.21), X I ,PIIBnk(l/m)l
P[IBn(l/m)I 2
5 2(logn)-m'2.
2 E] < 00, and hence, with
probability 1, lBnk(l/m)l 5 E for all large k.
We have now shown that, if (22.21) holds, then there is probability
1 that (22.22) and IBn(l/m)I 5 E both hold for infinitely many values
of n. Since 1z(l/m)I < E by (22.21), it follows for such an n that
/ B n ( l / m ) - z ( l / m ) l 5 2 ~ But
. now, this and (22.22) imply ( B n ( 0 )= 0)
IBn(i/m)- z(i/m)I5 m6 + 2E,
(22.23)
0 5 i 5 m.
We know from the first part of the proof that, with probability
1 and for large n, llBn - yII 5 E for some y E K , so that, by (22.4)
for y, ~ ~ n (-t~) ( (- l>/m)I
i
5 2~
< 3~ for (i - l)/m 5
< E for
t 5 i/m. Again by (22.4), Iz(t)- x ( ( i - l)/m)I 5
(i - l)/m 5 t 5 i/m. We conclude from (22.21) and (22.23) that
13
I)Bn- 211 5 m6 6~< 7 ~ .
+
+
Proof of Strassen's Theorem
We are now in a position to complete the proof of Strassen's theorem;
the argument remaining is similar to that for Theorem 20.2. We start
with the sequence {&} on (0'3,
P). It is independent, and the t,
all have the same distribution, with mean 0 and variance 1. Now
t 01 and the stopping times -rn
consider the Brownian motion [ B ( t ) : 2
of Theorem 20.1. They are defined on a new space (R', 3',
P'). Of
course, Theorem 22.2 applies to this Brownian motion. Define random
variables <i and TI,as for Theorem 20.2, and define random elements
Xn,Yn,and Zn of C by linear interpolation between these values at
the points i/n:
x,:
X;, = L;lSi = L ; ~&i - <h (on R),
yn: y n = L-W
12
z - L12- C
~ h <-i C h = L;lB(-rz)
Zn: Z;, = Bn(i/n) = ~ ; l ~ ( i(on
) 0').
$,/
(0.W
This is the same as (20.2), except for the new norming constant, and
X n coincides with the random function of (22.1). Since the have the
228
OTHER
MODESOF
CONVERGENCE
same joint distribution as the <i, the X" will satisfy Theorem 22.1 on
fl if the Y" satisfy the same theorem on 0'. The B" are now defined
on 0' rather than on fl, but they do satisfy Theorem 22.2.
Suppose we can prove that
((Y"
- B"((+ 0
(22.25)
holds with PI-probability 1. By Theorem 22.2, there is PI-probability
1 that ( B 3 ,B 4 . ..) is relatively compact and the set of its limit points
is K . But then the sequence ( Y 3 , Y 4.,.). will have the same property
by (22.25), which will complete the proof.
We prove (22.25) in two steps:
(IBn- Z"II
(22.26)
---f
0
--f
0
and
llY" - Z"ll
(22.27)
hold with PI-probability 1. For large n we have [M25(58)]
Now (22.26) follows by the Borel-Cantelli lemma.
As for (22.27), llY" - Znll = L;lmaxis, IB(q)- B(i)l by the
definitions, and so it is enough to prove that
L,llB(T") - B ( n ) (* 0
(22.28)
with PI-probability 1. Let E be given. By the strong law of large
numbers (recall that the T~ - ~ " - 1 are independent and identically
distributed with mean l), there is an no such that, with probability
exceeding 1 - E , we have 17, - n1 < nE for n > no, in which case
( B ( T-~ B(n)l
)
5 sup ( B ( t ) B ( n ) \ ,where the supremum extends over
(1 - ~ ) 5nt 5 (1 ~ ) nDefine
.
n', a function of n, by n' = [(l ~ ) n 1 .
For large n, (1 - ~ ) n / n>/ 1 - 2 ~ and
, this implies
+
+
SECTION22. STRASSEN’S
THEOREM
For large n we have L,t/L,
229
< 2, and then
(22.29) L i l l B ( ~ , )- B(n)l 5 4
=4
sup
Li:IB(sn’) - B(n’)l
sup
p q s ) - P’(l)l.
1-2E5S51
1-2E5S51
With probability exceeding 1 - E , (22.29) holds for all large enough
n. By Theorem 22.2, there is probability exceeding 1 - 2~ that both
(22.29) and Bn’ E K Ehold for large n. But then, by (22.4), we have
for 1 - 2~ 5 s 5 1. Therefore, with
IBn’(s)- Bn’(l)l 5 2e
probability exceeding 1 - 2 ~ ,
+
for all sufficiently large values of n. This shows that (22.28) holds with
0
probability 1 and completes the proof of Theorem 22.1.
Applications
The applications use this result:
Lemma 2. If 4 is a continuous map from C to R1, then, for w in
a set of probability 1, the sequence
(22.30)
(6(X3(wh4(X4(w)),* *)
is relatively compact and the set of its limit points coincides with the
closed internal # ( K ) ,
PROOF.If z and y are points of C , then the function f ( t ) =
4 ( ( 1 - t ) z + ty) on [ 0 , 1 ] is continuous and assumes the values ~ ( I c )
and 4(y) at the endpoints and hence must assume all values between
the two. Since K is convex, 4 ( K )is a closed interval. It is easy to see
that, if (22.2) is relatively compact and the set of its limit points is K ,
then (22.30) is also relatively compact and the set of its limit points is
0
6 ( K ) .The result follows by Theorem 22.1.
Exumple 22.1 Take ~ ( I c =
) z ( 1 ) . By (22.4), 1x(1)1 5 1 for IC E K ,
and since +(x) is +1 and -1 for z ( t ) = t and x ( t ) = -t, we have
+ ( K )= [-1, +1]. And
(22.31)
By the lemma, the limit points of the sequence defined by (22.31)
exactly fill out the interval [-1,+1]. This is the Hartman-Wintner
0
theorem.
230
OTHER
MODESOF
CONVERGENCE
Example 22.2 If +(x) = sup, x ( t ) ,then 4 ( K )= [0,1], and
The limit points exactly fill out [0,1].
0
Example 22.3 Let f have bounded derivative on [ 0,1] and take
$(x) = J i x ( t ) f ( t ) d t . If F ( t ) = f ( s ) d s and x E K , then partial integration gives 4(x) = F(t)x'(t)dt. This is an inner product
(F,z') in L2[0,11, and under the constraint 11x'11; = Ji(~'(t))~dt5 1,
it is maximal for x' = F/llF112 and the maximum is m = llFll2 =
[s,'F 2 ( t )dt]1/2. Therefore, $ ( K )= [-m, m].
We can approximate 4 ( X " ) = J-,l X : f ( t ) dt by
6
Jt
To see this, note first that
If A bounds I f 1 and B bounds If' [ , then the integrand is bounded by
AIX? - L;'Sil+ Bn-lL;lISi(, and we have
n
n
By the strong law of large numbers, this goes to 0 with probability 1.
Therefore, the limit points of {Un}fill out [-m,rn].
If f(t) = c , then rn = lcl/fi. If f(t) = ta, a > 1, then m =
((a 2)(a 3/2))-lI2.
0
+
+
Example 22.4. Because of Example 22.1, it is interesting to investigate the frequency of the integers i for which LclSa exceeds a fixed
c. Define random variables yi by
(22.34)
231
SECTION
22. STRASSEN'S
THEOREM
For 0 < c < 1, there is probability 1 that
(22.35)
l i m s u pl - xny i = 1 -exp(-4($
n
n 2=3
.
- 1)).
The proof of this falls into two distinct parts. Define
(22.36)
vc(x):= X [ t : x ( t ) 2
C h ] ,
s*(c): = sup vc(x),
xEK
where X is Lebesgue measure and t is restricted to [ 0,1]. We show first
that
(22.37)
s*(c) = 1 - exp(-4(cd2 - 1))
and second that
(22.38)
with probability 1.
To prove (22.37),we show first that the supremum s*(c) is achieved.
Suppose that zm + z (in the sense of C) and consider the inequalities
X[t:z,(t) 2 c&] 5 X [ t : z ( t )2 c&- k-'1 < X [ t : x ( t ) 2 c&] + E . Given
E, choose k so that the second inequality holds; for large m, the first
holds as well, which proves upper semicontinuity: lim suprnvc(x,) 5
vc(x). Now choose {xm} in K so that vc(x,) converges to s*(c), and
by passing to subsequence, arrange that x, + 20. Then zoE K , and
by upper semicontinuity, xo is a maximal point of K : It achieves the
supremum s*(c). It will turn out that there is only one of them, but
for now let xo be any maximal point.
If s < t and z E K , it follows by Jensen's inequality that
(22.39)
and there is equality if and only if x is linear over [s,t 1. If z ( t )2 c&
and t > 0, then (22.39) gives Jl(x'(u))2du
2 ( ~ ( t ) )2~c2.
/ t Therefore,
since the integral goes to 0 with t , we must have z ( t )< c& in some
interval (0, so). This is true of each x in K , and it implies vc(x) > 0.
OTHERMODESOF CONVERGENCE
232
If zo is maximal and p2 = J i ( z b ( ~ ) ) ~<d u1, then z1 = p - l z o
lies in K and ~ ~ ( $ exceeds
1)
vc(zo)(since the latter must be positive).
Therefore, every maximal zo satisfies
(22.40)
Next, the set A = [t:zo(t)> c&] is empty: If 1 E A , then there
is an a > c and a t’ such that zo(t’) = a and zo(t) > a on (t’,11.
Redefine zo so that zo(t)= a on [ t ’ , l ] .This leaves vc(zo)unchanged
d t (22.39), which contradicts (22.40). It
but decreases J ; ( ~ b ( t ) ) ~by
follows that, if A # 8, then there are points tl and t2 such that 0 <
tl < t 2 < 1, zo(t1) = c f i , z o ( t 2 ) = c f i , and z:o(t)> c& on ( t l , t 2 ) .
Since p ( t ) = c& is strictly concave, zo cannot be linear over [ t l ,t z ] . If
each point of ( t l ,t2) can be enclosed in an open interval on which zo is
linear, then by compactness, 20 is linear over [tl E , t2 - E ] for each E
and hence is linear over [tl,tz]. Therefore, there is a to in ( t l ,t2) that
is contained in no open interval over which zo is linear. There is some
7 small enough that the line segment L connecting ( t o - 77, zo(t0- 7 ) )
to ( t o g,zo(to 77)) lies above the graph of p . Redefine zo over
[to-7, to+7] so that its graph coincides with L there. This leaves vC(zo)
unchanged, and again we have a contradiction of (22.40). Therefore,
A is indeed empty.
We know that zo(t) 5 c& for all t , and zo(t) < c& on (0,so).
Take SO to be the infimum of those positive t for which zo(t) = c d ,
with SO = 1 if there are no such t. Then ~ ( t=)t c / f i o on [ O , S O ] ,
for if 20 is not linear there, then (22.39) (strict inequality) for z = zo,
s = 0, t = so contradicts (22.40) once again. We prove below that
z:o(t)= c& on [so, 11, so that
+
+
+
(22.41)
Let us assume this and complete the proof of (22.37). If (22.41) holds
for the maximizing 20,then, by (22.40), c2 S , 1 , ( ~ / 2 & ) ~ d=t 1, and
this gives SO = so(c), where so(c):= exp(-4(1 - c - ~ ) ) .This proves
(22.37), as well as the fact that 0 < so < 1.
To prove (22.41), we must eliminate two possibilities. First, s u p
pose there exist s1 and s2 such that so 5 s1 < s2 5 1, zo(s1) = c f i ,
zo(s2) = c f i , and zo(t)< c& on (s1,s2). Let S = 8 2 - s1 and define
+
SECTION
22. STRASSEN’S
THEOREM
233
a continuous yo by
If t is a point of [so, s1] and zo(t)2 c f i , then t+6 is point of [s0+6, s2]
and yo(t 6) 1 c
m (algebra); if t is a point of [sp,11, then yo(t) 2
zo(t) (algebra). Therefore, vC(yo) 2 vC(zo)and yo is a maximal point.
An easy computation, together with the facts that yh(t) = zb(t - 6) on
[so 6, sp] and yh(t) = zb(t) on [s2,1],shows that
+
+
, -
(yL(t))2dt = c2,
J so
Therefore, by (22.39),
which contradicts (22.40) and rules out the first possibility.
The second possibility is that there exists an s1 such that so 5
s1 < 1, zo(t)= c f i on SO,^], and z ~ ( t<) c& on (s1,1]. Redefine
20 by taking ~ ( t= )cJ-S-; on [sl,11. Since zo is still maximal, (22.40)
gives so = s l e x p ( - 4 ( ~ - ~
- 1)). But from s1 < 1 follows s1 - so <
1-exp(-4(~-~-1)). Since this last quantity is acheived by (22.41)’ the
redefined zo cannot be maximal, which rules out the second possibility.
This completes the proof of (22.37). It remains to prove (22.38).
Suppose we can show that, for 0 < c‘ < c < c” < 1, there is probability
1 that
(22.43)
and
(22.44)
C ya 2 s* (c”).
l n
lim sup -
*
a=3
234
OTHER
MODESOF
CONVERGENCE
This will be enough, since (22.37) shows that s*(c) is continuous and
decreasing.
To prove (22.43), first restrict the range of summation to
(22.45)
.
ran1 + 1 5 i 5 n,
where 0 < a < 1. Since this decreases the averaget by at most a , if
we can show that (22.43) holds for the new average, then (22.43) itself
will follow: Let a decrease to 0 through the rationals. Fix the a and
take E small enough that c ( l - E) - 2 ~ / & > c'. Suppose that i is in
the range (22.45), that t E IF = [(i - l)/n,i/n], and that n is large
enough that (log log an/ log log n)lI22 1- E. Then yi = 1 implies
(22.46)
X & = L,%i
2
c(l -E
) f i
2 c ( l - €)A.
With probability 1, for large n there is an zn in K for which llXn znll < E , and then (22.46) implies zn(i/n) 2 c ( l - E ) & - E , which in
turn implies zn(t) 2 c( 1- ~ )
- 2~
f ifor n large enough that
<E
(use (22.4)). And finally, (22.45) and t E IF further imply zn(t) 2
(c(1 - E) - ~E/,/E)&
> c'&. To sum up, there is probability 1 that,
for n large and i in the range (22.45), yi = 1implies that zn(t)2 c ' d
on I?. This means that the average in (22.43) for the new range (22.45)
is at most X [ t : zn(t) 2 c'&], which proves the original (22.43).
As for (22.44), we can restrict the sum by
fi
(22.47)
LanJ - 15 i < n,
since this only decreases the average. Take zo to be the maximal point
of K for c", and take a less than so(c") as defined after (22.41). Then
X[t: a I
t 5 1, zco(t) 2 c v i ] = X [ t : zo(t) 2
C'lA]
= s*(cI').
Now X & > c f l
implies Si > c G L , 2 c L i and hence yi = 1.
With probability 1, there are infinitely many n such that, for i in the
(d' - c ) G . Suppose
range (22.47), llXn - zoll < (c" - c ) m I
that, for such an n and i, we have xo(t) 2 c"& for some t in IF+l.
Then
t These are not quite averages, since the number of summands is less than
TI.
SECTION
22. STRASSEN7S THEOREM
235
and hence yi = 1. Since [a,
11 is covered by the
for i in the range
(22.47), it follows that n-l C ~ ~ ~ a”yinis, at
- lleast
This proves (22.44).
As Strassen points out, .99999 < s*(1/2) < .999999. Therefore, for
c = 1/2 there is probability 1that n-l
”yi exceeds .99999 infinitely
often but exceeds .999999 only finitely often.
0
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
APPENDIX M
METRIC SPACES
We review here a few properties of metric spaces, taking as known the
very first definitions and facts.
M1. Some Notation. We denote the space on which the metric is
defined by S and the metric itself by p ( z , y ) ; the metric space proper
is the pair (S,p). For subsets A of S , denote the closure by A - , the
interior by A”, and the boundary by r3A = A - - A ” . The distance from
z to A is p(z, A ) = inf[p(z,y): y E A ] ;from p(z, A ) 5 p ( z , y) p ( y , A )
it follows that p( . , A ) is uniformly continuous. Denote by B ( z ,r ) the
open r-ball [y: p ( z , y) < r ] ; “ball” will mean “open ball,” and closed
balls will be denoted B ( z , r ) - . The €-neighborhood of a set A is the
open set A‘ = [ z : p ( x , A )< €1.
+
M2. Comparing Metrics. Suppose p and p’ are two metrics on the
same space S . To say that the p’-topology is larger than the ptopology
is to say that the corresponding classes 0 and 0’of open sets stand in
the relation
(1)
0 c 0’.
This holds if and only if for every II: and r , there is an r‘ such that
B‘(z,r’) c B ( z ,r ) , and so in this case the p’-topology is also said to be
finer than the p-topology. (The term stronger is sometimes confusing.)
Regard the identity map i on S as a map from ( S ,p’) t o ( S ,p ) . Then
i is continuous if and only if G E 0 implies G = i-lG E 0’-that is,
if and only if (1) holds. But also, i is continuous in this sense if and
only if
p’(z,, z) + O implies p(z,, z)
0.
--f
This is another way of saying that the p’-topology is finer than the
ptopology. The metric p is discrete if p(z, y) = 1 for z # y; this gives
to S the finest (largest) topology possible.
236
APPENDIX
M
237
Two metrics and the corresponding topologies are equivalent if each
is finer than the other: (S,p) and (S,p’) are homeomorphic. If p’ is
finer than p, then the two may be equivalent; in other words, “finer”
does not mean “strictly finer.”
M3. Separability. The space S is separable if it contains a countable,
dense subset. A base for S is a class of open sets with the property
that each open set is a union of sets in the class. An open cover of A
is a class of open sets whose union contains A .
Theorem. These three conditions are equivalent:
(i) S is separable.
(ii) S has a countable base.
(iii) Each open cover of each subset of S has a countable subcover.
PROOF.Proof that (i) + (ii). Let D be countable and dense, and
take V to be the class of balls B ( d , r ) for d in D and r rational. Let
G be open; to prove that V is a base, we must show that, if GI is the
union of those elements of V that are contained in G, then G = G I .
Clearly, G1 c G, and to prove G c GI it is enough to find, for a given
z in G, a d in D and a rational r such that z E B ( d , r ) c G. But
if z E G, then B ( ~ , Ec) G for some positive E . Since D is dense,
there is a d in D such that p ( z , d ) < ~ / 2 .Take a rational r satisfying
p(z, d ) < r < ~ / 2 :z E B(d,r ) c B ( z ,E ) .
Proof that (ii) + (iii). Let {Vl,V2,.. .} be a countable base, and
suppose that {G,} is an open cover of A ( a ranges over an arbitrary
index set). For each V k for which there exists a G, satisfying V k c G,,
let G,, be some one of these G, containing it. Then A c Uk G a k .
Proof that (iii) + (i). For each n, [ B ( z , n - l ) : zE S] is an open
cover of S . If (iii) holds, there is a countable subcover [B(z,k,n - l ) : k =
1 , 2 , . . The countable set [z,k: n, k = 1 , 2 , .. .] is dense in S.
0
.I.
A subset M of S is separable if there is a countable set D that
is dense in M ( M C D - ) . Although D need not be a subset of M ,
this can easily be arranged: Suppose that { d k } is dense in M , and
take z k n to be a point common to B ( d k , n - l ) and M , if there is one.
Given an z in M and a positive E > 0 choose n and then d k so that
p ( z , d k ) < n-l < ~ / 2 .Since B ( d k , n - ’ ) contains the point z of M , it
contains z k n , and p(z,zk,) < E . The z k n therefore form a countable,
dense subset of M .
Theorem. Suppose the subset M of S is separable.
238
APPENDIXM
(i) There is a countable class A of open sets with the property that,
i f x E G n M andGisopen, t h e n x E A c A - c G f o r s o m e A i n A .
(ii) Every open cover of M has a countable subcover.
PROOF.Proof of (i). Let D be a countable, dense subset of M
and take A to consist of the balls B ( d , r ) for d in D and T rational.
If z E G n M and G is open, choose 6 so that B ( x ,E ) c G, then
,
finally, choose a rational r
choose d in D so that p ( x , d ) < ~ / 2 and
so that p ( z , d ) < r < c/2. It follows that z E B ( d , r ) c B(d,r)- C
B ( z , e )c G.
Proof of (ii). Let A = { A l ,Az,. . .} be the class of part (i). Given
an open cover { G a }of M , choose for each Ak a G,, containing it (if
0
there is one). Then M C UkG a k .
These arguments are essentially those of the preceding proof. Part
(ii) is the Lindelof property.
Separability is a topological property: If p and p’ are equivalent
metrics, then M is pseparable if and only if it is p’-separable.
M4. Completeness. A sequence { x n } is fundamental, or has the
Cauchy property, if
SUP ~ ( z zi j,) +n
i,j>n
0.
A set M is complete if every fundamental sequence in M has a limit
lying in M . A complete set is obviously closed. Usually, the question
is whether S itself is complete. A fundamental sequence converges if it
contains a convergent subsequence, which provides a convenient way
of checking completeness.
Completeness is not a topological property: S = [I,m) is complete
under the usual metric ( p ’ ( s , y ) = Iz - 91) but not under the equivalent
metric p(x,y) = 1z-l - y-’l (or, to put it another way, [l,ca)and
(0,1] are homeomorphic, although the first is complete and the second
is not). A metric space (S,p) is topologically complete if, as in this
example, there is a metric equivalent to p under which it is complete.
Given a metric p on S, define
b(x,Y > = 1 A p(s, Y).
(2)
Since +(t)= 1A t is nondecreasing and satisfies +(s t) <_ $(s) + $ ( t )
for s, t 2 0, b is a metric; it is clearly equivalent to p. Further, since
+(t) 5 t for t L 0 and +(t) = t for 0 5 t 5 1, a sequence is b
fundamental if and only if it is pfundamental; this means that S is p
complete if and only if it is bcomplete. The metric b has the advantage
that it is bounded: b(z,y)5 1.
+
239
APPENDIXM
M5. Compactness. The set A is by definition compact if each open
cover of A has a finite subcover. An +net for A is a set of points
{zk} with the property that for each z in A there is an z k such that
p ( z , z k ) < E ; A is totally bounded if for each positive E it has a finite
E-net (the points of which are not required to lie in A ) .
Theorem. These three conditions are equivalent
(i) A- is compact.
(ii) Each sequence in A has a convergent subsequence (the limit of
which necessarily lies in A - ) .
(iii) A is totally bounded and A- is complete.
PROOF.It is easy to see that (ii) holds if and only if each sequence
in A- has a subsequence converging to a point in A- and that A is
totally bounded if and only if A- is. Therefore, we may assume in the
proof that A = A- is closed.
The proof is clearer if we put three more properties between (i)
and (ii):
(il) Each countable open cover of A has a finite subcover.
(i2) If A c UnGn, where the Gn are open and G1 C G2 C .,., then
A c G, f o r some n.
(i3) If A Fl 3 F2 3 ., where the Fn are closed and nonempty, then
Fn is nonempty.
We first prove that (il), (iz), (i3), (ii), (iii) are all equivalent.
Proof that (il)
(i2). Obviously (il) implies (iz). As for the
converse, if {Gn}covers A , simply replace Gn by UkCn
- Gk.
nn
-
Proof that (i2)
(i3). First, (i2) says that A n G , t A implies that
A n G, = A for some n. And (i3) says that A n Fn J. 0 implies that
A n Fn = 8 for some n (here the Fn need not be contained in A ) . If
Fn = GZ, the two statements say the same thing.
Proof that (is) c) (ii). Assume that (is) holds. If {zn}is a sequence
in A , take Bn = {zn,z n + l , . . .} and Fn = B i . Each Fn is nonempty,
F, contains some z. Since J: is in the closure
and hence, if (i3) holds,
of B,, there is an i, such that in 2 n and p(z,zi,) < n - l ; choose the
in inductively so that 21 < 22 < . .. Then limn p ( z , xi,) = 0: (ii) holds.
On the other hand, if Fn are decreasing, nonempty closed sets and (ii)
holds, take zn E Fn and let z be the limit of some subsequence; clearly,
zE
Fn: (i3) holds.
Proof that (ii)+(iii). If A is not totally bounded, then there exists
a poitive E and an infinite sequence { z n }in A such that p(z,, z n ) 2 E
f-f
nn
nn
240
APPENDIX
M
for m # n. But then {z,} contains no convergent subsequence, and
so (ii) must imply total boundedness. And (ii) implies completeness,
because, if { z n } is fundamental and has a subsequence converging t o
z, then the entire sequence converges to z.
Proof that (iii)+(ii). Use the the diagonal method [PM.538]. If
A is totally bounded, it can for each n be covered by finitely many
open balls & I , . . . ,Bnk, of radius n-l. Given a sequence { z m } in
A , first choose an increasing sequence of integers mil, m12,. . . in such
a way that zmll,zmlz,
. . . all lie in the same B l k , which is possible
because there are only finitely many of these balls. Then choose a
sequence m21, m22,. . ., a subsequence of m l l , m l 2 , . . ., in such a way
z m Z 2. ,. .all lie in the same B 2 k . Continue. If ri = mii, then
that zmZ1,
zCTn,zrnfl,
. . . all lie in the same B,k. It follows that zT1,zTz,.. . is
fundamental and hence by completeness converges to some point of A .
Thus (il) through (iii) are equivalent. Since obviously (i) implies
(il), we can complete the proof by showing that (il) and (iii) together
imply (i). But if A is totally bounded, then it is clearly separable, and
it follows by the Lindelof property that an arbitrary open cover of A
has a countable subcover. And now it follows by (il) that there is a
further subcover that is finite.
0
Compactness is a topological property, as follows by condition (ii)
of the theorem. A set A is bounded if its diameter sup[p(z,y): 2,y E A]
is finite. The closure of a totally bounded set is obviously bounded in
this sense; the converse is false, since, for example, the closed balls in C
(Example 1.3) are not compact. On the other hand, a set in Euclidean
k-space is totally bounded if and only if it is bounded, and so here the
bounded sets are exacly the ones with compact closure.
A set A is relatively compact if A- is compact. This is equivalent t o the condition that every sequence in A contains a convergent
subsequence, the limit of which may not lie in A .
A useful fact: The continuous image of a compact set as compact.
For suppose that f : S --t S’ is continuous and A is compact in S . If
{f(zn)}
is a sequence in f ( A ) , choose {ni} so that { z n i } converges to
a point z of A . By continuity, { f ( z n i )converges
}
to the point f(z)of
f (4.
M6. Products of Metric Spaces. Suppose that (Si,pi),i = 1,2,. . .,
are metric spaces and consider the infinite Cartesian product S =
241
APPENDIX
M
S1 x
5'2 x
a
- -. It is clear that
is a metric, the metric of coordinatewise convergence.
If each Si is separable, then S is separable. For suppose that Di
is a countable set dense in Si,and consider the countable set D in S
consisting of the points of the form
(4)
where k 2 1, xi is a variable point of Di for i 5 k, and xy is some fixed
point of Si for i > k. Given an E and a point y of S, choose k so that
2-i < E and then choose points xi of the Di so that pi(yi, xi) < E.
With this choice, the point (4)satisfies p ( y , x ) < 2 ~ .
If each Sa is complete, then S is complete. Suppose that xn =
(xy,xt,. . .) are points of S forming a fundamental sequence. Then
each sequence xi,x:, . . . is fundamental in Si and hence pi (xa,xi) +n 0
for some xi in Si. By the M-test, p ( x n , x )---t 0.
If Ai is compact in Si, then A1 x A2 x . . is compact in S . (This
is a special case of Tihonov's theorem.) Given a sequence of points
xn = (x?,xg,. . .) in A , consider for each i the sequence x t , x:, . . . in
Ai. Since Ai is compact, there is a sequence n1, n2, . . . of integers such
that xyk +k xi for some xi in Ai. But by the diagonal method, the
sequence { n k } can be chosen so that x y k + k xi holds for all i at the
same time. And then, x n k +k ( X I , ~ 2 ,.).~ .
M7. Baire Category. A set A is dense in B if B c A - . And A
is everywhere dense (or simply dense) if S = A - , which is true if and
only if A is dense in every open ball B. And A is defined to be nowhere
dense if there is n o open ball B in which it is dense. The Cantor set is
nowhere dense in the unit interval, for example, but a nowhere-dense
set can be entirely ordinary: A line is nowhere dense in the plane.
To say that A is nowhere dense is to say that, for every open ball
B , A fails to be dense in B , which in turn is to say that B contains an
x such that, for some E, the ball B ( x ,E ) fails to meet A : B ( x ,E ) c A".
But since B is open, B ( ~ , E
c )B for small enough E:
(5)
B(~,E
C )B n A C .
APPENDIXM
242
Thus A is nowhere dense if and only if every open ball B contains a ball
B ( x ,E ) satisfying ( 5 ) . This condition is usually taken as the definition,
although the words “nowhere dense” then lose their direct meaning.
By making E smaller, we can strengthen ( 5 ) to B ( x ,E ) - C B fl A“. The
Baire category theorem:
Theorem. If S is complete, then it cannot be represented as a
countable union of nowhere dense sets.
PROOF.
Suppose that each of A1, A2,. . . is nowhere dense. There
is in S an x1 such that B(x1,€1)- c SnA; for some €1. And B ( x ~ , E ~ )
contains an 2 2 such that B ( 5 2 , ~ ) -C B ( x l , ~ 1n) A; for some € 2 .
Continue. The en can be chosen in sequence so that en < 2 - n , and
then, since p(x,,z,+l) < 2-,, the sequence {IC,}is fundamental and
hence has a limit x. For each k, x lies in B(xk,~k)-and hence lies
0
outside Ak: S = UkA k is impossible.
A set is defined to be of the first category if it can be represented as
a countable union of nowhere-dense sets; otherwise it is of the second
category. According to Baire’s theorem, the space itself must be of the
second category if it is complete.
M8. Upper Semicontinuity. A function f is upper semicontinuous
at x if for each E there is a 6 such that p ( x , y) < 6 implies f (y) <
f (x) E . It is easy to see that f is everywhere upper semicontinuous
if and only if, for each real a , [x:f ( x ) < a] is an open set. Dini’s
theorem:
+
Theorem. If f,(x) 10 f o r each x, and i f each f, is everywhere upper semicontinuous, then the convergence is uniform o n each compact
set.
PROOF.For each E , the open sets G, = [x:f,(z) < E ] cover S . If
K is compact, then K C G, for some n, and uniformity follows.
0
M9. Lipschitz Functions. A function satisfying a Lipschitz condition on a subset A of S can be extended to the entire space.
).(fI
Theorem. Supose that f i s a function on A that satisfies
f (y)J 5 Kp(x,y) for x and y in A . There is an extension g off to S
that satisfies the same condition: l g ( x ) - g ( y ) J5 K p ( x ,y) for x and y
in S . Iff satisfies If I 5 a o n A , then g can be taken to satisfy 191 5 a
on S.
APPENDIX
M
243
PROOF.
Fix an arbitrary point z of A . If y E A , then for all x E S ,
+ K P h Y ) = f ( 4 + KP(x,Y) + (f(9)- f ( 4 )
2 f ( 4 + Kp(x,Y) - K p ( y , z )2 f ( z ) - K d x , 4.
Therefore, the function g(x) = infyEA(f (y) + Kp(z,y)) is well defined
f(Y)
+
on S . If x and y lie in A , then f ( y ) Kp(x,y) 2 f(x),with equality
for y = x: g ( x ) = f ( x ) for x in A .
Let x and x‘ be points of S . Given E, choose y in A so that g(x) 2
f ( y ) K p ( z , y ) - Then
+
g(x’>- 9 ( 4 I
f ( Y ) + KP(X’7 Y) - [f( 3 ) + K d x ,Y) - El
= K(p(x’,Y)) - P ( x , 9 ) ) + E L K&’, 4
+
€7
and so g(z’) - g(x) 5 K p ( x ’ , x ) . Interchange x‘ and x to get the
Lipschitz condition for g.
0
If a bounds
on A , truncate g above at a and below at -a.
If1
M10. Topology and Measurability. The Borel a-field S for ( S ,p )
is the one generated by the open sets. Let (S’,p’) be a second metric
space, with Borel a-field S’. If h : S ---t S’ is continuous, then it is
measurable S/S’ (in the sense that A‘ E S‘ implies A E S). To prove
this, it is enough [PM.182] to show that h-’G’ E S if G’ is an open
set in S‘; but of course h-’G’ is open. In particular, a continuous real
function on S is S-measurable.
Let (0,F)be a measurable space, and let hn and h be maps from
SZ to S . If each h, is measurable 3 / S , and if limn hnx = hx for every
x, then h is also measurable F / S . In fact, h-’F c liminf, hi1$‘‘ c
h-’Fz‘. Intersect over positive, rational E: If F is closed, then h-lF =
lim inf, h;lF‘, which lies in 3.
The set D h of points at which h is not continuous lies in S. This is
true even if h is not measurable S/S’. To prove it, let A,a be the set of
x in S for which there exist points y and z in S satisfying p ( x , y) < 6,
p ( x , z ) < 6, and p’(hy, h z ) 2 E. Then A,6 is open, and Dh E S because
Dh = 0, A,s ( E and 6 ranging over the positive rationals).
Subspaces. A subset So of S is a metric space in its own right. If
0 and 00are the classes of open sets in S and SO,then 0 0 = 0 n So
(= [Gn So: G E O ] ) ,and it follows [PM.159] that the Borel a-field in
SOis
n,
u,
(6)
so=snso.
APPENDIXM
244
If SOlies in S, (6) becomes
So = [A:A C So,A E S].
(7 )
Product Spaces. Let S’ and S” be metric spaces with metrics p‘
and p“ and Borel a-fields S‘ and S”, and consider the product space
T = S‘ x S”. The product topology in T may be specified by various
metrics, for example,
and
(9)
t ( ( dd
, l ) ,(y!, y”)) = p y x / ,y’) v p y d l , y q .
Under either of these metrics there is convergence (&, z): + (z’,d’)
in T if and only if xk t x’ in S‘ and x: --+ x” in S”. Under the metric
(9) we have, in an obvious notation,
(10)
~ ~ ( ( d , d ’ )=, rB)P t ( d , r )x BPll(z’’,~),
which is convenient.
Consider the projections d : T
S‘ and d’ : T + S“ defined
by d ( z ’ , d ’ ) = x‘ and 7r”(d,d’)= z”; each is continuous. If To is
countable and dense in T ,then dT0 and #TO are countable and dense
A and S[ are countable and dense
in S‘ and S“. On the other hand, if S
in S‘ and S“, then Sl, x S[ is countable and dense in T . Therefore: T
is separable af and only if S’ and S” are both separable.
--f
Let 7 be the Borel a-field in T . Consider also the product a-field
S‘ x S”-the one generated by the measurable rectangles, the sets A’ x
A” for A‘ E S‘ and A“ E S“. Now this rectangle is (n’)-lA’n(n”)-lA”;
since the two projections are continuous, they are measurable 7 / S ’
and ‘T/S“, respectively, and it follows that the rectangle lies in T.
Therefore, S‘ x S“ C 7. On the other hand, if T is separable, then
each open set in T is a countable union of the sets (10) and hence lies
in S‘ x S“. It follows that
(11)
S’ x S” = T,
if T is separable.+
t Without separability, (11) may fail: If S’ = S” is discrete and has power
exceeding that of the coninuum, then the diagonal [(z,y):z = y] lies in 7 but not
in S’ x S”. See Problem 2 on p. 261 of Halmos [36].
APPENDIX
M
245
ANALYSIS
M11. The Gamma and Beta Functions. The gamma function is
defined by
r(a)=
(12)
Jo
0;)
F-1
e --I dx
for positive a. Integration by parts shows that r ( a ) = (a - l)r(a- 1)
for a > 1, from which it follows that r(m)= (m - l)! for positive
integers m.
The gamma-a distribution (with unit scale parameter) has density
over ( 0 , ~ ) .A calculation with the Laplace transform shows that
ga and gp convolve to ga+p, from which it follows that ga+p(l) =
J;
ga(Y)gp(l - Y) dY, or
The left side here defines the beta function; the beta-(a, P ) distribution
has over (0,l) the density y"-'(l - y)P-'r(a P)/r(a)I'(P).
+
M12. Dirichlet's Formula. Let Dk be the Rk-set defined by the inti < 1. Dirichlet's integral formula is
equalities t l , . . . , t k > 0 and
xf=l
+
+ a k and the ai are positive. To prove it, fix
+ . - . + t k ( a = 0 if k = 2), and consider the
where a = a1 . . .
t 3 , . . . , t k , write a = t 3
integral
By (14), the change of variables
Jacobian s2) reduces this to
tl =
(1 -
s1)s2, t 2
= s1s2 (with
APPENDIX
M
246
And now (15) follows by induction.
M13. The Dirichlet Distribution. The random vector X =
( X I , .. . ,X k ) has the Dirichlet distribution with (positive) parameters
k- 1
xi and ( X I , . . ,Xk-l) has density
a1,. . . , Crk if x k = 1 -
on the set where t l , . . . ,t k - 1 are positive and add to a t most 1. By
(14) and (15), (16) does integrate to 1. A change of variables shows
that, if ai = a j , then the distribution of X remains the same if Xi
and X j are interchanged. If the ai are all the same, then X has the
symmetric Dirichlet distribution; in this case, the distribution of X is
invariant under all permutations of the components.
M14. Scheff6’s Theorem. Suppose that g: are nonnegative, that
C m y & = Cmgm(finite) for all n, and that y$ jnYm for all m.
Then, by the series form of Scheffe’s theorem [PM.215],
m
If f is a bounded and continuous real function, then
m .
m
For, if A4 bounds f , then
m
m
m
m
To the first sum on the right apply (17) and to the second apply the
bounded convergence theorem.
M15. Measurability of Some Mappings. Let 7 be the class of
Bore1 subsets of T = [ 0,1]. For each t , the projection 7rt from C to R1
(Example 1.3) is measurable C. Since the mapping
(19)
(z, t ) + 4 4
APPENDIX
M
247
from C x T to R1 is continuous in the product topology, and since
C x 7 is the a-field of Borel sets for this topology (see ( l l ) ) , (19) is
measurable C x 7/R1.
For z E C,let h ( z )be the Lebesgue measure of the set o f t in T for
which z ( t )> 0. We want to prove that h is measurable C , and we can
derive this from a more general result. If TJ is the indicator of (O,m),
then
h ( z )=
1
1
v ( z ( t )dt.
)
It will be enough to prove that, if w is Borel measurable, bounded,
and continuous except on a set of Lebegue measure 0, then (20) is
measurable C ; we prove at the same time that it is continuous except
on a set of Wiener measure 0.
Since v ( z ( t ) )is a bounded, Borel measurable function of t , (20)
is well defined. And since (19) is measurable C x 7, the mapping
$ : C x T + R1 defined by $(z, t ) = v ( z ( t ) )is also measurable. Since
$ is bounded, h ( z )= $(z, t ) d t is measurable C as a function of z
[PM.233].
If D, is the set of discontinuities of TJ, then, by assumption, AD, =
0, where X is Lebesgue measure. Let E be the set of (z, t ) for which
z ( t )E D,.If W is Wiener measure, then W [ z(z,
: t) E El = 0, and it
follows by Fubini's theorem applied to the measure W x X on C x 7 that
X [ t : (2, t) E El = 0 for z @ A , where A is a C-set satisfying WA = 0.
Suppose that zn + z in the topology of C.If z $! A , then z ( t )$! D, for
) v(z(t))
almost all t and hence, since zn(t)--+ z ( t )for all t , v ( z n ( t )+
for almost all t. It follows by the bounded convergence theorem that
Jl
Thus h is continuous except at points forming a set of W-measure 0.
The argument goes through if W is replaced by a P with the property that P.rr;l is absolutely continuous with respect to Lebesgue measure for almost all t. This is true of W".
Except in two places, this argument also goes through word for
word if C is replaced by D . First, if zn + z in the topology of D , then
zn(t)+ z ( t )for all but countably many t (rather than for all t ) ,which
is still enough for (21). Second, the proof that (19) is measurablein this case, that it is measurable 2) x IIRl-requires modification.
APPENDIXM
248
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
APPENDIX
M
249
Define (see the proof in Section 12 that 7rt is measurable) h,(x,t)as
t
~ - ' S , ~ + ' z ( s ) dfor
s t 5 and as E-' J,-cx ( s ) d s for t >
Then h,
is continuous on D x T , and since h,-~(x,t) --tm z ( t )for all z and t ,
measurability follows.
One more function on C remains to be analyzed, namely, the supremum h ( z ) of those t in [ 0 , 1 ] for which x ( t ) = 0. Since [x:h ( z )< a]
is open, h is measurable. If h is discontinuous at x, then h ( t ) must
keep to one side of 0 in ( h ( z ) , l ) and keep to the same side of 0 in
( h ( z )- ~ , h ( z for
) ) some E . That h is continuous except on a set of
Wiener measure 0 will therefore follow if we show that, for each t o ,
the supremum and infimum of W over [to,1] have continuous distributions. Since Wt - W,, for t ranging over [to,11 is distributed as a
Wiener path with a linearly transformed time scale, -Wt, +supt2,, W,
has a continuous distribution (see (8.20)). This last random variable
and Wt, are independent, and hence their sum also has a continuous
distribution. The infimum is treated the same way.
3.
M16. More Measurability. In Section 14 we defined the mapping
+ : D x Do --+ D by +(z,4) = z 0 4 (see (14.11)), and we are to prove it
measurable 2)x Do/D. Since the finite-dimensional sets in D generate
D,it is enough to prove that, for each t , the mapping
is measurable D x Do.If & ( t ) = [ k @ ( t ) l / kthen
,
& ( t ) 1 d ( t ) for each
t. Hence the mapping
converges pointwise to (22), and it is enough to prove this latter mapping measurable 2) x Do.Now [(x,4 ) :x ( & ( t ) )I a] is the union of
(24)
[(x,4 ) :W )= 01 n [(x,4 ) :4 0 ) I
a]
with the sets
then [$ E Do:+(t) E H ] = DOn TT'H lies in DO,and
If H E R1,
E H] E
therefore [(x,4):4(t)E H ] E D x Do. Similarly, [(x,qh):z(t)
D x DO.Thus the sets (24) and (25) all lie in D x DO,which proves
the measurability of (23).
APPENDIXM
250
CONVEXITY
We treat here a few facts about planar convex sets needed for Section
15. We assume the basic theory (support lines, convex hulls, and
so on), and the treatment is somewhat informa1.t Every convex set
considered is assumed to be compact. Denote Euclidean distance by
d(z,y) = Iz - yI, and write A+' = (A')- = [x:d(z,A) 5 E ] .
The Hausdorff distance between two compact sets is
(26)
h ( A , B )= (sup inf )x- yI) V (sup inf
ye B x E A
x E A yEJ3
Equivalently, h(A,B ) is the infimum of those
E
)IC - yl).
for which
Let T be the space of (compact) convex subsets of the unit square
Q = [ 0, 112. For the theory of Section 15, we must show that T is htotally bounded, and in fact we need to estimate the size of the smallest
E-net.
M17. Preliminaries on Convexity. We first show that T is hcomplete, which, once it has been shown to be h-totally bounded, will
imply that T is h-compact. Suppose that {C,} is h-fundamental, and
Cj)-; note that, since the union
define C as the closed set
decreases with i, the intersection is the same for all n. Suppose we
Cj C C: for i 2 n, and hence
have h(C,, Ci) < E for i 2 n. Then
C c (U,"=,Cj)- C Cz'. On the other hand, if z E C,, then the disc
B ( z ,E ) meets Ci for each i 2 n, and therefore B ( x ,E ) - n
Cj)is, for i 2 n, a decreasing sequence of nonempty, closed sets; it follows
Cj)- = B ( z ,E ) - n C is also
that the intersection B ( z ,E ) - n
nonempty. This means that IC E C+',and hence C, c 0'
Thus
.
h(C,,C) 5 E , which shows that h(C,,C) -+ 0.
It remains to show that the closed set C is convex. Choose E,
so that h(C,,C) < E, -+ 0. Suppose that x and y lie in C and
consider a convex combination z = px gy. There are points IC,
and
yn of C, such that 12 - xnl < En and ( y - yI, < en, and of course
z, = PZn qy, + z . Also, z, lies in C, and hence lies within E, of a
nF,(u,"=i
uci
(Ugi
+
+
t Eggleston [25],for example, has a detailed account of the general theory. He
defines the Hausdorff distance as a sum rather than a maximum (which gives an
equivalent metric), and he gives it no name.
251
APPENDIX
M
point z i of C. This means that zk also converges to z , which proves
that z lies in C. Therefore, the limit set C is convex, and T is indeed
complete.
We also need this fact: Suppose that C C C,+l c C, for all n.
Then: C, 1 C if and only if h(C,C,) I 0. Suppose first that C, 1 C
but h(C,, C) > E. There is then an z, in C, for which d(z,, C) 2 E .
Pass to a subsequence along which z, converges to some IC. Then
d ( z , C) 2 E , hence z $! C, hence z $ Cno for some no, and hence
d(z,Cno)> 0. But z, E C,, for n 2 no, and so z, cannot converge
to z, a contradiction. Suppose on the other hand that h(C,C,) -, 0.
Of course, C, 1 A for some A , and C c A c C, c C+' for large n; let
€10: A = C .
We next prove that
XC+€- XC =
Z ( ~ C+
) Em2 if C" # 0,
~ z ( C+
) Em2 if C" = 8.
where X denotes two-dimensional Legesgue measure and 1 denotes arc
length. The case C" = 0 is easy, since then C is a closed line segment,
dC = C, and X(C) = 0. We show in the course of the proof that dC
is rectifiable.
Suppose first that C is a convex polygon. On each side Si of C,
construct a rectangle R, of height E , as shown in Figure 1. Between
& and &+I lies a fan-shaped region Fi, a sector of a disc of radius E.
If -yi is the angle at the apex, then
yi = 2n, and so
XFi = r e 2 .
And of course
XRi = Z(dC)e. Hence (28) for the polygona; case.
Next, take D to be a polygon inscribed in the general C, and apply
(28) for E = 1: Z(dD) < XD+l - AD 5
Thus l ( 8 D ) is bounded,
and d D is indeed rectifiable. And if C c Q, then Cfl c [-1,212,
which gives
(29)
xi
xi
xi
Z(X)5 9
if C
c Q.
(It is intuitively clear that, in fact, Z(dC) 5 4.)
Before proving the general case of (28)) we show that X(dC) = 0.
If C" = 8, this is easy. In the opposite case, translate C so that the
origin is an interior point. The sets cy(dC)are disjoint, and X(cy(dC)) =
a2X(dC),which is impossible if X(dC) > 0.
To prove (28) for the general C with nonempty interior, note first
that each point of C is the limit of points in Go:Given u in C, choose
z1 and z2 in C" in such a way that u , z1, z2 are not colinear and hence
APPENDIX
M
252
are the vertices of a triangle V contained in C ; but obviously, V" c C"
and hence u E avo. Now choose inscribed polygons C, in such a
l(8C) (by the
way that C, A , where C" c A C C. Then Z(aC,)
XA = XC (since X(aC) = 0). Also,
definition of arc length), and XC,
CL' C C;il C C+'. If z E C', then 12 - yI < E for some y E C". But
then y E C, for some n, so that d ( z , C,) < E and 2 E CLe: C' c C;'.
Thus C;' 1 B , where C' C B c C+', and so XC
';
t XB = XC+€
(since X(aC+') = 0). Hence (28).
A further inequality:
r
(30)
r
X(ClAC2)I
45h(C1,C2)
for C1,C2 EQ.
Suppose that h(C1,C2) < h. Then C2 nCf C CTh- C1 and C1 n Cg c
C:h - C2, and so, by ( 2 8 ) , X(C1ACz)5 2(Z(aCl) Z(dC2))h 27rh2.
And now (29) gives X(ClAC2)5 36h 27rh2. Since h(C1,C2) 5
letting h tend to h(C1,Cz)gives (30).
A boundary point of a convex set is regular if through it there runs
only one support line for the set (it is not a "corner" of the boundary).
Another fact we need is that, if C is convex, then every boundary point
z of the larger convex set Cf' is regular. Indeed, there is a unique point
y of C such that Iz- yI = E , and B ( y ,e ) - c C+'; since any line through
z that supports C+' must also support B(y, E ) - , there can be only one
of them. Since h(C,C+') = E, it follows that every convex set can
be approximated arbirarily closely in the Hausdorff metric by a larger
convex set whose boundary points are all regular and whose interior is
nonempty. (Although the boundary of the approximating set has no
corners, it may contain straight-line segments.)
+
+
+
a,
M18. The Size of +Nets. The basic estimate:
Theorem. There is a constant A such that, for 0 < e < 1, there
exists in T an E-net T ( E )consisting of N ( E )convex polygons, where
logN(E) 5 A
Et
-log-.
Further, for each C in T , there is a C' in T ( E )such that C
h(C,C') < E (approximation from above).
c C' and
It is possible to remove the logarithmic factor on the right in (3l),t
but the proof of the stronger inequality is both more complicated and
t Dudley [21], p. 62.
253
APPENDIX
M
less intuitive. We prove only the inequality (31) as it stands, since this
suffices for the application in Section 15.
PROOF.Let r be a positive integer, later to be taken of the order
fl.
Assume that r 2 8. We want to approximate an element C of
T by a convex polygon with a limited number of sides, and we proceed
by constructing a sequence of approximations C1,(72,. . .. These sets
may not be contained in Q and hence may not be elements of T , but
we can remedy this at the end by intersecting the last approximation
with Q. First, we take C1 to be a convex set that has only regular
boundary points, has nonempty interior, and satisfies C c C1 and
h(C,C1)< l/r2, as we know we can.
For z on dC1, let L ( z ) be the (unique) support line through z,
and let u ( z ) be the unit vector that starts at z,is normal to L ( z ) ,
and is directed away from C1. If p ( z ) is the endpoint of ~ ( z after
)
it has been translated to the origin, then p(.) maps dC1 continuously
onto the unit circle (it may not be one-to-one, since dC1 may have
flat places). For 0 5 j < r , let wj be a point of dC1 such that p(wj)
has polar coordinates 1 and 27rj/r. Since C1 may not be contained
in Q, (29) is not available as it stands. But C1 c [-1,212, hence
CT' c [-2,312, and the argument leading to (29) gives l(dC1)5 25.
Therefore, there are points on dC1, at most 25r of them, such that
.
the distance along dC1 from one to the next is at most 1 / ~ Merge
this set with set of the wj.This gives points XO,X I , .. . , Xk-1, ordered
clockwise (say) around E l , such that, first, 1zi-zi+lI 5 1/r ( i + l = 0
for i = k - l ) ,second, the angle Qi between the support lines L ( z i )and
L(zi+l) satisfies 0 5 Qi 5 27r/r, and third, k 5 26r. Now L ( z i )is the
boundary of two closed half-planes; let H ( z i ) be the one containing
C1. Then C2 =
H ( z i ) is a convex polygon which contains C1.
Suppose at first that all the angles Oi are positive, which implies
that CZhas exactly k sides. Consider successive points xi and zi+l:
and let yi be the point where the two lines L ( z i )and L(zi+l)intersect
(Figure 2). If ai and Pi are the angles at the vertices zi and zi+l of
the triangle ziyizi+l, and if zi is the foot of the perpendicular from yi
to the opposite side, then ai +Pi = Qi. From T 2 8 follows Qi 5 27r/r 5
7r/4 and therefore,t Iyi - ziI = 1zi - ziI t a n a i 5 1
z
i - zi+lltanQi 5
( 1 / ~tan(2nlr)
)
5 47r/r2. This implies that ea,ch point of the segment
from zi to yi is within distance 16/r2 of a point on dC1. The same
i to the corresponding point
argument applies to the segment from z
flfzi
If 0 5 t 5 ~ / 4 then
,
cost 2
114, and so tant 5 t/cost 5 2t.
APPENDIXM
254
on the other side. The segment from yi-1 to yi is one of the k sides
of C2, and each of its points is within 16/r2 of 8Ci. Since C1 C C2, it
follows that h(C1,C2) 5 16/r2.
If a Bi is 0, join the sides lying on L ( z i )and L(zi+l) (lines which
coincide in this case) into one longer side. If we do this for each i such
that Bi = 0, we see that C2 now has 26r or fewer noncollinear sides
(k 5 26r), and the angle between each side and the next is at most
27r/r.
There are infinitely many possibilities for the polygon C2, because
there is no restriction on the vertices. The next step is to replace
the vertices of C2 by elements of the lattice C, of points of the form
( i / r 2 , j / r 2(i) and j integers), which will lead to a convex polygon C3
for which there are only finitely many possibilities.
Consider a vertex yi joining adjacent sides Si and Si+lof C2 (Figure
3 ) . We want to move yi to a nearby element of C,. Extend Si and
Si+l past yi to get rays Ri and R: from yi; let Ai be the open region
bounded by Ri and Ri. Since r > 4 , so that 27r/r < 7r/2, the angle
between Ri and R: is obtuse. It is clear that, for p > l / r 2 , any disc
of radius p contains a point of L=, in its interior. Consider the disc of
radius p that is tangent to the rays & and R;. Since the angle at yi is
obtuse, Iyi - a\ < p = \ b - a \ , and so Jyi- bl < 2 p . Take p = 4/3r2:Ai
contains a point y: that lies in C, and satisfies Iyi - yll < 4 / r 2 .
Let C3 be the convex hull of the points y:. (Some of the yi may
be interior to C3.) Since y: lies in the open set Ai, a ray (Figure 4 )
starting from y: and passing through yi will enter the interior of C2
and will eventually pass between a pair 9; and
(or through one of
them). This means that yi lies in the triangle with vertices y;, y,; and
y;+l and hence lies in the convex hull C3. And since this is true for
each vertex of the polygon C2, it is a subset of C3. Finally, since each
yi is within 4/r2 of a point of C2, we arrive at h(C2,C3)< 4/r2.
The successive Haudorff distances from C to C1 to C2 to C3 are less
than l / r 2 ,16/r2,and 4 / r 2 ,and therefore, h(C,Cs) < 21/r2. And the
sets are successively larger. How many of these convex polygons C3 are
there? Although C3 may extend beyond Q, it is certainly contained in
[-1, 212. The number of points of C, in this larger square is less than
9(r2+ 1 ) 2 and hence less than 36r4. Since each of the 26r or fewer
vertices of C3 is one of these points, the number of polygons C3 is a t
most
Mr =
36r4 5 26r( 36r4
2 6 r ) 5 26r(36r4)26'.
yi-1
c y( )
a=1
APPENDIX
M
255
(The first inequality holds if 26r is less than half of 36r4, which only
requires r > 1.) Clearly, log Mr 5 Br log T for a constant B.
If C4 = Q n C3, then C4 is a convex polygon containing C ; then
h(C,C4) < 21/r2,and MT bounds the number of these special elements
2 r - 1. Then
of T . Given an E , take r an integer such that r >
21/r2 < E , and if E is small enough, then r 2 8 (a condition used in
the construction). Since r 5 2 m , (31) follows.
0
Suppose that C is in T , and choose a polygon C: in T ( E in
) such a
way that C c C! and h(C,C:) < E . Now let C: = [z:d ( z ,(C:)') 2 E ] .
Then C: is convex and closed. And we can show that C: c C , that
is, that 2 $ C implies d ( z ,(C:)') < E . Since this inequality certainly
holds if z 4 C:, assume that z E C: - C. If y is the point of C nearest
to IC (Figure 5 ) , then C is supported by the line through y that is
perpendicular to the segment from y to z. Extend this segment until
it meets aC: at the point z . Then y is the point of C nearest to z ,
and since h(C,C:) < E, we have Iy - z ( < E , and hence d ( z ,(C:)") < E .
Therefore, C: C C C Cr.
A simple argument shows that X(C:-C:)
5 l ( a C : ) ~5 9 ~ On
: each
side of the convex polygon C: construct a rectangle that has height E
and overlaps C: (Figure 6 ) . The extra factor of 9 can be eliminated by
9 place of E and noting that replacing E by ~ / in
9
starting over with ~ / in
(31) gives a function of the same order.+ (It is not true that h(C:, Cr)
is always small: Take C: to be an isosceles triangle with angle E at the
apex.)
To sum up: If C is an element of T , there are an element C: of
T ( E )and a companion convex set C:, such that
Since C: is completely determined b y C:, it follows that the number of
possible pairs (CL, C:) is N ( E ) .
Let TObe the countable class of finite intersections of sets contained
in U, T ( l / n ) . If Cn =
Cy,kl then Cn E To and h(C,Cn)3 0,
and since C c Cn+l c Cn, it follows that CT1
1 C.
t In marking examination papers, Egorov (of Egorov's theorem) used to take off
a point if a student ended up with 2~ instead of c.
APPENDIXM
256
PROBABILITY
M19. Etemadi’s Inequality. If S1,.
. . ,S n are sums of independent
random variables, then
To prove this, consider the sets Bk where
j < k. Since the B k are disjoint,
2 3 a but
lSjl
< 30 for
M20. Bernstein’s Inequality. Suppose that 171,. . . ,qn are independent random variables, each with mean 0 and variance 0 2 ,and each
qn; it has mean 0 and variance
bounded by 1. Let S = 171
s2 = no2. Then
+ +
+
This is a version of Bernstein’s inequality.
To prove (33), suppose that 0 < t < 1 and put G ( t ) = 1/(1 - t ) .
Then, since 17: 5 7: for r 1 2,
By independence,
APPENDIX
M
257
and therefore, for positive y,
> etY]I e-tyE[etS] 5 exp[3G(t)
t2s2
P[S 2 y] = P [ ets -
- ty]
,
We want to bound the probability on the left, and the method will be
clear if at first we operate as though G ( t ) were constant:G(t) = G.
This makes it easy to minimize the right side over t. The minimum is
at
t=- Y
(34)
s2G'
and substitution gives
(35)
Although this argument is incorrect, a modification of it leads a
good bound. Replace the G in (34) by 1/(1- t ) and solve for t : t =
y/(s2 + y). Since this t is between 0 and 1! (35) does hold if we replace
the G there by 1/(1- t ) = ( s 2 y)/s2:
+
Replace y by xJ;I and use s2 = no2;the resulting inequality, together
with the symmetric inequality on the other side, gives (33).
M21. R6nyi Mixing. Events Al,A2,.. . in ( Q , F P, ) are mizing in
the sense of Re'nya if
(36)
P(A, n E ) -+ aP(E)
for every E in 3. In this case, P(A,) + a. Suppose that 30is a field
in 3,and let 3 1 = ~ ( 3 0 ) 30
: C3
1 C 3.
Theorem. Suppose that (36) holds for every E in 30and that
A, E 3 1 for all n. Then
(37)
holds af g is 3-measurable and P-integrable; in particular, (36) holds
for all E in 3. And af P dominates PO, a second probability measure
o n 3 , then
(38)
Po(&)
+
a.
APPENDIX
M
258
PROOF.The class of 3-sets E satisfying (36) is a X-system (use the
M-test) which contains the field 30and therefore [PM.42] contains 31.
It follows that (37) holds if g is the indicator of an 31-set and hence
if it is a simple, F1-measurable function. If g is 31-measurable and
P-integrable, choose simple, F1-measurable functions g k that satisfy
19k1 I
191 and 9k
9. Now
+
00: (37) follows by the dominated
Let n + 00 and then let k
convergence theorem.
Now suppose that g is 3-measurable and P-integrable and use conditional expected values. Since A, € 3 1 and E[g((.Fl]satisfies (37), it
follows that
--f
L , g d P = L n E[gll31]dP + f2
1
E[gJl31]dp = f2 1 9 d P ;
(37) again holds. To prove (38), take g = dPo/dP.
0
M22. The Shift Operator. Let R+03be the space of doubly infinite
sequences z = (. . . , ~ - ~ ( z ) , ~ o ( ~ ). .,.)~of~ real
( z )numbers;
,
the Jk are
the coordinate variables. Take 72:"
to be the field of finite-dimensional
sets (cylinders), and let R+" = o(R$"). Let T be the (left) shift
operator, defined by the equation &(Tz)= En+1(z).If 6 is a function
on R+", define BT as the function with value B(Tz) at z;for example,
SnT = En+l.
Define R", R", and
as in Example 1.2: z = ( ~ { ( x ) , & (. z. .).
),
Let T' be the shift operator here: <L(T'z)= tk+l(z)for n 2 1.
Any stochastic process . . .q-1, 770, 771,. . . can be realized on the
space (R+03,R+"),in the sense that, by Kolmogorov's existence the-
",r
under which the
orem, there is a probability measure P, on ",+"
coordinate process {&} has the same finite-dimensional distributions
as { qn}. Since all convergence-in-distributionsresults depend only on
the finite-dimensional distributions, {qn} and {&} are equivalent for
our purposes. Now (7,) is stationary if and only if T preserves P, on
R+03: P,T-l = P,. And by definition, (77,) is ergodic if T is ergodic
under P,. Similarly, a one-sided process (771,772, . . .) can be realized on
(R', R", Pb) for the appropriate Pb; the original process is stationary
if and only if T' preserves Pb and is by definition ergodic if T' is ergodic
under P,.
APPENDIX
M
259
But if (71,7 2 , . . .) is stationary, it can also be represented on the
space (RS",RS", P,) if, under P,, (&, . . . , &) has the distribution of
(71 . . . ,q1+2)--2L)
for all u and v. Thus a stationary one-sided process
can be extended to a stationary two-sided process with the same finitedimensional distributions. And T , T', and the two processes are all
ergodic or none is. This is because ergodicity depends only on the
finite-dimensional distributions: For example, if T is ergodic under P,
then it follows by the ergodic theorem and the bounded convergence
theorem that
(39)
C
1 ,
P(A n T - ~ B )-+ P(A)P(B)
n
k=l
-
for all A and B in R+". Conversely, this implies ergodicity: If A = B
is invariant, then PA is 0 or 1. And (39) holds in general if it holds for
A and B in Rt": Given A and B in R+" and an E , choose sets A,
and B, in R
:" in such a way that P(AAA,) < E and P(BAB,) < E ,
and prove (39) by approximation.
If E[&ll<ll.. . ,<,-I] = 0, so that ((I,&, . . .) is a stationary martingale difference, then the two-sided extension satisfies the condidtion
E[&ll&k,.
. . ,&-I] = 0 for each k 2 1, and it follows by a standard
convergence theorem [PM.470] that E[&ll. . . ,&-2, &-1] = 0.
If F is a a-field in RSm,let T-lF be thz a-field of sets T-lA for
A E F.Suppose that P is preserved by T . If 8 (measurable R+") is
integrable and A E Fl then, by change of variable,
and therefore,
(40)
E[OllF]T = E[8T(IT-1F].
If Fn = a [ & : k 5 n], then T-'F, = & + I ,
variables & are integrable, then (40) gives
(41)
In particular,
and if the coordinate
E[JillFnlT = E[<Z+lIl-Tn+ll.
APPENDIXM
260
For any 0, {OTn:
n = 0, f l ,. . .} is a stationary stochastic process, and
[PM.495] if T is ergodic, so is this new process. Thus {E[<n113n-1]}
and { E[<z/I.Fn-l]} (for 4; integrable) are ergodic processes.
M23. Uniform Mixing. Let 11 . 11 stand for the L2 norm, and write
E A if 4 is A-measurable. There are three standard measures of
dependence between o-fields A and l?:
5
(43)
a(A,B) = S U ~ [ J P (nAB ) - P A . PBJ:A E A, B E l?],
These measures are successively stronger; specifically,
To prove the left-hand inequality, simply take
I B - PB.
xi
< = I A - PA and 77 =
xj
The right-hand inequality in (46) is harder to prove. It is enough
u~IA
and
~q =
vjI~~,
to consider simple random variables E =
where {Ai) and { B j }are finite decompositions of R into A-sets and
&sets, respectively. By Schwarz's inequality
Hence it is enough to prove that
APPENDIX
M
261
Since
j
a
(47) will follow if we show that
C IP(BjlAi)
-
PBjI I 2 d A , B )
j
holds for each i. If C z [C,:] is the union of those Bj for which the
difference P(Bj(Ai) - PBj is positive [nonpositive], then C z and Ctlie in 13, and therefore,
C IP(Bj1Ai)
-
PBjl = [P(CTIAi) - PCT]
j
+ [PCF - P(CJAi)]
I
2 d A , a).
This proves the right-hand inequality in (46). (It is easy to strengthen
the inequality between the first and third members of (46) to a(A,23) 5
$44 a).)
Finally, if q is B-measurable and has a second moment, then
(48)
IIE[rllldI
-
E[rllll
I P(A,B)llrl- E[77111.
We may assume that E[q] = 0. By the definition (44))IE[qllA]q]l 5
p(d,B)llE[q11A]II. 11q11. But the left side here is IE[E[qllA]qllA]]I=
lE[(E[qllA])2]1= IIE[q11d]J12,from which (48) now follows.
M24. The Gauss-Kuamin-L6vy Theorem. The theorem is this:
Theorem. Suppose that fo has two continuous derivatives o n [ 0,1]
and that f o ( 0 ) = 0 and fo(1) = 1. Define f l , f 2 , . . . b y the recursion
C[fn(')
3
f n 3( +Lt ) ]
W
(49)
Then, f o r n
and
fn+dt) =
j=1
2 0 and 0 I
t I 1,
-
APPENDIX
M
262
where Is,(t)l
v Is;(t)l I
K B ~K, > 0, o < B < I.+
This can be used for a simple proof of the uniform mixing condition
(19.38). Let
A
(52)
= [z:u ~ ( z=
) ai, 1 1. i
5 k],
and define fn(t) as the conditional probability P([X:T"~Z 5 t](A),
where P is Gauss's measure (19.22). Since
00
j=1
1
j+t
3
it follows by countable additivity that the f, satisfy (49). Let a(t)
[,f3(t)]
be the smaller [larger] of
and G+.
. .+d-.
Then, since x = d m
d a k ( x ) T k x , A n [x:Tkx5 t] is the
set of irrationals in [ ~ ( t@
) ,( t ) ]Therefore,
.
fi+..+&
+ +
+
and fo satisfies the conditions of the Gauss-Kuzmin-Levy theorem.
- *, it follows by the
Since Tk+,-lz =
uniqueness of the partial quotients in an expansion that the general
element of the a-field &+, = a ( ~ k + ~ , a ~ +.,.+.) lhas
,
the form B =
[x:Tk+n-lx
E HI for H E R'. Therefore, by (50),
+d
P(B1A) =
where
that
s,
a+
fL-l(t) d t = P H
+ &-l,
18,-11 5 KOn-'. But P H = PB by stationarity, and it follows
IP(A n B ) - P A . PBI 5 KBn-lPA
(53)
holds if A has the form (52) and B E .Fk+n. The class of sets A satisfying (53) is closed under the formation of countable disjoint unions,
and each element of %k = ~ ( a l ,. . ,a h ) is such a union of sets (52).
Therefore, (53) holds for A E 7-lk and B E &+,, which proves (19.38).
~
~~
~
~
~~
t See Rockett&Sziisz [58], p. 152 for the proof. They do not give (50) (which
obviously implies (51)) as part of the statement of the theorem, but it follows from
their method of proof.
263
APPENDIX
M
M25. Normal Tails. For x > 0,
Therefore, for x
> 0,
(54)
This obviously follows from (54) if x 2 l/G.In the opposite case,
we have P[N x] 5 < e-1f4R 5 e-"f2.
For an inequality in the opposite direction, assume that 0 5 x < y.
Since the normal density exceeds e - Y 2 / 2 / f i on (x, y), we have
i
P[x < N
Y - xe-y2/2
< y]2 -
&
If [B(s):O5 s 5 t] is a Brownian motion and a 2 0, then, by
symmetry and the reflection principle (see (8.20)),
- B ( s ) 2 a] = 4P"
P[sup,,t IB(s)I 2 a] = 2P[SUP,<t
Combine this with (54) to get
Combine it with (55) to get
2 a/&].
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
SOME NOTES ON THE PROBLEMS
1.11. Given E and a compact K , cover K by finitely many balls
B ( x i ,ri),i 5 k, for which xi E K , ri < ~ / 2 and
, B(zi,~ i ) - is compact;
take G = U i B ( z i , r i )and K1 = G-. Choose 6 so that 6 < ~ / 2and
p ( K , G C )> 6. Then K C K b C K1 C K'. If f(x) = (1 - p ( x , K ) / 6 ) + ,
then IK 5 f 5 I K ~and
, hence the uniformly continuous f has compact
support. If Pf = Qf, then PK Pf = Qf 5 QK1 5 QK'. It follows
that P K 5 QK and (symmetry) PK = QK.
1.17. Suppose that L is locally compact and dense in the open
ball B , and arrange that B- n L is compact; choose points x, of B so
that p(zm,2), 2 E > 0 for m # n, then choose points yn of B- L so
that p ( y n , x n ) < ~ / 3 and
,
conclude that B- n L is not compact.
1.18. Take T = (0,1]. Let X be a Hamel basis for [l,GO) that
contains 1 [PM.198], and take U to consist of the reciprocals of the
)
elements of X\{l}. Consider the elements of Cb(T)defined by z U ( t=
sin(27r/ut) for u E U . Choose a and p so that 0 < a/27r < p/27r <
1/4. Then, for each distinct u and 2) in U , there is (Hardy&Wright
[36], Theorem 443) a t in T such that ( { l / u t }{l/vt})
,
E (0, a / 2 ~x)
(p/27r, where the brackets denote fractional part. But then we have
lxu(t)-xv(t)l2 sinp-sina > 0. The I[xU-x2,11are therefore bounded
away from 0.
2.8. If { x n } contains a subsequence converging to 2, then P = 6,
is easy to prove. Suppose on the other hand that {z,} contains no convergent subsequence. Then, for each k, {z,} contains no subsequence
converging to xk, hence there is an rk such that z, $ B ( z k , r k )for inQ ) ) = 0.
finitely many n, and hence P ( B ( x k r, k ) ) lim inf, bZn(B(zk,
But if F consists of the xk and G = F C ,then F is closed (no sequence
in F converges) and G is open, so that P G liminf, 6,,G = 0. But
B ( Q ,~ k ) ,a countable union of sets of P-measure 0.
now S = G U
2.9. Let F p be the field (Problem 2.5) of P-continuity sets. If F
is closed, then F EJ F and F' E F p for all but countably many 6 . The
<
i),
<
uk
264
<
SOMENOTES ON
THE
PROBLEMS
265
field F p therefore generates S, and [PM.169] for each A in S and each
E , P ( A A B ) < E for some B E F p . But P ( B - - B ) = 0: For each
A in S and each E , P ( A A F ) < E for some closed F . If Plfl < 00,
k
there is, for given E , a simple hl = Ci&lAi
(ti # 0) such that
P ( f - hll < E . Choose closed sets Fi such that P(AiAFi) < €/Itilk
and set h2 =
&IF,; then Plf - h2l < 2 6 . Now choose 6 so that
PFi 5 PF,“ < PFi ~/lti(lc
for each i. Let gi(x) = (1 - p ( x , Fi)/S)+
and g = C tigi. Show that Plf - g1 < 3 ~ .
3.8. It would follow that (Szn- Sn)/fi = fiS2n/d% - S n / f i
converges in distribution to N and in probability to (fi
- 1)q. But
l ) q would have the same distribution as N , which is
then r] and
impossible.
4.2. Assume that X n
X (on A ) . To show that ( X Y , X ; )
( X l , X 2 ) , define pij(x,y) as x y / ( l - x) if x # 1; if x = 1, take it to be
1 or 0 as j = i 1 or not. Let J consist of the pairs of distinct positive
integers and let J k consist of the ( z , j )in J for which i , j I k . Suppose
that f is bounded and continuous on R2. By Theorem 3.5,it will be
enough to show that
+
(a-
*
*
+
That this holds if J is replaced by J k is clear. From C J p i j ( X ? ,Xj”) =
C J p i j ( X i , X j )= 1, it follows that C J , . p i j ( X ? , X j ” ) * C J k c p i j ( X , , X j ) .
Therefore, for given E and r ] , there exist k and no such that n 2 no imp i j ( X 1 ,Xj”) > E ] < r ] . Now use the fact that f is bounded
plies P[CJL
to derive (1).
4.3. Let G and ll have the GEM and Poisson-Dirichlet distributions, so that pG =d n. In the argument leading to Theorem
4.4,we showed that aZn
G and concluded by Theorem 4.1 that
pZn = puZn + pG =d n. But now (see the preceding problem)
apZn
an and apZn =d aZn + G, so that an =d G. And
=d U n =d G.
aG =d
5.12. (a) The class of Bore1 sets in Q has the power of the continuum [PM.36]. (d) I f f is bounded and uniformly continuous, then
*
*
where wfis the modulus of continuity.
SOMENOTES ON
266
THE
PROBLEMS
6.4. Consider the functions p'(z,
6.5. It is easy to show that P, --+ P (weak*) implies P, J O PO.
Suppose now that P, JO Po and that f is bounded and continuous
(and hence S-measurable). If we can, for given r ] , construct a bounded,
continuous, So-measurable g such that f 5 g and Pg 5 Pf r ] , then
P, -P P (weak*) will follow without difficulty. Assume that 0 < f < 1,
and choose points ti such that 0 = t o < * . . < tl, = 1, ti - ti-1 < r ] ,
and P[f = ti] = 0. If Fi = [ti-l 5 f 5 t i ] , then f 5 CitiIFi and
C i t i P F i 5 Pf 7. Choose c SO that C i t i P ( F t ) < C i t i P F i q.
Define gi by (6.1) with Fi in place of F ( M supports P ) , and take g =
tigi. Then f 5 g and Pg 5
tiP(((F;>"n M ) ' ) =
tiP(F;) <
tiPFi 7 5 Pf 2 ~ .
7.2. See Dudley [23],p. 40.
8.1. Show that every locally compact subset of CO= [xE C: z(0) =
01 is nowhere dense and that each ball in COhas positive Wiener measure.
a).
+
xi
+
+
+
xi
+
Xi
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
BIBLIOGRAPHICAL NOTES
Since this second edition is more a textbook than a research monograph, I think few historical notes are needed. In [6] (the first edition
of this book) there are some notes of this kind; for more extensive accounts of the history, see Dudley [23], Ethier & Kurtz [26], Hall & Heyde
[35], Jacod & Shiryaev [39], and Pollard [53].
Books devoted pimarily to weak convergence are Davidson [15]
(for students of economics), Ethier & Kurtz [26], Jacod & Shiryaev (391,
Parthasarathy [48], Pollard [53], Shorack & Wellner [SO], and van der
Waart & Wellner [65]. Books partly devoted to weak convergence are
Dudley [23], Durrett [24], Hall & Heyde [35], Karatzas & Shreve [40],
and Strook & Varadhan [63].
Section 4. Vervaat [66], p. 90, shows that (4.13) has (4.14) as its
Laplace transform and connects it with limit theory for record values.
Griffiths [33], p. 145, gives the joint moments for the density (4.15);
this generalizes (4.17). For a different approach to Theorem 4.4, see Arratia, Barbour, &Tavark [4]. For the case of cycle lengths (0 = l ) , the
result is due to Shepp & Lloyd [59]. Theorem 4.5 was proved in [8]; the
proof given here is that of Donnelly& Grimmet [17]. For the proof of
Theorem 4.5 itself, the argument in [8]is the most straightforward, but
the Donnelly-Grimmet approach is better because it puts the result in
a very general and interesting context; see Arratia, Barbour, & Tavar6
[3] and [4], and also Arratia [2]. For more on computation, see Griffiths
[33] and [34]. The arguments at the end of Section 4 are laborious but
require only calculus.
Section 5. Prohorov’s fundamental paper is (561. The proof of
Theorem 5.1 given here is from [7]; for a similar construction on locally
compact spaces, see Halmos [36], p. 231. The construction in Problem
5.12 is due to Leger; see Choquet [13]. In this connection, see also
Preiss [55].
Section 6. “Weak’ly” can be pronounced “weak-circly.”
267
268
BIBLIOGRAPHICAL
NOTES
Section 7. The second proof of Theorem 7.5 is due to Wichura [67].
Section 10. The maximal inequalities of this section are from [6].
The proofs here, much easier than those in [6], are due to Michael
Wichura. See Bickel & Wichura [5] for extensions to multidimensional
time.
Section 11. Theorem 11.1 is from a manuscript version of [6]; it
was cut to keep the book short. For generalizations, see Gonz6les
Villalobos [30]. For a different approach to functional limit theorems
for lacunary series, see Philipp & Stout [52].
Section 12. Skorohod’s basic paper is [61].
Section 14. The proofs here all follow the same pattern: We prove
weak convergence of the finite-dimensional distributions and then verify tightness. Some patterns of proof circumvent the tightness issue.
An application of Theorem 12.6, for example, requires no consideration
of tightness, since that is effectively built into its hypothesis; see the
proof of Theorem 17.3. Theorem 19.4 of [6] is another example. For a
systematic treatment of this approach, see Ethier & Kurtz [26].
Section 15. Theorem 15.1 is from Dudley [20] and Theorem 15.2
from Bolthausen [ll]. The proof in [ll] uses the theory of analytic
sets, but as the proof here shows, this is unnecessary. The theorem on
the size of €-nets [M18] is due to Dudley [21].
Section 16. The development is based on Lindvall [43] and Aldous
PI
*
Section 17. Theorem 17.2 is another result cut from [6] for reasons
of space. For a different approach to these problems, see Philipp [49].
De Koninck and Galambos [16] have a result related to Theorem 17.3;
they prove convergence of the finite-dimensional distributions but do
not formulate a functional limit theorem.
Section 18. For statistical applications of martingale limit theorems, see Hall & Heyde [35].
Section 19. In [6], thirty pages (Sections 20 and 21) were necessary
to cover what is achieved very simply and efficiently in this section by
the use of Gordin’s method of approximating stationary processes by
martingales; see Gordin [31]. Although convergence to diffusions was
briefly treated in [6] (see the comments on Section 14 above), it has
become a subject of its own, and I omit any account of it here. See
Ethier&Kurtz [26] for a systematic treatment; a good place to start
is Durrett [24]. I have cut Section 22 of [6], on empirical distribution
functions for dependent random variables.
BIBLIOGRAPHICAL
NOTES
269
Section 20. For a survey of various kinds of invariance principles,
see Philipp [51]. Theorem 20.2 can be found in Breiman [12], p. 279,
and in Freedman [29], p. 83.
Section 21. The second proof of Theorem 21.1 was shown to me
by Walter Philipp; it is based on [50].
Section 22. The proof is from Strassen [62] and (on several points)
Freedman [29].
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
BIBLIOGRAPHY
1. David Aldous: Stopping times and tightness, Ann. Probab. 6(1978)
335-340.
2. Richard Arratia: On the central role of scale invariant Poisson processes on (0, m), Discrete Math. Theoret. Comput. Sci., 41(1998)
21-41.
3. Richard Arratia, A.D. Barbour, and Simon Tavard: Random combinatorial structures, Notices Amer. Math. SOC.,44( 1997)903-910.
4. Richard Arratia, A.D. Barbour, and Simon Tavark: Logarithmic
Combinatorial Structures, book in preparation.
5. P.J. Bickel and M.J. Wichura: Convergence criteria for multiparameter stochasic processes and some applications, Ann. Math.
Statist., 42(1971)1656-1670.
6. Patrick Billingsley: Convergence of Probability Measures, first edition, New York, Wiley, 1968.
7. Patrick Billingsley: Weak Convergence of Measures: Applications
in Probability, Philadelphia, SIAM, 1971.
8. Patrick Billingsley: On the distribution of large prime divisors,
Periodica Mathematica Hungarica, 2( 1972)283-289.
9. Patrick Billingsley: Probability and Measure, third edition, New
York, Wiley, 1995.
10. Harald Bohr: Almost Periodic Functions, New York, Chelsea, 1947.
11. Erwin Bolthausen: Weak convergence of an empirical process indexed by the closed convex subsets of 12, 2. Wahrscheinlichkeitstheorie 43(1978)173-181.
12. Leo Breiman: Probability, Reading, Addison-Wesley, 1968.
13. G. Choquet : Sur les ensembles uniformkment ndgligeables, Sem.
Choquet (Initiation Ci l'analyse), Paris, ge annke (1969/1970) no. 6.
270
BIBLIOGRAPHY
271
14. J. Czipszer and L. Gehkr: Extension of functions satisfying a Lipschitz condition, Acta Math. Acad. Sci. Hungar. 6(1955)213-220.
15. James Davidson: Stochastic Limit Theory, New York, Oxford University Press, 1994.
16. J.-M. De Konink and J. Galambos: The intermediate prime divisors of integers, Proc. Amer. Math. SOC.101(1987)213-216.
17. Peter Donnelly and Geoffrey Grimmett: On the asymptotic distribution of large prime factors, J. London Math. SOC.(2), 47(1993)
395-404.
18. M. Donsker: An invariance principle for certain probability limit
theorems, Mem. Amer. Math. SOC.6(1951).
19. J.L. Doob: Stochastic Processes, New York, Wiley, 1953.
20. R.M.Dudley: Weak convergence of probabilities on nonseparable
metric spaces and empirical measures on Euclidean spaces, I1linois
J. Math. 10(1966)109-126.
21. R.M. Dudley: Metric entropy of some classes of sets with differentiable boundaries, J . Approximation Th. 10(1974)227-236; Correction, ibid. 26(1979)192-193.
22. R.M. Dudley: A Course on Empirical Processes, Springer Lectures
Notes 1097(1984), New York, Springer.
23. Richard M. Dudley: Real Analysis and Probability, Pacific Grove,
Wadsworth and Brooks/Cole, 1989.
24. Richard Durrett: Stochastic Calculus, Boca, Raton, CRC Press,
1996.
25. H.G. Eggleston: Convexity,Cambridge, The University Press, 1958.
26. Stewart N. Ethier and Thomas G. Kurtz: Marlcov Processes: Characterization and Convergence, New York, Wiley, 1986.
27. W.J. Ewens: Mathematical Population Genetics, New York, Springer, 1979.
28. William Feller: A n Introduction to Probability Theory and Its Applications, third edition, New York, Wiley, 1968.
29. David Freedman: Brownian Motion and Diflusion, San Francisco,
Holden-Day, 1971.
30. Alvaro GonzBles Villalobos: Some functional limit theorems for
dependent random variables, Ann. Probab. 6( 1974)1090-1107.
31. M.I. Gordin: The central limit theorem for stationary processes,
Soviet Math. Dokl. 10(1969)1174-1176.
272
BIBLIOGRAPHY
32. M.I. Gordin: On the behavior of the variances of sums of random variables forming a stationary process, Theory Probab. Appl.
16(1971)474-484.
33. R.C. Griffiths: On the distribution of allele frequencies in a diffusion model, Theor. Pop. Biol.,15(1979)140-158.
34. R.C. Griffiths: On the distribution of points in a Poisson Dirichlet
process, J. Appl. Prob., 25(1988)336-345.
35. P. Hall and C.C. Heyde: Martingale Limit Theory and Its Application, New York, Academic Press, 1980.
36. Paul R. Halmos: Measure Theory, New York, Van Nostrand, 1950.
37. G.H. Hardy and E.M. Wright: An Introduction to the Theory of
Numbers, fourth edition, London, Oxford University Press, 1960.
38. I.A. Ibragimov and Yu.V. Linnik: Independent and Stationary Sequences of Random Variables,Wolters-Noordhoff, Groningen, 1971.
39. Jean Jacod and Albert N. Shiryaev: Limit Theorems for Stochastic
Processes, Berlin, Springer, 1987.
40. Ioannis Karatzas and Steven E. Shreve: Brownian Motion and
Stochastic Calculus, New York, Springer, 1988.
41. John L. Kelley: General Topology,Princeton, Van Nostrand, 1955.
42. A. Ya. Khinchine: Continued Fractions, third edition (English
translation), Chicago, University of Chicago Press, 1964.
43. Torgny Lindvall: Weak convergence in the function space D [0, m),
J. Appl. Probab. 10(1973)109-121.
44. L.A. Liusternik and V. J. Sobolev: Elements of Functional Analysis,
New York, Frederick Ungar, 1961.
45. Peter Major: Approximation of partial sums of i.i.d.r.v.s when the
summands have only two moments, 2. Wahrscheinlich~eitstheorie,
35( 1976)221-229.
46. D. Monrad and W. Philipp: The problem of embedding vectorvalued martingales in a Gaussian process, Theory Probab. Appl.
35(1990)374-377.
47. A.M. Mark: Some probability theorems, Bull. Amer. Math. Soc.
55(1949)885-900.
48. K.R. Parthasarathy: Probability Measures on Metric Spaces, New
York, Academic Press, 1967.
49. Walter Philipp: Mixing sequences of random variables and probabilistic number theory, Mem. Amer. Math. SOC.114(1971).
.
BIBLIOGRAPHY
273
50. Walter Philipp: Weak and LP-invariance principles for sums of Bvalued random variables, Ann. Probab. 8(1980)68-82; Correction,
ibid. 14(1986)1095-1101.
51. Walter Philipp: Invariance principles for independent and weakly
dependent random variables, pp. 225-268 in Dependence in Probability and Statistics, Ernst Eberlein and Murad Taqqu, editors,
Boston, Birkhaiiser, 1986.
52. Walter Philipp and William Stout: Almost sure invariance principles for partial sums of weakly dependent random variables, Mem.
Amer. Math. SOC.,161(1975).
53. David Pollard: convergence of Stochastic Processes, New York,
Springer, 1984.
54. Jean-Pierre Portmanteau: Espoir pour l'ensemble vide? Annales
de 1 'Universite' de Felletin, CXLI(1915)322-325.
55. David Preiss: Metric spaces in which Prohorov's theorem is not
valid, 2. Wahrscheinlichlceitstheorie und Verw.Gebiete 27( 1973)109
-116.
56. Yu.V. Prohorov: Convergence of random processes and limit theorems in probability theory, Theory Probab. Appl., 1(1956)157-214.
57. Philip Protter: Stochastic Integration and Differential Equations,
New York, Springer, 1990.
58. Andrew M. Rockett and Peter Sziisz: Continued Fractions, New
Jersey, World Scientific, 1992.
59. L.A. Shepp and S.P. Lloyd: Ordered cycle lengths in a random
permutation, Trans. Amer. Math. SOC.121(1966)340-357.
60. Galen R. Shorack and Jon A. Wellner: Empirical Processes with
Applications to Statistics, New York, Wiley, 1986.
61. A.V. Skorohod: Limit theorems for stochastic processes, Theory
Probab. Appl., 1(1956)261-290.
62. V. Strassen: An invariance principle for the law of the iterated logarithm, 2. Wahrscheinlichkeitstheorie und Verw. Gebiete, 3( 1964)
211-226.
63. Daniel W. Strook and S.R. Srinivasa Varadhan: Multidimensional
Diffusion Processes, New York, Springer, 1979.
64. M. Talagrand: Les boules peuvent-elles engendrer la tribu borelienne d'un espace mbtrisable non separable? Se'minair Choquet, Paris,
17e annbe: C5, 1978.
65. Aad W. van der Waart and Jon A. Wellner: Weak Convergence
and Empirical Processes, New York, Springer, 1996.
274
BIBLIOGRAPHY
66. W. Vervaat: Success Epochs in Bernoulli trials, Mathematical Centre Tracts, No. 41, Amsterdam, 1972.
67. M.J. Wichura: A note on the weak convergence of stochastic processes, Ann. Math. Statist. 42( 1971)1769-1772.
Convergence of Probability Measures
Patrick Billingsley
Copyright 01999 by John Wiley & Sons, Inc
INDEX
Greek symbols are alphabetized by their representation in roman letters (phi for
so on). A reference [PM.n] is to page n of Probability and Measure, third
edition.
4, and
A,, 127
Aldous, 270
Aldous’s tightness criterion, 176
&-mixing, 201
Arc sine law, 100
Arratia, 269
Arratia, Barbour, & Tavark, 269
Arzel&-Ascolitheorem, 81, 159
Baire category theorem, 241
Ball a-field, 12, 65
Barbour, 269
Base for a topology, 237
Bernstein’s inequality, 256
Beta function, 245
Bickel & Wichura, 270
Bohr, 10
Bolthausen, 270
Borel set, 7
Borel a-field, 7
Boundary, 236
Bounded set, 240
Breiman, 271
Brownian bridge, 93, 100
Brownian motion, 87
Cadlag function, 121
Cauchy sequence, 238
Chaining, 165
Choquet, 269
Compact set, 241
Complete set, 240
Continued fractions, 206, 264
Continuity set, 15
Convergence, in distribution, 25
in probability, 27
Convergence-determining class, 18
Convex sets, 161, 261
Coordinate variables 86
Coupling theorem, 73
Cover, 237
Cram& Wold met hod, [PM .382]
Davidson, 269
De Konick & Galambos, 270
8, 2, 236
&sparse, 122
De Moivre-Laplace theorem, 1
Diagonal method, [PM.538]
Dini’s theorem, 242
Diophantine approximation, 205
Dirichlet distribution, 246
formula, 245
Discontinuity of the first kind, 121
Discrete metric, 236
Distribution function, 25
Distribution of a function on the line, 33
Distribution of a random element, 24
Dominated measures, 148
Donnelly & Grimmett, 269
Donsker’s theorem, 4, 90, 146
Doob, 112
Dudley, 74, 158, 254, 269, 270
Dudley’s theorem, 158
Durrett, 269, 270
Eggleston, 251
275
INDEX
276
Egorov, 255
Empirical distributions, 149
Empirical processes, 156, 161
€-net, 239
Equicontinuity, 81
Equivalent metrics, 237
Erdos, 182
Ergodic process, 197, 260
Ergodic theorem, [PM.314]
Etemadi’s inequality, 256
Ethier & Kurtz, 269, 270
Ewens, 43
Feller, 30, 38, 98, 99, 104
Finer topology, 236
Finite-dimensional sets, in C, 12
in D ,134
in R-, 10
Freedman, 271
Functional analysis, 6
Functional limit theorem, 93
Functions of mixing processes, 202
Fundamental sequence, 240
Gamma function, 245
Gauss-Kuzmin-LBvy theorem, 263
Gaussian random function, 93
Gauss’s measure, 205
GEMdistribution, 43
Golambos, 270
Gonzdes Villalobos, 270
Gordin, 270
Griffiths, 269
Grimmett, 269
Hall& Heyde, 269, 270
Halmos, 71, 246, 269
Hardy& Wright, 50, 182, 266
Hartman-Wintner theorem, 231
Hausdorff distance, 251
Helley’s selection theorem, [PM.336]
Heyde, 269, 270
Incommensurable
(linearly independent)
arguments, 34, 119
Independent increments, 86
Integral limit theorems, 29
Integration to the limit, 30
Invariance principle, 93, 210
Jacod & Shiryaev, 269
Kac, 182
Kac-Steinhaus theorem, 35
Karatzas & Shreve, 269
Kurtz, 269, 270
Kuzmin, 263
L,25
Lacunary series, 113
A, 124
lIAIIo, 125
Laplace transform, 63
Large prime divisors, 49
Larger topology, 238
Law, 25
LBger, 269
LBvy, 100, 263
Lindeberg-L6vy case, 94
Lindeberg-LBvy theorem, [PM.357]
Lindeberg theorem, [PM.359]
Lindelov property, 238
Lindvall, 270
Linearly independent, 34, 119
Lipschitz function, 242
Liusternik & Sobolev, 13
Lloyd, 269
Local limit theorems, 29
Local vs. integral laws, 29
Long cycles, 49
Lyapounov theorem, [PM.362]
m-dependent, 199
M-test, [PM.542]
Mapping theorem, 20
Marriage lemma, 75
Maximal inequalities, 105
Mean value, 32
Measurable S/B,
IPM.1821
[Mnl, 7
Modulus of continuity, 80
Modulus of right continuity, 122
Natural projection, from C , 12
from R-, 10
Net, 239
Norm, 11
Nowhere dense, 241
Open cover, 237
Parthasarathy, 269
&mixing, 201
Philipp, 270, 271
277
INDEX
Philjpp & Stout, 270
n-system, 9
[PM.n],9
Poisson-Dirichlet distribution, 43
Poisson limit, 135
Pollard, 269
Portmanteau, 273
Portmanteau theorem, 15
Preiss, 269
Probability measure, 7
Products of metric spaces 244
Prohorov, 269
Prohorov metric, 72
Prohorov’s theorem, 59
Projection, from C , 12
from D, 133
from Rm, 10
R’, R”, 9
Rademacher function, 148
Ramanujan. 182
Random element, 24
function, 24
sequence. 24
variable, 24
vector, 24
Random walk case. 94
Ranking function, 42
Reflection principle, 93
Regular measure, 7
Relative compactness, of measures, 57
of sets, 81
Relative measure, 32
Relatively compact, 57. 240
Renewal theory, 154
Rdnyi mixing, 257
pmixing, 200, 260
Scheffk’s theorem, [PM.214]
Semicontinuity, 242
Semiring, 19
Separability, 237
Separating class, 9
Shift operator, 258
Shiryaev, 269
Shorack Rc Wellner. 269
a-compact, 9
Sizebiased sampling, 46
Shepp & Lloyd, 59
Shreve, 269
Skorohod, 70, 208. 270
Skorohod representation theorem. 70
topology, 123
Sparse, 122
Steinhaus, 35
Stopping time, 176
Stout, 270
Strssen. 74, 271
Strassen’s theorem, 223
Strassen-Dudley theorem, 74
Strook & Varadhan, 269
Subspaces, 243
Talagrand, 12
TavarB, 269
Tight measure. 8
family of measures, 59
Tight’, 68
Topological completeness, 238
Totally bounded set, 239
Under, 86
IJniform distribution modulo 1, 15
Uniform integrability, 31
Uniform metric, 11
Uniform mixing, 201, 260
Unit mass, 14
Upper semicontinuity. 242
van der Waart k Wellner. 269
Varadhan, 269
Vervaat, 269
W Z ( b ) , 80
w m ,121
W L ( b ) , 122
w;(E). 131
Weak convergence, 7
Weak’ convergence, 67
Wellner, 269
Wichura, 270
Wiener measure, 86
Wiritner, 231
Wright, 50. 182, 266