MATH 550-01 INTRODUCTION TO PROBABILITY FALL 2011

MATH 7550-01
INTRODUCTION TO PROBABILITY
FALL 2011
Lecture 14. The zero-one law. Convergence of random variables.
So for an infinite sequence of independent events either finitely many of them occur
almost
surely, or almost surely infinitely many occur (according to whether the series
∑∞
𝑖=1 𝑃 (𝐴𝑖 ) converges or diverges). It turns out that this is a particular case of a more general statement, and the reason by which this general statement (which will be formulated
later) can be applied here is that whether the event {infinitely many of events 𝐴𝑖 occur}
occurs is determined by the tails of the sequence: by its subsequence 𝐴𝑛 , 𝐴𝑛+1 , 𝐴𝑛+2 , ...,
for every natural 𝑛 (because 𝐴1 , 𝐴2 , ..., 𝐴𝑛−1 are only finitely many).
It is more convenient to speak about a sequence of random variables 𝜉1 , 𝜉2 , ..., 𝜉𝑛 , ...;
and we are going to do so (with the events 𝐴𝑖 we can associate their indicator random
variables 𝐼𝐴𝑖 ).
Now we go to the set-theoretic introduction to probability theory.
Let 𝜉1 , 𝜉2 , ..., 𝜉𝑛 , ... be an infinite sequence of random variables (independent or
not, does not matter now: we are in the set-theoretic introduction to probability theory,
where the probability 𝑃 and all things based on it, in particular, independence, are not
admitted).
Let us introduce 𝜎-algebras of events in the sample space Ω (i. e., 𝜎-algebras being
parts of the fundamental 𝜎-algebra ℱ):
ℱ≤𝑛 = 𝜎(𝜉𝑖 , 1 ≤ 𝑖 ≤ 𝑛),
(14.1)
generated by the random variables 𝜉1 , ..., 𝜉𝑛 ;
ℱ[𝑛, 𝑚] = 𝜎(𝜉𝑖 , 𝑛 ≤ 𝑖 ≤ 𝑚);
(14.2)
ℱ≥𝑛 = 𝜎(𝜉𝑖 , 𝑖 ≥ 𝑛),
(14.3)
generated by the random variables 𝜉𝑛 , 𝜉𝑛+1 , 𝜉𝑛+2 , ... .
Now we introduce the tail 𝜎-algebra ℱ≥∞ : the limit as 𝑛 → ∞ of the 𝜎-algebras ℱ≥𝑛
defined by (14.3). (The word “tail” is used when we are dealing with infinite sequences, and it means
the parts of the sequences after their 𝑛-th element, for 𝑛 being large. E. g., we can say that convergence
of an infinite series depends only on the tails of the sequence of its terms.)
What does it mean, the limit of a sequence of 𝜎-algebras?
Remember that a 𝜎-algebras are classes of sets (subsets of Ω, in our case); and a
class of sets is just a set whose elements happen to be sets; and that we have defined
what the limit of a sequence of sets is in the cases of a non-decreasing sequence and of a
non-increasing one.
The sequence of 𝜎-algebras ℱ≥𝑛 , 𝑛 = 1, 2, 3, ..., is non-increasing:
ℱ≥𝑛 = 𝜎(𝜉𝑛 , 𝜉𝑛+1 , 𝜉𝑛+2 , 𝜉𝑛+3 , ...) ⊇ ℱ≥𝑛+1 = 𝜎(𝜉𝑛+1 , 𝜉𝑛+2 , 𝜉𝑛+3 , ...),
(14.4)
because the first 𝜎-algebra has one random variable, namely 𝜉𝑛 , more in its generators.
1
So by definition
ℱ≥∞ =
∞
∩
ℱ≥𝑛 .
(14.5)
𝑛=1
Note that ℱ≥∞ is defined as
variable 𝜉∞ (or random variables
natural numbers).
the limit at infinity, and not as 𝜎(𝜉𝑖 , 𝑖 ≥ ∞): there is no random
𝜉𝑖 numbered with indices that are greater than ∞: all 𝑖’s are just
The intersection of any number of 𝜎-algebras is again a 𝜎-algebra; so ℱ≥∞ is one.
There are no random variables 𝜉𝑖 that participate in generating this 𝜎-algebra: 𝜉𝑛 is
excluded at the stage that we include ℱ≥𝑛+1 in our intersection. Does it mean that the
𝜎-algebra ℱ≥∞ is the same as the 𝜎-algebra in Ω generated by and empty set of random
variables – i. e., the 𝜎-algebra {Ω, ∅} consisting of two events only? No, as this example
shows:
Let 𝐴1 , 𝐴2 , ..., 𝐴𝑛 , ... be a sequence of events. Let the random variables 𝜉𝑖 be their
indicators: 𝜉𝑖 = 𝐼𝐴𝑖 . Then the event
{infinitely many of 𝐴𝑖 , 𝑖 = 1, 2, ... occur}
(14.6)
belongs to the tail 𝜎-algebra ℱ≥∞ .
Indeed, the event (14.6) is the same as the event
{infinitely many of 𝐴𝑖 , 𝑖 = 𝑛, 𝑛 + 1, 𝑛 + 2, ... occur} =
∞ ∪
∞
∩
𝐴𝑖 ;
(14.7)
𝑘=𝑛 𝑖=𝑘
this event is represented by applying countable set-theoretic operations to events 𝐴𝑛 , 𝐴𝑛+1 ,
𝐴𝑛+2 , ..., and so it belongs to the 𝜎-algebra 𝜎{𝐴𝑛 , 𝐴𝑛+1 , 𝐴𝑛+2 , ...} generated by these
events, which is the same as the 𝜎-algebra 𝜎(𝜉𝑛 , 𝜉𝑛+1 , 𝜉𝑛+2 , ...) = ℱ≥𝑛 generated by their
indicators.
So we have:
{infinitely many of 𝐴𝑖 , 𝑖 = 1, 2, ... occur} ∈ ℱ≥𝑛
(14.8)
for every 𝑛, and the event (14.6) belongs to the intersection of these 𝜎-algebras:
{infinitely many of 𝐴𝑖 , 𝑖 = 1, 2, ... occur} ∈ ℱ≥∞ .
(14.9)
We can produce many more examples of “tail” events belonging to the 𝜎-algebra ℱ≥∞ ;
I am going to show one more example, and see Problems 21 – 27 .
Let 𝜉1 , 𝜉2 , ..., 𝜉𝑛 , ... be a sequence of real-valued random variables. Let us prove that
the event
{ lim 𝜉𝑛 = 𝑎}
(14.10)
𝑛→∞
belongs to the tail 𝜎-algebra ℱ≥∞ (𝑎 is an arbitrary real number).
2
We know that lim𝑛→∞ 𝜉𝑛 (𝜔) = 𝑎 means that for every 𝜀 > 0 there exists a natural 𝑘
such that for every 𝑖 ≥ 𝑘 we have ∣𝜉𝑖 (𝜔) − 𝑎∣ < 𝜀; so
{𝜔 : lim 𝜉𝑛 (𝜔) = 𝑎} =
𝑛→∞
∞
∞ ∩
∩ ∪
{𝜔 : ∣𝜉𝑖 (𝜔) − 𝑎∣ < 𝜀}.
(14.11)
𝜀>0 𝑘=1 𝑖=𝑘
∩
We feel a little uncomfortable having an uncountable intersection here: 𝜀>0 ; applying
uncountable operations to events may, in general, produce a set not belonging to our
𝜎-algebra ℱ of events. But we can take as 𝜀 all numbers of the form 1/𝑚 and rewrite
the event as follows:
{𝜔 : lim 𝜉𝑛 (𝜔) = 𝑎} =
𝑛→∞
∞ ∪
∞ ∩
∞
∩
{𝜔 : ∣𝜉𝑖 (𝜔) − 𝑎∣ < 1/𝑚}.
(14.12)
𝑚=1 𝑘=1 𝑖=𝑘
Now, this is clearly an event, and even one belonging to ℱ≥1 , because all events
{∣𝜉𝑖 − 𝑎∣ < 1/𝑚} belong to it.
We can also rewrite this event in the form
{ lim 𝜉𝑛 = 𝑎} =
𝑛→∞
∞ ∪
∞ ∩
∞
∩
{∣𝜉𝑖 − 𝑎∣ < 1/𝑚}
(14.13)
𝑚=1 𝑘=𝑛 𝑖=𝑘
for an arbitrary natural 𝑛; all events {∣𝜉𝑖 − 𝑎∣ < 1/𝑚} here belong to the 𝜎-algebra ℱ≥𝑛 ,
so the event {lim𝑛→∞ 𝜉𝑛 = 𝑎} belongs to the 𝜎-algebra ℱ≥𝑛 for every natural 𝑛; and so
also to their intersection, the tail 𝜎-algebra ℱ≥∞ .
Theorem 14.1 (the 0 – 1 law). Let 𝜉1 , 𝜉2 , ..., 𝜉𝑛 , ... be a sequence of independent random variables. Then every event belonging to the tail 𝜎-algebra ℱ≥∞ either has
probability 1, or 0.
Proof. Let us take
𝒜=
∞
∪
ℱ≤𝑛 .
(14.14)
𝑛=1
This class of sets is an algebra in Ω (but not a 𝜎-algebra: otherwise, since it contains all
events of the form {𝜉𝑖 ∈ 𝐶𝑖 }, it would contain the smallest 𝜎-algebra ℱ≥1 containing all
such events – which is not the case).
Indeed, clearly Ω ∈ 𝒜 (it belongs to every summand in the union (14.14)). Now,
about the complement: Let 𝐴 ∈ 𝒜. By its definition (14.14), this means that there exists
a natural 𝑛 such that 𝐴 ∈ ℱ≤𝑛 . Since ℱ≤𝑛 is a 𝜎-algebra, we have 𝐴𝑐 ∈ ℱ≤𝑛 , and of
course this complement belongs to the union (14.14).
Finally, about the union of two (or finitely many) sets. Let 𝐴, 𝐵 ∈ 𝒜; this means
that there exist natural 𝑛 and 𝑚 such that 𝐴 ∈ ℱ≤𝑛 , 𝐵 ∈ ℱ≤𝑚 . Without restriction
of generality we can assume that 𝑚 ≥ 𝑛. Clearly ℱ≤𝑛 ⊆ ℱ≤𝑚 , because the second
𝜎-algebra is generated by a larger number of random variables; so we have also 𝐴 ∈ ℱ≤𝑚 .
From 𝐴 ∈ ℱ≤𝑚 , 𝐵 ∈ ℱ≤𝑚 we obtain: 𝐴 ∪ 𝐵 ∈ ℱ≤𝑚 ⊆ 𝒜.
3
Clearly, the 𝜎-algebra ℱ≥1 is generated by the algebra 𝒜. (In the lecture, I used, for some
reason, for this 𝜎 -algebra the notation ℱ≤∞ ; the notation ℱ<∞ would be more appropriate – but the
notation ℱ≥1 had been already introduced.)
Now let us prove that every event 𝐴 ∈ 𝒜 and every 𝐵 ∈ ℱ≥∞ are independent. Let
us take the event 𝐴 fixed. By definition, there exists a natural 𝑛 such that 𝐴 ∈ ℱ≤𝑛 . Also
by definition, 𝐵 ∈ ℱ≥𝑘 for every natural 𝑘; in particular, 𝐵 ∈ ℱ≥𝑛+1 .
Let us introduce the algebra
∞
∪
𝒞=
ℱ[𝑛+1, 𝑚]
(14.15)
𝑚=𝑛+1
(the proof of this being an algebra is the same as for 𝒜).
Every 𝐵 ∈ 𝒞 is independent from our event 𝐴 by Theorem 13.1 (for finitely many
random variables 𝜉1 , ..., 𝜉𝑛 and 𝜉𝑛+1 , ..., 𝜉𝑚 ). By Theorem 12.2 (we remember that every
algebra is a semi-algebra) we get that every 𝐵 ∈ 𝜎(𝒞) = ℱ≥𝑛+1 is independent from 𝐴.
Since every 𝐵 ∈ ℱ≥∞ belongs to ℱ≥𝑛+1 , we get that every 𝐵 ∈ ℱ≥∞ is independent with
every event 𝐴 ∈ 𝒜.
Now we use Theorem 12.2 another time, and get that 𝐵 (an arbitrary set in ℱ≥∞ )
and every 𝐴 ∈ 𝜎(𝒜) = ℱ≥1 are independent.
Now we take an arbitrary 𝐴 ∈ ℱ≥∞ , and 𝐵 = 𝐴. This event belongs to the 𝜎-algebra
ℱ≥1 , so by what we have proved, 𝐴 and 𝐴 are independent:
𝑃 (𝐴 ∩ 𝐴) = 𝑃 (𝐴) = 𝑃 (𝐴) ⋅ 𝑃 (𝐴).
(14.16)
So 𝑃 (𝐴) is a solution of the quadratic equation 𝑃 (𝐴)2 = 𝑃 (𝐴); but this equation has only
two solutions, 𝑃 (𝐴) = 0 and 𝑃 (𝐴) = 1.
The theorem is proved.
Many concrete things follow from this theorem: that a series with independent summands can either converge almost surely, or diverge almost surely; that the distribution
𝜉1 + ... + 𝜉𝑛
function of the random variable 𝜂 = lim𝑛→∞
can only take values 0 or 1
𝑛
(what can you say about such a random variable?); etc.
A question may arise whether a zero-one law is true for an uncountable family of
independent random variables. But, first of all, I would like to remind you that we have not
proved existence of an uncountable family of random variables with prescribed distributions
𝜇𝛼 , 𝛼 ∈ 𝐴 (except in the simple case that all these distributions except countably many
of them are concentrated at one point: 𝜇𝛼 = 𝛿𝑎𝛼 , the unit measure concentrated at the
point 𝑎𝛼 ). I won’t tell you whether such an existence theorem holds (we’ll return to this
later); but if an uncountable family of independent random variables does exist: say, 𝜉𝑡 ,
𝑡 ∈ [0, ∞), the stochastic process 𝜉𝑡 , 𝑡 ∈ [0, ∞), wouldn’t possess any good properties,
e. g., it wouldn’t be continuous in any sense at any point 𝑡 ∈ [0, ∞); and so it would be of
no use as a mathematical model for any extra-mathematical phenomenon.
But here we notice that we don’t know what continuity, as applied to random variables,
means. Of course, continuity is based on convergence; and it will be in the next lecture
4
that we touch upon the question of convergence – or rather convergences – of random
variables.
All of us have heard the following (or something like this): the relative frequency
of an event in a very large number of repetitions of the experiment (under the same
conditions) is approximately equal to its probability; or: the relative frequency of an event
in 𝑛 repetitions of an experiment becomes closer and closer to the probability of the event
as the number 𝑛 of repetitions grows.
For random variables, this can be reformulated as follows: if a real-valued random
variable has a (finite) expectation, the arithmetic mean of its values in a large number
of repetitions of our experiment is close to the expectation of our random variable; or:
becomes closer and closer to it as the number of experiments grows.
We can see that this is, basically, the same thing if we consider, say, a random variable 𝜉 taking three values: 𝑎, 𝑏, and 𝑐; the arithmetic mean of the values of our random
variable in 𝑛 experiments is equal to 𝑎 times the relative frequency of the event {𝜉 = 𝑎},
plus the same for 𝑏 and 𝑐, and this is approximately equal to the expectation.
These are statements belonging not to mathematics: rather their place is at the borderline between mathematics and the extra-mathematical world: just outside mathematics.
But we can aspire to create a mathematical model for this.
Something has already been done in this direction: an idealized mathematical model
for observing the values of a random variable in repeated experiments is a sequence of
independent random variables; we are happy now that such a thing does exist (as well as
an infinite sequence of independent events with any probabilities).
So we may aspire to have mathematical theorems stating that for a sequence of random variables 𝜉1 , 𝜉2 , ..., 𝜉𝑛 , ..., under such and such conditions, the arithmetic mean
𝜉1 + ... + 𝜉𝑛
of the first 𝑛 of them converges to ... But here we stop for some time: we
𝑛
haven’t specified what conditions are imposed on the random variables 𝜉𝑖 , and we even
don’t know whether they all have the same expectation 𝐸 𝜉𝑖 .
However, it is even more important that we haven’t specified in what sense the convergence of random variables should be understood here.
You see, random variables are functions (of the sample point 𝜔); and whereas we
have, basically, only one concept of convergence for sequences of numbers, we have many
types of convergence for functions: e. g., pointwise convergence; uniform convergence; or
L𝑝 -convergence; etc. So we have to introduce and study some types of convergence for
random variables.
It turns out that uniform convergence or convergence at all points do not play any
significant role in probability theory. Let me introduce two other types of convergence.
Let 𝜂1 , 𝜂2 , ..., 𝜂𝑛 , ... be a sequence of random variables on a probability space
(Ω, ℱ, 𝑃 ); 𝜁, a random variable on the same space. We say that the sequence 𝜂𝑛 converges
in probability to 𝜁 (the notations: 𝜂𝑛 →𝑃 𝜁 (𝑛 → ∞), or: (𝑃 ) lim𝑛→∞ 𝜂𝑛 = 𝜁) if for
every positive 𝜀
𝑃 {∣𝜂𝑛 − 𝜁∣ < 𝜀} → 1
(𝑛 → ∞).
(14.17)
5
Of course, (14.17) is equivalent to
lim 𝑃 {∣𝜂𝑛 − 𝜁∣ ≥ 𝜀} = 0,
𝑛→∞
𝜀 > 0.
(14.18)
We can define convergence in probability also for random variables taking values in
an arbitrary metric space 𝑋, supposing the distance dist(𝑥, 𝑦) is a measurable function of
the pair (𝑥, 𝑦), by
𝑃 {dist(𝜂𝑛 , 𝜁) < 𝜀} → 1
(𝑛 → ∞);
(14.19)
but for simplicity’s sake we’ll stick to 𝑋 = ℝ1 or ℝ𝑛 .
We say that the sequence of 𝜂𝑛 converges to 𝜁 almost surely if
𝑃 { lim 𝜂𝑛 = 𝜁} = 𝑃 {𝜔 : lim 𝜂𝑛 (𝜔) = 𝜁(𝜔)} = 1,
(14.20)
𝑃 {𝜂𝑛 ∕→ 𝜁 (𝑛 → ∞)} = 0.
(14.21)
𝑛→∞
𝑛→∞
or, equivalently,
Of course, we have to check first that the 𝜔-sets in (14.20) or (14.21) are indeed
events. The set under the probability sign in (14.20) consists of all 𝜔 such that the limit
exists, and is equal to 𝜁(𝜔). The set
{𝜔 : lim 𝜂𝑛 (𝜔) exists}
𝑛→∞
(14.22)
belongs to ℱ (is an event), and the limit is an ℱ-measurable function on this set (a fact
from the set-theoretic introduction to measure theory, see Lecture 6, the text around
formula (6.24)). So the set under the probability sign in (14.20) is one on which the
ℱ-measurable function lim𝑛→∞ 𝜂𝑛 (𝜔) − 𝜁(𝜔) belongs to the Borel set consisting of one
point 0, and so this set belongs to ℱ (is an event).
Convergence in probability and almost sure convergence are considered also in measure
theory, under the names of convergence in measure and convergence almost everywhere
(remember that systematic disregard of sets of zero measure or events of zero probability
is a common trait of measure theory and probability theory).
The types of convergence that we introduced have some properties that we are not
accustomed to: namely, the limit in probability – or the almost-sure limit – is not unique,
in general. Indeed, if we have in our probability space some non-empty events having
probability 0 (and such is the situation, for example, always when we are considering
continuous random variables), we can change the limiting random variable 𝜁 arbitrarily
on such a set of probability measure 0, and still it will be a version of the same limit. Let
us say that a random variable 𝜁 ′ is equivalent to the random variable 𝜁 if 𝑃 {𝜁 ′ ∕= 𝜁} = 0.
Then 𝜂𝑛 →𝑃 𝜁 ′ if and only if 𝜂𝑛 →𝑃 𝜁, and 𝜂𝑛 → 𝜁 ′ almost surely if and only if 𝜂𝑛 → 𝜁
almost surely. This is because for equivalent random variables 𝜁 and 𝜁 ′ we have:
{∣𝜂𝑛 − 𝜁 ′ ∣ < 𝜀} Δ {∣𝜂𝑛 − 𝜁∣ < 𝜀} ⊆ {𝜁 ′ ∕= 𝜁},
¯
¯
¯𝑃 {∣𝜂𝑛 − 𝜁 ′ ∣ < 𝜀} − 𝑃 {∣𝜂𝑛 − 𝜁∣ < 𝜀}¯ ≤ 𝑃 {𝜁 ′ ∕= 𝜁} = 0,
6
(14.23)
(14.24)
and also
∣𝑃 { lim 𝜂𝑛 = 𝜁 ′ } − 𝑃 { lim 𝜂𝑛 = 𝜁}∣ ≤ 𝑃 {𝜁 ′ ∕= 𝜁} = 0.
𝑛→∞
𝑛→∞
(14.25)
To overcome this difficulty (or: to go back to what we are accustomed to), we can
consider these convergences not for random variables, but rather on the set of equivalence
classes of random variables; or we can just state that the limits in these senses are not
unique, but almost unique.
But we have to prove that if 𝜂𝑛 →𝑃 𝜁 and 𝜂𝑛 →𝑃 𝜁 ′ , or if both 𝜂𝑛 → 𝜁 and 𝜂𝑛 → 𝜁 ′
almost surely, then 𝜁 ′ ∼ 𝜁 (𝜁 ′ is equivalent to 𝜁, which means that 𝑃 {𝜁 ′ ∕= 𝜁} = 0).
Suppose first that 𝜂𝑛 →𝑃 𝜁 and 𝜂𝑛 →𝑃 𝜁 ′ . This means that (14.18) holds for every
positive 𝜀, and also
𝑃 {∣𝜂𝑛 − 𝜁 ′ ∣ ≥ 𝜀} → 0
(𝑛 → ∞).
(14.26)
We have, for any positive 𝜀 and natural 𝑛:
{∣𝜁 ′ − 𝜁∣ ≥ 2𝜀} ⊆ {∣𝜂𝑛 − 𝜁 ′ ∣ ≥ 𝜀} ∪ {∣𝜂𝑛 − 𝜁∣ ≥ 𝜀},
(14.27)
so
𝑃 {∣𝜁 ′ − 𝜁∣ ≥ 2𝜀} ≤ 𝑃 ({∣𝜂𝑛 − 𝜁 ′ ∣ ≥ 𝜀} ∪ {∣𝜂𝑛 − 𝜁∣ ≥ 𝜀})
≤ 𝑃 {∣𝜂𝑛 − 𝜁 ′ ∣ ≥ 𝜀} + 𝑃 {∣𝜂𝑛 − 𝜁 ′ ∣ ≥ 𝜀} → 0
(𝑛 → ∞).
(14.28)
Since 𝜀 > 0 is arbitrary, we get:
′
𝑃 {𝜁 ∕= 𝜁} = 𝑃
∞
(∪
)
{∣𝜁 ′ − 𝜁∣ ≥ 2/𝑚} = lim 𝑃 {∣𝜁 ′ − 𝜁∣ ≥ 2/𝑚} = 0
𝑚→∞
𝑚=1
(14.29)
∪∞
(because {𝜁 ′ ∕= 𝜁} = 𝑚=1 {∣𝜁 ′ − 𝜁∣ ≥ 2/𝑚}).
For the almost-sure convergence it is quite simple:
{𝜁 ′ ∕= 𝜁} ⊆ { lim ∕= 𝜁} ∪ { lim ∕= 𝜁},
(14.30)
𝑃 {𝜁 ′ ∕= 𝜁} ≤ 𝑃 { lim ∕= 𝜁} + 𝑃 { lim ∕= 𝜁 ′ } = 0.
(14.31)
𝑛→∞
𝑛→∞
𝑛→∞
𝑛→∞
It turns out that almost sure convergence is stronger than convergence in probability:
Theorem 14.2. Let 𝜂𝑛 → 𝜁 almost surely. Then 𝜂𝑛 →𝑃 𝜁.
Proof. Suppose 𝜂𝑛 → 𝜁 almost surely. Then we have, for an arbitrary positive 𝜀
(see a similar formula (14.11)):
{𝜔 : 𝜂𝑛 (𝜔) → 𝜁(𝜔)} =
∞ ∩
∞
∩ ∪
{𝜔 : ∣𝜂𝑖 (𝜔)−𝜁(𝜔)∣ < 𝜀} ⊆
𝜀>0 𝑛=1 𝑖=𝑛
1 = 𝑃 {𝜂𝑛 → 𝜁} ≤ 𝑃
∞ ∩
∞
∪
{𝜔 : ∣𝜂𝑖 (𝜔)−𝜁(𝜔)∣ < 𝜀},
𝑛=1 𝑖=𝑛
∞ ∩
∞
(∪
𝑛=1 𝑖=𝑛
7
)
{∣𝜂𝑖 − 𝜁∣ < 𝜀} .
(14.32)
(14.33)
So the probability in the right-hand side is equal to 1. Let us go to the complements:
𝑃
∞ ∪
∞
(∩
)
{∣𝜂𝑖 − 𝜁∣ ≥ 𝜀} = 0.
(14.34)
𝑛=1 𝑖=𝑛
∪∞
∩∞
But the events 𝑖=𝑛 {∣𝜂𝑖 − 𝜁∣ ≥ 𝜀} form a non-increasing sequence, so the 𝑛=1
in (14.34) is nothing but the limit lim𝑛→∞ of these events, and the probability of the
limit is equal to the limit of the probabilities:
lim 𝑃
𝑛→∞
∞
(∪
)
{∣𝜂𝑖 − 𝜁∣ ≥ 𝜀} = 0.
Now we use the fact that {∣𝜂𝑛 − 𝜁∣ ≥ 𝜀} ⊆
∪∞
𝑖=𝑛 {∣𝜂𝑖
lim 𝑃 {∣𝜂𝑛 − 𝜁∣ ≥ 𝜀} ≤ lim 𝑃
𝑛→∞
(14.35)
𝑖=𝑛
𝑛→∞
∞
(∪
− 𝜁∣ ≥ 𝜀}, and get:
)
{∣𝜂𝑖 − 𝜁∣ ≥ 𝜀} = 0.
(14.36)
𝑖=𝑛
As a matter of fact, we did not prove that almost sure convergence is stronger than
that in probability: we proved rather that it is not weaker. But in fact it is stronger:
there are cases in which convergence in probability does take place, but not almost sure
convergence.
8