STA 348: INTRODUCTION TO STOCHASTIC PROCESSES 1

STA 348: INTRODUCTION TO STOCHASTIC PROCESSES
EVGENIJ KRITCHEVSKI
1. Probability spaces, the different theoretical frameworks.
Probability theory, statistics, and stochastic processes can be studied in various frameworks,
at different levels of generality and abstraction. Each framework has its intrinsic mathematical
advantages and drawbacks. From the perspective of the student, each framework has its own
learning process and will offer the learner the opportunity to develop a distinct set of skills.
Each framework has the common central concept of a probability space, which is roughly as
follows. We are given a set Ω and a ”probability” function P that assigns to subsets E of Ω a
number P (E) ∈ [0, 1]. One can think of P (E) as the mass of E. The function P needs to be
additive, i.e. P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) for E1 ∩ E2 = ∅. We also require the total mass to be
equal to one, i.e. P (Ω) = 1. From a purely abstract and somewhat dry point of view, one can say
that probability theory is nothing but a theory of mass distributions. The interesting part, however,
is that the abstract theory can explain and in some sense predict the outcome of many real life
experiments. For example if you decide to toss a fair coin 1000 times and to record the number x
of times that heads appear, then from purely theoretical considerations I can be pretty much sure
that you will observe |x/1000 − 1/2| ≤ 0.1. How can I predict this result? Well, I would look at the
set Ω of all possible outcomes of my experiment of tossing the coin 1000 times. The outcome of the
experiment is a word ω = ω(1), ω(2), · · · , ω(1000) where ω(k) = H if the k’th tossed yields heads
and ω(k) = T if the k’th tossed yields tails. There are in total 21000 such words. The set Ω consists
of all those words. Then I would decide the mass distribution P to give the same mass 2−1000 to
each word ω ∈ Ω. That means that for every subset of E ⊂ Ω, P (E) = |E| / |Ω| = |E| /21000 i.e. the
number of elements in E divided by the total number of elements in Ω. Then I would look at the
subset E of Ω consisting of words ω with |x/1000 − 1/2| ≤ 0.1. By some mathematical tricks (that
we will learn) I would compute that P (E) ≤ 10−8 . That number is very small, and I would think
of E having a very small probability and therefore I would be pretty much sure that the result ω
of the experiment will not belong to the set E.
The simplest possible framework of finite discrete probability: We are given a finite set
Ω = {ω1 , · · · , ωn }. There are in total 2n possible subsets of Ω. Let us denote by 2Ω the collection
of all these subsets. For example if n = 10, the set E = {ω3 , ω7 } is an element of 2Ω .
Definition 1.1. A probability measure on Ω is a function P : 2Ω → [0, 1] such that
(1) P (Ω) = 1
(2) P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) for all E1 , E2 ∈ 2Ω with E1 ∩ E2 = ∅.
Date: January 14, 2010.
1
2
EVGENIJ KRITCHEVSKI
In the finite discrete framework, there is a very simple way to describe probability measures. Let
pi = P ({ωi }) for 1 ≤ i ≤ n. Then the property (2) implies that
!
P (E) =
pi ,
i:ωi ∈E
and so the numbers pi determine completely the probability
measure P . Conversely, if p1 , p2 , · · · , pn
"n
is a set of numbers such that "
each pi ≥ 0 and i=1 pi = 1, then we can define the probability
measure P by setting P (E) := i:ωi ∈E pi . The point here is that the basic building blocks of P are
the masses of individual points, i.e. the numbers pi = P ({ωi }), 1 ≤ i ≤ n.
Some terminology: The set Ω together the function P will de called a discrete finite probability
space. The subsets of Ω are called events. The number P (E) is referred to as the probability of the
event E.
Some examples:
1) Ω = {1, 2, 3, 4, 5, 6} and P ({j}) = 1/6 for 1 ≤ j ≤ 6.
(2) Ω = {H, T } and P ({H}) = P ({T }) = 1/2.
(3) Ω = {a, b, c} and P ({a}) = 0, P ({b}) = 1, P ({c}) = 0.
(4) Ω = {Homework, P roject, F inal} and
P ({Homework}) = 0.4, P ({P roject}) = 0.15, P ({F inal}) = 0.45. Then P ({P roject, F inal}) =
0.15 + 0.45 = 0.6.
(5) Ω is a set of 36 cards. A card is specified by its rang from
{6, 7, 8, 9, 10, Jack, Queen, King, Ace} and its suit from {spades, hearts, diamonds, clubs}. For
each card ω ∈ Ω, P ({ω}) = 1/36. (remark: (Ω, P ) can be naturally seen as a product space)
The event E = {ω : ω is an Ace} has probability P (E) = 1/9.
The main advantages of studying probability theory in the present framework is that the finiteness
assumption reduces technicalities to the absolute minimum. The main mathematical objects are
very easy to define and to intuitively understand. The proofs of most results are easy since we never
have to worry about convergence questions, limits, infinite series, etc ... Thus the novice learner can
focus on the structure and phenomenology of the theory, to start developing probabilistic intuition
and to study the heart of the matter, instead of agonizing over technical details. Despite the
finiteness limitation, the framework is rich enough to capture fundamental principles such as the
law of large numbers, central limit theorem, and measure concentration.
The main disadvantage of this framework is again the limitation of finiteness. While we can
study finite sequences of coin tosses, or more generally, finite words with letters chosen randomly
from a finite alphabet, we cannot deal with any kind of infinite sequences of random numbers, and
we cannot even formulate statements like ”with probability one, a sequence of random numbers
converges”. Another serious problem is the impossibility to deal with continuous probability distributions, i.e. having random real numbers chosen uniformly from the interval [0, 1] or chosen from
the whole real axis according to the Gaussian distribution (2π)−1/2 e−t/2 .
The most general framework of measure theory: We are given a nonempty set Ω and we
do not impose any restriction on the cardinality of Ω (it could be, finite, countable, uncountable, or
even bigger). The case when Ω is countable is essentially a very simple generalization of the finite
case (see the discussion of the countable discrete framework below). The real difference starts when
3
Ω is at least uncountable, for example Ω = R. The definition of a probability measure on Ω is no
longer obvious matter, and has in general to do something "
more elaborate than assigning a mass
P ({ω}) to each point of ω ∈ Ω. Why? Well, if we require ω∈Ω P ({ω}) = 1, then automatically
there must exist a countable subset C ⊂ Ω so that P ({ω}) = 0 for ω ∈ Ω\C. There is no way to
sum up to one uncountably many positive numbers.
As in the finite case, let us think of a probability as a mass distribution on Ω. In some cases, we
have a good a priori intuitive understanding (i.e. not based on a formal mathematical definition)
of a probability measure. Let us look at the unit interval Ω = [0, 1] = {ω ∈ R : 0 ≤ ω ≤ 1}. What
should a uniform mass distribution P on Ω look like? The mass of any subinterval is the length of
the interval, i.e. P ([a, b]) = b − a for all 0 ≤ a ≤ b ≤ 1. The mass of any single point has to be
zero, P ({ω}) = 0 for all ω ∈ Ω, since a point is the limiting case of a very small interval. Thus
P ([a, b]) = P ((a, b]) = P ([a, b)) = P ((a, b)). Also if a set is"
a countable union of disjoint intervals
∞
S = ∪n=1 [an , bn ], then it is very natural to have P (S) = ∞
n=1 (bn − an ). We see that the basic
building blocks of P are the intervals and not the (zero mass) individual
# ∞ points. More generally
if we are given a Riemann integrable function p : R → [0, ∞) with −∞ p(x)dx = 1, for example
#b
2
p(x) = √12π e−x /2 , we have a natural mass distribution on R assigning the mass a p(x)dx to each
interval [a, b].
In the most general setting, we do not assume any a priori (algebraic or topological) structure
on Ω. In order to define a probability measure, the basic building blocks are the subsets of Ω. A
probability measure on Ω should be function P that assigns a number P (E), 0 ≤ P (E) ≤ 1 to
subsets E ⊂ Ω. We certainly want the following two properties to hold. 1) P (Ω) = 1. 2) If (En )n≥1
are pairwise disjoint subsets of Ω, i.e. En ∩ Em = ∅ for n += m, then
$
&
%
!
P
En =
P (En ).
n≥1
n≥1
It turns out that, in many important cases, trying to define P for all possible subsets E ⊂ Ω leads
to catastrophic problems. For example, one can show that there exists no probability measure
P (E) on Ω = [0, 1], defined for all subsets E ⊂ Ω, and verifying properties 1), 2) together with
P ([a, b]) = b − a. In the general theory one defines P (E) only for a certain system of subsets E.
Definition 1.2. Let Ω be a nonempty set. A σ-field on Ω is a nonempty collection F of subsets of
Ω that is closed under complements and countable unions i.e.
(1) If E ∈ F then Ω\E ∈ F
(2) If (En )n≥1 is a sequence of subsets of Ω and En ∈ F for all n ≥ 1, then ∪∞
n=1 En ∈ F.
A measurable space is a pair (Ω, F), where F is a σ-field on Ω. Note that the definition of a
σ-field implies that we must have ∅ ∈ F and Ω ∈ F. Of course F = 2Ω = {all subsets of Ω} is
a σ-field, which is the biggest possible. On the other extreme F = {∅, Ω} is the smallest possible
σ-algebra. Most of the time, one works with an ”intermediate” σ-field F which is not as big as 2Ω
and not as small as {∅, Ω}.
4
EVGENIJ KRITCHEVSKI
Definition 1.3. Let (Ω, F) be a measurable space. A function P : F → [0, 1] is called a probability
measure if P (Ω) = 1 and
$∞
&
∞
%
!
P
En =
P (En ),
n=1
n=1
whenever En ∈ F for all n ≥ 1 and En ∩ Em = ∅ for n += m. The triple (Ω, F, P ) is called a
probability space.
This axiomatic definition of a probability space is due to A. N, Kolmogorov (1930’ties) and it is
now the standard definition. The framework encompasses the finite setting as a simple special case:
Ω is a finite set and F = 2Ω . In that case |F| = 2|Ω| is finite, and hence not big enough to cause
any problems.
There are a number of challenges when one studies probability in the general framework. 1) The
is a high degree of abstraction and one can have a difficulty to develop intuition 2) The rigorous
construction of any interesting example is involved 3) Many objects are defined by a nontrivial
procedure, and it requires some effort to simply digest the definitions. 4) There are very important
issues of convergence, i.e. limits, infinite series, integrals, interchange of the order of limiting
operations, functions taking very small and very large values etc... All of those challenges are
absent from the finite discrete setting. The learning process is also slower in the general setting,
because one needs to take the necessary time to understand many details. So why would we bother
studying probability theory in such great generality when we can stick to the finite setting and
understand most of the deep and beautiful results there with less effort? One could go even further
and argue that the ”real world” is discrete and that infinite spaces are simply approximations to
large finite spaces and that finite probability spaces are the only only ones that we should care
about.
The main reward of studying probability theory in the general setting is that one obtains very
powerful results that have far reaching applications in many disciplines (analysis, partial differential
equations, quantum mechanics, and many other...). The general probability theory is very closely
related to the measure theory (in both theories, the objects are almost identical, but measure theory
and probability are concerned with a different set of problems). Many fundamental subjects like
Fourier series and integrals require the understanding of measure theory.
Continuous probability framework: In this framework one can understand basic properties
of random numbers drawn from nice
# ∞continuous distributions. A probability density on R is a
function p : R → [0, ∞) such that −∞ p(t)dt = 1. One usually want the function p to be nice
enough in order for the integral to make sense. Piecewise continuous functions are nice enough and
these are all that we would care about. The ”probability” or ”mass” of an interval E = (a, b) is
#b
defined to be the number P (E) = a p(t)dt. One can more generally take E to be a finite union of
'
"m # bi
pairwise disjoint intervals, E = m
i=1 (ai , bi ), and then P (E) :=
i=1 ai p(t)dt.
Some examples.
1) Uniform probability density: We are given a finite interval Ω = [x, y] ⊂ R. Let
(
1/(y − x)
if x < t < y
p(t) =
0
if t ∈
/ (x, y)
5
2
2) Normal probability density: p(t) = (2π)−1/2 e−t .
3) Exponential probability density:
( −t
e
if t ≥ 0
p(t) =
0
if t < 0
Probability densities on R are used to model a random real number. Random pairs of real
numbers are modeled by probability densities on R2 . A probability densities on R2 is a function
p : R2 → [0, ∞) such that
) )
∞
∞
−∞
−∞
p(x, y)dxdy = 1.
The simplest examples of a probability densities on R2 are obtained as follows. Take probability
densities p1 , p2 on R and for p(x, y) = p1 (x)p2 (y). Then p(x, y) is a probability density on R2 .
More generally a random vector in Rm , is modeled by a probability density on Rm that is a
function p : Rm → [0, ∞) such that
) ∞) ∞
) ∞
···
p(x1 , x2 , · · · , xm )dx1 dx2 · · · dxm = 1.
−∞
−∞
−∞
Countable discrete probability framework: This framework is a straight forward generalization of the finite discrete probability. The set Ω is no longer finite but countable. That means
that you can enumerate all the elements of Ω as
Ω = {ω0 , ω1 , ω2 , · · · } .
A probability measure"
on Ω is described by an infinite sequence p0 , p1 , p2 , · · · "
such that each 0 ≤
∞
pn ≤ 1 for each n and n=0 pn = 1. For each subset E ⊂ Ω we define P (E) = ωi ∈E pi .
The most important example: Let Ω = {0, 1, 2, · · · } be the set of natural numbers. Let 0 < λ < ∞
be a parameter and let
λn
pn = e−λ .
n!
The sequence pn defines the Poisson distribution.
1.1. Constructing probability spaces. Let us only discuss things in the finite discrete frameworks, but every concept here can be extended to any of the other frameworks. There are natural
ways to start with one or several a probability spaces and to construct a new probability space.
(1) (mapping) Let (Ω, P ) be a given finite discrete probability space. Let Ω& be another finite set
and f : Ω → Ω& a function. Then
(1.1)
P & (E & ) = P (f −1 (E & ))
defines a probability measure on Ω& . We write P & = f∗ (P ). This is most interesting when
|Range(f )| < |Ω| (Think why !).
(2)(product spaces) Let (A = {a1 , · · · , an } , P ) and (B = {b1 , · · · , bm } , Q) be finite probability
spaces. Let
Ω = A × B = {(ak , bj ) : 1 ≤ k ≤ n, 1 ≤ j ≤ m} .
6
EVGENIJ KRITCHEVSKI
Of course |Ω| = nm. Then the product probability measure P × Q on Ω is defined by
(1.2)
(P × Q)({(ak , bj )}) = P ({ak })Q({bj }).
2. HOMEWORK 1
In the discrete finite probability framework
1) Show that there are exactly 2n subsets of the set Ω = {ω1 , ω2 , · · · , ωn }.
2) (About mapping) Show that the formula (1.1) indeed defines a probability measure on Ω& .
3) (About product spaces) Show that the formula (1.2) indeed defines a probability measure on
Ω.
4) Let A = {1, 2, 3, 4, 5, 6} with the probability measure P ({k}) = 1/6, 1 ≤ k ≤ 6. Let Ω = A×A
with the product probability measure P × P . Let Ω& = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and f : Ω → Ω&
be the function given by f ((k, j)) = k + j. Describe the probability measure f∗ (P × P ).
5) Let Ω = {ω1 , ω2 , · · · , ωn }. A probability measure on Ω is naturally identified with a vector
(p1 , p2 , · · · , pn ) ∈ Rn . Show that the set MΩ of all probability probability measures on Ω is convex.
You have to formulate the natural definition of convexity. Give a graphical representation of MΩ
for n = 1, 2 and 3. What are the extreme points of MΩ ?
In the measure theoretical framework
1) Let F be a σ-field on Ω. Show that for every sequence E1 , E2 , · · · in F, we have ∩∞
n=1 En ∈ F.
2) Let (Ω, F) be a measurable space. For a sequence of events E1 , E2 , · · · in F we define
* %
lim sup En =
Ek ,
n→∞
and
lim inf En =
n→∞
n≥1 k≥n
% *
Ek .
n≥1 k≥n
a) Show that ω ∈ lim supn→∞ En if and only if ω ∈ En for infinitely many n. b) Formulate and
prove the analogous statement about lim inf n→∞ En . c) Show that
+
,
Ω\ lim sup En = lim inf (Ω\En )
n→∞
n→∞
3) Let (Ω, F, P ) be a probability space."a) Show that P (∅) = 0. b) Show that P (E) ≤ P (F )
when E ⊂ F . c) Show that P (∪n≥1 En ) ≤ n≥1 P (En ) for any sequence E1 , E2 , · · · in F.
4) Let (Ω, F, P ) be a probability space. a) Suppose that E1 ⊂ E2 ⊂ E3 ⊂ · · · . Show that
P (∪n≥1 En ) = limn→∞ P (En ). b) Suppose that E1 ⊃ E2 ⊃ E3 ⊃ · · · . Show that P (∩n≥1 En ) =
limn→∞ P (En ).
In the continuous
framework.
√
# ∞ −tprobability
2 /2
1) Show that −∞ e
dt = 2π.
2) About convolution. Suppose that p1 (t) and p2 (t) are probability densities on R. The convolution of p1 and p2 is a new function (p1 % p2 ) : R → R defined by
7
(p1 % p2 )(t) =
)
∞
−∞
p1 (t − x)p2 (x)dx.
a) Show that p1 % p2 is a probability density. b) Show that p1 % p2 = p2 % p1
3) Let s > 0 be a parameter. Let
(
Cs e−st
if t ≥ 0
ps (t) =
0
if t < 0
Find the value of the constant Cs that makes ps a probability density. Assuming s1 += s2 , find a
simple formula for the convolution ps1 % ps2 .
In the countable discrete probability framework
1) Suppose that p0 , p1 , p2 , · · · defines a probability on Ω = {0, 1, 2, · · · }. Show that limn→∞ pn = 0.
2) For the Poisson distribution with parameter λ,
λn
n ≥ 0,
pn = e−λ ,
n!
compute
∞
∞
!
!
npn
and
n2 pn .
n=0
n=0

Download Report

STA 348: INTRODUCTION TO STOCHASTIC PROCESSES 1

Paperzz.com

Your Paperzz