STA 348: INTRODUCTION TO STOCHASTIC PROCESSES EVGENIJ KRITCHEVSKI 1. Probability spaces, the different theoretical frameworks. Probability theory, statistics, and stochastic processes can be studied in various frameworks, at different levels of generality and abstraction. Each framework has its intrinsic mathematical advantages and drawbacks. From the perspective of the student, each framework has its own learning process and will offer the learner the opportunity to develop a distinct set of skills. Each framework has the common central concept of a probability space, which is roughly as follows. We are given a set Ω and a ”probability” function P that assigns to subsets E of Ω a number P (E) ∈ [0, 1]. One can think of P (E) as the mass of E. The function P needs to be additive, i.e. P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) for E1 ∩ E2 = ∅. We also require the total mass to be equal to one, i.e. P (Ω) = 1. From a purely abstract and somewhat dry point of view, one can say that probability theory is nothing but a theory of mass distributions. The interesting part, however, is that the abstract theory can explain and in some sense predict the outcome of many real life experiments. For example if you decide to toss a fair coin 1000 times and to record the number x of times that heads appear, then from purely theoretical considerations I can be pretty much sure that you will observe |x/1000 − 1/2| ≤ 0.1. How can I predict this result? Well, I would look at the set Ω of all possible outcomes of my experiment of tossing the coin 1000 times. The outcome of the experiment is a word ω = ω(1), ω(2), · · · , ω(1000) where ω(k) = H if the k’th tossed yields heads and ω(k) = T if the k’th tossed yields tails. There are in total 21000 such words. The set Ω consists of all those words. Then I would decide the mass distribution P to give the same mass 2−1000 to each word ω ∈ Ω. That means that for every subset of E ⊂ Ω, P (E) = |E| / |Ω| = |E| /21000 i.e. the number of elements in E divided by the total number of elements in Ω. Then I would look at the subset E of Ω consisting of words ω with |x/1000 − 1/2| ≤ 0.1. By some mathematical tricks (that we will learn) I would compute that P (E) ≤ 10−8 . That number is very small, and I would think of E having a very small probability and therefore I would be pretty much sure that the result ω of the experiment will not belong to the set E. The simplest possible framework of finite discrete probability: We are given a finite set Ω = {ω1 , · · · , ωn }. There are in total 2n possible subsets of Ω. Let us denote by 2Ω the collection of all these subsets. For example if n = 10, the set E = {ω3 , ω7 } is an element of 2Ω . Definition 1.1. A probability measure on Ω is a function P : 2Ω → [0, 1] such that (1) P (Ω) = 1 (2) P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) for all E1 , E2 ∈ 2Ω with E1 ∩ E2 = ∅. Date: January 14, 2010. 1 2 EVGENIJ KRITCHEVSKI In the finite discrete framework, there is a very simple way to describe probability measures. Let pi = P ({ωi }) for 1 ≤ i ≤ n. Then the property (2) implies that ! P (E) = pi , i:ωi ∈E and so the numbers pi determine completely the probability measure P . Conversely, if p1 , p2 , · · · , pn "n is a set of numbers such that " each pi ≥ 0 and i=1 pi = 1, then we can define the probability measure P by setting P (E) := i:ωi ∈E pi . The point here is that the basic building blocks of P are the masses of individual points, i.e. the numbers pi = P ({ωi }), 1 ≤ i ≤ n. Some terminology: The set Ω together the function P will de called a discrete finite probability space. The subsets of Ω are called events. The number P (E) is referred to as the probability of the event E. Some examples: 1) Ω = {1, 2, 3, 4, 5, 6} and P ({j}) = 1/6 for 1 ≤ j ≤ 6. (2) Ω = {H, T } and P ({H}) = P ({T }) = 1/2. (3) Ω = {a, b, c} and P ({a}) = 0, P ({b}) = 1, P ({c}) = 0. (4) Ω = {Homework, P roject, F inal} and P ({Homework}) = 0.4, P ({P roject}) = 0.15, P ({F inal}) = 0.45. Then P ({P roject, F inal}) = 0.15 + 0.45 = 0.6. (5) Ω is a set of 36 cards. A card is specified by its rang from {6, 7, 8, 9, 10, Jack, Queen, King, Ace} and its suit from {spades, hearts, diamonds, clubs}. For each card ω ∈ Ω, P ({ω}) = 1/36. (remark: (Ω, P ) can be naturally seen as a product space) The event E = {ω : ω is an Ace} has probability P (E) = 1/9. The main advantages of studying probability theory in the present framework is that the finiteness assumption reduces technicalities to the absolute minimum. The main mathematical objects are very easy to define and to intuitively understand. The proofs of most results are easy since we never have to worry about convergence questions, limits, infinite series, etc ... Thus the novice learner can focus on the structure and phenomenology of the theory, to start developing probabilistic intuition and to study the heart of the matter, instead of agonizing over technical details. Despite the finiteness limitation, the framework is rich enough to capture fundamental principles such as the law of large numbers, central limit theorem, and measure concentration. The main disadvantage of this framework is again the limitation of finiteness. While we can study finite sequences of coin tosses, or more generally, finite words with letters chosen randomly from a finite alphabet, we cannot deal with any kind of infinite sequences of random numbers, and we cannot even formulate statements like ”with probability one, a sequence of random numbers converges”. Another serious problem is the impossibility to deal with continuous probability distributions, i.e. having random real numbers chosen uniformly from the interval [0, 1] or chosen from the whole real axis according to the Gaussian distribution (2π)−1/2 e−t/2 . The most general framework of measure theory: We are given a nonempty set Ω and we do not impose any restriction on the cardinality of Ω (it could be, finite, countable, uncountable, or even bigger). The case when Ω is countable is essentially a very simple generalization of the finite case (see the discussion of the countable discrete framework below). The real difference starts when 3 Ω is at least uncountable, for example Ω = R. The definition of a probability measure on Ω is no longer obvious matter, and has in general to do something " more elaborate than assigning a mass P ({ω}) to each point of ω ∈ Ω. Why? Well, if we require ω∈Ω P ({ω}) = 1, then automatically there must exist a countable subset C ⊂ Ω so that P ({ω}) = 0 for ω ∈ Ω\C. There is no way to sum up to one uncountably many positive numbers. As in the finite case, let us think of a probability as a mass distribution on Ω. In some cases, we have a good a priori intuitive understanding (i.e. not based on a formal mathematical definition) of a probability measure. Let us look at the unit interval Ω = [0, 1] = {ω ∈ R : 0 ≤ ω ≤ 1}. What should a uniform mass distribution P on Ω look like? The mass of any subinterval is the length of the interval, i.e. P ([a, b]) = b − a for all 0 ≤ a ≤ b ≤ 1. The mass of any single point has to be zero, P ({ω}) = 0 for all ω ∈ Ω, since a point is the limiting case of a very small interval. Thus P ([a, b]) = P ((a, b]) = P ([a, b)) = P ((a, b)). Also if a set is" a countable union of disjoint intervals ∞ S = ∪n=1 [an , bn ], then it is very natural to have P (S) = ∞ n=1 (bn − an ). We see that the basic building blocks of P are the intervals and not the (zero mass) individual # ∞ points. More generally if we are given a Riemann integrable function p : R → [0, ∞) with −∞ p(x)dx = 1, for example #b 2 p(x) = √12π e−x /2 , we have a natural mass distribution on R assigning the mass a p(x)dx to each interval [a, b]. In the most general setting, we do not assume any a priori (algebraic or topological) structure on Ω. In order to define a probability measure, the basic building blocks are the subsets of Ω. A probability measure on Ω should be function P that assigns a number P (E), 0 ≤ P (E) ≤ 1 to subsets E ⊂ Ω. We certainly want the following two properties to hold. 1) P (Ω) = 1. 2) If (En )n≥1 are pairwise disjoint subsets of Ω, i.e. En ∩ Em = ∅ for n += m, then $ & % ! P En = P (En ). n≥1 n≥1 It turns out that, in many important cases, trying to define P for all possible subsets E ⊂ Ω leads to catastrophic problems. For example, one can show that there exists no probability measure P (E) on Ω = [0, 1], defined for all subsets E ⊂ Ω, and verifying properties 1), 2) together with P ([a, b]) = b − a. In the general theory one defines P (E) only for a certain system of subsets E. Definition 1.2. Let Ω be a nonempty set. A σ-field on Ω is a nonempty collection F of subsets of Ω that is closed under complements and countable unions i.e. (1) If E ∈ F then Ω\E ∈ F (2) If (En )n≥1 is a sequence of subsets of Ω and En ∈ F for all n ≥ 1, then ∪∞ n=1 En ∈ F. A measurable space is a pair (Ω, F), where F is a σ-field on Ω. Note that the definition of a σ-field implies that we must have ∅ ∈ F and Ω ∈ F. Of course F = 2Ω = {all subsets of Ω} is a σ-field, which is the biggest possible. On the other extreme F = {∅, Ω} is the smallest possible σ-algebra. Most of the time, one works with an ”intermediate” σ-field F which is not as big as 2Ω and not as small as {∅, Ω}. 4 EVGENIJ KRITCHEVSKI Definition 1.3. Let (Ω, F) be a measurable space. A function P : F → [0, 1] is called a probability measure if P (Ω) = 1 and $∞ & ∞ % ! P En = P (En ), n=1 n=1 whenever En ∈ F for all n ≥ 1 and En ∩ Em = ∅ for n += m. The triple (Ω, F, P ) is called a probability space. This axiomatic definition of a probability space is due to A. N, Kolmogorov (1930’ties) and it is now the standard definition. The framework encompasses the finite setting as a simple special case: Ω is a finite set and F = 2Ω . In that case |F| = 2|Ω| is finite, and hence not big enough to cause any problems. There are a number of challenges when one studies probability in the general framework. 1) The is a high degree of abstraction and one can have a difficulty to develop intuition 2) The rigorous construction of any interesting example is involved 3) Many objects are defined by a nontrivial procedure, and it requires some effort to simply digest the definitions. 4) There are very important issues of convergence, i.e. limits, infinite series, integrals, interchange of the order of limiting operations, functions taking very small and very large values etc... All of those challenges are absent from the finite discrete setting. The learning process is also slower in the general setting, because one needs to take the necessary time to understand many details. So why would we bother studying probability theory in such great generality when we can stick to the finite setting and understand most of the deep and beautiful results there with less effort? One could go even further and argue that the ”real world” is discrete and that infinite spaces are simply approximations to large finite spaces and that finite probability spaces are the only only ones that we should care about. The main reward of studying probability theory in the general setting is that one obtains very powerful results that have far reaching applications in many disciplines (analysis, partial differential equations, quantum mechanics, and many other...). The general probability theory is very closely related to the measure theory (in both theories, the objects are almost identical, but measure theory and probability are concerned with a different set of problems). Many fundamental subjects like Fourier series and integrals require the understanding of measure theory. Continuous probability framework: In this framework one can understand basic properties of random numbers drawn from nice # ∞continuous distributions. A probability density on R is a function p : R → [0, ∞) such that −∞ p(t)dt = 1. One usually want the function p to be nice enough in order for the integral to make sense. Piecewise continuous functions are nice enough and these are all that we would care about. The ”probability” or ”mass” of an interval E = (a, b) is #b defined to be the number P (E) = a p(t)dt. One can more generally take E to be a finite union of ' "m # bi pairwise disjoint intervals, E = m i=1 (ai , bi ), and then P (E) := i=1 ai p(t)dt. Some examples. 1) Uniform probability density: We are given a finite interval Ω = [x, y] ⊂ R. Let ( 1/(y − x) if x < t < y p(t) = 0 if t ∈ / (x, y) 5 2 2) Normal probability density: p(t) = (2π)−1/2 e−t . 3) Exponential probability density: ( −t e if t ≥ 0 p(t) = 0 if t < 0 Probability densities on R are used to model a random real number. Random pairs of real numbers are modeled by probability densities on R2 . A probability densities on R2 is a function p : R2 → [0, ∞) such that ) ) ∞ ∞ −∞ −∞ p(x, y)dxdy = 1. The simplest examples of a probability densities on R2 are obtained as follows. Take probability densities p1 , p2 on R and for p(x, y) = p1 (x)p2 (y). Then p(x, y) is a probability density on R2 . More generally a random vector in Rm , is modeled by a probability density on Rm that is a function p : Rm → [0, ∞) such that ) ∞) ∞ ) ∞ ··· p(x1 , x2 , · · · , xm )dx1 dx2 · · · dxm = 1. −∞ −∞ −∞ Countable discrete probability framework: This framework is a straight forward generalization of the finite discrete probability. The set Ω is no longer finite but countable. That means that you can enumerate all the elements of Ω as Ω = {ω0 , ω1 , ω2 , · · · } . A probability measure" on Ω is described by an infinite sequence p0 , p1 , p2 , · · · " such that each 0 ≤ ∞ pn ≤ 1 for each n and n=0 pn = 1. For each subset E ⊂ Ω we define P (E) = ωi ∈E pi . The most important example: Let Ω = {0, 1, 2, · · · } be the set of natural numbers. Let 0 < λ < ∞ be a parameter and let λn pn = e−λ . n! The sequence pn defines the Poisson distribution. 1.1. Constructing probability spaces. Let us only discuss things in the finite discrete frameworks, but every concept here can be extended to any of the other frameworks. There are natural ways to start with one or several a probability spaces and to construct a new probability space. (1) (mapping) Let (Ω, P ) be a given finite discrete probability space. Let Ω& be another finite set and f : Ω → Ω& a function. Then (1.1) P & (E & ) = P (f −1 (E & )) defines a probability measure on Ω& . We write P & = f∗ (P ). This is most interesting when |Range(f )| < |Ω| (Think why !). (2)(product spaces) Let (A = {a1 , · · · , an } , P ) and (B = {b1 , · · · , bm } , Q) be finite probability spaces. Let Ω = A × B = {(ak , bj ) : 1 ≤ k ≤ n, 1 ≤ j ≤ m} . 6 EVGENIJ KRITCHEVSKI Of course |Ω| = nm. Then the product probability measure P × Q on Ω is defined by (1.2) (P × Q)({(ak , bj )}) = P ({ak })Q({bj }). 2. HOMEWORK 1 In the discrete finite probability framework 1) Show that there are exactly 2n subsets of the set Ω = {ω1 , ω2 , · · · , ωn }. 2) (About mapping) Show that the formula (1.1) indeed defines a probability measure on Ω& . 3) (About product spaces) Show that the formula (1.2) indeed defines a probability measure on Ω. 4) Let A = {1, 2, 3, 4, 5, 6} with the probability measure P ({k}) = 1/6, 1 ≤ k ≤ 6. Let Ω = A×A with the product probability measure P × P . Let Ω& = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and f : Ω → Ω& be the function given by f ((k, j)) = k + j. Describe the probability measure f∗ (P × P ). 5) Let Ω = {ω1 , ω2 , · · · , ωn }. A probability measure on Ω is naturally identified with a vector (p1 , p2 , · · · , pn ) ∈ Rn . Show that the set MΩ of all probability probability measures on Ω is convex. You have to formulate the natural definition of convexity. Give a graphical representation of MΩ for n = 1, 2 and 3. What are the extreme points of MΩ ? In the measure theoretical framework 1) Let F be a σ-field on Ω. Show that for every sequence E1 , E2 , · · · in F, we have ∩∞ n=1 En ∈ F. 2) Let (Ω, F) be a measurable space. For a sequence of events E1 , E2 , · · · in F we define * % lim sup En = Ek , n→∞ and lim inf En = n→∞ n≥1 k≥n % * Ek . n≥1 k≥n a) Show that ω ∈ lim supn→∞ En if and only if ω ∈ En for infinitely many n. b) Formulate and prove the analogous statement about lim inf n→∞ En . c) Show that + , Ω\ lim sup En = lim inf (Ω\En ) n→∞ n→∞ 3) Let (Ω, F, P ) be a probability space."a) Show that P (∅) = 0. b) Show that P (E) ≤ P (F ) when E ⊂ F . c) Show that P (∪n≥1 En ) ≤ n≥1 P (En ) for any sequence E1 , E2 , · · · in F. 4) Let (Ω, F, P ) be a probability space. a) Suppose that E1 ⊂ E2 ⊂ E3 ⊂ · · · . Show that P (∪n≥1 En ) = limn→∞ P (En ). b) Suppose that E1 ⊃ E2 ⊃ E3 ⊃ · · · . Show that P (∩n≥1 En ) = limn→∞ P (En ). In the continuous framework. √ # ∞ −tprobability 2 /2 1) Show that −∞ e dt = 2π. 2) About convolution. Suppose that p1 (t) and p2 (t) are probability densities on R. The convolution of p1 and p2 is a new function (p1 % p2 ) : R → R defined by 7 (p1 % p2 )(t) = ) ∞ −∞ p1 (t − x)p2 (x)dx. a) Show that p1 % p2 is a probability density. b) Show that p1 % p2 = p2 % p1 3) Let s > 0 be a parameter. Let ( Cs e−st if t ≥ 0 ps (t) = 0 if t < 0 Find the value of the constant Cs that makes ps a probability density. Assuming s1 += s2 , find a simple formula for the convolution ps1 % ps2 . In the countable discrete probability framework 1) Suppose that p0 , p1 , p2 , · · · defines a probability on Ω = {0, 1, 2, · · · }. Show that limn→∞ pn = 0. 2) For the Poisson distribution with parameter λ, λn n ≥ 0, pn = e−λ , n! compute ∞ ∞ ! ! npn and n2 pn . n=0 n=0
© Copyright 2025 Paperzz