ST213 Mathematics of Random Events: Outline notes Spring Term 1999-2000 Lecturer: Jonathan Warren Notes by: Wilfrid S. Kendall Department of Statistics University of Warwick This is /home/fisher/wsk/ms/ST213/st213.tex (unix) version 1.6. Last edited: 16:23:17, 24/01/2000. 1 Contents 1 Introduction 1.1 Aims and Objectives . . . . . . . 1.2 Books . . . . . . . . . . . . . . . 1.3 Resources (including examination 1.4 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . information) . . . . . . . . 2 Probabilities, algebras, and σ-algebras 2.1 Motivation . . . . . . . . . . . . . . 2.2 Revision of sample space and events 2.3 Algebras of sets . . . . . . . . . . . . 2.4 Limit Sets . . . . . . . . . . . . . . . 2.5 σ-algebras . . . . . . . . . . . . . . . 2.6 Countable additivity . . . . . . . . . 2.7 Uniqueness of probability measures . 2.8 Lebesgue measure and coin tossing . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 12 12 18 22 23 26 28 3 Independence and measurable functions 3.1 Independence . . . . . . . . . . . . . . . 3.2 Borel-Cantelli lemmas . . . . . . . . . . 3.3 Law of large numbers for events . . . . . 3.4 Independence and classes of events . . . 3.5 Measurable functions . . . . . . . . . . . 3.6 Independence of random variables . . . 3.7 Distributions of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 34 35 36 37 39 40 4 Integration 4.1 Simple functions and Indicators . 4.2 Integrable functions . . . . . . . 4.3 Expectation of random variables 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 44 47 48 . . . . . . . . . . . . . . . . 5 Convergence 49 5.1 Convergence of random variables . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Laws of large numbers for random variables . . . . . . . . . . . . . . . . . . . 55 ii 5.3 5.4 5.5 Convergence of integrals and expectations . . . . . . . . . . . . . . . . . . . . 56 Dominated convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . 57 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6 Product measures 59 6.1 Product measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Relationship with independence . . . . . . . . . . . . . . . . . . . . . . . . . . 61 iii ST213 outline notes (Version 1.6 [U]): 1 1.1 16:23:17, 24/01/2000 Introduction Aims and Objectives The main purpose of the course ST213 Mathematics of Random Events (which we will abbreviate to MoRE) is to work over again the basics of the mathematics of uncertainty. You have already covered this in a rough-and-ready fashion in: (a) ST111 Probability; (b) and even in ST114 Games and Decisions. In this course we will cover these matters with more care. It is important to do this because a proper appreciation of the fundamentals of the mathematics of random events (a) gives an essential basis for getting a good grip on the basic ideas of statistics; (b) will be of increasing importance in the future as it forms the basis of the hugely important field of mathematical finance. 1 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 It is appropriate at this level that we cover the material emphasizing concepts rather than proofs: by-and-large we will concentrate on what the results say and so will on some occasions explain them rather than prove them. The third-year courses MA305 Measure Theory, and ST318 Probability Theory go into the matter of proofs. For further discussion of how Warwick probability courses fit together, see our road-map to probability at Warwick at www.warwick.ac.uk/statsdept/teaching/probmap.html 1.2 Books The book with contents best matching this course is Williams [3], though this gives more details (and especially many more proofs!) than we cover here; a still more extensive treatment is given by Billingsley [1]. The book by Grimmett and Stirzaker [2] also gives helpful explanations of some (but not all) of the concepts dealt with in this course. 2 ST213 outline notes (Version 1.6 [U]): 1.3 16:23:17, 24/01/2000 Resources (including examination information) The course is composed of 30 lectures, valued at 12 CATS credit. It has an assessed component (20%) as well as an examination in the summer term. The assessed component will be conducted as follows: an exercise sheet will be handed out approximately every fortnight, totalling 4 sheets. In the 10 minutes at the start of the next lecture you produce an answer to one question under examination conditions, specified at the start of the lecture. Model answers will be distributed after the test, and an examples class will be held a week after the test. The tests will be marked, and the assessed component will be based on the best 3 out of 4 of your answers. This method helps you learn during the lecture course so should: • improve your exam marks; • increase your enjoyment of the course; • cost less time than end-of-term assessment. Further copies of exercise sheets (after they have been handed out in lectures!) can be obtained at the homepage for the ST213 course: www.warwick.ac.uk/statsdept/teaching/ST213.html 3 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 There are various resources available for you as part of the course. First of all, naturally enough, are the lectures. These will be supplemented about once a fortnight by an examples class. You are expected to come to this class prepared to work through examples; I and some helpers will circulate to offer help and advice when requested. These notes will also be made available at the above URL, chapter by chapter as they are covered in lectures. Notice that they do not cover all the material of the lectures: their purpose is to provide a basic skeleton of summary material to supplement the notes you make during lectures. For example no proofs are included. In particular you will not find it possible to cover the course by ignoring lectures and depending on these notes alone! The notes are in Acrobat pdf format: this is a good way to disseminate information including mathematical formulae, and can be read using widely available free software (Adobe Acrobat Reader: follow links from Adobe Acrobat Reader downloads homepage; it is also on numerous CD-ROMS accompanying the more reputable computer magazines!). Acrobat pdf format allows me to include numerous hypertext references and I have made full use of this. As a rule of thumb, clicking on coloured text is quite likely to: • move you to some other relevant text, either in the current document such as here: Aims and objectives, or occasionally in supporting documents or other course notes; 4 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 • launch a Worldwide Web browser as here (assuming your system has configured a browser appropriately); • send me an email, as here: [email protected]. In due course I expect also to experiment with animations . . . . You should notice that Adobe Acrobat Reader includes a facility for going back from the current page to the previously visited page: bear this in mind should you get lost! Documents always have typographical errors: please email me to notify me of any you think you have found. When you do this, please include a reference to the version and the date of production of these notes (see the page header information on every page!). In any case I expect to update these notes throughout the lecturing term as I myself discover and correct errors, and think up improvements. If you try to print out these pages you will likely discover a snag! I have formatted them to fit comfortably on computer screens; printing via Adobe Acrobat Reader will use up a lot of pages unless you know how to play clever tricks with PostScript. The version to be placed in the library will be re-formatted to fit A4 pages. Finally notice that if you download a copy of these notes to your own computer then you may discover that some of the links cease to work. In particular these and other web-based 5 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 teaching materials of the Department of Statistics all reside on a special sub-area which is only accessible from web-browsers originating on machines based within the Warwick campus. So don’t be surprised that you can’t access the notes from your own internet account! Further related material (eg: related courses, some pretty pictures of random processes, ...) can be obtained by following links from: W.S. Kendall’s homepage: www.warwick.ac.uk/statsdept/Staff/WSK/ The Statistics Department will in the summer term sell booklets of the previous year’s examination papers together with (rough) outline solutions, and we will run two revision classes for this course at that time. Finally, there is a unix newsgroup for this course: uwarwick.stats.course.st213 It is intended for self-help discussion of ST213-related matters. Lecturers will lurk on the group (this means, they will not post, except for occasional announcements of an administrative nature, but will read it at approximately weekly intervals to get an idea of course feedback). 6 ST213 outline notes (Version 1.6 [U]): 1.4 16:23:17, 24/01/2000 Motivating Examples Here are some examples to help us see what are the issues. Example 1.1 (J. Bernoulli, circa 1692): Suppose that A1 , A2 , ... are mutually independent events, each of which has probability p. Define Sn = #{ events Ak which happen for k ≤ n} . Then the probability that Sn /n is close to p increases to 1 as n tends to infinity: P [ |Sn /n − p| ≤ ] → 1 as n → ∞ for all > 0. Example 1.2 Suppose the random variable U is uniformly distributed over the continuous range [0, 1]. Why is it that for all x in [0, 1] we have P [U = x] 7 = 0 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 and yet P[a ≤ U ≤ b] = b−a whenever 0 ≤ a ≤ b ≤ 1? Why can’t we argue as follows? [ P[a ≤ U ≤ b] = P {x} x∈[a,b] = X P[U = x] = 0? x∈[a,b] Example 1.3 (The Banach-Tarski paradox): Consider a sphere S 2 . In a certain qualified sense it is possible to do the following curious thing: we can “find” a subset F ⊂ S 2 and (for any k ≥ 3) rotations τ1k , τ2k , ..., τkk such that S2 = τ1k F ∪ τ2k F ∪ ... ∪ τkk F . What then should we suppose the surface area of F to be? Since S 2 = τ13 F ∪ τ23 F ∪ τ33 F we can argue for area(F ) = 1/3. But since S 2 = τ14 F ∪ τ24 F ∪ τ34 F ∪ τ44 F we can equally argue for area(F ) = 1/4. Or similarly for area(F ) = 1/5. Or 1/6, or ... 8 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Example 1.4 Reverting to Bernoulli’s example (Example 1.1 above) we could ask, what is the probability that, when we look at the whole sequence S1 /1, S2 /2, S3 /3, ..., we see the sequence tends to p? Is this different from Bernoulli’s statement? Example 1.5 Here is a question which is apparently quite different, which turns out to be strongly related to the above ideas! Can we generalize the idea of a “Riemann integral” in such a way as to make sense of rather discontinuous integrands, such as the case given below? Z 1 f (x) dx 0 where f (x) = 1 0 when x is a rational number, when x is an irrational number. 9 ST213 outline notes (Version 1.6 [U]): 2 2.1 16:23:17, 24/01/2000 Probabilities, algebras, and σ-algebras Motivation Consider two coins A and B which are tossed in the air so as each to land with either heads or tails upwards. We do not assume the coin-tosses are independent! It is often the case that one feels justified in assuming the coins individually are equally likely to come up heads or tails. Using the fact P [ A = T ] = 1 − P [ A = H ], etc, we find P [ A comes up heads ] = P [ B comes up heads ] = 1 2 1 2 To find probabilities such as P [ HH ] = P [ A = H, B = H ] we need to say something about the relationship between the two coin-tosses. It is often the case that one feels justified 10 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 in assuming the coin-tosses are independent, so P [ A = H, B = H ] = P[A = H ] × P [B = H ] . However this assumption may be unwise when the person tossing the coin is not experienced! We may decide that some variant of the following is a better model: the event determining [B = H] is C if [A = H], D if [A = T ], where P[C = H ] = P[D = H ] = 3 4 1 4 and A, C, D are independent. There are two stages of specification at work here. Given a collection C of events, and specified probabilities P [ C ] for each C ∈ C, we can find P [ C c ] = 1 − P [ C ] the probability of the complement C c of C, but not necessarily P [ C ∩ D ] for C, D ∈ C. 11 ST213 outline notes (Version 1.6 [U]): 2.2 16:23:17, 24/01/2000 Revision of sample space and events Remember from ST111 that we can use notation from set theory to describe events. We can think of events as subsets of sample space Ω. If A is an event, then the event that A does not happen is the complement or complementary event Ac = {ω ∈ Ω : ω 6∈ A}. If B is another event then the event that both A and B happen is the intersection A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}. The event that either A or B (or both!) happen is the union A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}. 2.3 Algebras of sets This leads us to identify classes of sets for which we want to find probabilities. Definition 2.1 (Algebra of sets): An algebra (sometimes called a field) of subsets of Ω is a class C of subsets of a sample space Ω satisfying: (1) closure under complements: if A ∈ C then Ac ∈ C; (2) closure under intersections: if A, B ∈ C then A ∩ B ∈ C; 12 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 (3) closure under unions: if A, B ∈ C then A ∪ B ∈ C. Definition 2.2 (Algebra generated by a collection): If C is a collection of subsets of Ω then A(C), the algebra generated by C, is the intersection of all algebras of subsets of Ω which contain C. Here are some examples of algebras: (i) the trivial algebra A = {Ω, ∅}; (ii) supposing Ω = {H, T }, another example is A = {Ω = {H, T }, {H}, {T }, ∅} ; (iii) now consider the following class of subsets of the unit interval [0, 1]: A = { finite unions of subintervals } ; 13 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 This is an algebra. For example, if A (a0 , a1 ) ∪ (a1 , a2 ) ∪ ... ∪ (a2n , a2n+1 ) = is a non-overlapping union of intervals (and we can always re-arrange matters so that any union of intervals to be non-overlapping!) then Ac = [0, a0 ] ∪ [a1 , a2 ] ∪ ... ∪ [a2n+1 , 1] . This checks point (1) of the definition of an algebra of sets. Point (2) is rather easy, and point (3) is defined by points (1) and (2). (iv) Consider A = {{1, 2, 3}, {1, 2}, {3}, ∅}. This is an algebra of subsets of Ω = {1, 2, 3}. Notice it does not include events such as {1}, {2, 3}. (v) Just to give an example of a collection of sets which is not an algebra, consider {{1, 2, 3}, {1, 2}, {2, 3}, ∅}. (vi) Algebras get very large. It is typically more convenient simply to give a collection C of sets generating the algebra. For example, if C = ∅ then A(C) = {∅, Ω} is the trivial algebra described above! 14 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 (vii) If Ω = {H, T } and C = {{H}} then A = {{H, T }, {H}, {T }, ∅} as in example (ii) above. (viii) If Ω = [0, 1] and C = { intervals in [0, 1] } then A(C) is the collection of finite unions of intervals as in example (iii) above. (ix) Finally, if Ω = {H, T } and C is the collection of points in [0, 1] then A(C) is the collection of (a) all finite sets in [0, 1] and (b) all complements of finite sets in [0, 1]. In realistic examples algebras are rather large : not surprising, since they correspond to the collection of all “true-or-false” statements you can make about a certain experiment! (If your experiment’s results can be summarised as n different “yes”/“no” answers – such as, result is hot/cold, result is coloured black/white, etc – then the relevant algebra is composed of 2n different subsets!) Therefore it is of interest that the typical element of an algebra can be written down in a rather special form: 15 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 2.3 (Representation of typical element of algebra): If C is a collection of subsets of Ω then the event A belongs to the algebra A(C) generated by C if and only if A = Mi N \ [ Ci,j i=1 j=1 c where for each i, j either Ci,j or its complement Ci,j belongs to C. Moreover we may write A in this form with the sets Mi \ Di = Ci,j j=1 being disjoint. 1 We are now in a position to produce our first stab at a set of axioms for probability. Given a sample space and an algebra A of subsets, probability P [ · ] assigns a number between 0 1 This result corresponds to a basic remark in logic: logical statements, however complicated, can be reduced to statements of the form (A1 and A2 and ... and Am ) or (B1 and B2 and ... and Bn ) or ... or (C1 and C2 and ... and Cp ), where the statements A1 etc are either basic statements or their negations, and no more than one of the (...) or ... or (...) can be true at once. 16 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 and 1 to each event in the algebra A, obeying the rules given below. There is a close analogy to the notion of length of subsets of [0, 1] (and also to notions of area, volume, ...): the table below makes this clear: Probability P[∅] = 0 P[Ω] = 1 P[A ∪ B ] = P [A] + P [B] (if A ∩ B = ∅) Length of subset of [0, 1] Length (∅) = 0 Length ([0, 1]) = 1 Length ([a, b] ∪ [c, d]) = Length ([a, b]) + Length ([c, d]) if a ≤ b < c ≤ d There are some consequences of these axioms which are not completely trivial. For example, the “law of negation” P [ Ac ] 1 − P[A] ; = and the “generalized law of addition” holding when A ∩ B is not necessarily empty P [A ∪ B] = P [A] + P [B] − P[A ∩ B ] 17 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 (think of “double-counting”); and finally the “inclusion-exclusion law” P [ A1 ∪ A2 ∪ ... ∪ An ] X P [ Ai ] = − XX + . . . + (−1)n+1 P [ A1 ∩ A2 ∩ ... ∩ An ] . i 2.4 i6=j P [ Ai ∩ Aj ] + Limit Sets Much of the first half of ST111 is concerned with calculations using these various rules of probabilistic calculation. Essentially the representation theorem above tells us we can compute the probability of any event in A(C) just so long as we know the probabilities of the various events in C and also of all their intersections, whether by knowing events are independent or whether by knowing various conditional probabilities. 2 However these calculations can become long-winded and ultimately either infeasible or unrevealing. It is better to know how to approximate probabilities and events, which leads 2 We avoid discussing conditional probabilities here for reasons of shortage of time: they have been dealt with in ST111 and figure very largely in ST202 18 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 us to the following kind of question: Suppose we have a sequence of events Cn which are decreasing (getting harder and harder to satisfy) and which converge to a limit C: Cn ↓ C. Can we say P [ Cn ] converges to P [ C ]? Here is a specific example. Suppose we observe an infinite sequence of coin tosses, and think therefore of the collection C of events Ai that the ith coin comes up heads. Consider the probabilities (a) P [ second toss gives heads ] = P [ A2 ] Tn (b) P [ first n tosses all give heads ] = P [ i=1 Ai ] (c) P [ the first toss which gives a head is even-numbered ] There is a difference! The first two can be dealt with within the algebra. The third cannot: suppose Cn is the event “the first toss in numbers 1, ..., n which gives a head is even-numbered or else all n of these tosses give tails”, then Cn lies in A(C), and converges 19 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 down to the event C “the first toss which gives a head is even-numbered”, but C is not in A(C). We now find a number of problems raise their heads. • Problems with “everywhere being impossible”: Suppose we are running an experiment with an outcome uniformly distributed over [0, 1]. Then we have a problem as mentioned in the second of our motivating examples: under reasonable conditions we are working with the algebra of finite unions of sub-intervals of [0, 1], and the probability measure which gives P [ [a, b] ] = b − a, but this means P [ {a} ] = 0. Now we need to be careful, since if we rashly allow ourselves to work with uncountable unions we get [ X {x} = 0 = 0. P x∈[0,1] x∈[0,1] But this contradicts P [ [0, 1] ] = 1 and so is obviously wrong. • Problems with specification: if we react to the above example by insisting we can only give probabilities to events in the original algebra, then we can fail to give probabilities to perfectly sensible events, such as in examples such as in (c) in the infinite 20 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 sequence of coin-tosses above. On the other hand if we rashly prescribe probabilities then how can we avoid getting into contradictions such as above? It seems sensible to suppose that at least when we have Cn ↓ C then we should be allowed to say P [ Cn ] ↓ P [ C ], and this turns out to be the case as long as the set-up is sensible. Here is an example of a set-up which is not sensible: Example 2.4 Ω = {1, 2, 3, ...}, C = {{1}, {2}, ...}, P [ n ] = 1/2n+1 . Then A(C) is the collection of finite and co-finite3 subsets of the positive integers, and P [ {1, 2, ..., n} ] = n X = (1/2) × (1 − 1/2n+1 ) 1/2m+1 m=1 We must now investigate how we can deal with limit sets. 3 co-finite: complement is finite 21 → 1/2 6= 1 . ST213 outline notes (Version 1.6 [U]): 2.5 16:23:17, 24/01/2000 σ-algebras The first task is to establish a wide range of sensible limit sets. Boldly, we look at sets which can be obtained by any imaginable combination of countable set operations: the collection of all such sets is a σ-algebra.4 Definition 2.5 (σ-algebra): A σ-algebra of subsets of Ω is an algebra which is also closed under countable unions. In fact σ-algebras are even larger than ordinary algebras; it is difficult to describe a typical member of a σ-algebra, and it pays to talk about σ-algebras generated by specified collections of sets. Definition 2.6 (σ-algebra generated by a collection): For any collection of subsets C of Ω, we define σ(C) to be the intersection of all σ-algebras of subsets of Ω which contain C: \ σ(C) = {S : S is a σ-algebra and C ⊆ S} . 4σ stands for “countable” 22 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 2.7 (Monotone limits): Note that σ(C) defined above is indeed a σ-algebra. Furthermore, it is the smallest σ-algebra containing C which is closed under monotone limits. Examples of σ-algebras include: all algebras of subsets of finite sets (because then there will be no non-finite countable set operations); the Borel σ-algebra generated by the family of all intervals of the real line; the σ-algebra for the coin-tossing example generated by the infinite family of events Ai 2.6 = [ ith coin is heads ] . Countable additivity Now we have established a context for limit sets (they are sets belonging to a σ-algebra) we can think about what sort of limiting operations we should allow for probability measures. Definition 2.8 (Measures): A set-function µ : A → [0, ∞] is said to be a finitely-additive measure if it satisfies: (FA) µ(A ∪ B) = µ(A) + µ(B) whenever A, B are disjoint. It is said to be countably-additive (or σ-additive) if in addition 23 ST213 outline notes (Version 1.6 [U]): (CA) µ ( in A. S∞ i=1 Ai ) = P∞ i=1 16:23:17, 24/01/2000 µ(Ai ) whenever the Ai are disjoint and their union S∞ i=1 Ai lies We abbreviate “finitely-additive” to (FA), and “countably-additive” to (CA). We often abbreviate “countably-additive measure” to “measure”. Notice that if A were actually a σ-algebra then we wouldn’t have to check the condition S∞ “ i=1 Ai lies in A” in the third property. Definition 2.9 (Probability measures): A set-function P : A → [0, 1] is said to be a finitely-additive probability measure if it is a (FA) measure such that P [ Ω ] = 1. It is a (CA) probability measure (we often just say “probability measure” if in addition it is (CA). Notice various consequences for probability measures: µ(∅) = 0, condition S∞ P∞ (ii) follows from condition (iii) if condition (iii) holds, we always have µ ( i=1 Ai ) ≤ i=1 µ(Ai ) even when the union is not disjoint, etc. CA is a kind of continuity condition. A similar continuity condition is that of “monotone limits”. 24 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Definition 2.10 (Monotone limits): A set-function µ : A → [0, 1] is said to obey the monotone limits property (ML) if it satisfies: • µ(Ai ) → µ(A) whenever the Ai increase upwards to a limit set A which lies in A. (ML) is simpler to check than (CA) but is equivalent for finitely-additive measures. Theorem 2.11 (Equivalence for countable additivity): ⇐⇒ (FA) + (ML) (CA) Lemma 2.12 (Another condition for countable additivity): Suppose P is a finitely additive probability measure on (Ω, F), where F is an algebra of sets. Then P is countably additive if and only if lim P [ An ] = 1 n→∞ whenever the sequence of events An belongs to the algebra F and moreover An ↑ Ω. 25 ST213 outline notes (Version 1.6 [U]): 2.7 16:23:17, 24/01/2000 Uniqueness of probability measures To illustrate the next step, consider the notion of length/area. (To avoid awkward alternatives, we talk about the measure instead of length/area /volume/...) It is easy to define the area of very regular sets. But for a stranger, more “fractal-like”, set A we would need to define something like an “outer-measure” nX o µ(A) = inf µ(Bi ) : where the Bi cover A to get at least an upper bound for what it would be sensible to call the measure of A. Of course we must give equal priority to considering what is the measure of the complement Ac . Suppose for definiteness that A is contained in a simple set Q of finite measure (a convenient interval for length, a square for area, a cube for volume, ...) so that Ac = Q \ A. Then consideration of µ(Ac ) leads us directly to consideration of “inner-measure” for A: µ(A) = µ(Q) − µ(Ac ) . Clearly µ(A) ≥ µ(A): moreover we can only expect a truly sensible definition of measure on the set F = A : µ(A) = µ(A) . 26 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 The fundamental theorem of measure theory states that this works out all right! Theorem 2.13 (Extension theorem): If µ is a measure on an algebra A which is σadditive on A then it can be extended uniquely to a countable additive measure on F defined as above: moreover σ(A) ⊆ F. The proof of this remarkable theorem is too lengthy to go into here. Notice that it can be paraphrased very simply: if your notion of measure (probability, length, area, volume, ...) can be defined consistently on an algebra in such a way that it is σ-additive whenever the two sides ! ∞ ∞ X [ µ Ai = µ(Ai ) i=1 i=1 S∞ make sense (whenever the disjoint union i=1 Ai actually belongs to the algebra), then it can be extended uniquely to the (typically much larger) σ-algebra generated by the original algebra, so as again to be a (σ-additive) measure. There is an important special part of this theorem which is worth stating separately. 27 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Definition 2.14 (Π-system): A Π-system of subsets of Ω is a collection of subsets including Ω itself and closed under finite intersections. Theorem 2.15 (Uniqueness for probability measures): Two finite measures which agree on a π-system Π also agree on the generated σ-algebra σ(Π). 2.8 Lebesgue measure and coin tossing The extension theorem can be applied to the “uniform probability space” Ω = [0, 1], A given by finite unions of intervals, P given by lengths of intervals. It turns out P is indeed σ-additive on A (showing this is non-trivial!) and so the extension theorem tells us there is a unique countably additive extension P on the σ-algebra B = σ(A) (the Borel σ-algebra restricted to [0, 1]). We call this Lebesgue measure. There is a significant connection between infinite sequences of coin tosses and numbers in [0, 1]. Briefly, we can expand a number x ∈ [0, 1] in binary (as opposed to decimal!): we write x as .ω1 ω2 ω3 ... 28 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 where ωi equals 1 or 0 according as 2i x is greater than 1 or not. The coin-tossing σ-algebra can be viewed as generated by the sequence {ω1 , ω2 , ω3 , ...} with 0 standing for tails, 1 for heads. In effect we get a map from coin-tossing space 2N to number space [0, 1] – with the slight cautionary note that this map very occasionally maps two sequences onto one number (think of .0111111... and .100000...). In particular [ω1 = a1 , ω2 = a2 , ..., ωd = ad ] = [x, x + 2−d ) where x is the number corresponding to (a1 , a2 , ..., ad ). Remarkably, we can now use the uniqueness theorem to show that the map T : (a1 , a2 , ..., ad ) 7→ x preserves probabilities, in the sense that Lebesgue measure is exactly the same as we get by finding the probability of the event T −1 (A) as a coin-tossing event, if the coins are independent and fair. 29 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 It is reasonable to ask whether there are any non-measurable sets, since σ-algebras are so big! It is indeed very hard to find any. Here is the basic example, which is due in essence to Vitali. Example 2.16 Consider the following equivalence relation on (Ω,B,P): we say x ∼ y if x − y is a rational number. Now construct a set A by choosing exactly one member from each equivalence class. So for any x ∈ [0, 1] there is one and only one y ∈ A such that x − y is a rational number. If A were Lebesgue measurable then it would have a value P [ A ]. What would this value be? Imagine [0, 1] folded round into a circle. It is the case that P [ A ] does not change when one turns this circle. In particular we can now consider Aq = {a + q : a ∈ A} for rational q. By construction Aq and Ar are disjoint for different rational q, r. Now we have [ Aq = [0, 1] q rational and since there are only countably many rational q, and P [ Aq ] doesn’t depend on q, we 30 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 determine P [ [0, 1] ] X = q P [ Aq ] rational X = q P[A] . rational But this cannot make sense if P [ [0, 1] ] = 1! We are forced to conclude that A cannot be Lebesgue measurable. This example has a lot to do with the Banach-Tarski paradox described in the motivating Example 1.3 above. 3 3.1 Independence and measurable functions Independence In ST111 we formalized the idea of independence of events. Essentially we require a “multiplication law” to hold: 31 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Definition 3.1 (Independence of an infinite sequence of events): We say the events Ai (for i = 1, 2, ...) are independent if, for any finite subsequence i1 < i2 < ... < ik we have P [ Ai1 ∩ ... ∩ Aik ] = P [ Ai1 ] × ... × P [ Aik ] Notice we require all possible multiplication laws to hold: it is possible to build interesting examples where events are independent pair-by-pair, but altogether give non-trivial information about each other. We need to talk about infinite sequences of events (often independent). We often have in the back of our minds a sense that the sequence is revealed to us progressively over time (though this need not be so!), suggesting two natural questions. First, will we see events occur in the sequence right into the indefinite future? Second, will we after some point see all events occur? Definition 3.2 (“Infinitely often” and “Eventually”): Given a sequence of events B1 , B2 , ... we say • Bi holds infinitely often ([Bi i.o.]) if there are infinitely many different i for which the 32 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 statement Bi is true: in set-theoretic terms [Bi i.o.] = ∞ [ ∞ \ Bj . i=1 j=i • Bi holds eventually ([Bi ev.]) if for all large enough i the statement Bi is true: in set-theoretic terms ∞ \ ∞ [ [Bi ev.] = Bj . i=1 j=i Notice these two concepts ev. and i.o. make sense even if the infinite sequence is just a sequence, with no notion of events occurring consecutively in time! Notice (you should check this yourself!) [Bi i.o.] = 33 [Bic ev.]c . ST213 outline notes (Version 1.6 [U]): 3.2 16:23:17, 24/01/2000 Borel-Cantelli lemmas The multiplication laws appearing above in Section 3.1 force a kind of “infinite multiplication law”. Lemma 3.3 (Probability of infinite intersection): If the events Ai (for i = 1, 2, ...) are independent then " ∞ # ∞ \ Y P Ai = P [ Ai ] i=1 i=1 We have to be careful what we mean by the infinite product course the limiting value lim n→∞ n Y Q∞ i=1 P [ Ai ]: we mean of P [ Ai ] . i=1 We can now prove a remarkable pair of facts about P [ Ai i.o. ] (and hence its twin P [ Ai ev. ]!). It turns out it is often easy to tell whether these events have probability 0 or 1. 34 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 3.4 (Borel-Cantelli lemmas): Suppose the events Ai (for i = 1, 2, ...) form an infinite sequence. Then P∞ (i) if i=1 P [ Ai ] < ∞ then P [ Ai holds infinitely often ] (ii) if P∞ i=1 = P [ Ai i.o. ] = 0; = 1. P [ Ai ] = ∞ and the Ai are independent then P [ Ai holds infinitely often ] = P [ Ai i.o. ] Note the two parts of the above result are not quite symmetrical: the second part also requires independence. It is a good exercise to work out a counterexample to part (ii) if independence fails. 3.3 Law of large numbers for events As a consequence of these ideas it can be shown that “limiting frequencies” exist for sequences of independent trials with the same success probability. 35 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 3.5 (Law of large numbers for events): Suppose that we have a sequence of independent events Ai each with the same probability p. Let Sn count the number of events A1 , ...,, An which occur. Then Sn P − p ≤ ev. = 1 n for all positive . 3.4 Independence and classes of events The idea of independence stretches beyond mere sequences of events. For example, consider (a) a set of events concerning a football match between Coventry City and Aston Villa at home for Coventry, and (b) a set of events concerning a cricket test between England and Australia at Melbourne, both happening on the same day. At least as a first approximation, one might assume that any combination of events concerning (a) is independent of any combination concerning (b). Definition 3.6 (Independence and classes of events): Suppose C1 , C2 are two classes 36 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 of events. We say they are independent if A and B are independent whenever A ∈ C1 , B ∈ C2 . Here our notion of Π-systems becomes important. Lemma 3.7 (Independence and Π-systems): If two Π-systems are independent, then so are the σ-algebras they generate. Returning to sequences, the above is the reason why we can jump immediately from assumptions of independence of events to deducing that their complements are independent. Corollary 3.8 (Independence and complements): If a sequence of events Ai is independent, then so is the sequence of complementary events Aci . 3.5 Measurable functions Mathematical work often becomes easier if one moves from sets to functions. Probability theory is no different. Instead of events (subsets of sample space) we can often find it easier to work with random variables (real-valued functions defined on sample space). You should 37 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 think of a random variable as involving lots of different events, namely those events defined in terms of the random variable taking on different sets of values. Accordingly we need to take care that the random variable doesn’t produce events which fall outwith our chosen σ-algebra. To do this we need to develop the idea of a measurable function. Definition 3.9 (Measurable space): (Ω, F) is a measurable space if F is a σ-algebra of subsets of Ω. Definition 3.10 (Borel σ-algebra): The Borel σ-algebra B is the σ-algebra of subsets of R generated by the collection of intervals of R. In fact we don’t need all the intervals of R. It is enough to take the closed half-infinite intervals (−∞, x]. Definition 3.11 (Measurable function): Suppose given two measurable spaces (Ω, F), (Ω0 , F0 ). We say the function f : Ω → Ω0 is measurable if f −1 (A) = {ω : f (ω) ∈ A} belongs to F whenever A belongs to F0 . 38 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Definition 3.12 (Random variable): Suppose that X : Ω → R is measurable as a mapping from (Ω, F) to (R, B). Then we say X is a random variable. As we have said, to each random variable there is a class of related events. This actually forms a σ-algebra. Definition 3.13 (σ-algebra generated by a random variable): If X : Ω → R is a random variable then the σ-algebra generated by X is the family of events σ(X) = {X −1 (A) : A ∈ B}. 3.6 Independence of random variables Random variables can be independent too! Essentially here independence means that a event generated by one of the random variables cannot be used to give useful predictions about an event generated by the other random variable. Definition 3.14 (Independence of random variables): We say random variables X and Y are independent if their σ-algebras σ(X), σ(Y ) are independent. 39 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 3.15 (Criterion for independence of random variables): Let X and Y be random variables, and let P be the Π-system of R formed by all half-infinite closed intervals (−∞, x]. Then X and Y are independent if and only if the collections of events X −1 P, Y −1 P are independent.5 3.7 Distributions of random variables We often need to talk about random variables on their own, without reference to other random variables or events. In such cases all we are interested in is the probabilities they have of taking values in various regions: Definition 3.16 (Distribution of a random variable): Suppose that X is a random variable. Its distribution is the probability measure PX on R given by PX [B] P [X ∈ B ] = whenever B ∈ B. 5 Here we define X −1 P = {X −1 (A) : A ∈ P} = {X −1 ((−∞, x]) : x ∈ (−∞, ∞)} 40 ST213 outline notes (Version 1.6 [U]): 4 16:23:17, 24/01/2000 Integration One of the main things to do with functions is to integrate them (find the area under the curve). One of the main things to do with random variables is to take their expectations (find their average values). It turns out that these are really the same idea! We start with integration. 4.1 Simple functions and Indicators Begin by thinking of the simplest possible function to integrate. That is an indicator function, which only takes two possible values, 0 or 1: Definition 4.1 (Indicator function): If A is a measurable set then its indicator function is defined by 0 if x 6∈ A; I[A] (x) = 1 if x ∈ A. The next stage up is to consider a simple function taking only a finite number of values, since it can be regarded as a linear combination of indicator functions. 41 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Definition 4.2 (Simple functions): A simple function h is a measurable function h : Ω → R which only takes finitely many values. Thus we can represent it as h(x) = c1 I[A1 ] (x) + ...cn I[An ] (x) for some finite collection A1 , ..., An of measurable sets and constants c1 , ..., cn . It is easy to integrate simple functions ... Definition 4.3 (Integration of simple functions): The integral of a simple function h with respect to a measure µ is given by Z h dµ Z = h(x)µ( dx) = n X ci µ(Ai ) i=1 where h(x) = c1 I[A1 ] (x) + ...cn I[An ] (x) as above. 42 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 R Note that one really should prove that the definition of h dµ does not depend on exactly how one represents h as the sum of indicator functions. Integration for such functions has a number of basic properties which one uses all the time, almost unconsciously, when trying to find integrals. Theorem 4.4 (Properties of integration for simple functions): R R (1) if µ(f 6= g) = 0 then f dµ = g dµ; R R R (2) Linearity: (af + bg) dµ = a f dµ + b g dµ; R R (3) Monotonicity: f ≤ g means f dµ ≤ g dµ; (4) min{f, g} and max{f, g} are simple. Simple functions are rather boring. For more general functions we use limiting arguments. We have to be a little careful here, since some functions will have integrals built up from +∞ where they are integrated over one part of the region, and −∞ over another part. 43 ST213 outline notes (Version 1.6 [U]): Think for example of Z ∞ 1 dx x −∞ = Z 0 ∞ 16:23:17, 24/01/2000 1 dx + x Z 0 −∞ 1 dx “equals” ∞ − ∞? x So we first consider just non-negative functions. Definition 4.5 (Integration for non-negative measurable functions): If f ≥ 0 is measurable then we define Z Z f dµ = sup g dµ : for simple g such that 0 ≤ g ≤ f . 4.2 Integrable functions For general functions we require that we don’t get into this situation of “∞ − ∞”. Definition 4.6 (Integration for general measurable functions): If f is measurable and we can write f = g − h for two non-negative measurable functions g and h, both with 44 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 finite integrals, then Z f dµ = Z g dµ − Z h dµ . We then say f is integrable. R One really needs to prove that the integral f dµ does not depend on the choice f = g−h. In fact if there is any choice which works then the easy choice g h = = max{f, 0} max{−f, 0} will work. One can show that the integral on integrable functions agrees with its definition on simple functions and is linear. What starts to make the theory very easy is that the integral thus defined behaves very well when studying limits. Theorem 4.7 (Monotone convergence theorem (MON)): If fn ↑ f (all being non- 45 ST213 outline notes (Version 1.6 [U]): negative measurable functions) then Z 16:23:17, 24/01/2000 fn dµ ↑ Z f dµ . Corollary 4.8 (Integrability and simple functions): if f is non-negative and measurable then for any sequence of non-negative simple functions fn such that fn ↑ f we have Z Z fn dµ ↑ f dµ . Definition 4.9 (Integration over a measurable set): if A is measurable and f is integrable then Z Z f dµ = I[A] f dµ . A 46 ST213 outline notes (Version 1.6 [U]): 4.3 16:23:17, 24/01/2000 Expectation of random variables The above notions apply directly to random variables, which may be thought of simply as measurable functions defined on the sample space! Definition 4.10 (Expectation): if P is a probability measure then we define expectation (with respect to this probability measure) for all integrable random variables X by Z Z E[X ] = X dP = X(ω)P( dω) . The notion of expectation is really only to do with the random variable considered on its own, without reference to any other random variables. Accordingly it can be expressed in terms of the distribution of the random variable. Theorem 4.11 (Change of variables): Let X be a random variable and let g : R → R be a measurable function. Assuming that the random variable g(X) is integrable, Z E [ g(X) ] = g(x)PX ( dx) . R 47 ST213 outline notes (Version 1.6 [U]): 4.4 16:23:17, 24/01/2000 Examples You need to work through exercises such as the following to get a good idea of how the above really works out in practice. See the material covered in lectures for more on this. R1 Exercise 4.12 Evaluate 0 xLeb( dx) = x. 4.13 Consider Ω = {1, 2, 3, ...}, P [ {i} ] = pi where RExerciseP ∞ f dP = i=1 f (i)pi . Exercise 4.14 Evaluate Ry Exercise 4.15 Evaluate Rn 0 ex Leb( dx). f (x)Leb( dx) where if 0 ≤ x < 1, 1 2 if 1 ≤ x < 2, f (x) = ... n if n − 1 ≤ x < n. 0 48 P∞ i=1 pi = 1. Evaluate ST213 outline notes (Version 1.6 [U]): Exercise 4.16 Evaluate 5 R 16:23:17, 24/01/2000 I[0,θ] (x) sin(x)Leb( dx). Convergence Approximation is a fundamental key to making mathematics work in practice. Instead of being stuck, unable to do a hard problem, we find an easier problem which has almost the same answer, and do that instead! The notion of convergence (see first-year analysis) is the formal structure giving us the tools to do this. For random variables there are a number of different notions of convergence, depending on whether we need to approximate a whole sequence of actual random values, or just a particular random value, or even just probabilities. 5.1 Convergence of random variables Definition 5.1 (Convergence in probability): The random variables Xn converge in probability to Y , Xn → Y in prob. , 49 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 if for all positive we have P [ |Xn − Y | > ] → 0. Definition 5.2 (Convergence almost surely / almost everywhere): The random variables Xn converge almost surely to Y , Xn → Y a.s. , if we have P [ Xn → Y ] = 0. The (measurable) functions fn converge almost everywhere to f if the set {x : fn (x) → f (x) fails } is of Lebesgue measure zero. The difference is that convergence in probability deals with just a single random value Xn for large n. Convergence almost surely deals with the behaviour of the whole sequence. Here are some examples to think about. 50 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Example 5.3 Consider random variables defined on ([0, 1], B, Leb) by Xn (ω) = I[[0,1/n]] (ω), Then Xn → 0 a.s.. Example 5.4 Consider the probability space above and the events A1 = [0, 1], A2 = [0, 1/2], A3 = [1/2, 1], A4 = [0, 1/4], ..., A7 = [3/4, 1], ... Then Xn = I[An ] converges to zero in probability but not almost surely. Example 5.5 Suppose in the above that Xn = n X (k/n)I[[(k−1)/n,k/n]] . k=1 Then Xn → X a.s., where X(ω = ω ∈ [0, 1]. Example 5.6 Suppose in the above that Xn ≤ a for all n. Let Yn = maxm≤n Xm . Then Yn ↑ Y a.s. for some Y . 51 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Example 5.7 Suppose in the above that the Xn are not bounded, but are independent, and furthermore ∞ Y lim P [ Xn ≤ a ] = 1 . a→∞ i=1 Then Yn ↑ Y a.s. where P [Y ≤ a] = ∞ Y P [ Xn ≤ a ] . i=1 As one might expect, the notion of almost sure convergence implies that of convergence in probability. Theorem 5.8 (Almost sure convergence implies convergence in probability): Xn → X a.s. implies Xn → X in prob. 52 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Almost sure convergence allows for various theorems telling us when it is OK to exchange integrals and limits. Generally this doesn’t work: consider the example Z ∞ 1 = λ exp(−λt) dt 6→ Z0 ∞ Z lim λ exp(−λt) dt = 0 dt = 0 . 0 λ→∞ However we have already seen one case where it does work: when the limit in monotonic. In fact we only need this to hold almost everywhere (i.e. when the convergence is almost sure). Theorem 5.9 (MON): if the functions fn , f are non-negative and if fn ↑ f µ − a.e. then Z Z fn dµ ↑ f dµ . It is often the case that the following simple inequalities are crucial to figuring out whether convergence holds. 53 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Lemma 5.10 (Markov’s inequality): if f : R → R is increasing and non-negative and X is a random variable then P [ X ≥ a ] ≤ E [ f (X) ] /f (a) for all a such that f (a) > 0. Corollary 5.11 (Chebyshev’s inequality): if E X 2 < ∞ then P [ |X − E [ X ] | ≥ a ] ≤ Var(X)/a2 for all a > 0. In particular we can get a lot of mileage by combining with the fact, that while in general the variance of a random variable is not additive, it is additive in the case of independence. Lemma 5.12 (Variance and independence): if a sequence of random variables Xi is independent then ! n n X X Var Xi = Var (Xi ) . i=1 i=1 54 ST213 outline notes (Version 1.6 [U]): 5.2 16:23:17, 24/01/2000 Laws of large numbers for random variables An important application of these ideas is to show that the law of large numbers extends from events to random variables. Theorem 5.13 (Weak law of large numbers): if a sequence of random variables Xi is independent, and if the random variables all have the same finite mean and variance E [ Xi ] = µ and Var(Xi ) = σ 2 < ∞, then Sn /n → µ in prob. where Sn = (X1 + ... + Xn )/n is the partial sum of the sequence. As you will see, the proof is really rather easy when we use Chebyshev’s inequality above. Indeed it is also quite easy to generalize to the case when the random variables are correlated, as long as the covariances are small ... However the corresponding result for almost sure convergence, rather than convergence, is rather harder to prove. 55 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 5.14 (Strong law of large numbers): if a sequence of random variables Xi is independent and identically distributed, and if E [ Xi ] = µ then Sn /n → µ a.s. where Sn = (X1 + ... + Xn )/n is the partial sum of the sequence. 5.3 Convergence of integrals and expectations We already know a way to relate integrals to limits (MON). What about a general sequence of non-negative measurable functions? Theorem 5.15 (Fatou’s lemma (FATOU)): If the functions fn : R → R are actually non-negative then Z Z lim inf fn dµ ≤ lim inf fn dµ . We can also go “the other way”: 56 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 5.16 (“Reverse Fatou”): If the functions fn : R → R are bounded above by g µ a.e. and g is integrable then Z Z lim sup fn dµ ≤ lim sup fn dµ . 5.4 Dominated convergence theorem Although in general one can’t interchange limits and integrals, this can be done if all the functions (equivalently, random variables) involved are bounded in absolute value by a single non-negative function (random variable) which has finite integral. Corollary 5.17 (Dominated convergence theorem (DOM)): If the functions fn : R → R are bounded above in absolute value by g µ a.e. (so |fn | < g a.e.) and g is integrable and also fn → f then Z Z lim fn dµ = f dµ . This is a very powerful result ... 57 ST213 outline notes (Version 1.6 [U]): 5.5 16:23:17, 24/01/2000 Examples Example 5.18 If the Xn form a bounded sequence random variable and they converge almost surely to X then E [ Xn ] → E [ X ] . Example 5.19 Suppose that U is a random variable uniformly distributed over [0, 1] and Xn = n 2X −1 k2−n I[k2−n ≤U <(k+1)2−n ] . k=0 Then E [ log(1 − Xn ) ] → −1. Example 5.20 Suppose that the Xn are independent and X1 = 1 while for n ≥ 2 P [ Xn = n + 1 ] P [ Xn = 1 ] = P [ Xn = 1/(n + 1) ] = 1 − 2/n3 58 = 1/n3 ST213 outline notes (Version 1.6 [U]): and Zn = and Qn i=1 Xi . Then the Zn form an almost surely convergent sequence with limit Z∞ , E [ Zn ] 6 6.1 16:23:17, 24/01/2000 = E [ Z∞ ] . Product measures Product measure spaces The idea here is, given two measure spaces (Ω, F, µ) and (Ω0 , F0 , ν), we build a meaasure space Ω × Ω0 by using “rectangle sets” A × B with measures µ(A) × ν(B). As you might guess from the “product form” µ(A) × ν(B), in the context of probability this is related to independence. Definition 6.1 (Product measure space): define the “product measure” µ ⊗ ν on the Π-system R of rectangle sets A × B as above. Let A(R) be the algebra generated by R. 59 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Lemma 6.2 (Representation of A(R)): every member of A(R) can be expressed as a finite disjoint union of rectangle sets. It is now possible to apply the Extension Theorem 2.13 (we need to check σ-additivity – this is non-trivial but works) to define the “product measure” µ ⊗ ν on the whole σ-algebra σ(R). 6.2 Fubini’s theorem There are three big results on integration. We have already met two: MON and DOM, which tell us cases when we can exchange integrals and limits. The other result arises in the situation where we have a product measure space. In such a case we can integrate any function in one of three possible ways: either using the product measure, or by first doing a “partial integration” holding one coordinate fixed, and then integrating with respect to that one. We call this alternative iterated integration, and obviously there are two ways to do it depending on which variable we fix first. The final big result is due to Fubini, and tells us that as long as the function is modestly well-behaved it doesn’t matter which of the three ways we do the integration, we still get the same answer: 60 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 Theorem 6.3 (Fubini’s theorem): Suppose f is a real-valued function defined on the product measure space above which is either (a) non-negative or (b) µ ⊗ ν-integrable. Then Z Z Z f d(µ ⊗ ν) = f (ω, ω 0 )µ( dω) ν( dω 0 ) Ω0 Ω Notice the two alternative conditions. Non-negativity (sometimes described as Tonelli’s condition, is easy to check but can be limited. Think carefully about Fubini’s theorem and especially Tonelli’s condition, and you will see that the only thing which can go wrong is when in the product form you have an ∞ − ∞ problem! 6.3 Relationship with independence Suppose X and Y are independent random variables. Then the distribution of the pair (X, Y ), a measure on R × R given by µ∗ (A) = P [ (X, Y ) ∈ A ] , is exactly the product measure µ⊗ν where µ is the distribution of X, and ν is the distribution of Y . 61 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 End of outline notes 62 ST213 outline notes (Version 1.6 [U]): 16:23:17, 24/01/2000 References [1] P. Billingsley. Probability and Measure. John Wiley & Sons, 1985. [OPAC]. 2 [2] G.R. Grimmett and D.R. Stirzaker. Probability and Random Processes. Oxford University Press, 1982. [OPAC]. 2 [3] D. Williams. Probability with Martingales. CUP, 1991. [OPAC]. 2 63
© Copyright 2026 Paperzz