Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 BTRY 4080 / STSCI 4080, Fall 2009 Theory of Probability Instructor: Ping Li Department of Statistical Science Cornell University Instructor: Ping Li 1 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li General Information • Lectures: Tue, Thu 10:10-11:25 am, Malott Hall 253 • Section 1: Mon 2:55 - 4:10 pm, Warren Hall 245 • Section 2: Wed 1:25 - 2:40 pm, Warren Hall 261 • Instructor: Ping Li, [email protected], Office Hours: Tue, Thu 11:25 am -12 pm, 1192, Comstock Hall • TA: Xiao Luo, [email protected]. Office hours: (1) Mon, 4:10 - 5:10pm Warren Hall 245; (2) Wed, 2:40 - 3:40pm, Warren Hall 261. • Prerequisites: Math 213 or 222 or equivalent; a course in statistical methods is recommended • Textbook: Ross, A First Course in Probability, 7th edition, 2006 2 Cornell University, BTRY 4080 / STSCI 4080 • Exams: Fall 2009 Locations TBD – Prelim 1: In Class, September 29, 2009 – Prelim 2: In Class, October 20, 2009 – Prelim 3: In Class, TBD – Final Exam: Wed. December 16, 2009, from 2pm to 4:30pm. – Policy: Close book, close notes Instructor: Ping Li 3 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • Homework: Weekly – Please turn in your homework either in class or to BSCB front desk (Comstock Hall, 1198). – No late homework will be accepted. – Before computing your overall homework grade, the assignment with the lowest grade (if ≥ grade (if ≥ 25%) will be dropped, the one with the second lowest 50%) will also be dropped. – It is the students’ responsibility to keep copies of the submitted homework. 4 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • Grading: 1. The homework: 30% 2. The prelim with the lowest grade: 3. The other two prelims: 4. The final exam 5% 15% each 35% • Course Letter Grade Assignment A ≈ 90% (in the absolute scale) C ≈ 60% (in the absolute scale) In borderline cases, participation in section and class interactions will be used as a determining factor. 5 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Material Topic Textbook Combinatorics 1.1 - 1.6 Axioms of Probability 2.1 - 2.5 Conditional Probability and Independence 3.1 - 3.5 Discrete Random Variables 4.1 - 4.9 Continuous Random Variables 5.1 - 5.7 Jointly Distributed Random Variables 6.1 - 6.7 Expectation 7.1 - 7.9 As time permits: Limit Theorems 8.1 - 8.6, Poisson Processes and Markov Chains 9.1 - 9.2 Simulation 10.1 - 10.4 6 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Probability Is Increasingly Popular & Truly Useful An Example: Probabilistic Counting Consider two sets S1 , S2 ∈ Ω = {0, 1, 2, ..., D − 1}, for example D = 2000 S1 = {0, 15, 31, 193, 261, 495, 563, 919} S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921} Q: What is the size of intersection a A: Easy! = |S1 ∩ S2 |? a = |{15, 193, 563}| = 3. 7 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Challenges • What if the size of space D = 264 ? • What if there are 10 billion of sets, instead of just 2? • What if we only need approximate answers? • What if we need the answers really quickly? • What if we are mostly interested in sets that are highly similar? • What if the sets are dynamic and frequently updated? • ... Instructor: Ping Li 8 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Search Engines Google/Yahoo!/Bing all collected at least 1010 (10 billion) web pages. Two examples (among many tasks) • (Very quickly) Determine if two Web pages are nearly duplicates A document can be viewed as a collection (set) of words or contiguous words. This is a counting of set intersection problem. • (Very quickly) Determine the number of co-occurrences of two words A word can be viewed as a set of documents containing that word This is also a counting of set intersection problem. 9 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Term Document Matrix Doc 1 Doc 2 Doc m Word 1 1 0 0 1 0 Word 2 0 1 0 1 0 0 0 0 1 1 0 Word 3 Word 4 Word n ≥ 1010 documents (Web pages). ≥ 105 common English words. (What about total number of contiguous words?) 10 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li One Method for Probabilistic Counting of Intersections Consider two sets S1 , S2 ⊂ Ω = {0, 1, 2, ..., D − 1} S1 = {0, 15, 31, 193, 261, 495, 563, 919} S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921} One can first generate a random permutation π : π : Ω = {0, 1, 2, ..., D − 1} −→ {0, 1, 2, ..., D − 1} Apply π to sets S1 and S2 (and all other sets). For example π (S1 ) = {13, 139, 476, 597, 698, 1355, 1932, 2532} π (S2 ) = {13, 236, 476, 683, 798, 1456, 1968, 2532, 3983} 11 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Store only the minimums min{π (S1 )} = 13, min{π (S2 )} = 13 Suppose we can theoretically calculate the probability P = Pr (min{π (S1 )} = min{π (S2 )}) S1 and S2 are more similar =⇒ larger P (i.e., closer to 1). Generate (small) k random permutations and every times store the minimums. Count the number of matches to estimate the similarity between S1 and S2 . 12 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Later, we can easily show that P = Pr (min{π (S1 )} = min{π (S2 )}) = where a |S1 ∩ S2 | a = |S1 ∪ S2 | f1 + f2 − a = |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |. Therefore, we can estimate P as a binomial probability from k permutations. 13 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Combinatorial Analysis Ross: Chapter 1, Sections 1.1 - 1.6 Instructor: Ping Li 14 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Basic Principles of Counting, Section 1.2 1. The multiplicative principle A project requires two sub-tasks, A and B. There are m ways to complete A, and there are n ways to complete B. Q: How many different ways to complete this project? 2. The additive principle A project can be completed either by doing task A, or by doing task B. There are m ways to complete A, and there are n ways to complete B. Q: How many different ways to complete this project? 15 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Multiplicative Principle B A 1 2 1 2 Destination Origin m n Q: How many different routes from the origin to reach the destination? Route 1: Route 2: ... Route n: Route n+1: ... Route ? { A:1 + B:1} { A:1 + B:2} { A:1 + B:n} { A:2 + B:1} { A:m + B:n} 16 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The additive Principle A 1 2 Destination Origin m B 1 2 n Q: How many different routes from the origin to reach the destination? 17 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors, and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to be chosen. Q: How many different subcommittees are possible? —————– Multiplicative or additive? 18 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: In a 7-place license plate, the first 3 places are to be occupied by letters and the final 4 by numbers, e.g., ITH-0827. Q: How many different 7-place license plates are possible? —————– Multiplicative or additive? 19 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: How many license plates would be possible if repetition among letters or numbers were prohibited? 20 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Permutations and Combinations, Section 1.3, 1.4 Need to select 3 letters from 26 English letter, to make a license plate. Q: How many different arrangements are possible? ——————— The answer differs if • Order of 3 letters matters =⇒ Permutation e.g., ITH, TIH, HIT, are all different. • Order of 3 letters does not matter =⇒ Combination 21 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Permutations Basic formula: Suppose we have n objects. How many different permutations are there? n × (n − 1) × (n − 2) × ... × 1 = n! Why? 22 Cornell University, BTRY 4080 / STSCI 4080 1 Fall 2009 2 Instructor: Ping Li 3 4 ... n ... n choices n−1 choices n−2 choices Which counting principle?: ... Multiplicative? additive? 1 choice 23 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example: A baseball team consists of 9 players. Q: How many different batting orders are possible? Instructor: Ping Li 24 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A class consists of 6 men and 4 women. The students are ranked according to their performance in the final, assuming no ties. Q: How many different rankings are possible? Q: If the women are ranked among themselves and the men among themselves how many different rankings are possible? 25 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example: How many different letter arrangements can be formed using P,E,P ? The brute-force approach • Step 1: Permutations assuming P1 and P2 are different. P1 P2 E P1 E P2 P2 P1 E P2 E P1 E P1 P2 E P2 P1 • Step 2: Removing duplicates assuming P1 and P2 are the same. PPE PEP EPP Q: How many duplicates? Instructor: Ping Li 26 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: How many different letter arrangements can be formed using P,E,P,P,E,R ? Answer: 6! = 60 3! × 2! × 1! Why? 27 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A More General Example of Permutations: How many different permutations of n objects, of which n1 are identical, n2 are identical, ..., nr are identical? • Step 1: Assuming all n objects are distinct, there are n! permutations. • Step 2: For each permutation, there are n1 ! × n2 ! × ... × nr ! ways to switch objects and still remain the same permutation. • Step 3: The total number of (unique) permutations are n! n1 ! × n2 ! × ... × nr ! 28 Cornell University, BTRY 4080 / STSCI 4080 1 2 3 4 Fall 2009 5 Instructor: Ping Li 6 7 8 9 10 11 How many ways to switch the elements and remain the same permutation? 29 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A signal consists of 9 flags hung in a line, which can be made from a set of 4 white flags, 3 red flags, and 2 blue flags. All flags of the same color are identical. Q: How many different signals are possible? 30 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Combinations Basic question of combination: The number of different groups of r objects that could be formed from a total of n objects. Basic formula of combination: n n! = r (n − r)! × r! Pronounced as “n-choose-r ” 31 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Basic Formula of Combination When orders are relevant, the number of different groups of r objects Step 1: formed from n objects = 1 n × (n − 1) × (n − 2) × ... × (n − r + 1) 2 3 4 ... r ... n choices Step 2: n−1 choices n−2 choices ... n−r+1 choices When orders are NOT relevant, each group of r objects can be formed by r! different ways. 32 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Step 3: Therefore, when orders are NOT relevant, the number of different groups of r objects would be n × (n − 1) × ... × (n − r + 1) r! n × (n − 1) × ... × (n − r + 1)×(n − r) × (n − r − 1) × ... × 1 r!×(n − r) × (n − r − 1) × ... × 1 n! = (n − r)! × r! n = r = 33 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example: A committee of 3 is to be formed from a group of 20 people. Q: How many different committees are possible? Instructor: Ping Li 34 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: How many different committees consisting of 2 women and 3 men can be formed, from a group of 5 women and 7 men? 35 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (1.4.4c): Consider a set of n antennas of which m are defective and n − m are functional. Q: How many linear orderings are there in which no two defectives are consecutive? A: n−m+1 . m Line up n − m functional antennas. Insert (at most) one defective antenna between two functional antennas. Considering the boundaries, there are in total n − m + 1 places to insert defective antennas. 36 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n Useful Facts about Binomial Coefficient r 1. n 0 2. n r 3. n r = 1, and nn = n n−r = n−1 r−1 = 1. Recall 0! = 1. n−1 r + Proof 1 by algebra n−1 r−1 + n−1 r = (n−1)! (n−r)!(r−1)! + (n−1)! (n−r−1)!r! = (n−1)!(r+n−r) (n−r)!r! Proof 2 by a combinatorial argument A selection of r objects either contains object 1 or does not contain object 1. 37 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 4. Some computational tricks • Avoid factorial n!, which is easily very large. 170! = 7.2574 × 10306 • n Take advantage of r = n n−r • Convert products to additions n n(n − 1)(n − 2)...(n − r + 1) = r! r r−1 r X X = exp log(n − j) − log j j=0 j=1 • Use accurate approximations of the log factorial function (log n!). Check http://en.wikipedia.org/wiki/Factorial. 38 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Binomial Theorem The Binomial Theorem: n (x + y) = n X n k=0 Example for n 2 k xk y n−k = 2: 2 2 (x + y) = x + 2xy + y , 2 0 = 2 2 = 1, 21 =2 39 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof of the Binomial Theorem by Induction Two key components in proof by induction: 1. Verify a base case, e.g., n = 1. 2. Assume the case n − 1 is true, prove for the case n. Proof: The base case is trivially true (but always check ! !) Assume (x + y)n−1 = n−1 X k=0 n − 1 k n−1−k x y k 40 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Then n X n n−1 X n k n−k xk y n−k = x y + xn + y n k k k=0 k=1 n−1 n−1 X n − 1 X n − 1 = xk y n−k + xk y n−k + xn + y n k k−1 k=1 k=1 n−1 n−1 X n − 1 X n − 1 =y xk y n−1−k − y n + xk y n−k + xn + y n k k−1 k=0 k=1 n−2 X n − 1 =y(x + y)n−1 + xj+1 y n−1−j + xn , (let j = k − 1) j j=0 =y(x + y)n−1 + x n−1 X j=0 n − 1 j n−1−j x y − xn + xn j =y(x + y)n−1 + x(x + y)n−1 = (x + y)n 41 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Multinomial Coefficients A set of n distinct items is to be divided into r distinct groups of respective size n1 , n2 , ..., nr , where Pr i=1 ni = n. Q: How many different divisions are possible? Answer: n n1 , n2 , ..., nr = n! n1 !n2 !...nr ! 42 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof n Step 1: There are n 1 ways to form the first group of size n1 . n−n1 Step 2: There are n2 ... ways to form the second group of size n2 . n−n1 −n2 −...−nr−1 Step r : There are nr ways to form the last group of size nr . Therefore, the total number of divisions should be n − n1 n − n1 − n2 − ... − nr−1 × × ... × n2 nr n! (n − n1 )! (n − n1 − ... − nr−1 )! = × × ... × (n − n1 )!n1 ! (n − n1 − n2 )!n2 ! (n − n1 − ...nr−1 − nr )!nr ! n! = n1 !n2 !...nr ! n n1 43 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A police department in a small city consists of 10 officers. The policy is to have 5 patrolling the streets, 2 working full time at the station, and 3 on reserve at the station. Q: How many different divisions of 10 officers into 3 groups are possible? 44 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Number of Integer Solutions of Equations Suppose x1 , x2 , ..., xr , are positive, and x1 + x2 + ... + xr = n. Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints. A: Line up n 1’s. Draw r (numbers). n−1 r−1 − 1 lines from in total n − 1 places to form r groups 45 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Suppose x1 , x2 , ..., xr , are non-negative, and x1 + x2 + ... + xr = n. Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints. A: n+r−1 , r−1 by padding r zeros after n one’s. One can also let yi = xi + 1, i = 1 to r . Then the problem becomes finding positive integers (y1 , ..., yr ) so that y1 + y2 + ... + yr = n + r . 46 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example: 8 identical blackboards are to be divided among 4 schools. Q (a): How many divisions are possible? Q (b): How many, if each school must receive at least 1 blackboard? Instructor: Ping Li 47 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li More Examples for Chapter 1 From a group of 8 women and 6 men a committee consisting of 3 men and 3 women is to be formed. (a): How many different committees are possible? (b): How many, if 2 of the men refuse to serve together? (c): How many, if 2 of the women refuse to serve together? (d): How many, if 1 man and 1 woman refuse to serve together? 48 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution: (a): How many different committees are possible? 8 3 6 3 × = 56 × 20 = 1120 (b): How many, if 2 of the men refuse to serve together? 8 3 × 4 3 + 2 1 4 2 = 56 × 16 = 896 (c): How many, if 2 of the women refuse to serve together? 6 3 + 2 1 6 2 × 6 3 = 50 × 20 = 1000 (d): How many, if 1 man and 1 woman refuse to serve together? 7 3 5 7 5 7 5 3 + 3 2 + 2 3 = 350 + 350 + 210 = 910 8 6 7 5 Or 3 3 − 2 2 = 910 Instructor: Ping Li 49 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Fermat’s Combinatorial Identity Theoretical Exercise 11. X n n i−1 = , k k−1 i=k n≥k There is an interesting proof (different from the book) using the result for the number of integer solutions of equations, by a recursive argument. 50 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof: Let f (m, r) be the number of positive integer solutions, i.e., x1 + x2 + ... + xr = m. We know f (m, r) Suppose m = m−1 r−1 . = n + 1 and r = k + 1, Then m−1 n f (n + 1, k + 1) = = k r−1 x1 can take values from 1 to (n + 1) − k , because xi ≥ 1, i = 1 to k + 1. Decompose the problem into (n + 1) − k sub-problems by fixing a value for x1 . = 1, then x2 + x3 + ... + xk+1 = m − 1 = n. The total number of positive integer solutions is f (n, k). If x1 51 Cornell University, BTRY 4080 / STSCI 4080 If x1 If x1 Fall 2009 = 1, there are still f (n, k) Instructor: Ping Li n−1 possibilities, i.e., k−1 . = 2, there are still f (n − 1, k) n−2 possibilities, i.e., k−1 ... If x1 = (n + 1) − k , there are still f (k, k) k−1 possibilities, i.e., k−1 Thus, by additive counting principle, f (n + 1, k + 1) = n+1−k X j=1 But, is this what we want? Yes! f (n + 1 − j, k) = n+1−k X j=1 n−j k−1 52 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li By additive counting principle, f (n + 1, k + 1) = n+1−k X j=1 n+1−k X f (n + 1 − j, k) n−j = k−1 j=1 n−1 n−2 k−1 = + + ... + k−1 k−1 k−1 k−1 k n−1 = + + ... + k−1 k−1 k−1 n X i−1 = k−1 i=k 53 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li A company has 50 employees, 10 of whom are in management positions. Q: How many ways are there to choose a committee consisting of 4 people, if there must be at least one person of management rank? A: 4 X 50 40 10 40 − = 138910 = 4 4 k 4−k k=1 ———10 Can we also compute the answer using 1 49 3 = 184240? No. 10 49 It is not correct because the two sub-tasks, 1 and 3 , are overlapping; and hence we can not simply apply the multiplicative principle. 54 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Axioms of Probability Chapter 2, Sections 2.1 - 2.5, 2.7 Instructor: Ping Li 55 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 2.1 In this chapter we introduce the concept of the probability of an event and then show how these probabilities can be computed in certain situations. As a preliminary, however, we need the concept of the sample space and the events of an experiment. 56 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sample Space and Events, Section 2.2 Definition of Sample Space: The sample space of an experiment, denoted by S , is the set of all possible outcomes of the experiment. Example: S = {girl, boy}, if the output of an experiment consists of the determination of the gender of a newborn. Example: If the experiment consists of flipping two coins (H/T), the the sample space S = {(H, H), (H, T ), (T, H), (T, T )} 57 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: If the outcome of an experiment is the order of finish in a race among 7 horses having post positions, 1, 2, 3, 4, 5, 6, 7, then S = {all 7! permutations of (1, 2, 3, 4, 5, 6, 7)} Example: If the experiment consists of tossing two dice, then S = {(i, j), i, j = 1, 2, 3, 4, 5, 6}, What is the size |S|? Example: If the experiment consists of measuring (in hours) the lifetime of a transistor, then S = {t : 0 ≤ t ≤ ∞}. 58 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Definition of Events: Any subset E of the sample space S is an event. Example: S = {(H, H), (H, T ), (T, H), (T, T )}, E = {(H, H)} is an event. E = {(H, T ), (T, T )} is also an event Example: S = {t : 0 ≤ t ≤ ∞}, E = {t : 1 ≤ t ≤ 5} is an event, But {t : −10 ≤ t ≤ 5} is NOT an event. Instructor: Ping Li 59 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Set Operations on Events Let A, B , C be events of the same sample space S A ⊂ S, B ⊂ S, C ⊂ S. Notation for subset: ⊂ Set Union: A ∪ B = the event which occurs if either A OR B occurs. Example: S = {1, 2, 3, 4, 5, ...} = the set of all positive integers. A = {1}, B = {1, 3, 5}, C = {2, 4, 6}, A ∪ B = {1, 3, 5}, B ∪ C = {1, 2, 3, 4, 5, 6}, A ∪ B ∪ C = {1, 2, 3, 4, 5, 6} 60 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Set Intersection: A ∩ B = AB = the event which occurs if both A AND B occur. Example: S = {1, 2, 3, 4, 5, ...} = the set of all positive integers. A = {1}, B = {1, 3, 5}, C = {2, 4, 6}, A ∩ B = {1}, B ∩ C = {} = ∅, A∩B∩C =∅ Mutually Exclusive: If A ∩ B = ∅, then A and B are mutually exclusive. 61 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Set Complement: Ac = S − A = the event which occurs only if A does not occur. Example: S = {1, 2, 3, 4, 5, ...} = the set of all positive integers. A = {1, 3, 5, ...} = the set of all positive odd integers. B = {2, 4, 6, ...} = the set of all positive even integers. Ac = S − A = B , B c = S − B = A. Instructor: Ping Li 62 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Laws of Set Operations Commutative Laws: A∪B =B∪A A∩B =B∩A Associative Laws: (A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C (A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C = ABC Distributive Laws: (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) = AC ∪ BC (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C) These laws can be verified by Venn diagrams. Instructor: Ping Li 63 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li DeMorgan’s Laws n [ Ei i=1 n \ i=1 Ei !c !c = n \ Eic i=1 = n [ i=1 Eic 64 Cornell University, BTRY 4080 / STSCI 4080 Proof of ( Sn c i=1 Ei ) = Fall 2009 Tn c E i i=1 Sn Tn c Let A = ( i=1 Ei ) and B = i=1 Eic 1. Prove A ⊆B ∈ A. =⇒ x does not belong to any Ei . Tn c =⇒ x belongs to every Ei . =⇒ x ∈ i=1 Eic = B . Suppose x 2. Proof B ⊆A ∈ B . =⇒ x belongs to every Eic . Sn c =⇒ x does not belong to any Ei . =⇒ x ∈ ( i=1 Ei ) = A. Suppose x 3. Consequently A =B Instructor: Ping Li 65 Cornell University, BTRY 4080 / STSCI 4080 Proof of Let Fi ( c Tn i=1 Ei ) = = Eic . We have proved ( Fall 2009 Sn c E i=1 i c Sn i=1 Fi ) = n \ i=1 Ei !c = Instructor: Ping Li = Tn i=1 n \ Fic i=1 n [ i=1 Fic Fi !c !c !c = n [ i=1 Fi = n [ i=1 Eic 66 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Axioms of Probability, Section 2.3 For each event E in the sample space S , there exists a value P (E), referred to as the probability of E . P (E) satisfies three axioms. Axiom 1: 0 ≤ P (E) ≤ 1 Axiom 2: P (S) = 1 Axiom 3: For any sequence of mutually exclusive events, E1 , E2 , ..., (that is, Ei Ej = ∅ whenever i 6= j ), then P ∞ [ i=1 Ei ! = ∞ X i=1 P (Ei ) 67 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Consequences of Three Axioms Consequence 1: P (∅) = 0. P (S) = P (S ∪ ∅) = P (S) + P (∅) =⇒ P (∅) = 0 Consequence 2: If S Sn = i=1 Ei and all mutually-exclusive events Ei are equally likely, meaning P (E1 ) = P (E2 ) = ... = P (En ), then P (Ei ) = 1 , n i = 1 to n 68 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Some Simple Propositions, Section 2.4 Proposition 1: Proof: P (E c ) = 1 − P (E) S = E ∪ E c =⇒ 1 = P (S) = P (E) + P (E c ) Proposition 2: If E ⊂ F , then P (E) ≤ P (F ) Proof: E ⊂ F =⇒ F = E ∪ (E c ∩ F ) =⇒ P (F ) = P (E) + P (E c ∩ F ) ≥ P (E) Instructor: Ping Li 69 Cornell University, BTRY 4080 / STSCI 4080 Proposition 3: Fall 2009 Instructor: Ping Li P (E ∪ F ) = P (E) + P (F ) − P (EF ) Proof: E ∪ F = (E − EF )∪(F − EF )∪EF , union of three mutually exclusive sets. =⇒ P (E ∪ F ) = P (E − EF ) + P (F − EF ) + P (EF ) = P (E) − P (EF ) + P (F ) − P (EF ) + P (EF ) = P (E) + P (F ) − P (EF ) Note that E = EF ∪ (E − EF ), union of two mutually exclusive sets. 70 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A Generalization of Proposition 3: P (E ∪ F ∪ G) =P (E) + P (F ) + P (G) − P (EF ) − P (EG) − P (F G) + P (EF G) Proof: Let B = F ∪ G. Then P (E ∪ B) = P (E) + P (B) − P (EB) P (B) = P (F ∪ G) = P (F ) + P (G) − P (F G) P (EB) = P (E(F ∪ G)) = P (EF ∪ EG) = P (EF ) + P (EG) − P (EF EG) = P (EF ) + P (EG) − P (EF G) 71 Cornell University, BTRY 4080 / STSCI 4080 Proposition 4: Fall 2009 Instructor: Ping Li P (E1 ∪ E2 ∪ ... ∪ En ) = ? Read this proposition and its proof yourself. We will temporarily skip this proposition and relevant examples 5m, 5n, and 5o. However, we will come back to those material at a later time. 72 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A total of 36 members of a club play tennis, 28 play squash, and 18 play badminton. Furthermore, 22 of the members play both tennis and squash, 12 play both tennis and badminton, 9 play both squash and badminton, and 4 play all three sports. Q: How many members of this club play at least one of these sports? 73 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution: A = { members playing tennis } B = { members playing squash } C = { members playing badminton } |A ∪ B ∪ C| =?? Instructor: Ping Li 74 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sample Space Having Equally Likely Outcomes, Sec. 2.5 Assumption: In an experiment, all outcomes in the sample space S are equally likely to occur. For example, S = {1, 2, 3, ..., N }. Assuming equally likely outcomes, then P ({i}) = 1 , i = 1 to N N For any event E : |E| P (E) = = Number of outcomes in S |S| Number of outcomes in E 75 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: If two dice are rolled, what is the probability that the sum of the upturned faces will equal 7? S = {(i, j), i, j = 1, 2, 3, 4, 5, 6} E = {??} 76 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li If 3 balls are randomly drawn from a bag containing 6 white and 5 black balls, what is the probability that one of the drawn balls is white and the other two black? S = {??} E = {??} 77 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li An urn contains n balls, of which one is special. If k of these balls are withdrawn randomly, which is the probability that the special ball is chosen? 78 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Revisit Probabilistic Counting of Set Intersections Consider two sets S1 , S2 ⊂ Ω = {0, 1, 2, ..., D − 1}. For example, S1 = {0, 15, 31, 193, 261, 495, 563, 919} S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921} One can first generate a random permutation π : π : Ω = {0, 1, 2, ..., D − 1} −→ {0, 1, 2, ..., D − 1} Apply π to sets S1 and S2 . For example π (S1 ) = {13, 139, 476, 597, 698, 1355, 1932, 2532} π (S2 ) = {13, 236, 476, 683, 798, 1456, 1968, 2532, 3983} 79 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Store only the minimums min{π (S1 )} = 13, min{π (S2 )} = 13 Suppose we can theoretically calculate the probability P = Pr (min{π (S1 )} = min{π (S2 )}) S1 and S2 are more similar =⇒ larger P (i.e., closer to 1). Generate (small) k random permutations and every times store the minimums. Count the number of matches to estimate the similarity between S1 and S2 . 80 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Now, we can easily (How?) show that P = Pr (min{π (S1 )} = min{π (S2 )}) = where a |S1 ∩ S2 | a = |S1 ∪ S2 | f1 + f2 − a = |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |. Therefore, we can estimate P as a binomial probability from k permutations. How many permutations (value of k ) we need?: Depend on the variance of this binomial distribution. 81 Cornell University, BTRY 4080 / STSCI 4080 Poker Example (2.5.5f): Fall 2009 Instructor: Ping Li A poker hand consists of 5 cards. If the cards have distinct consecutive values and are not all of the same suit, we say that the hand is a straight. What is the probability that one is dealt a straight? A 2 3 4 5 6 7 8 9 10 J Q K A 82 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li A 5-card poker hand is a full house if it consists of 3 cards of the same denomination and 2 cards of the same denomination. (That is, a full house is three of a kind plus a pair.) What is the probability that one is dealt a full house? 83 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example (The Birthday Attack): Instructor: Ping Li If n people are present in a room, what is the probability that no two of them celebrate their birthday on the same day of the year (365 days) ? How large need n be so that this probability is less than 12 ? 84 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Example (The Birthday Attack): Instructor: Ping Li If n people are present in a room, what is the probability that no two of them celebrate their birthday on the same day of the year? How large need n be so that this probability is less than 12 ? 365 × 364 × ... × (365 − n + 1) 365n 365 364 365 − n + 1 × × ... × = 365 365 365 p(n) = p(23) = 0.4927 < 21 , p(50) = 0.0296 85 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 1 0.9 0.8 0.7 p(n) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 n 40 50 60 86 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Matlab Code n = 5:60; for i = 1:length(n) p(i) = 1; for j = 2:n(i); p(i) = p(i) * (365-j+1)/365; end; end; figure; plot(n,p,’linewidth’,2); grid on; box on; xlabel(’n’); ylabel(’p(n)’); How to improve the efficiency for computing many n’s? 87 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li A football team consists of 20 offensive and 20 defensive players. The players are to be paired into groups of 2 for the purpose of determining roommates. The pairing is done at random. (a) What is the probability that there are no offensive-defensive roommate pairs? (b) What is the probability that there are 2i offensive-defensive roommate pairs? Denote the answer to (b) by P2i . The answer to (a) is then P0 . 88 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution Steps: 1. Total ways of dividing the 40 players into 20 unordered pairs. 2. Ways of pairing 2i offensive-defensive pairs. 3. Ways of pairing 20 − 2i remaining players within each group. 4. Applying multiplicative principle and equally likely outcome assumption. 5. Obtaining numerical probability values. 89 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution Steps: 1. Total ways of dividing the 40 players into 20 unordered pairs. ordered unordered 40 (40)! = 2, 2, ..., 2 (2!)20 (40)! (2!)20 (20)! 2. Ways of pairing 2i offensive-defensive pairs. 20 20 × × (2i)! 2i 2i 3. Ways of pairing 20 − 2i remaining players within each group. (20 − 2i)! 210−i (10 − i)! 2 90 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 4. Applying multiplicative principle and equally likely outcome assumption. P2i = 2 20 2i (2i)! h (20−2i)! 10−i 2 (10−i)! 40! 220 20! i2 5. Obtaining numerical probability values. Stirling’s approximation: n+1/2 −n n! ≈ n e √ 2π Be careful! • Better approximations are available. • n! n could easily overflow, so could k . • The probability never overflows, P ∈ [0, 1]. • Looking for possible cancelations. • Rearranging terms in numerator and denominator to avoid overflow. 91 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Conditional Probability and Independence Chapter 3, Sections 3.1 - 3.5 Instructor: Ping Li 92 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Introduction, Section 3.1 Conditional Probability: one of the most important concepts in probability: • Calculating probabilities when partial (side) information is available. • Computing desired probabilities may be easier. 93 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A Famous Brain-Teaser • There are 3 doors. • Behind one door is a prize. • Behind the other 2 is nothing. • First, you pick one of the 3 doors at random. • One of the doors that you did not pick will be opened to reveal nothing. • Q: Do you stick with the door you chose in the first place, or do you switch? 94 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A Joke: Tom & Jerry Conversation • Jerry: I’ve heard that you will take a plane next week. Are you afraid that there might be terrist activities? • Tom: Yes. That’s exactly why I am prepared. • Jerry: How? • Tom: It is reported that the chance of having a bomb in a plane is 1 106 . According to what I learned in my probability class (4080), the chance of having two bombs in the plane will be 10112 . • Jerry: So? • Tom: Therefore, to dramatically reduce the chance from bring one bomb to the plane myself! • Jerry: AH... 1 1 106 to 1012 , I will 95 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 3.2, Conditional Probabilities Example: Suppose we toss a fair dice twice. • What is the probability that the sum of the 2 dice is 8? = {??} Event E = {??} Sample space S • What if we know the first dice is 3? (Reduced) S = {??} (Reduced) E = {??} 96 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Basic Formula of Conditional Probability Two events E, F ⊆ S. P (E|F ): conditional probability that E occurs given that F occurs. If P (F ) > 0, then P (E|F ) = P (EF ) P (F ) 97 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Basic Formula of Conditional Probability P (E|F ) = P (EF ) P (F ) How to understand this formula? —- Venn diagram Given that F occurs, • What is the new (ie, reduced) sample space S ′ ? • What is the new (ie, reduced) event E ′ ? • P (E|F ) = |E ′ | |S ′ | = |E ′ |/S |S ′ |/S =?? 98 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.2.2c): In the card game bridge the 52 cards are dealt out equally to 4 players — called East, West, North, and South. If North and South have a total of 8 spades among them, what is the probability that East has 3 of the remaining 5 spades? The reduced sample space S ′ = {??} The reduced event E ′ = {??} 99 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A transformation of the basic formula P (E|F ) = P (EF ) P (F ) =⇒ P (EF ) = P (E|F ) × P (F ) = P (F |E) × P (E) P (E|F ) may be easier to compute than P (F |E), or vice versa. 100 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: A total of n balls are sequentially and randomly chosen, without replacement, from an urn containing r red and b blue balls (n ≤ r + b). Given that k of the n balls are blue, what is the probability that the first ball chosen is blue? Two solutions: 1. Working with the reduced sample space — Easy! Basically a problem of choosing one ball from n balls randomly. 2. Working with the original sample space and basic formula — More difficult! 101 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Working with original sample space: Event B : the first ball chosen is blue. Event Bk : a total of k blue balls are chosen. Desired probability: P (BBk ) P (Bk |B)P (B) P (B|Bk ) = = P (Bk ) P (Bk ) P (B) =?? P (Bk ) =?? P (Bk |B) =?? 102 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Desired probability: P (B|Bk ) = b P (B) = , r+b P (BBk ) P (Bk |B)P (B) = P (Bk ) P (Bk ) P (Bk ) = Therefore, P (Bk |B)P (B) P (B|Bk ) = = P (Bk ) b k r n−k r+b n b−1 r k−1 n−k r+b−1 n−1 P (Bk |B) = , b × × r+b b−1 r k−1 n−k r+b−1 n−1 r+b n b r k n−k k = n 103 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.2.2f): Suppose an urn contains 8 red balls and 4 white balls. We draw 2 balls from the urn without replacement. Assume that at each draw each ball in the urn is equally likely to be chosen. Q: what is probability that both balls drawn are red? Two solutions: 1. The direct approach 2. The conditional probability approach 104 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Two solutions: 1. The direct approach 8 2 12 2 14 = 33 2. The conditional probability approach 8 7 14 × = 12 11 33 105 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Multiplication Rule An extremely important formula in science P (E1 E2 E3 ...En ) = P (E1 )P (E2 |E1 )P (E3 |E1 E2 )...P (En |E1 E2 ...En−1 ) Proof? 106 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.2.2g): An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards each. Q: Compute the probability that each pile has exactly 1 ace. A: Two solutions: 1. Direct approach. 2. Conditional probability (as in the textbook). 107 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution by the direct approach • The size of sample space) 52! 52 39 26 52! = = 13! × 13! × 13! × 13! (13!)4 13 13 13 • Fix 4 aces in four piles • Only need to select 12 cards for each pile (from 48 cards total) 48 36 24 48! = 12 12 12 (12!)4 • Consider 4! permutations • The desired probability 48 36 12 12 52 39 13 13 24 12 26 13 48! (13!)4 × 4! = 4! = 0.1055 52! (12!)4 108 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution using the conditional probability Define • E1 = {The ace of spades is in any one of the piles } • E2 = {The ace of spades and the ace of hearts are in different piles } • E3 = {The ace of spades, hearts, and diamonds are all in different piles } • E4 = {All four aces are in different piles} The desired probability P (E1 E2 E3 E4 ) =P (E1 ) P (E2 |E1 ) P (E3 |E1 E2 ) P (E2 |E1 ) P (E4 |E1 E2 E3 ) 109 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • P (E1 ) = 1 • P (E2 |E1 ) = 39 51 • P (E3 |E1 E2 ) = 26 50 • P (E4 |E1 E2 E3 ) = 13 49 • P (E1 E2 E3 E4 ) = 39 26 13 51 50 49 = 0.1055. 110 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Section 3.3, Baye’s Formula The basic Baye’s formula P (E) = P (E|F )P (F ) + P (E|F c ) (1 − P (F )) Proof: • By Venn diagram. Or • By algebra using the conditional probability formula Instructor: Ping Li 111 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.3.3a): An accident-prone person will have an accident within a fixed one-year period with probability 0.4. This probability decreases to 0.2 for a non-accident-prone person. Suppose 30% of the population is accident prone. Q 1: What is the probability that a random person will have an accident within a fixed one-year period? Q 2: Suppose a person has an accident within a year. What is the probability that he/she is accident prone? 112 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example: Let p be the probability that the student knows the answer and 1 − p the probability that the student guesses. Assume that a student who guesses will be 1 correct with probability m , where m is the total number of multiple-choice alternatives. Q: What is the conditional probability that a student knew the answer, given that he/she answered it correctly? Solution: Let C = event that the student answers the question correctly. Let K = event that he/she actually knows the answer. P (K|C) =?? 113 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: Let C = event that the student answers the question correctly. Let K = event that he/she actually knows the answer. P (K|C) = P (KC) P (C|K)P (K) = P (C) P (C|K)P (K) + P (C|K c )P (K c ) 1×p = 1 1×p+ m (1 − p) mp = 1 + (m − 1)p ———— What happens when m is large? 114 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.3.3d): A lab blood test is 95% effective in detecting a certain disease when it is represent. However, the test also yields a false positive result for 1% of the healthy tested. Assume 0.5% of the population actually has the disease. Q: What is the probability that a person has the disease given that the test result is positive? Solution: Let D = event that the tested person has the disease. Let E = event that the test result is positive. P (D|E) = ?? 115 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Generalized Baye’s Formula Suppose F1 , F2 , ..., Fn are mutually exclusive events such as then P (E) = n X Sn i=1 P (E|Fi )P (Fi ), i=1 which is easy to understand from the Venn diagram. P (EFj ) P (E|Fj )P (Fj ) P P (Fj |E) = = n P (E) i=1 P (E|Fi )P (Fi ) Fi = S , 116 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.3.3m): A bin contains 3 different types of disposable flashlights. The probability that a type 1 flashlight will give over 100 hours of use is 0.7, with the corresponding probabilities for type 2 and type 3 flashlights being 0.4 and 0.3 respectively. Suppose that 20% of the flashlights in the bin are type 1, and 30% are type 2, and 50% are type 3. Q 1: What is the probability that a randomly chosen flashlight will give more than 100 hours of use? Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability that it was type j ? 117 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Q 1: What is the probability that a randomly chosen flashlight will give more than 100 hours of use? 0.7 × 0.2 + 0.4 × 0.3 + 0.3 × 0.5 = 0.14 + 0.12 + 0.15 = 0.41 Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability that it was type j ? Type 1: Type 2: Type 3: 0.14 0.41 0.12 0.41 0.15 0.41 = 0.3415 = 0.3927 = 0.3659 118 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Independent Events Definition: Two events E and F are said to be independent if P (EF ) = P (E)P (F ) Consequently, if E and F are independent, then P (E|F ) = P (E), P (F |E) = P (F ). Independent is different from mutually exclusive! 119 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Check Independence by Verifying P (EF ) Example: Instructor: Ping Li = P (E)P (F ) Two coins are flipped and all 4 outcomes are equally likely. Let E be the event that the first lands heads and F the event that the second lands tails. Verify that E and F are independent. Example: Suppose we toss 2 fair dice. Let E denote the event that the sum of two dice is 6 and F denote the event that the first die equals 4. Verify that E and F are not independent. 120 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Proposition 4.1: If E and F are independent, then so are E and F c . Proof: Given P (EF ) = P (E)P (F ) Need to show P (EF c ) = P (E)P (F c ). Which formula contains both P (EF ) and P (EF c )? Instructor: Ping Li 121 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Independence of Three Events Definition: The three events, E , F ,and G, are said to be independent if P (EF G) = P (E)P (F )P (G) P (EF ) = P (E)P (F ) P (EG) = P (E)P (G) P (F G) = P (F )P (G) Instructor: Ping Li 122 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Independence of More Than Three Events Definition: The events, Ei , i = 1 to n, are said to be independent if, for every E1 , E2 , ..., Er , r ≤ n, of these events, P (E1 E2 ...Er ) = P (E1 )P (E2 )...P (Er ) Instructor: Ping Li 123 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.4.4f): An infinite sequence of independent trials is to be performed. Each trial results in a success with probability p and a failure with probability 1 − p. Q: What is the probability that 1. at least one success occurs in the first n trials. 2. exactly k successes occur in the first n trials. 3. all trials result in successes? 124 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution: Q: What is the probability that 1. at least one success occurs in the first n trials. 1 − (1 − p)n 2. exactly k successes occur in the first n trials. n k pk (1 − p)n−k How is this formula related to the binomial theorem? 3. all trials result in successes? pn → 0 if p < 1. Instructor: Ping Li 125 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (3.4.4h): Independent trials, consisting of rolling a pair of fair dice, are performed. What is the probability that an outcome of 5 appears before an outcome of 7 when the outcome is the sum of the dice? Solution 1: Let En denote the event that no 5 or 7 appears on the first n − 1 trials and a 5 appears on the nth trial, then the desired probability is P ∞ [ n=1 En ! Why Ei and Ej are mutually exclusive? = ∞ X n=1 P (En ) 126 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li P (En ) = P ( no 5 or 7 on the first n − 1 trials AND a 5 on the nth trial) By independence P (En ) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial) P ( a 5 on the nth trial) = P (a 5 on any trial) P ( no 5 or 7 on the first n − 1 trials) = [P (no 5 or 7 on any trial)] n−1 P (no 5 or 7 on any trial) = 1 − P (either a 5 OR a 7 on any trial) P (either a 5 OR a 7 on any trial) = P (a 5 on any trial) + P (a 7 on any trial) 127 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li P (En ) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial) P (En ) = 4 6 − 1− 36 36 n−1 4 1 = 36 9 13 18 n−1 The desired probability P ∞ [ n=1 En ! ∞ X 1 1 2 = P (En ) = 13 = 5 . 9 1 − 18 n=1 128 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution 2 (using a conditional argument): Let E = the desired event. Let F = the event that the first trial is a 5. Let G = the event that the first trial is a 7. Let H = the event that the first trial is neither a 5 nor 7. P (E) = P (E|F )P (F ) + P (E|G)P (G) + P (E|H)P (H) 129 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A General Result: If E and F are mutually exclusive events of an experiment, then when independent trials of this experiment are performed, E will occur before F with probability P (E) P (E) + P (F ) 130 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Gambler’s Ruin Problem (Example 3.4.4k) Two gamblers, A and B , bet on the outcomes of successive flips of a coin. On each flip, if the coin comes up head, A collects 1 unit from B , whereas if it comes up tail, A pays 1 unit to B . They continue until one of them runs out of money. If it is assumed that the successive flips of the coin are independent and each flip results in a head with probability p, what is the probability that A ends up with all the money if he starts with i units and B starts with N − i units? Intuition: What is this probability if the coin is fair (i.e., p = 1/2)? 131 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution: E = event that A ends with all the money when he starts with i. H = event that first flip lands heads. Pi = P (E) = P (E|H)P (H) + P (E|H c )P (H c ) P0 =?? PN =?? P (E|H) =?? P (E|H c ) =?? Instructor: Ping Li 132 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Pi = P (E) =P (E|H)P (H) + P (E|H c )P (H c ) =pPi+1 + (1 − p)Pi−1 = pPi+1 + qPi−1 =⇒Pi+1 = Pi − qPi−1 q =⇒ Pi+1 − Pi = (Pi − Pi−1 ) p p q P2 − P1 = P1 p 2 q q P3 − P2 = (P2 − P1 ) = P1 p p ... i−1 q Pi − Pi−1 = P1 p 133 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li " Because PN 2 i−1 # q q q Pi =P1 1 + + + ... + p p p i q 1 − p =P1 if q 6= p q 1− p = 1, we know 1 − pq P1 = N 1 − pq i 1 − pq Pi = N 1 − pq (Do we really have to worry about q if q 6= p if q 6= p = p = 12 ?) 134 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Matching Problem (Example 3.5.5d) At a party n men take off their hats. The hats are then mixed up, and each man randomly selects one. We say that a match occurs if a man selects his own hat. Q: What is the probability of (a) no matches; (b) exactly k matches? 135 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (a): using recursion and conditional probabilities. 1. Let E = event that no matches occurs. Denote Pn = P (E). Let M = event that the first man selects his own hat. 2. Pn = P (E) = P (E|M )P (M ) + P (E|M c )P (M c ) 3. Express Pn as a function a Pn−1 , Pn−2 , etc. 4. Identify the boundary conditions, eg P1 5. Prove Pn , eg, by induction. =?, P2 =?. 136 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (a): using recursion and conditional probabilities. 1. Let E = event that no matches occurs. Denote Pn = P (E). Let M = event that the first man selects his own hat. 2. Pn = P (E) = P (E|M )P (M ) + P (E|M c )P (M c ) P (M c ) = P (E|M ) = 0, n−1 n , Pn = n−1 c n P (E|M ) 3. Express Pn as a function a Pn−1 , Pn−2 , etc. P (E|M c ) = 1 n−1 Pn−2 + Pn−1 , 4. Identify the boundary conditions. P1 = 0, P2 = 1/2. 5. Prove Pn by induction. Pn = 1 2! − 1 3! + 1 4! − ... + (−1)n n! . (What is a good approximation of Pn ?) Pn = n−1 n Pn−1 + n1 Pn−2 . 137 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (b): 1. Fix a group of k men. 2. The probability that these k men select their own hats. 3. The probability that none of the remaining n − k men select their own hats. 4. Multiplicative counting principle. 5. Adjustment for selecting k men. 138 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (b): 1. Fix a group of k men. 2. The probability that these k men select their own hats. 1 1 1 1 ... n n−1 n−2 n−k+1 = (n−k)! n! 3. The probability that none of the remaining n − k men select their own hats. Pn−k 4. Multiplicative counting principle. (n−k)! n! × Pn−k 5. Adjustment for selecting k men. (n−k)! n n! Pn−k k 139 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li P (.|F ) Is A Probability Conditional probabilities satisfy all the properties of ordinary probabilities. Proposition 5.1 • 0 ≤ P (E|F ) ≤ 1 • P (S|F ) = 1 • If Ei , i = 1, 2, ..., are mutually exclusive events, then ! ∞ ∞ [ X P Ei |F = P (Ei |F ) i=1 i=1 140 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Conditionally Independent Events Definition: Two events, E1 and E2 , are said to be conditionally independent given event F , if P (E1 |E2 F ) = P (E1 |F ) or, equivalently P (E1 E2 |F ) = P (E1 |F )P (E2 |F ) 141 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Laplace’s Rule of Succession (Example 3.5.5e) There are k + 1 coins in a box. The ith coin will, when flipped, turn up heads with probability i/k , i = 0, 1, ..., k . A coin is randomly selected from the box and is then repeatedly flipped. Q: If the first n flip all result in heads, what is the conditional probability that the (n + 1)st flip will do likewise? Key: Conditioning on that the ith coin is selected, the outcomes will be conditionally independent. 142 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: Let Ci = event that the ith coin is selected. Let Fn = event the first n flips all result in heads. Let H = event that the (n + 1)st flip is a head. The desired probability: P (H|Fn ) = k X i=0 By conditional independence P (H|Fn Ci )P (Ci |Fn ) P (H|Fn Ci ) =?? Solve P (Ci |Fn ) using Baye’s formula. 143 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The desired probability: P (H|Fn ) = k X i=0 By conditional independence P (H|Fn Ci )P (Ci |Fn ) P (H|Fn Ci ) = P (H|Ci ) = i k Using Baye’s formula P (Ci Fn ) P (Fn |Ci )P (Ci ) P (Ci |Fn ) = = = Pk P (Fn ) j=0 P (Fn |Cj )P (Cj ) The final answer: P (H|Fn ) = j n+1 j=0 k Pk j n j=0 k Pk i n 1 k k+1 Pk j n 1 j=0 k k+1 144 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Chapter 4: (Discrete) Random Variables Definition: A random variable is a function from the sample space S to the real numbers. A discrete random variable is a random variable that takes only on a finite (or at most a countably infinite) number of values. 145 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (1a): Suppose the experiment consists of tossing 3 fair coins. Let Y denote the number of heads appearing, then Y is a discrete random variable taking on one of the values 0, 1, 2, 3 with probabilities P (Y = 0) = P {(T, T, T )} = 1 8 3 P (Y = 1) = P {(H, T, T ), (T, H, T ), (T, T, H)} = 8 P (Y = 2) = P {(T, H, H), (H, T, H), (H, H, T )} = P (Y = 3) = P {(H, H, H)} = 1 8 Check P 3 [ ! {Y = i} i=0 = 3 X i=0 P {Y = i} = 1. 3 8 146 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (1c) Independent trials, consisting of the flipping of a coin having probability p of coming up heads, are continuously performed until either a head occurs or a total of n flips is made. If we let X denote the number of times the coin is flipped, then X is a random variable taking on one of the the values 1, 2, ..., n. P (X = 1) = P {H} = p P (X = 2) = P {(T, H)} = (1 − p)p P (X = 3) = P {(T, T, H)} = (1 − p)2 p ... P (X = n − 1) = (1 − p)n−2 p P (X = n) = (1 − p)n−1 Check Pn i=1 P (X = i) = 1?? 147 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (1d) 3 balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black balls. Suppose that we win $1 for each white ball selected and lose $1 for each red ball selected. Let X denote our total winnings from the experiments. Q 1: What possible values can X take on? Q 2: What are the respective probabilities? 148 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution 1: What possible values can X take on? X ∈ {3, 2, 1, 0, −1, −2, −3}. Solution 2: What are the respective probabilities? P (X = 0) = 5 3 + 3 3 1 1 11 3 5 1 P (X = 3) = P (X = −3) = 55 165 3 = 3 11 3 1 = 165 149 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Coupon Collection Problem (Example 1e) There are N distinct types of coupons and each time one obtains a coupon equally likely from one of the N types, independent of prior selections. A random variable T = number of coupons that needs to be collected until one obtains a complete set of at least one of each type. Q: What is P (T = n)? 150 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution: 1. Suffice to study P (T > n) because P (T = n) = P (T > n) − P (T > n − 1) 2. Define event Aj = no type j coupon is collected among first n. 3. P (T > n) = P Then what? S N j=1 Aj Instructor: Ping Li 151 Cornell University, BTRY 4080 / STSCI 4080 P N [ j=1 + Aj = X j1 <j2 <j3 Fall 2009 X j P (Aj ) − Instructor: Ping Li X P (Aj1 Aj2 ) j1 <j2 P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN ) 152 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 n N −1 P (Aj ) = N n N −2 P (Aj1 Aj2 ) = N ... n N −k P (Aj1 Aj2 ...Ajk ) = N ... n 1 P (Aj1 Aj2 ...AjN −1 ) = N P (A1 A2 ...AN ) = 0 Instructor: Ping Li 153 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 P (T > n) = P + X N [ j=1 Instructor: Ping Li Aj = X j P (Aj ) − X P (Aj1 Aj2 ) j1 <j2 P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN ) j1 <j2 <j3 n n N N −2 N N −3 =N − + 2 N 3 N n N 1 N + ... + (−1) N −1 N n N −1 X N N −j (−1)j+1 = j N j=1 N −1 N n 154 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Cumulative Distribution Function For a random variable X , its cumulative distribution function F is defined as F (x) = P {X ≤ x} F (x) is a non-decreasing function: If x ≤ y , then F (x) ≤ F (y). 155 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.2: Discrete Random Variables Discrete random variable A random variable X that take on at most a countable number of possible values. The probability mass function p(a) = P {X = a} If X takes one of the values x1 , x2 , ..., then p(xi ) ≥ 0 ∞ X i=1 p(xi ) = 1 156 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (4.2.2a): The probability mass function of a random variable X is given by p(i) = cλi i! , i = 0, 1, 2, .... Q: Find • the relation between c and λ. • P {X = 0}; • P {X > 2}; 157 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: • The relation between c and λ. ∞ X p(i) = 1 =⇒ c i=0 ∞ X λi i=0 i! = 1 =⇒ ceλ = 1 =⇒ c = e−λ . • P {X = 0}; P {X = 0} = c = e−λ . • P {X > 2}; P {X > 2} =1 − P {X = 0} − P {X = 1} − P {X = 2} λ2 =1 − c − cλ − c 2 −λ =1 − e −λ − λe λ2 e−λ − . 2 158 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.3: Expected Value Definition: If X is a discrete random variable having a probability mass function p(x), the expected value of X is defined by E(X) = X xp(x) x;p(x)>0 The expected value is a weighted average of possible values that X can take on. 159 Cornell University, BTRY 4080 / STSCI 4080 Example Fall 2009 Instructor: Ping Li Find E(x) where X is the outcome when we roll a fair die. 6 6 X 1 1X 7 E(X) = i = i= 6 6 i=1 2 i=1 Example E(X) =?? Let I be an indicator variable for event A, if 1 I= 0 if A occurs if Ac occurs 160 Cornell University, BTRY 4080 / STSCI 4080 Example (3d) Fall 2009 Instructor: Ping Li A school class of 120 students are driven in 3 buses to a symphony. There are 36 students in one of the buses, 40 in another, and 44 in the third bus. When the buses arrive, one of the 120 students is randomly chosen. Let X denote the number of students on the bus of that randomly chosen student, find E(X). Solution: X = {??} P (X) =?? 161 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (4a) P {X = −1} = 0.2, P {X = 0} = 0.5, P {X = 1} = 0.2. Find E(X 2 ). Solution: Let Y = X 2. P {Y = 0} =P {X = 0} = 0.5 P {Y = 1} =P {X = 1 OR X = −1} =P {X = 1} + P {X = −1} =0.2 + 0.3 = 0.5 Therefore E(X 2 ) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5 162 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Verify X i P {X = xi }x2i =P {X = −1}(−1)2 + P {X = 0}(0)2 + P {X = 1}(1)2 ?? =0.5 = E(X 2 ) 163 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.4: Expectation of a Function of Random Variable Proposition 4.1 values xi , i function g If X is a discrete random variable that takes on one of the ≥ 1, with respective probabilities p(xi ), then for any real-valued E (g(X)) = X g(xi )p(xi ) i Proof: Group the i’s with the same g(xi ). 164 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof: Group the i’s with the same g(xi ). X g(xi )p(xi ) = i X j = g(xi )p(xi ) i:g(xi )=yj X yj X yj P {g(X) = yj } j = X j X p(xi ) i:g(xi )=yj =E(g(X)). 165 Cornell University, BTRY 4080 / STSCI 4080 Corollary 4.1 Fall 2009 Instructor: Ping Li If a and b are constants, then E(aX + b) = aE(X) + b. Proof Use the definition of E(X). The nth moment n E(X ) = X x;p(x)>0 xn p(x) 166 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.5: Variance The expectation E(X) measures the average of possible values of X . The variance V ar(X) measures the variation, or spread of these values. Definition If X is a random variable with mean µ, then the variance of X is defined by V ar(X) = E (X − µ) 2 167 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li An alternative formula for V ar(X) 2 V ar(X) = E(X 2 ) − (E(X)) = E(X 2 ) − µ2 Proof: 2 V ar(X) =E (X − µ) X = (x − µ)2 p(x) x = X x = X x (x2 − 2µx + µ2 )p(x) 2 x p(x) − 2µ X =E(X 2 ) − 2µ2 + µ =E(X 2 ) − µ2 . x 2 xp(x) + µ 2 X x p(x) 168 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Two important identities If a and b are constants, then V ar(a) = V ar(b) = 0 V ar(aX + b) = a2 V ar(X) Proof: Use the definitions. Instructor: Ping Li 169 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li E(aX + b) = aE(x) + b V ar(aX + b) =E ((aX + b) − E(aX + b))2 =E ((aX + b) − aEX − b)2 2 =E (aX − aEX) 2 =a2 E (X − EX) =a2 V ar(X) 170 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.6: Bernoulli and Binomial Random Variables Bernoulli random variable X ∼ Bernoulli(p) P {X = 1} = p(1) = p, P {X = 0} = p(0) = 1 − p. Properties: E(X) = p V ar(X) = p(1 − p) 171 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Binomial random variable Definition A random variable X presents the number of total successes that occur in n independent trials. Each trial results in a success with probability p and a failure with probability 1 − p. X ∼ Binomial(n, p). The probability mass function n i p(i) = P {X = i} = p (1 − p)n−i i • Proof? Pn • i=0 p(i) = 1? 172 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A very useful representation: X ∼ binomial(n, p) can be viewed as the sum of n independent Bernoulli random variables with parameter p. X= n X i=1 Yi , Yi ∼ Bernoulli(p) 173 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li Five fair coins are flipped. If the outcomes are assumed independent, find the probability mass function of the numbers of heads obtained. Solution: • X = number of heads obtained. • X ∼ binomial(n, p). • P {X = i} =? n=??, p=?? 174 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Properties of X Expectation: Variance: Proof:?? Instructor: Ping Li ∼ binomial(n, p) E(X) = np. V ar(X) = np(1 − p). 175 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n X n i E(X) = i p (1 − p)n−i i i=0 n X n! pi (1 − p)n−i = i i!(n − i)! i=0 = n X i=1 =np i n! pi (1 − p)n−i i!(n − i)! n X i=1 =np n−1 X j=0 =np (n − 1)! pi−1 (1 − p)(n−1)−(i−1) (i − 1)!((n − 1) − (i − 1))! (n − 1)! pj (1 − p)(n−1)−j j!((n − 1) − j)! 176 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n n X X n n! E(X 2 ) = i2 pi (1 − p)n−i = i2 pi (1 − p)n−i i i!(n − i)! i=0 i=1 n X (n − 1)! =np i pi−1 (1 − p)(n−1)−(i−1) (i − 1)!((n − 1) − (i − 1))! i=1 n−1 X (n − 1)! pj (1 − p)(n−1)−j =np (j + 1) j!((n − 1) − j)! j=0 Y ∼ binomial(n − 1, p) =npE(Y + 1) =np((n − 1)p + 1) 2 V ar(X) = E(X 2 ) − (E(X)) = np((n − 1)p + 1) − (np)2 = np(1 − p) 177 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 What Happens When n X ∼ binomial(n, p), Instructor: Ping Li → ∞ and np → λ? n → ∞ and np → λ n! pi (1 − p)n−i P {X = i} = (n − i)!i! (1 − p)n pi = n(n − 1)(n − 2)...(n − i + 1) i! (1 − p)i n i (1 − p) (np) n(n − 1)(n − 2)...(n − i + 1) = i! (1 − p)i ni e−λ λi ≈ i! 178 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 lim n→∞ Instructor: Ping Li λ 1− n n = e−λ For fixed (small) i, n(n − 1)(n − 2)...(n − i + 1) lim =1 n→∞ (n − λ)i 179 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.7: The Poisson Random Variable Definition: A random variable X , taking on one of the values 0, 1, 2, ..., is a Poisson random variable with parameter λ, if for some λ i λ p(i) = P {X = i} = e−λ , i! i = 0, 1, 2, 3, ... Check ∞ X i=0 p(i) = ∞ X i=0 i −λ λ e > 0, i! = 1?? 180 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Facts about Poisson Distribution • Poisson(λ) approximates binomial (n, p), when n large and np ≈ λ. −λ i n i e λ n−i p (1 − p) ≈ i i! • Poisson is widely used for modeling rare (p ≈ 0) events (but not too rare). – The number of misprints on a page of a book – The number of people in a community living to 100 years of age – ... 181 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Compare Binomial with Poisson n = 20 p = 0.2 n = 20 p = 0.2 Binomial Poisson 0.25 Poisson Binomial 0.25 0.15 0.15 p(i) 0.2 p(i) 0.2 0.1 0.1 0.05 0.05 0 0 5 10 i 15 Poisson approximation is good • n does not have to be too large. • p does not have to be too small. 20 0 0 5 10 i 15 20 182 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Matlab Code function comp_binom_poisson(n,p); i=0:n; P1=binopdf(i,n,p); P2=poisspdf(i,n*p); figure; bar(i,P1,’g’); hold on; grid on; bar(i,P2,’r’); legend(’Binomial’,’Poisson’); xlabel(’i’); ylabel(’p(i)’); axis([-0.5 n 0 max(P1)+0.05]); title([’n = ’ num2str(n) ’ p = ’ num2str(p)]); figure; bar(i,P2,’r’); hold on; grid on; bar(i,P1,’g’); legend(’Poisson’,’Binomial’); xlabel(’i’); ylabel(’p(i)’); axis([-0.5 n 0 max(P1)+0.05]); title([’n = ’ num2str(n) ’ p = ’ num2str(p)]); 183 Cornell University, BTRY 4080 / STSCI 4080 Example Fall 2009 Instructor: Ping Li Suppose that the number of typographical errors on a single page of a book has a Poisson distribution with parameter λ = 12 . Q: Calculate the probability that there is at least one error on this page. P (X ≥ 1) = 1 − P (X = 0) = 1 − e−1/2 = 0.393. 184 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectation and Variance of Poisson X ∼ P oisson(λ) Y ∼ binomial(n, p) X approximates Y when (1): n large, (2): p small, and (3): np ≈ λ E(Y ) = np, Guess E(X) = ??, V ar(X) =?? V ar(Y ) = np(1 − p) 185 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li X ∼ P oisson(λ) ∞ X ∞ ∞ i i−1 j X X λ λ −λ −λ −λ λ E(X) = ie =λ e =λ e =λ i! (i − 1)! j! i=1 i=1 j=0 ∞ i i−1 X λ −λ λ 2 2 −λ =λ ie E(X ) = i e i! (i − 1)! i=1 i=1 ∞ X =λ ∞ X j −λ λ (j + 1)e j=0 V ar(X) = λ2 + λ − λ2 = λ j! = λ (E(X) + 1) = λ2 + λ 186 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.8.1: Geometric Random Variable Suppose that independent trials, each having probability p of Definition: being a success, are performed until a success occurs. X is the number of trials required. Then P {X = n} = (1 − p)n−1 p, Check P∞ n=1 P {X = n} = P∞ n=1 (1 n = 1, 2, 3, ... − p)n−1 p = 1?? 187 Cornell University, BTRY 4080 / STSCI 4080 Expectation Fall 2009 X ∼ Geometric(p). E(X) = ∞ X n=1 We know Instructor: Ping Li n(1 − p)n−1 p (q = 1 − p) =p + 2qp + 3q 2 p + 4q 3 p + 5q 4 p + ... 2 3 4 =p 1 + 2q + 3q + 4q + 5q + ... =?? 1 + q + q 2 + q 3 + q 4 + ... = 1 1−q How about I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =?? 188 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =?? First trick q × I = q + 2q 2 + 3q 3 + 4q 4 + 5q 5 + ... 1 I − qI = 1 + q + q + q + q + ... = 1−q 1 =⇒I = (1 − q)2 p 1 =⇒E(X) = pI = = (1 − q)2 p 2 3 4 189 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =?? Second trick d 2 3 4 5 q + q + q + q + q + ... I= dq d 1 1 I= = dq 1 − q (1 − q)2 Instructor: Ping Li 190 Cornell University, BTRY 4080 / STSCI 4080 Variance Fall 2009 Instructor: Ping Li X ∼ Geometric(p). E(X 2 ) = ∞ X n=1 n2 (1 − p)n−1 p 2 2 2 (q = 1 − p) 2 3 2 4 =p 1 + 2 q + 3 q + 4 q + 5 q + ... J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =?? 191 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =?? Trick 1 qJ = q + 22 q 2 + 32 q 3 + 42 q 4 + 52 q 5 + ... J(1 − q) = 1 + (22 − 12 )q + (32 − 22 )q 2 + (42 − 32 )q 3 + ... = 1 + 3q + 5q 2 + 7q 3 + ... J(1 − q)q = q + 3q 2 + 5q 3 + 7q 4 + ... J(1 − q)2 = 1 + 2q + 2q 2 + 2q 3 + ... = −1 + 1+q J= (1 − q)3 p(2 − p 2−p E(X ) = = p3 p2 2 2 1+q = 1−q 1−q 1−p V ar(X) = p2 192 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =?? Trick 2 ∞ X ∞ X d n 2 n−1 J= n q = nq dq n=1 n=1 "∞ # X d d q n = nq = E(X) dq n=1 dq p q d 1 q 1 = = p dq 1 − q p (1 − q)2 Instructor: Ping Li 193 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.8.2: Negative Binomial Distribution Definition: Independent trials, each having probability p, of being a success are performed until a total of r successes is accumulated. Let X denote the number of trials required. Then X is a negative binomial random variable X ∼ N B(r, p) X ∼ Geometric(p) is the same as X ∼ N B(1, p). What is probability mass function of X ∼ N B(r, p)? 194 Cornell University, BTRY 4080 / STSCI 4080 P {X = n} = Fall 2009 n−1 r p (1 − p)n−r , r−1 E(X k ) = = ∞ X n=r ∞ X n=r Now what? Instructor: Ping Li n = r, r + 1, r + 2, ... nk n−1 r p (1 − p)n−r r−1 nk (n − 1)! pr (1 − p)n−r (r − 1)!(n − r)! 195 Cornell University, BTRY 4080 / STSCI 4080 E(X k ) = ∞ X nk−1 n=r ∞ X Fall 2009 Instructor: Ping Li n! pr (1 − p)n−r (r − 1)!(n − r)! r n! k−1 = n pr+1 (1 − p)n−r , p n=r r!(n − r)! (m = n + 1) ∞ r X (m − 1)! = (m − 1)k−1 pr+1 (1 − p)m−1−r p m=r+1 r!(m − 1 − r)! r = E[(Y − 1)k−1 ], p Y ∼ N B(r + 1, p) E(X) =??, E(X 2 ) =??, V ar(X) =?? 196 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li r E(X k ) = E[(Y − 1)k−1 ], p Let k Y ∼ N B(r + 1, p) = 1. Then r r E(X) = E[(Y − 1)0 ] = p p Let k = 2. Then r r E(X ) = E[(Y − 1)] = p p 2 r+1 −1 p r(1 − p) V ar(X) = E(X ) − E X = p2 2 2 197 Cornell University, BTRY 4080 / STSCI 4080 Example (4.8.8a) Fall 2009 Instructor: Ping Li An urn contains N white and M black balls. Balls are randomly selected, one at a time, until a black one is obtained. Each selected ball is replaced before the next one is drawn. Q(a): What is the probability that exactly n draws are needed? Q(b): What is the expected number of draws needed? Q(c): What is the probability that at least k draws are needed? 198 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: This is a geometric random variable with p= M M +N . Q(a): What is the probability that exactly n draws are needed? P (X = n) = (1 − p) n−1 p= N M +N n−1 M M N n−1 = M +N (M + N )n Q(b): What is the expected number of draws needed? 1 M +N N E(X) = = =1+ p M M Q(c): What is the probability that at least k draws are needed? P (X ≥ k) = ∞ X n=k (1 − p)n−1 p = k−1 p(1 − p) = 1 − (1 − p) N M +N k−1 199 Cornell University, BTRY 4080 / STSCI 4080 Example (4.8.8e): Fall 2009 Instructor: Ping Li A pipe-smoking mathematician carries 2 matchboxes, 1 in the left-hand pocket and 1 in his right-hand pocket. Each time he needs a match he is equal likely to take it from either pocket. Consider the moment when the mathematician first discovers that one of his matchboxes is empty. If it is assumed that both matchboxes initially contained N matches, what is the probability that there are exactly k matches in the other box, k = 0, 1, ..., N ? 200 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: 1. Assume the left-hand pocket is empty. 2. X = total matches taken at the moment when left-hand pocket is empty. 3. View X as a random variable X ∼ N B(r, p). What is r , p? 4. Remember there are k matches remaining in the right-hand pocket. 5. The desired probability = P {X = n}. What is n? 6. Adjustment by considering either left-hand or right-hand can be empty. 201 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: 1. Assume the left-hand pocket is empty. 2. X = total matches taken at the moment when left-hand pocket is empty. 3. View X as a random variable X ∼ N B(r, p). What is r , p? p = 21 , r = N + 1. 4. Remember there are k matches remaining in the right-hand pocket. 5. The desired probability = P {X = n}. What is n? n = 2N − k + 1. 6. Adjustment by considering either left-hand or right-hand can be empty. 2 2N −k N 1 2N −k+1 . 2 202 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.8.3: Hypergeometric Distribution Definition A sample of size n is chosen randomly without replacement from − m are black. Let X denote the number of white balls selected, then X ∼ HG(N, m, n) an urn containing N balls, of which m are white and N P {X = i} = m i N −m n−i N n , i = 0, 1, ..., n. n − (N − m) ≤ i ≤ min(n, m) 203 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li If the n balls are chosen with replacement, then it is binomial when m and N are large relative to n. m X ∼ HG(N, m, n) ≈ binomial n, N nm E(X) ≈ np = ?? N ?? nm m 1− ?? V ar(X) ≈ np(1 − p) = N N ?? n, p = m N . 204 Cornell University, BTRY 4080 / STSCI 4080 E(X k ) = n X i i=1 = n X ik−1 i=1 n−1 X m k i Fall 2009 Instructor: Ping Li 205 N −m n−i N n (N − m)! n!(N − n)! m! (i − 1)!(m − i)! (n − i)!(N − m − n − i)! N! m! (N − m)! n!(N − n)! j!(m − 1 − j)! (n − 1 − j)!(N − m − n − 1 − j)! N! j=0 nm k−1 = E (Y + 1) , Y ∼ HG(N − 1, m − 1, n − 1) N = (j + 1)k−1 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 4.9: Properties of Cumulative Distribution Function F (x) = P {X ≤ x} 1. F (x) is a non-decreasing function: If a < b, then F (a) ≤ F (b) 2. limx→∞ = 1 3. limx→−∞ = 0 4. F (x) is right continuous. 206 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Chapter 5: Continuous Random Variables X is a continuous random variable if there exists a non-negative function f , defined for all real x ∈ (−∞, ∞), having the property that for any set B of real numbers, Z P {X ∈ B} = f (x)dx Definition: B The function f is called the probability density function of the random variable X . 207 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Some Facts • P {X ∈ (−∞, ∞)} = R∞ −∞ f (x)dx = 1 • P {X ∈ (a, b)} = P {a ≤ X ≤ b} = • P {X = a} = Ra a • f (x) = d dx F (x) a f (x)dx f (x)dx = 0 • F (a) = P {X ≤ a} = P {X < a} = • F (∞) = 1, Rb F (−∞) = 0 Ra −∞ f (x)dx 208 Cornell University, BTRY 4080 / STSCI 4080 Example 5.1.1a: Fall 2009 Suppose that X is a continuous random variable whose probability density function is given by C(4x − 2x2 ) 0 < x < 2 f (x) = 0 otherwise Q(a): What is the value of C ? Q(b): Find P {X Instructor: Ping Li > 1}. 209 Cornell University, BTRY 4080 / STSCI 4080 Example 5.1.1d: Fall 2009 Instructor: Ping Li Denote FX and fX the cumulative function and the density function of a continuous random variable X , respectively. Find the density function of Y = 2X . 210 Cornell University, BTRY 4080 / STSCI 4080 Example 5.1.1d: Fall 2009 Instructor: Ping Li Denote FX and fX the cumulative function and the density function of a continuous random variable X , respectively. Find the density function of Y = 2X . Solution: FY (y) = P {Y ≤ y} = P {2X ≤ y} = P {X ≤ y/2} = FX (y/2) d d 1 fY (x) = FY (y) = FX (y/2) = fX (y/2) dy dy 2 211 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Derivatives of Common Functions http://en.wikipedia.org/wiki/Derivative • Exponential and logarithmic functions d x d x x x e = e , a = a log a, dx dx d 1 d 1 log x = , log x = a dx x dx x log a • Trigonometric functions d dx sin x = cos x, d dx cos x = − sin x, (Notation: log x = loge x) d dx tan x = sec2 x • Inverse trigonometric functions d d √ 1 √ 1 arcsin x = , arccos x = − , dx dx 1−x2 1−x2 d dx arctan x = 1 1+x2 , 212 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Integration by Parts Z b Z b Z b b f (x)dx = f (x)x|a − a f (x)d[g(x)] = a f (x)h(x)dx = a where h(x) Z b xf ′ (x)dx a f (x)g(x)|ba Z = H ′ (x). − Z b g(x)f ′ (x)dx a b f (x)d[H(x)] = a b f (x)H(x)|a − Z b H(x)f ′ (x)dx a 213 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Examples: Z Z a b b log xdx a =x log x|ba − Z b ′ x [log x] dx a b 1 =b log b − a log a − x dx x a =b log b − a log a − b + a. b Z 1 1 log xd[x ] = x2 log x|ba − 2 2 a Z 1 2 1 2 1 b = b log b − a log a − xdx 2 2 2 a 1 2 1 2 1 2 2 = b log b − a log a − b −a 2 2 4 1 x log xdx = 2 Z 2 Z b a ′ x2 [log x] dx 214 Cornell University, BTRY 4080 / STSCI 4080 Z Z Z b b xn ex dx = a b xn cos xdx = a Z Fall 2009 Z b Instructor: Ping Li xn d[ex ] = a b xn d[sin x] = a n sin x sin 2xdx =2 a Z b xn ex |ba b xn sin x|a n b 2 n+2 b n+2 sin x a ex xn−1 dx a −n sin x sin x cos xdx = 2 a = −n Z Z Z b sin x xn−1 dx a b a sinn+1 xd[sin x] 215 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectation Definition: Given a continuous random variable X with density f (x), its density is defined as E(X) = Z ∞ −∞ xf (x)dx 216 Cornell University, BTRY 4080 / STSCI 4080 Example: Fall 2009 Instructor: Ping Li The density of X is given by X Find E e . 2x if 0 ≤ x ≤ 1 f (x) = 0 otherwise Two solutions: 1. • Find the density function of Y = eX , denoted by fY (y). R∞ • E(Y ) = −∞ yfY (y)dy . 2. Apply the formula E [g(X)] = Z ∞ −∞ g(x)f (x)dx 217 Cornell University, BTRY 4080 / STSCI 4080 Solution 1: Fall 2009 Instructor: Ping Li Y = eX . X FY (y) =P {Y ≤ y} = P e ≤ y Z log y 2ydy = log2 y =P {X ≤ log y} = 0 Therefore, 2 fY (y) = log y, 1≤y≤e y Z ∞ Z e E(Y ) = yfY (y)dy = 2 log ydy −∞ = e 2y log y|1 1 − Z 1 e 1 2y dy = 2 y 218 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution 2: X E e = = Z 1 2xex dx = 0 1 2xex |0 −2 Z Z 1 2xdex 0 1 ex dx 0 =2e − 2(e − 1) = 2 219 Cornell University, BTRY 4080 / STSCI 4080 Lemma 5.2.2.1 Fall 2009 Instructor: Ping Li For a non-negative random variable X , E(X) = Z ∞ 0 P {X > x}dx 220 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li For a non-negative random variable X , Lemma 5.2.2.1 E(X) = Z ∞ 0 P {X > x}dx Proof: Z ∞ 0 P {X > x}dx = Z ∞ 0 Z ∞Z Z ∞ fX (y)dydx x y fX (y)dxdy 0 0 Z y Z ∞ = fX (y) dx dy 0 0 Z ∞ = fX (y)ydy = E(X) = 0 221 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 X is a continuous random variable. • E(aX + b) = aE(X) + b, where a and b are constants. 2 • V ar(X) = E (X − E(X)) = E(X 2 ) − E 2 (X). • V ar(aX + b) = a2 V ar(X) Instructor: Ping Li 222 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 5.3: The Uniform Random Variable Definition: X is a uniform random variable on the interval (a, b) if its probability density function is given by f (x) = Denote X 1 b−a 0 if a<x<b otherwise ∼ U nif orm(a, b), or simply X ∼ U (a, b). 223 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Cumulative function F (x) = If X 0 if x≤a x−a b−a if a<x<b 1 if x≥b ∼ U (0, 1), then F (x) = x, if 0 ≤ x ≤ 1. Expectation E(X) = a+b 2 Variance (b − a)2 V ar(X) = 12 224 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 5.4: Normal Random Variable Definition: given by X is a normal random variable, X ∼ N µ, σ 2 1 − (x−µ) f (x) = √ e 2σ2 , 2πσ 2 −∞ < x < ∞ if its density is 225 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • If X ∼ N (µ, σ 2 ), then E(X) = µ and V ar(X) = σ 2 . • The density function is a “bell-shaped” curve. • It was first introduced by French Mathematician Abraham DeMoivre in 1733, to approximate the binomial when n was large. • The distribution became truly famous because of Karl F. Gauss and hence its another common name is “Gaussian distribution.” • It is called “normal distribution” (probably started by Karl Pearson) because “most people” believed that it was “normal” for any well-behaved data set to follow this distribution. 226 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Check that Z ∞ f (x)dx = −∞ Note that, by letting Z ∞ −∞ z= ∞ −∞ (x−µ)2 1 − 2σ2 √ e dx = 1. 2πσ x−µ σ , 2 1 − (x−µ) √ e 2σ2 dx = 2πσ It suffices to show that I 2 Z ∞ −∞ 1 − z2 √ e 2 dz = I. 2π = 1. ∞ 1 1 − w2 2 √ e √ e 2 dw I = dz × 2π 2π −∞ −∞ Z ∞Z ∞ 1 − z2 +w2 2 = e dzdw 2π −∞ −∞ 2 Z ∞ Z 2π Z 2π Z ∞ 2 2 r r 1 1 r = e− 2 rdθdr = dθ e− 2 d = 1. 2π 0 2π 0 2 0 0 Z ∞ Z 2 − z2 Z 227 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectation and Variance If X ∼ N (µ, σ 2 ), then E(X) = µ and V ar(X) = σ 2 . ∞ 2 1 − (x−µ) 2 E(X) = x√ e 2σ dx 2πσ −∞ Z ∞ Z ∞ 2 (x−µ)2 1 1 − 2σ2 − (x−µ) = (x − µ) √ e dx + µ√ e 2σ2 dx 2πσ 2πσ −∞ −∞ Z ∞ 2 1 − (x−µ) = (x − µ) √ e 2σ2 d[x − µ] + µ 2πσ −∞ Z ∞ z2 1 − 2σ 2 dz + µ = z√ e 2πσ −∞ =µ. Z 228 Cornell University, BTRY 4080 / STSCI 4080 2 Fall 2009 2 Instructor: Ping Li V ar(X ) =E (X − µ) Z ∞ (x−µ)2 1 − 2σ2 2 = (x − µ) √ e d[x − µ] 2πσ −∞ Z ∞ z2 1 − 2σ 2 2 e = z √ dz 2πσ −∞ Z ∞ 2 z 1 − z22 h z i 2 √ e 2σ d =σ 2 σ 2π −∞ σ Z ∞ 2 − t2 2 2 1 =σ t √ e d [t] 2π −∞ 2 Z ∞ t2 1 t 2 − 2 t√ e d =σ 2 2π −∞ Z ∞ h t2 i 1 2 =σ −t √ d e− 2 2π −∞ Z ∞ ∞ t2 t2 1 1 √ e− 2 dt − √ =σ 2 σ 2 te− 2 = σ2 −∞ 2π 2π −∞ 229 Cornell University, BTRY 4080 / STSCI 4080 2 − t2 Why limt→∞ te 2 − t2 lim te t→∞ Fall 2009 Instructor: Ping Li = 0?? = lim t→∞ t et2 /2 [t]′ 1 = lim 2 ′ = lim t2 /2 = 0 t→∞ et /2 t→∞ te 230 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Normal Distribution Approximates Binomial Suppose X ∼ Binomial(n, p). For fixed p, as n → ∞ Binomial(n, p) ≈ N (µ, σ 2 ), µ = np, σ 2 = np(1 − p). Instructor: Ping Li 231 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n = 10 p = 0.2 0.35 Density (mass) function 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 x 5 6 7 8 9 10 232 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n = 20 p = 0.2 0.25 Density (mass) function 0.2 0.15 0.1 0.05 0 −10 −5 0 5 10 x 15 20 25 233 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n = 50 p = 0.2 0.16 0.14 Density (mass) function 0.12 0.1 0.08 0.06 0.04 0.02 0 −10 −5 0 5 10 x 15 20 25 30 234 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n = 100 p = 0.2 0.1 0.09 Density (mass) function 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 −10 0 10 20 x 30 40 50 235 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li n = 1000 p = 0.2 0.035 Density (mass) function 0.03 0.025 0.02 0.015 0.01 0.005 0 100 150 200 x 250 300 236 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 237 Matlab code function NormalApprBinomial(n,p); mu = n*p; sigma2 = n*p*(1-p); figure; bar((0:n),binopdf(0:n,n,p),’g’);hold on; grid on; x = mu - 3*sigma2:0.001:mu+3*sigma2; plot(x,normpdf(x,mu,sqrt(sigma2)),’r-’,’linewidth’,2); xlabel(’x’);ylabel(’Density (mass) function’); title([’n = ’ num2str(n) ’ p = ’ num2str(p)]); Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Standard Normal Distribution Z ∼ N (0, 1) is called the standard normal. • E(Z) = 0, V ar(Z) = 1. 2 • If X ∼ N µ, σ , then Y = x−µ σ ∼ N (0, 1). • Z ∼ N (0, 1). Denote density function fZ (z) by φ(z) and cumulative function FZ (z) by Φ(z). Z z 2 1 −z 1 − z2 √ e 2 dz φ(z) = √ e 2 , Φ(z) = 2π 2π −∞ • φ(−x) = φ(x), Φ(−x) = 1 − Φ(x). Numerical table on Page 222. 238 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li X ∼ N (µ = 3, σ 2 = 9). Find Example 5.4.4b: • P {2 < X < 5} • P {X > 0} • P {|X − 3| > 6}. Solution: Let Y = X−µ σ = X−3 3 . Then Y ∼ N (0, 1). 2−3 X −3 5−3 < < 3 3 3 1 2 2 1 =Φ −Φ − =P − < Y < 3 3 3 3 =Φ (0.6667) − 1 + Φ (0.3333) P {2 < X < 5} =P ≈0.7475 − 1 + 0.6306 ≈ 0.3781 239 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 5.5: Exponential Random Variable Definition: An exponential random variable X has the probability density function given by λe−λx f (x) = 0 if if x≥0 x<0 (λ > 0) X ∼ EXP (λ). Cumulative function: F (x) = F (∞) =?? Z x 0 λe−λx dx = 1 − e−λx , (x ≥ 0). 240 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Moments: E(X n ) = Z ∞ xn λe−λx dx 0 Z ∞ =− xn de−λx 0 Z ∞ = e−λx nxn−1 dx 0 n = E(X n−1 ) λ E(X) = λ1 , E(X 2 ) = 2 λ2 , V ar(X) = 1 λ2 , E(X n ) = n! λn , 241 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Applications of Exponential Distribution • The exponential distribution often arises as being the distribution of the amount of time until some specific event occurs. • For instance, the amount of time (starting from now) until an earthquake occurs, or until a new war breaks out, etc. • The exponential distribution is memoryless. P {X > s + t|X > t} = P {X > s}, for all s, t ≥ 0 242 Cornell University, BTRY 4080 / STSCI 4080 Example 5b: Fall 2009 Instructor: Ping Li Suppose that the length of a phone call in minutes is an exponential random variable with parameter λ = 1 10 . If someone arrives immediately ahead of you at a public telephone booth, find the probability that you will have to wait • (a) more than 10 minutes; • (b) between 10 and 20 minutes. Solution: • P {X > 10} = e−λ10 = e−1 = 0.368. • P {10 < X < 20} = e−1 − e−2 = 0.233 243 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 5.6.1: The Gamma Distribution Definition: A random variable has a gamma distribution with parameters (α, λ), where α, λ > 0, if its density function is given by −λx λe (λx)α−1 , x ≥ 0 Γ(α) f (x) = 0 x<0 Denote X R∞ 0 ∼ Gamma(α, λ). f (x)dx = 1?? 244 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Gamma function is defined as Γ(α) = Γ(1) = 1. Z ∞ 0 Γ(0.5) = √ e−y y α−1 dy = (α − 1)Γ(α − 1). π. Γ(n) = (n − 1)!. 245 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Connection to exponential distribution • When α = 1, the gamma distribution reduces to an exponential distribution. Gamma(α = 1, λ) = EXP (λ). • A gamma distribution Gamma(n, λ) often arises as the distribution of the amount of time one has to wait until a total of n events has occurred. 246 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 X ∼ Gamma(α, λ). Expectation and Variance: E(X) = α λ, V ar(X) = E(X) = Z α λ2 . ∞ 0 ∞ λe−λx (λx)α−1 dx x Γ(α) e−λx (λx)α = dx Γ(α) 0 Z Γ(α + 1) 1 ∞ e−λx (λx)α λ dx = Γ(α) λ 0 Γ(α + 1) Γ(α + 1) 1 = Γ(α) λ α = . λ Z Instructor: Ping Li 247 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 5.6.4: The Beta Distribution Definition: A random variable X has a beta distribution if its density is given by f (x) = Denote X 1 a−1 x (1 B(a,b) 0 − x)b−1 0<x<1 otherwise ∼ Beta(a, b). The Beta function Γ(a)Γ(b) B(a, b) = = Γ(a + b) Z 1 0 xa−1 (1 − x)b−1 dx 248 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 X ∼ Beta(a, b). Expectation and Variance: a , E(X) = a+b ab V ar(X) = (a + b)2 (a + b + 1) Instructor: Ping Li 249 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sec. 5.7: Distribution of a Function of a Random Variable Given the density function fX (x) and cumulative function FX (x) for a random variable X , what is the general formula for the density function fY (y) of the random variable Y Example 5.7.7a: = g(X)? X ∼ U nif orm(0, 1). Y = g(X) = X n . FY (y) = P {Y ≤ y} = P {X n ≤ y} = P {X ≤ y 1/n } = FX (y 1/n ) = y 1/n 1 fY (y) = y 1/n−1 , n =fX 0≤y≤1 d −1 −1 g (y) g (y) dy y = g(x) = xn =⇒ g −1 (y) = y 1/n . 250 Cornell University, BTRY 4080 / STSCI 4080 Theorem 7.1: Fall 2009 Instructor: Ping Li Let X be a continuous random variable having probability density function fX . Suppose g(x) is a strictly monotone (increasing or decreasing), differentiable function of x. Then the random variable Y = g(X) has a probability density function given by fX g −1 (y) d g −1 (y) dy fY (y) = 0 if y = g(x) for some x if y 6= g(x) for all x 251 Cornell University, BTRY 4080 / STSCI 4080 Example 7b: Fall 2009 Y = X2 2 FY (y) = P {Y ≤ y} = P X ≤ y √ √ =P {− y ≤ X ≤ y} √ √ =FX ( y) − FX (− y) 1 √ √ fY (y) = √ (fX ( y) + fX (− y)) 2 y Does this example violate Theorem 7.1? Instructor: Ping Li 252 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (Cauchy distribution): Suppose a random variable X has density function 1 f (x) = , π(1 + x2 ) −∞ < x < ∞ 1 Q: Find the distribution of X . R∞ One can check −∞ f (x) = 1, but E(X) = ∞. 253 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: Denote the cumulative function of X by FX (x). Let Z = 1/X . Assume z > 0, FZ (z) =P {1/X ≤ z} = P {X ≤ 0} + P {X ≥ 1/z, X ≥ 0} Z ∞ 1 1 = + dx 2 2 1/z π(1 + x ) ∞ π 1 1 1 1 − tan−1 (1/z) = + tan−1 (z) = + 2 π 2 π 2 1/z Therefore, for z > 0, i′ 1 hπ 1 fZ (z) = − tan−1 (1/z) = . 2 π 2 π(1 + z ) 254 Cornell University, BTRY 4080 / STSCI 4080 Assume z Fall 2009 < 0, FZ (z) =P {1/X ≤ z} = P {X ≥ 1/z, X ≤ 0} Z 0 1 = dx 2 1/z π(1 + x ) 0 1 1 −1 −1 = tan (z) = 0 − tan (1/z) π π 1/z Therefore, for z < 0, ′ 1 1 −1 fZ (z) = 0 − tan (1/z) = . 2 π π(1 + z ) Thus, Z and X have the same density function. Instructor: Ping Li 255 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Chapter 6: Jointly Distributed Random Variables Definition: For any two random variables X and Y , the joint cumulative probability distribution is defined as F (x, y) = P {X ≤ x, Y ≤ y} , −∞ < x, y < ∞ 256 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 F (x, y) = P {X ≤ x, Y ≤ y} , Instructor: Ping Li −∞ < x, y < ∞ The distribution of X is FX (x) =P {X ≤ x} =P {X ≤ x, Y ≤ ∞} =F (x, ∞) The distribution of Y is FY (y) =P {Y ≤ y} = F (∞, y) Example: P {X > x, Y > y} = 1 − FX (x) − FY (y) + F (x, y) 257 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Jointly Discrete Random Variables X and Y are both discrete random variables. Joint probability mass function XX p(x, y) = P {X = x, Y = y}, x p(x, y) = 1 y Marginal probability mass function pX (x) = P {X = x} = pY (y) = P {Y = y} = X p(x, y) X p(x, y) y:p(x,y)>0 x:p(x,y)>0 258 Cornell University, BTRY 4080 / STSCI 4080 Example 1a: Fall 2009 Instructor: Ping Li Suppose that 3 balls are randomly selected from an urn containing 3 red, 4 white, and 5 blue balls. Let X and Y denote, respectively, the number of red and white balls chosen. Q: The joint probability mass function p(x, y) = P {X = x, Y = y}=?? 259 Cornell University, BTRY 4080 / STSCI 4080 p(0, 0) = p(0, 1) = p(0, 2) = p(0, 3) = Fall 2009 5 10 3 12 = 220 3 4 5 40 1 2 = 12 220 3 4 5 30 2 1 = 12 220 3 4 4 3 12 = 220 3 84 pX (0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) = 220 9 84 3 = pX (0) = 12 220 3 Instructor: Ping Li 260 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Jointly Continuous Random Variables X and Y are jointly continuous if there exists a function f (x, y) defined for all real x and y , having the property that for every set C of pairs of real numbers P {(X, Y ) ∈ C} = Z Z f (x, y)dxdy (x,y)∈C The function f (x, y) is called the joint probability density function. 261 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The joint cumulative function F (x, y) = P {X ∈ (−∞, x], Y ∈ (−∞, y]} Z y Z x = f (x, y)dxdy −∞ −∞ The joint probability density function ∂2 F (x, y) f (x, y) = ∂x∂y The marginal probability density functions fX (x) = Z ∞ −∞ f (x, y)dy, fY (y) = Z ∞ −∞ f (x, y)dx 262 Cornell University, BTRY 4080 / STSCI 4080 Example 1c: Fall 2009 Instructor: Ping Li The joint density function is given by Ce−x e−2y f (x, y) = 0 • (a) Determine C . • (b) P {X > 1, Y < 1}, • (c) P {X < Y }, • (d) P {X < a}. 0 < x < ∞, 0 < y < ∞ otherwise 263 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (a): Z ∞ Z ∞ Ce−x e−2y dxdy 0 0 Z ∞ Z ∞ =C e−x dx e−2y dy 1= 0 =C × Therefore, C 0 e−x |0∞ 1 −2y 0 C × e |∞ = 2 2 = 2. Solution (b): P {X > 1, Y < 1} = Z 1 Z 0 1 =e−x |1∞ ∞ 2e−x e−2y dxdy e−2y |01 = e−1 (1 − e−2 ) 264 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (c): P {X < Y } = Z Z 2e−x e−2y dxdy = (x,y):x<y Z ∞Z y 2e−x e−2y dxdy 0 0 Z ∞ Z y = e−x dx 2e−2y dy 0 0 Z ∞ −2y −y = 1−e 2e dy 0 Z ∞ = 2e−2y − 2e−3y dy = 0 2 1 =1 − = 3 3 265 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (d): P {X < a} =P {X < a, Y < ∞} Z aZ ∞ = 2e−2y e−x dydx 0 0 Z a e−x dx = 0 =1 − e−a 266 Cornell University, BTRY 4080 / STSCI 4080 Example 1e: Fall 2009 Instructor: Ping Li The joint density is given by e−(x+y) f (x, y) = 0 0 < x < ∞, 0 < y < ∞ otherwise Q: Find the density function of the random variable X/Y . 267 Cornell University, BTRY 4080 / STSCI 4080 Solution: Let Z Fall 2009 Instructor: Ping Li = X/Y , 0 < Z < ∞. FZ (z) =P {Z < z} = P {X/Y ≤ z} Z Z = e−(x+y) dxdy x/y≤z = Z ∞ 0 = Z ∞ Z yz 0 −y e 0 Z ∞ e−(x+y) dx dy e−x |0yz dy e−y − e−y−yz dy 0 −y−yz ∞ e =1− 1 =e−y |0∞ + z + 1 0 z+1 = h Therefore, fZ (z) = 1 − 1 z+1 i′ = 1 (z+1)2 . 268 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li FZ (z) =P {Z < z} = P {X/Y ≤ z} Z Z = e−(x+y) dxdy x/y≤z = ∞ Z 0 = ∞ Z 0 = Z ∞ "Z ∞ # e−(x+y) dy dx x z h i x z e−x e−y |∞ dy x e−x− z dy 0 = 0 −x− x z e 1+ 1 z ∞ = 1 1+ 1 z =1− 1 z+1 269 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Multiple Jointly Distributed Random Variables Joint cumulative distribution function F (x1 , x2 , x3 , ..., xn ) = P {X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn } Suppose X1 , X2 , ..., Xn are discrete random variables. Joint probability mass function p(x1 , x2 , ..., xn ) = P {X1 = x1 , X2 = x2 , ..., Xn = xn } 270 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li X1 , X2 , ..., Xn are jointly continuous if there exists a function f (x1 , x2 , ..., fn ) such that for any set C in n-dimensional space Z Z Z f (x1 , x2 , ..., xn )dx1 dx2 ...dxn P {(X1 , X2 , ..., Xn ) ∈ C} = ... (x1 ,x2 ,...,xn )∈C f (x1 , x2 , ..., xn ) is the joint probability density function. 271 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 6.2: Independent Random Variables The random variable X and Y are said to be independent if for Definition: any two sets of real numbers A and B P {X ∈ A, Y ∈ B} = P {X ∈ A} × P {Y ∈ B} equivalently P {X ≤ x, Y ≤ y} = P {X ≤ x} × P {Y ≤ y} or, equivalently, p(x, y) = pX (x)pY (y) f (x, y) = fX (x)fY (y) for all x, y when X and Y are discrete for all x, y when X and Y are continous 272 Cornell University, BTRY 4080 / STSCI 4080 Example 2b: Fall 2009 Instructor: Ping Li Suppose the number of people that enter a post office on a given day is a Poisson random variable with parameter λ. Each person that enters the post office is male with probability p and a female with probability 1 − p. Let X and Y , respectively, be the number of males and females that enter the post office. • (a): Find the joint probability mass function P {X = i, Y = j}. • (b): Find the marginal probability mass functions P {X = i}, P {Y = j}. • (c): Show that X and Y are independent. 273 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Key: Suppose we know there are N = n people that enter a post office. Then the number of males is a binomial with parameters (n, p). [X|N = n] ∼ Binomial (n, p) [N = X + Y ] ∼ P oi(λ). Therefore, P {X = i, Y =} = ∞ X n=0 P {X = i, Y = j|N = n}P {N = n} =P {X = i, Y = j|N = i + j}P {N = i + j} because P {X = i, Y = j|N 6= i + j} = 0. 274 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Solution (a) Conditional on X Instructor: Ping Li + Y = i + j: P {X = i, Y = j} =P {X = i, Y = j|X + Y = i + j}P {X + Y = i + j} i+j i+j i λ = p (1 − p)j e−λ (i + j)! i pi (1 − p)j e−λ λi λj = i!j! pi (1 − p)j e−λ(p+(1−p)) λi λj = i!j! i j (λp) (λ(1 − p)) = e−λp e−λ(1−p) i! j! 275 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution (b) P {X = i} = ∞ X j=0 P {X = i, Y = j} ∞ X i j (λp) (λ(1 − p)) = e−λp e−λ(1−p) i! j! j=0 ∞ i X j (λp) (λ(1 − p)) = e−λp e−λ(1−p) i! j! j=0 i −λp (λp) = e i! 276 Cornell University, BTRY 4080 / STSCI 4080 Example 2c: Fall 2009 Instructor: Ping Li A man and a woman decide to meet at a certain location. Each person independently arrives at a time uniformly distributed between 12 noon and 1 pm. Q: Find the probability that the first to arrive has to wait longer than 10 minutes. Solution: Let X = number of minutes by which one person arrives later than 12 noon. Let Y = number of minutes by which the other person arrives later than 12 noon. X and Y are independent. The desired probability = P {|X − Y | ≥ 10}. 277 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li P {|X − Y | ≥ 10} = Z Z f (x, y)dxdy Z Z fX (x)fY (y)dxdy Z Z 1 1 dxdy 60 60 |x−y|>10 = |x−y|>10 =2 y<x−10 =2 Z 60 10 25 = . 36 Z 0 x−10 1 dydx 3600 278 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A Necessary and Suffice Condition for Independence Proposition 2.1 The continuous (discrete) random variables X and Y are independent if and only if their joint probability density (mass) function can be expressed as fX,Y (x, y) = h(x)g(y), −∞ < x, y < ∞ Proof: By definition, X and Y are independent if fX,Y (x, y) = fX (x)fY (y). 279 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example 2f: (a): Two random variables X and Y with joint density given by f (x, y) = 6e−2x e−3y , 0 < x, y < ∞ are independent. (b): Two random variables X and Y with joint density given by f (x, y) = 24xy are NOT independent. (Why?) 0 < x, y < 1, 0 < x + y < 1 280 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Independence of More Than Two Random Variables Definition: The n random variables, X1 , X2 , ..., Xn , are said to be independent if, for all sets of real numbers, A1 , A2 , ..., An P {X1 ∈ A1 , X2 ∈ A2 , ..., Xn ∈ An } = n Y i=1 P {Xi ∈ Ai } equivalently, P {X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn } = for all x1 , x2 , ..., xn . n Y i=1 P {Xi ≤ xi }, 281 Cornell University, BTRY 4080 / STSCI 4080 Example 2h: Fall 2009 Instructor: Ping Li Let X , Y , Z be independent and uniformly distributed over (0, 1). Compute P {X ≥ Y Z}. Solution: P {X ≥ Y Z} = Z Z Z f (x, y, z)dxdydz x≥yz = Z 1 0 = Z 1 0 = Z 0 = 3 4 1 Z Z 1 0 1 0 Z 1 dxdydz yz (1 − yz) dydz z 1− dz 2 282 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Chapter 6.3: Sum of Independent Random Variables Suppose that X and Y are independent. Let Z =X +Y. Z Z FZ (z) =P {X + Y ≤ z} = fX (x)fY (y)dxdy x+y≤z = = = Z ∞ −∞ Z ∞ −∞ ∞ Z −∞ Z z−y fX (x)fY (y)dxdy −∞ fY (y) Z z−y fX (x)dxdy −∞ fY (y)FX (z − y)dy ∂ fZ (z) = FZ (z) = ∂z Z ∞ −∞ fY (y)fX (z − y)dy 283 Cornell University, BTRY 4080 / STSCI 4080 The discrete case: then Z Fall 2009 Instructor: Ping Li X and Y are two independent discrete random variables, = X + Y has probability mass function: pZ (z) = X j pY (j)pX (z − j) ——————– Example: Suppose X, Y ∼ Bernouli(p). Then Z = X + Y ∼ Binomial(2, p) 284 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li ∼ Bernouli(p) are independent. Then Z = X + Y ∼ Binomial(2, p). Suppose X, Y pZ (0) = X 2 pY (j)pX (0 − j) =pY (0)pX (0) = (1 − p)2 = (1 − p)2 0 X pY (j)pX (1 − j) =pY (0)pX (1 − 0) + pY (1)pX (1 − 1) X 2 2 pY (j)pX (2 − j) =pY (1)pX (2 − 1) = p2 = p 2 j pZ (1) = j pZ (2) = j 2 p(1 − p) =(1 − p)p + p(1 − p) = 1 285 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li ∼ Binomial(n, p) and Y ∼ Binomial(m, p) are independent. Then Z = X + Y ∼ Binomial(m + n, p). Suppose X pZ (z) = X j Note pY (j)pX (z − j). 0 ≤ j ≤ m and z − n ≤ j ≤ z . We can assume m ≤ n. There are three cases to consider: 1. Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z 2. Suppose m 3. Suppose n < z ≤ n, then 0 ≤ j ≤ m. < z ≤ m + n, then z − n ≤ j ≤ m. 286 Cornell University, BTRY 4080 / STSCI 4080 Suppose 0 Fall 2009 Instructor: Ping Li ≤ z ≤ m, then 0 ≤ j ≤ z pZ (z) = X j pY (j)pX (z − j) = z X m z X j=0 pY (j)pX (z − j) n pz−j (1 − p)n−z+j j z−j j=0 z X m n =pz (1 − p)m+n−z j z−j j=0 m+n z = p (1 − p)m+n−z z = pj (1 − p)m−j because we have seen (e.g., using a combinatorial argument) z X m n m+n = . j z − j z j=0 287 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sum of Two Independent Uniform Random Variables X ∼ U (0, 1) and Y ∼ U (0, 1) are independent. Let Z = X + Y . fZ (z) = Must have Z ∞ −∞ fY (y)fX (z − y)dy = Z 0 1 fX (z − y)dy 0 ≤ y ≤ 1 and 0 ≤ z − y ≤ 1, i.e., z − 1 ≤ y ≤ z . Two cases to consider 1. If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z . 2. If 1 < z ≤ 2, then z − 1 < y ≤ 1. 288 Cornell University, BTRY 4080 / STSCI 4080 If 0 Fall 2009 Instructor: Ping Li ≤ z ≤ 1, then 0 ≤ y ≤ z . Z 1 Z fZ (z) = fX (z − y)dy = 0 If 1 < z ≤ 2, then z − 1 ≤ y ≤ 1. Z 1 Z fZ (z) = fX (z − y)dy = 0 fZ (z) = z 2−z 0 z dy = z. 0 1 z−1 dy = 2 − z. 0<z≤1 1<z<2 otherwise 289 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Sum of Two Gamma Random Variables X ∼ Gamma(t, λ) and Y ∼ Gamma(s, λ) are independent. Let Z =X +Y. Show that Z ∼ Gamma(s + t, λ). For an intuition, see notes page 246. Instructor: Ping Li 290 Cornell University, BTRY 4080 / STSCI 4080 fZ (z) = Z ∞ Z z −∞ = 0 Fall 2009 Instructor: Ping Li fY (y)fX (z − y)dy = λe−λy (λy)s−1 Γ(s) e−λz λ2 λs+t−2 = Γ(s)Γ(t) λe−λz λs+t−1 = Γ(s)Γ(t) Z Z Z z 0 fY (y)fX (z − y)dy λe−λ(z−y) (λ(z − y))t−1 dy Γ(t) z 0 −λy s−1 λy t−1 e y e (z − y) dy z 0 y s−1 (z − y)t−1 dy 291 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Z λe−λz λs+t−1 z s−1 fZ (z) = y (z − y)t−1 dy Γ(s)Γ(t) 0 −λz Z z λe (λz)s+t−1 Γ(s + t) s−1 t−1 y (z − y) dy = Γ(s + t) z s+t−1 Γ(s)Γ(t) 0 −λz Z z s−1 s+t−1 t−1 λe (λz) Γ(s + t) y y y = 1− d Γ(s + t) Γ(s)Γ(t) 0 z z z −λz Z λe (λz)s+t−1 Γ(s + t) 1 s−1 t−1 = θ (1 − θ) dθ Γ(s + t) Γ(s)Γ(t) 0 Z 1 Γ(s + t) θ s−1 (1 − θ)t−1 dθ =fW (z) Γ(s)Γ(t) 0 W ∼ Gamma(s + t, λ). Because fZ (z) is also a density function, we must have Γ(s + t) Γ(s)Γ(t) Z 1 0 θ s−1 (1 − θ)t−1 dθ = 1 292 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Recall the definition of Beta function in Section 5.6.4 B(s, t) = Z 1 0 θ s−1 (1 − θ) t−1 dθ and its property (Eq. (6.3) in Section 5.6.4) B(s, t) = Γ(s)Γ(t) . Γ(s + t) Thus, Γ(s + t) Γ(s)Γ(t) Z 0 1 θ s−1 (1 − θ)t−1 dθ = Γ(s + t) Γ(s)Γ(t) =1 Γ(s)Γ(t) Γ(s + t) 293 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Generalization to more than two gamma random variables If X1 , X2 , ..., Xn are independent gamma random variables with respective parameters (s1 , λ), (s2 , λ), ..., (sn , λ), then X1 + X2 + ... + Xn = n X i=1 Xi ∼ Gamma n X i=1 ! si , λ 294 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sum of Independent Normal Random Variables = 1 to n, are independent random variables that are normally distributed with respective parameters ui and σi2 , i = 1 to n. Then Proposition 3.2 If Xi , i X1 + X2 + ... + Xn = n X i=1 Xi ∼ N n X µi , i=1 i=1 ———————— It suffices to show X 1 + X 2 ∼ N µ1 + µ2 , σ12 + σ22 n X σi2 ! 295 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li X ∼ N (µx , σx2 ), Y ∼ N (µy , σy2 ). Let Z = X + Y . Z ∞ fX (z − y)fY (y)dy −∞ # " Z ∞ (y−µy )2 (z−y−µx )2 − 2σ2 1 1 − 2 2σx y √ √ = e e dy 2πσx 2πσy −∞ 2 Z ∞ 2 x ) − (y−µy ) − (z−y−µ 1 1 2 2 2σx 2σy √ e dy =√ 2π −∞ 2πσx σy 2 +(y−µ )2 σ 2 Z ∞ (z−y−µx )2 σy y x − 1 1 2 2 2σx σy √ =√ e dy 2π −∞ 2πσx σy 2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2 Z ∞ y 2 (σx y x x x y y y y x − 1 1 2 σ2 2σx y √ =√ e dy 2π −∞ 2πσx σy fZ (z) = 296 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li = σx σy , B = σx2 σy2 , C = σx2 + σy2 , D = −2(µy σx2 − (µx − z)σy2 ), E = (µx − z)2 σy2 + µ2y σx2 . Let A fZ (z) = Z ∞ −∞ 1 =√ 2π = Z ∞ −∞ Z ∞ −∞ fX (z − y)fY (y)dy 2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2 y 2 (σx y x x x y y y y x − 2 2 2σx σy 1 √ e 2πσx σy 2 1 − Cy +Dy+E 2B √ dy = e 2πA (y−µ) − 1 2σ 2 because −∞ √ e 2πσ R∞ 2 r B D2 −4EC e 8BC 2 CA dy = 1, for any µ, σ 2 . dy 297 Cornell University, BTRY 4080 / STSCI 4080 Z = Z ∞ −∞ ∞ −∞ Fall 2009 2 1 − Cy +Dy+E 2B √ e dy 2πA 1 √ e 2πA ∞ − 2 − D 2 +E C 4C − B 2 C =e y 2 + D y+ E C C 2B C dy 2 2 y+ D ) − D 2 + E ( 2C C 4C − B 2 C dy e 1 √ = 2πA −∞ 2 − D 2 +E Z − 4C B C 2 C =e Z Instructor: Ping Li ∞ −∞ r √ 2π B = CA2 1 q B C r √AB 2 y+ D ) ( 2C − 2B C e dy C B D2 −4EC e 8BC 2 CA 298 Cornell University, BTRY 4080 / STSCI 4080 1 fZ (z) = √ 2π r Fall 2009 Instructor: Ping Li B D2 −4EC e 8BC 2 CA 1 q e =√ 2π σx2 + σy2 2 −(µ −z)σ 2 )2 −((µ −z)2 σ 2 +µ2 σ 2 )(σ 2 +σ 2 ) (µy σx x x y y y x x y 2 2 2 2 2σx σy (σx +σy ) ( x y) − 2(σ2 +σ2 ) 1 x y =√ q e 2π σx2 + σy2 Therefore, Z z−µ −µ = X + Y ∼ N µx + µy , 2 σx2 + σy2 . 299 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sum of Independent Poisson Random Variables If X1 , X2 , ..., Xn are independent Poisson random variables with respective parameters λi , i = 1 to n, then X1 + X2 + ... + Xn = n X i=1 Xi ∼ P oisson n X i=1 λi ! 300 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Sum of Independent Binomial Random Variables If X1 , X2 , ..., Xn are independent binomial random variables with respective parameters (mi , p), i = 1 to n, then X1 + X2 + ... + Xn = n X i=1 Xi ∼ Binomial n X i=1 mi , p ! 301 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Sections 6.4, 6.5: Conditional Distributions Recall P (E|F ) = P (EF ) P (F ) . If X and Y are discrete random variables, then p(x, y) pX|Y (x|y) = pY (y) If X and Y are continuous random variables, then f (x, y) fX|Y (x|y) = fY (y) Instructor: Ping Li 302 Cornell University, BTRY 4080 / STSCI 4080 Example 4b: Fall 2009 Instructor: Ping Li Suppose X and Y are independent Poisson random variables with respective parameters λx and λy . Q: Calculate the conditional distribution of X , given that X —————– What is the distribution of X + Y ?? + Y = n. 303 Cornell University, BTRY 4080 / STSCI 4080 Example 4b: Fall 2009 Instructor: Ping Li Suppose X and Y are independent Poisson random variables with respective parameters λx and λy . Q: Calculate the conditional distribution of X , given that X + Y = n. Solution: P {X = k, X + Y = n} P {X = k|X + Y = n} = P {X + Y = n} P {X = k, Y = n − k} P {X = k}P {Y = n − k} = = P {X + Y = n} P {X + Y = n} −λx k " −λy n−k # λ e e λx n! y = k! (n − k)! e−λx −λy (λx + λy )n k n−k n λx λx = 1− k λx + λy λx + λy λx Therefore, X ∼ Binomial n, p = λ +λ . x y 304 Cornell University, BTRY 4080 / STSCI 4080 Example 5b: Fall 2009 The joint density of X and Y is given by f (x, y) = Q: Find P {X Instructor: Ping Li e−x/y e−y y 0 > 1|Y = y}. ————————– P {X > 1|Y = y} = R∞ 1 fX|Y (x|y)dx 0 < x, y < ∞ otherwise 305 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: f (x, y) fX|Y (x|y) = = fY (y) P {X > 1|Y = y} = Z 1 e−x/y e−y y R ∞ e−x/y e−y dx y 0 ∞ e−x/y 1 = R ∞ −x/y = e−x/y y e dx 0 fX|Y (x|y)dx = Z ∞ 1 1 −x/y e dx = e−1/y y 306 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Bivariate Normal Distribution The random variables X and Y have a bivariate normal distribution if, for constants, ux , uy , σx , σy , −1 all −∞ < ρ < 1, their joint density function is given, for < x, y < ∞, by 1 − 2(1−ρ 1 2) p f (x, y) = e 2πσx σy 1 − ρ2 If X and Y are independent, then ρ 1 − 12 f (x, y) = e 2πσx (y−µy )2 + 2 σy (x−µx )(y−µy ) −2ρ σx σy = 0, and (x−µx )2 2 σx (x−µ )2 − 2σ2x x 1 √ = e 2πσx (x−µx )2 2 σx (y−µy )2 + 2 σy (y−µy )2 − 2σ2 y 1 √ × e 2πσy 307 Cornell University, BTRY 4080 / STSCI 4080 fX (x) = = = = = Z Z Z Z ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ Z ∞ Fall 2009 Instructor: Ping Li f (x, y)dy −∞ 1 − 2(1−ρ 1 2) p e 2πσx σy 1 − ρ2 1 − 2(1−ρ 1 2) p e 2πσx 1 − ρ2 (x−µx ) 1 − p e 2 2πσx 1 − ρ (x−µx )2 2 σx (x−µx )2 2 σx (y−µy )2 + 2 σy +t2 −2ρ r (x−µx )(y−µy ) −2ρ σx σy (x−µx )t σx 2 +t2 σ 2 −2ρ(x−µ )tσ x x x 2 2(1−ρ2 )σx 2 1 − Ct +Dt+E 2B √ dt = e 2πA √ p A = 2π 1 − ρ2 σx , D = −2ρ(x − µx )σx , dy dt dt x) B D2 −4EC 1 − (x−µ 2 2σx e 8BC = √ e 2 CA 2πσx B = (1 − ρ2 )σx2 , E = (x − µx )2 C = σx2 , 2 308 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Therefore, X ∼ N (µx , σx2 ), Y ∼ N (µy , σy2 ) The conditional density fX|Y (x|y) = A useful trick: f (x, y) fY (y) Only worry about the terms that are functions of x; the rest terms are just normalizing constants. 309 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Any (reasonable) non-negative function g(x) can be essentially viewed as a probability density function because one can always normalize it. g(x) = Cg(x) ∝ g(x), g(x)dx −∞ f (x) = R ∞ is a probability density function. where C R∞ (One can verify −∞ f (x) C is just a (normalizing) constant. When y is treated as a constant, C(y) is still a constant. X|Y = y means y is a constant. 1 g(x)dx −∞ = R∞ = 1). 310 Cornell University, BTRY 4080 / STSCI 4080 For example, Similarly, g(x) − g(x) = e Instructor: Ping Li (x−1)2 2 2 −2x 2 −x =e Fall 2009 , is essentially the density function of N (1, 1). 2 −2xy −x g(x) = e 2 are both (essentially) the density functions of normals. ————– 1 = C Z ∞ 2 −2xy −x e 2 √ y2 dx = 2πe 2 −∞ Therefore, 1 − y2 1 − (x−y)2 2 f (x) = Cg(x) = √ e 2 × g(x) = √ e 2π 2π is the density function of N (y, 1), where y is merely a constant. 311 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li (X , Y ) is a bivariate normal. f (x, y) ∝ f (x, y) fX|Y (x|y) = fY (y) 1 − 2(1−ρ 2) ∝e (x−µx )2 2 σx (x−µx )(y−µy ) −2ρ σx σy σ (x−µx )2 −2ρ(x−µx )(y−µy ) x σy − 2 2 2(1−ρ )σx ∝e − ∝e σ x−µx −ρ(y−µy ) x σy 2 2 2(1−ρ )σx 2 Therefore, X|Y ∼ N Y |X ∼ N σx , (1 − ρ2 )σx2 σy σy µy + ρ(x − µx ) , (1 − ρ2 )σy2 σx µx + ρ(y − µy ) 312 Cornell University, BTRY 4080 / STSCI 4080 Example 5d: Fall 2009 Instructor: Ping Li Consider n + m trials having a common probability of success. Suppose, however, that this success probability is not fixed in advance but is chosen from U (0, 1). Q: What is the conditional distribution of this success probability given that the n + m trials result in n successes? Solution: Let X = trial success probability. X ∼ U (0, 1). Let N = total number of successes. The desired probability: N |X = x ∼ Binomial(n + m, x). X|N = n. 313 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: Let X = trial success probability. X ∼ U (0, 1). Let N = total number of successes. N |X = x ∼ Binomial(n + m, x). P {N = n|X = x}fX (x) P {N = n} n+m n m x (1 − x) = n P {N = n} ∝ xn (1 − x)m fX|N (x|n) = Therefore, X|N Here X ∼ Beta(n + 1, m + 1). ∼ U (0, 1) is the prior distribution. 314 Cornell University, BTRY 4080 / STSCI 4080 If X|N Fall 2009 Instructor: Ping Li ∼ Beta(n + 1, m + 1), then n+1 E(X|N ) = (n + 1) + (m + 1) Suppose we do not have a prior knowledge of the success probability X . We observe n successes out of n + m trials. The most intuitive guess (estimate) of X should be X̂ = n n+m Assuming a uniform prior on X leads to the add-one smoothing. 315 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 6.6: Order Statistics Let X1 , X2 , ..., Xn be n independent and identically distributed, continuous random variables having a common density f and distribution function F . Define X(1) = smallest of X1 , X2 , ..., Xn X(2) = second smallest of X1 , X2 , ..., Xn ... X(j) = j th smallest of X1 , X2 , ..., Xn ... X(n) = largest of X1 , X2 , ..., Xn ≤ X(2) ≤ ... ≤ X(n) are known as the order statistics corresponding to the random variables X1 , X2 , ..., Xn . The ordered values X(1) 316 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li For example, the CDF of the largest (i.e., X(n) ) should be FX(n) (x) = P (X(n) ≤ x) = P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x) because, if the largest is smaller than x, then everyone is smaller than x. Then, by independence FX(n) (x) =P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x) =P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x) =F (x)n , and the density function is simply ∂F n (x) fX(n) (x) = = nF n−1 (x)f (x) ∂x 317 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The CDF of the smallest (i.e., X(1) ) should be FX(1) (x) = P (X(1) ≤ x) =1 − P (X(1) > x) =1 − P (X1 > x, X2 > x, ..., Xn > x) because, if the smallest is larger than x, then everyone is larger than x. Then, by independence FX(1) (x) =1 − P (X1 > x)P (X2 > x)...P (Xn > x) =1 − [1 − F (x)]n and the density function is simply n ∂1 − [1 − F (x)] n−1 fX(1) (x) = = n [1 − F (x)] f (x) ∂x 318 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Joint density function of order statistics for x1 ≤ x2 ≤ ... ≤ xn , fX(1) ,X(2) ,...,X(n) (x1 , x2 , ..., xn ) = n!f (x1 )f (x2 )...f (xn ), Intuitively (and heuristically) P {X(1) = x1 , X(2) = x2 } =P {{X1 = x1 , X2 = x2 } OR {X2 = x1 , X1 = x2 }} =P {X1 = x1 , X2 = x2 } + P {X2 = x1 , X1 = x2 } =f (x1 )f (x2 ) + f (x1 )f (x2 ) =2!f (x1 )f (x2 ) Warning, this is a heuristic argument, not a rigorous proof. Please read the book. 319 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Density function of the k th order statistics, X(k) : Instructor: Ping Li fX(k) (x). • Select 1 from the n values X1 , X2 , ..., Xn . Let it be equal to x. • Select k − 1 from the remaining n − 1 values. Let them be smaller than x. • Let the rest be larger than x. • Total number of choices is ??. • For a given choice, the probability should be ?? 320 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • Select 1 from the n values X1 , X2 , ..., Xn . Let it be equal to x. • Select k − 1 from the remaining n − 1 values. Let them be smaller than x. • Let the rest be larger than x. • n Total number of choices is 1,k−1,n−k . • For a given choice, the probability should be f (x) [F (x)] k−1 n−k [1 − F (x)] . Therefore, fX(k) (x) = n k−1 n−k f (x) [F (x)] [1 − F (x)] 1, k − 1, n − k 321 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Cumulative function of the k th order statistics • Via integrating the density function: = FX(k) (y) = Z y −∞ Z y fX(k) (x) −∞ n k−1 n−k f (x) [F (x)] [1 − F (x)] dx 1, k − 1, n − k Z Z n = 1, k − 1, n − k n = 1, k − 1, n − k y f (x) [F (x)] −∞ F (y) 0 k−1 n−k [1 − F (x)] [z]k−1 [1 − z]n−k dz dx 322 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • Via a direct argument. FX(k) (y) =P {X(k) ≤ (y)} = P {≥ k of the Xi ’s are ≤ y} FX(k) (y) =P {X(k) ≤ y} =P {≥ k of the Xi ’s are ≤ y} = n X j=k = P {j of the Xi ’s are ≤ y} n X n j=k j j [F (y)] [1 − F (y)] n−j 323 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • A more intuitive solution by dividing y into non-overlapping segments. [ {X(k) ≤ y} ={y ∈ [X(n) , ∞)} {y ∈ [X(n−1) , X(n) )} [ [ ... {y ∈ [X(k) , X(k+1) )} n X n j n−j = [F (y)] [1 − F (y)] j j=k n P {y ∈ [X(n) , ∞)} = [F (y)]n n P {y ∈ [X(n−1) , X(n) )} = n [F (y)]n−1 [1 − F (y)] n−1 n P {y ∈ [X(k) , X(k+1) )} = [F (y)]k [1 − F (y)]n−k k 324 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li An interesting identity by letting X n X n j=k j y j [1 − y] n = 1, k − 1, n − k ∼ U (0, 1): n−j Z y 0 xk−1 [1 − x] n−k dx, 0 < y < 1. 325 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Joint Probability Distribution of Functions of Random Variables X1 and X2 are jointly continuous random variables with probability density function fX1 ,X2 . Y1 = g1 (X1 , X2 ), Y2 = g2 (X1 , X2 ). The goal: Determine the density function fY1 ,Y2 . 326 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The formula to be remembered fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 )|J(x1 , x2 )|−1 where, the inverse functions: x1 = h1 (y1 , y2 ), x2 = h2 (y1 , y2 ) and the Jocobian: J(x1 , x2 ) = Two conditions ∂g1 ∂x1 ∂g2 ∂x1 ∂g1 ∂x2 ∂g2 ∂x2 = ∂g1 ∂g2 − ∂g2 ∂g1 ∂x1 ∂x2 ∂x1 ∂x2 • y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 ) can be uniquely solved: x1 = h1 (y1 , y2 ), and x2 = h2 (y1 , y2 ) • The Jocobian J(x1 , x2 ) 6= 0 for all points (x1 , x2 ). 327 Cornell University, BTRY 4080 / STSCI 4080 Example 7a: Fall 2009 Instructor: Ping Li Let X1 and X2 be jointly continuous random variables with probability density fX1 ,X2 . Let Y1 = X1 + X2 and Y2 = X1 − X2 . Q: Find fY1 ,Y2 in terms of fX1 ,X2 . 328 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: y1 = g1 (x1 , x2 ) = x1 + x2 , y2 = g2 (x1 , x2 ) = x1 − x2 , =⇒ y1 −y2 2 , x = h (y , y ) = x1 = h1 (y1 , y2 ) = y1 +y 2 2 1 2 2 2 , Therefore, 1 1 J(x1 , x2 ) = 1 −1 = −2; fY1 ,Y2 (y1 , y2 ) =fX1 ,X2 (x1 , x2 )|J(x1 , x2 )|−1 1 y1 + y2 y1 − y2 = fX1 ,X2 , 2 2 2 329 Cornell University, BTRY 4080 / STSCI 4080 Example 7b: Fall 2009 Instructor: Ping Li The joint density in the polar coordinates. x = h1 (r, θ) = r cos θ , y = h2 (r, θ) = r sin θ , i.e., p r = g1 (x, y) = x2 + y 2 , θ = g2 (x, y) = tan−1 xy . Q (a): Express fR,Θ (r, θ) in terms of fX,Y (x, y). Q (b): What if X and Y are independent standard normal random variables? 330 Cornell University, BTRY 4080 / STSCI 4080 Solution: Fall 2009 First, compute the Jocobian J(x, y): ∂g1 x =p = cos θ 2 2 ∂x x +y ∂g1 y =p = sin θ 2 2 ∂y x +y ∂g2 1 −y −y − sin θ = = = ∂x 1 + (y/x)2 x2 x2 + y 2 r ∂g2 1 1 x cos θ = = 2 = 2 2 ∂y 1 + (y/x) x x +y r ∂g1 ∂g2 ∂g1 ∂g2 cos2 θ sin2 θ 1 J= − = + = ∂x ∂y ∂y ∂x r r r Instructor: Ping Li 331 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li fR,Θ (r, θ) = fX,Y (x, y)|J|−1 = rfX,Y (r cos θ, r sin θ) where 0 < r < ∞, 0 < θ < 2π . If X and Y are independent standard normals, then 2 r − x2 +y2 re−r /2 2 fR,Θ (r, θ) = rfX,Y (r cos θ, r sin θ) = e = 2π 2π R and Θ are independent. Θ ∼ U (0, 2π). 332 Cornell University, BTRY 4080 / STSCI 4080 Example 7d: Fall 2009 Instructor: Ping Li X1 , X2 , and X3 , are independent standard normal random variables. Let Y1 = X1 + X2 + X3 Y2 = X1 − X2 Y3 = X1 − X3 Q: Compute the joint density function fY1 ,Y2 ,Y3 (y1 , y2 , y3 ). 333 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: Compute the Jocobian 1 1 1 J = 1 −1 0 1 0 −1 Solve for X1 , X2 , and X3 Y1 + Y2 + Y3 X1 = , 3 =3 Y1 − 2Y2 + Y3 X2 = , 3 Y1 + Y2 − 2Y3 X3 = 3 fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) = fX1 ,X2 ,X3 (x1 , x2 , x3 )|J|−1 x2 +x2 +x2 − 12 1 1 − 1 22 3 = = e e 3/2 3/2 3(2π) 3(2π) 2 y1 3 2y 2 + 32 2y 2 + 33 2y y − 23 3 334 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li More about Determinant a d g b c e f = aei + bf g + cdh − ceg − bdi − af h h i x1 0 0 0 ... 0 × x2 0 0 ... 0 × x3 0 ... 0 × × ... 0 × × x4 × × × × ... 0 × × × × × xn = x1 x2 x3 ...xn 335 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Chapter 7: Properties of Expectation If X is a discrete random variable with mass function p(x), then E(X) = X xp(x) x If X is a continuous random variable with density f (x), then E(X) = Z ∞ −∞ If a ≤ X ≤ b, then a ≤ E(X) ≤ b. xf (x)dx 336 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proposition 2.1 If X and Y have a joint probability mass function p(x, y), then E[g(x, y)] = XX y g(x, y)p(x, y) x If X and Y have a joint probability density function f (x, y), then E[g(x, y)] = What if g(X, Y ) =X +Y? Z ∞ −∞ Z ∞ −∞ g(x, y)f (x, y)dxdy 337 Cornell University, BTRY 4080 / STSCI 4080 Example 2a: Fall 2009 Instructor: Ping Li An accident occurs at a point X that is uniformly distributed on a road of length L. At the time of the accident an ambulance is at a location Y that is also uniformly distributed on the road. Assuming that X and Y are independent, find the expected distance between the ambulance and the point of the accident. Solution: X ∼ U (0, L), Y ∼ U (0, L). E(|X − Y |) = Z Z To find E(|X − Y |). |x − y|f (x, y)dxdy 338 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: X ∼ U (0, L), Y ∼ U (0, L). To find E(|X − Y |). Z Z E(|X − Y |) = |x − y|f (x, y)dxdy 1 = 2 L 1 = 2 L Z Z 1 = 2 L Z 1 = 2 L L = 3 Z L 0 0 LZ L 0 L 0 Z L |x − y|dxdy 0 x L 1 (x − y) dydx + 2 (y − x) dydx L 0 x 0 L Z L 2 2 x y y 1 xy − dx + 2 − xy dx 2 0 L 0 2 x Z L 2 x2 1 L x2 dx + 2 − xL + dx 2 L 0 2 2 Z L Z 339 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Let X and Y be continuous random variables with density f (x, y). g(X, Y ) = X + Y . E[g(x, y)] = E(X + Y ) Z ∞Z ∞ = (x + y)f (x, y)dxdy = Z = Z −∞ ∞ Z −∞ ∞ Z ∞ xf (x, y)dxdy + −∞ −∞ −∞ Z ∞ Z ∞ Z x f (x, y)dy dx + = −∞ ∞ −∞ −∞ xfX (x)dx + =E(X) + E(Y ) Z ∞ −∞ Z ∞ yf (x, y)dxdy −∞ Z ∞ ∞ y f (x, y)dx dy −∞ yfY (y)dy −∞ 340 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Linearity of Expectations E(X + Y ) = E(X) + E(Y ) E(X + Y + Z) = E((X + Y ) + Z) =E(X + Y ) + E(Z) = E(X) + E(Y ) + E(Z) E(X1 + X2 + ... + Xn ) = E(X1 ) + E(X2 ) + ... + E(Xn ) Linearity is not affected by correlations among Xi ’s. No assumption of independence is needed. 341 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Sample Mean Let X1 , X2 , ..., Xn be identically distributed random variables having expected value µ. The quantity X̄ , defined by n 1X X̄ = Xi n i=1 is called the sample mean. 342 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li " n X # 1 Xi n i=1 " n # X 1 = E Xi n i=1 E(X̄) =E n 1X = E[Xi ] n i=1 n 1X = µ n i=1 1 = nµ n =µ When µ is unknown, an unbiased estimator of µ is X̄ . 343 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectation of a Binomial Random Variable X ∼ Binomial(n, p) X = Z1 + Z2 + ... + Zn where Zi , i = 1 to n, are independent Bernoulli with probability of success p. Because E(Zi ) = p, E(X) = E n X i=1 Zi ! = n X i=1 E(Zi ) = np 344 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectation of a Hypergeometric Random Variable Suppose n balls are randomly selected without replacement, from an urn containing N balls of which m are white. Denote X ∼ HG(N, m, n). Q: Find E(X). Solution 1: Let X = X1 + X2 + ... + Xm , where 1 if the ith white ball is selected Xi = 0 otherwise Note that the Xi ’s are not independent; but we only need the expectation. 345 Cornell University, BTRY 4080 / STSCI 4080 Solution 1: Therefore, Fall 2009 Let X = X1 + X2 + ... + Xm , where 1 if the ith white ball is selected Xi = 0 otherwise E(Xi ) =P {Xi = 1} = P {ith white ball is selected} 1 N −1 n 1 n−1 = = N N n mn E(X) = . N Instructor: Ping Li 346 Cornell University, BTRY 4080 / STSCI 4080 Solution 2: Fall 2009 Let X = Y1 + Y2 + ... + Yn , where 1 if the ith ball selected is white Yi = 0 otherwise E(Yi ) =P {Yi = 1} = P {ith ball elected is white} m = N Therefore, mn E(X) = . N Instructor: Ping Li 347 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expected Number of Matches A group of N people throw their hats into the center of a room. The hats are mixed up and each person randomly selects one. Q: Find the expected number of people that selects their own hat. 348 Cornell University, BTRY 4080 / STSCI 4080 Solution: Fall 2009 Instructor: Ping Li Let X denote the number of matches and let X = X1 + X2 + ... + XN , where 1 if the i th person selects his own hat Xi = 0 otherwise 1 E(Xi ) = P {Xi = 1} = N Thus E(X) = N X i=1 E(Xi ) = 1 N =1 N 349 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Expectations May Be Undefined A More Rigorous Definition: If X is a discrete random variable with mass function p(x), then E(X) = X xp(x), x provided X x |x|p(x) < ∞ If X is a continuous random variable with density f (x), then E(X) = Z ∞ −∞ xf (x)dx, provided Z ∞ −∞ |x|f (x) < ∞ 350 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Example (Cauchy Distribution): 1 1 f (x) = , π x2 + 1 E(X) = Z ∞ −∞ −∞ < x < ∞ x 1 dx = 0 ??? π x2 + 1 (Answer is NO) because Z ∞ −∞ ∞ |x| 1 dx = π x2 + 1 Z ∞ 0 1 1 2 = d[x + 1] 2+1 π x 0 ∞ 1 2 = log(x + 1)0 π 1 = log ∞ = ∞ π Z 2x 1 dx π x2 + 1 351 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 7.3: Moments of The Number of Events X= Pn i=1 Ii : 1 if A occurs i Ii = 0 otherwise number of these events Ai that occur. We know E(X) = n X i=1 E(Ii ) = n X P (Ai ) i=1 Suppose we are interested in the number of pairs of events that occur, eg. Ai Aj 352 Cornell University, BTRY 4080 / STSCI 4080 X= Pn X(X−1) 2 i=1 Ii : = ⇐⇒ P i<j Ii Ij : Therefore, Fall 2009 Instructor: Ping Li number of these events Ai that occur. X 2 : number of pairs of event that occur. number of pairs of event that occur, because 1 if both A and A occur i j Ii Ij = 0 otherwise E X(X − 1) 2 E X 2 = X P (Ai Aj ) i<j − E (X) = 2 X i<j P (Ai Aj ) 353 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Second Moments of a Binomial Random Variables n independent trials, with each trial being a success with probability p. Ai = the event that trial i is a success. If i 6= j , then P (Ai Aj ) = p2 . E X 2 n 2 − E (X) = 2 P (Ai Aj ) = 2 p = n(n − 1)p2 2 i<j X E(X 2 ) = n2 p2 − np2 + np 2 2 2 2 V ar(X) = n p − np + np − [np] = np(1 − p) 354 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Second Moments of a Hypergeometric Random Variables n balls are randomly selected without replacement from an urn containing N balls, of which m are white. Ai = event that the ith ball selected is white. X = number of white balls selected. We have known n X m E(X) = P (Ai ) = n . N i=1 355 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li mm−1 P (Ai Aj ) = P (Ai )P (Aj |Ai ) = , N N −1 E X 2 i 6= j n mm−1 − E (X) = 2 P (Ai Aj ) = 2 2 N N −1 i<j X m m − 1 nm + E(X ) = n(n − 1) N N −1 N 2 m nm(N − n)(N − m) m N − n V ar(X) = =n 1− N 2 (N − 1) N N N −1 One can view m N = p. Note that N −n N −1 ≈ 1 if N is much larger than n. 356 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Second Moments of the Matching Problem Let Ai = event that person i selects his/her own hat in the match problem, i =1 to N . 1 1 P (Ai Aj ) = P (Ai )P (Aj |Ai ) = , N N −1 i 6= j X = number of people who select their own hat. We have shown E(X) = 1. 1 E(X ) − E(X) = 2 P (Ai Aj ) = N (N − 1) =1 N (N − 1) i<j 2 X V ar(X) = E(X 2 ) − E 2 (X) = 1 357 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Section 7.4: Covariance and Correlations Proposition 4.1: If X and Y are independent, then E(XY ) = E(X)E(Y ). For any function h and g , E[h(X) g(Y )] = E[h(X)] E[g(Y )]. Instructor: Ping Li 358 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof: E[h(X) g(Y )] = = = Z ∞ −∞ Z ∞ Z g(x)h(y)f (x, y)dxdy −∞ Z ∞ −∞ ∞ Z ∞ −∞ g(x)h(y)fX (x)fY (y)dxdy −∞ Z ∞ g(x)fX (x)dx h(y)fY (y)dy =E[h(X)] E[g(Y )] −∞ 359 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Covariance Definition: The covariance between X and Y , is defined by Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] Equivalently Cov(X, Y ) = E(XY ) − E(X)E(Y ) Instructor: Ping Li 360 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof: Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] = E[XY − E[X]Y − E[Y ]X + E[X]E[Y ]] = E[XY ] − E[E[X]Y ] − E[E[Y ]X] + E[E[X]E[Y ]] = E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ] = E[XY ] − E[X]E[Y ] Note that E(X) is a constant, so is E(X)E(Y ) 361 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li X and Y are independent =⇒ Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0. However, that X and Y are uncorrelated Cov(X, Y ) does not imply that X and Y are independent = 0, 362 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Properties of Covariance: Proposition 4.2 • Cov(X, Y ) = Cov(Y, X) • Cov(X, X) = V ar(X) • Cov(aX, Y ) = aCov(X, Y ) Directly follow from the definition Cov(X, Y ) • Cov n X i=1 Xi , m X j=1 Yj = = E(XY ) − E(X)E(Y ). n X m X i=1 j=1 Cov(Xi , Yj ) 363 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Proof: Cov n X i=1 Xi , m X j=1 Yj =E =E = n X i=1 i=1 j=1 n X m X n X m X i=1 j=1 = j=1 n X m X i=1 j=1 = Xi m X n X m X i=1 j=1 Yj − E Xi Yj − E(Xi Yj ) − n X n X Xi i=1 E(Xi ) i=1 n X m X ! m X j=1 E(Yj ) j=1 E(Xi )E(Yj ) i=1 j=1 [E(Xi Yj ) − E(Xi )E(Yj )] Cov(Xi , Yj ) m X E Yj 364 Cornell University, BTRY 4080 / STSCI 4080 Cov n X Fall 2009 Xi , i=1 m X j=1 Instructor: Ping Li Yj = n X m X Cov(Xi , Yj ) i=1 j=1 If n = m and Xi = Yi , then ! n n n n X n X X X X V ar Xi =Cov Xi , Xj = Cov(Xi , Xj ) i=1 i=1 = n X j=1 i=1 j=1 Cov(Xi , Xi ) + i=1 = n X i=1 = n X i=1 X Cov(Xi , Xj ) i6=j V ar(Xi ) + X Cov(Xi , Xj ) i6=j V ar(Xi ) + 2 X i<j Cov(Xi , Xj ) 365 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li If Xi ’s are pairwise independent, then V ar n X i=1 Example 4b: Xi ! = n X V ar(Xi ) i=1 X ∼ Binomial(n, p) can be written as X = X1 + X2 + ... + Xn , where Xi ∼ Bernoulli(p). Xi ’s are independent. Therefore ! n X V ar(X) =V ar Xi i=1 = n X i=1 V ar(Xi ) = n X i=1 p(1 − p) = np(1 − p). 366 Cornell University, BTRY 4080 / STSCI 4080 Example 4a: Fall 2009 Instructor: Ping Li Let X1 , X2 , ..., Xn be independent and identically distributed random variables having expected value µ and variance σ 2 . Let X̄ be the sample mean and S 2 be the sample variance, where n 1X Xi , X̄ = n i=1 Q: Find V ar(X̄) and E(S 2 ). n 1 X 2 S = (Xi − X̄)2 n − 1 i=1 367 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: V ar(X̄) =V ar 1 n 1 = 2 V ar n n X Xi i=1 n X ! Xi i=1 ! n 1 X = 2 V ar (Xi ) n i=1 n σ2 1 X 2 σ = = 2 n i=1 n We’ve already shown E(X̄) = µ. As the sample size n increases, the variance of X̄ , the unbiased estimator of µ, decreases. 368 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Less Clever Approach (different from the book) E(S 2 ) =E 1 n−1 1 E = n−1 1 = E n−1 1 = E n−1 n X i=1 n X i=1 n X i=1 n X i=1 n X (Xi − X̄)2 Xi2 ! 2 + X̄ − 2X̄Xi Xi2 + nX̄ 2 − 2X̄ Xi2 − nX̄ 2 ! 1 2 2 = E(Xi ) − nE X̄ n − 1 i=1 n 2 2 2 µ + σ − E X̄ = n−1 ! n X i=1 ! Xi ! 369 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li !2 n n X X X 1 1 2 2 Xi = 2E Xi + 2 Xi Xj E(X̄ ) = 2 E n n i=1 i i<j n X 1 X 2 = 2 E(Xi ) + 2 E(Xi Xj ) n i i<j n X 1 X 2 2 = 2 (µ + σ ) + 2 µ2 n i i<j Or, directly, 2 σ 1 + µ2 = 2 n µ2 + σ 2 + n(n − 1)µ2 = n n 2 E(X̄ ) =V ar X̄ + E 2 σ2 X̄ = + µ2 n 370 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Therefore, n 2 2 2 E(S ) = µ + σ − E X̄ n−1 2 n σ = µ2 + σ 2 − − µ2 n−1 n 2 =σ 2 Sample variance S 2 = 1 n−1 Pn i=1 (Xi − X̄)2 is an unbiased estimator of σ 2 . 1 1 Why n−1 instead of n ? Pn 2 (X − X̄) is a sum of n terms with only n − 1 degrees of freedom, i i=1 P n 1 because X̄ = n i=1 Xi . 371 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Standard Clever Technique for evaluating E(S 2 ) (as in the textbook) (n − 1)S 2 = = = n X i=1 n X i=1 n X i=1 (Xi − X̄)2 = (Xi − µ)2 + n X i=1 n X i=1 (Xi −µ + µ − X̄)2 (X̄ − µ)2 − 2 n X i=1 (Xi − µ)(X̄ − µ) (Xi − µ)2 − n(X̄ − µ)2 E[(n − 1)S 2 ] =(n − 1)E(S 2 ) = E " n X i=1 (Xi − µ)2 − n(X̄ − µ)2 # σ2 =nV ar(Xi ) − nV ar(X̄) = nσ − n = (n − 1)σ 2 n 2 372 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Correlation of Two Random Variables Definition: The correlation of two random variables X and Y , denoted by ρ(X, Y ), is defined, as long as V ar(X) > 0 and V ar(Y ) > 0, by ρ(X, Y ) = p Cov(X, Y ) V ar(X)V ar(Y ) It can be shown that −1 ≤ ρ(X, Y ) ≤ 1 373 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 −1 ≤ ρ(X, Y ) = p Proof: Instructor: Ping Li Cov(X, Y ) V ar(X)V ar(Y ) The basic trick: Variance V ar(X) V ar = ≤1 ≥ 0 always. X Y p +p V ar(X) V ar(Y ) ! V ar(X) V ar(Y ) 2Cov(X, Y ) + +p V ar(X) V ar(Y ) V ar(X)V ar(Y ) =2 (1 + ρ(X, Y )) ≥ 0 Therefore, ρ(X, Y ) ≥ −1. 374 Cornell University, BTRY 4080 / STSCI 4080 V ar Fall 2009 X Y p −p V ar(X) V ar(Y ) Instructor: Ping Li ! V ar(X) V ar(Y ) 2Cov(X, Y ) p = + − V ar(X) V ar(Y ) V ar(X)V ar(Y ) =2 (1 − ρ(X, Y )) ≥ 0 Therefore, ρ(X, Y ) ≤ 1. 375 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li • ρ(X, Y ) = +1 =⇒ Y = a + bX , and b > 0. • ρ(X, Y ) = −1 =⇒ Y = a − bX , and b > 0. • ρ(X, Y ) = 0 =⇒ X and Y are uncorrelated • ρ(X, Y ) is close to +1 =⇒ X and Y are highly positively correlated. • ρ(X, Y ) is close to −1 =⇒ X and Y are highly negatively correlated. 376 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Section 7.5: Conditional Expectation If X and Y are jointly discrete random variables E[X|Y = y] = X x = X xP {X = x|Y = y} xpX|Y (x|y) x If X and Y are jointly continuous random variables E[X|Y = y] = Z ∞ −∞ xfX|Y (x|y)dx 377 Cornell University, BTRY 4080 / STSCI 4080 Example 5b: Fall 2009 Instructor: Ping Li Suppose the joint density of X and Y is given by 1 −x/y −y e , f (x, y) = e y 0 < x, y < ∞ Q: Compute E[X|Y = y] = What the answer should be like? Z ∞ xfX|Y (x|y)dx −∞ (a function a y ) 378 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Solution: fX|Y (x|y) = f (x, y) f (x, y) = R∞ fY (y) f (x, y)dx −∞ 1 −x/y −y e ye = R ∞ 1 −x/y −y e e dx 0 y = = 1 −x/y −y e ye 0 −x/y −y e e ∞ 1 −x/y −y e ye e−y 1 = e−x/y , y E[X|Y = y] = Z ∞ −∞ y>0 xfX|Y (x|y)dx = Z ∞ 0 x −x/y e dx = y y 379 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Computing Expectations by Conditioning Proposition 5.1: The tower property E(X) = E[E[X|Y ]] In some cases, computing E(X) is difficult; but computing E(X|Y ) may be much more convenient. Instructor: Ping Li 380 Cornell University, BTRY 4080 / STSCI 4080 Proof: Fall 2009 Instructor: Ping Li Assume X and Y are continuous. (Book assumes discrete variables) E[E[X|Y ]] = Z ∞ E(X|Y )fY (y)dy −∞ Z ∞ Z ∞ = xfX|Y (x|y)dx fY (y)dy −∞ −∞ Z ∞Z ∞ = xfX|Y (x|y)fY (y)dxdy −∞ −∞ Z ∞Z ∞ = x fX|Y (x|y)fY (y) dxdy −∞ −∞ Z ∞Z ∞ = xf (x, y)dxdy −∞ −∞ Z ∞ Z ∞ = x f (x, y)dy dx −∞ −∞ Z ∞ = xfX (x)dx = E(X) −∞ 381 Cornell University, BTRY 4080 / STSCI 4080 Example 5d: Fall 2009 Instructor: Ping Li Suppose N the number of people entering a store on a given day is a random variable with mean 50. Suppose further that the amounts of money spent by these customers are independent random variables having a common mean $8. Assume the amount of money spent by a customer is also independent of the total number of customers to enter the store. Q: What is the expected amount of money spent in the store on a given day? Solution: N : the number of people entering the store. E(N ) = 50. Xi : the amount spent by the ith customer. E(Xi ) = 8. X : the total amount of money spent in the store on a give day. E(X) =?? 382 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li We can not use the linearity directly: E(X) = E N X i=1 Xi ! = N X E(Xi ) = i=1 N X 8 ??? i=1 because N , the number of terms in the summation, is also random. E(X) =E N X i=1 " =E E =E " Xi ! N X i=1 N X i=1 Xi |N E(Xi ) !# # =E(N 8) = 8E(N ) =8 × 50 = $400 383 Cornell University, BTRY 4080 / STSCI 4080 Example 2e: Fall 2009 Instructor: Ping Li A game is begun by rolling an ordinary pair of dice. • If the sum of the dice is 2, 3, or 12, the player loses. • If the sum is 7 or 11, the player wins. • If the sum is any other numbers i,the player continues to roll the dice until the sum is either 7 or i. – If it is 7, the player loses; – If it is i, the player wins; —————————– Q : Compute E(R), where R= number of rolls in a game. 384 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Conditional on S , the initial sum, E(R) = 12 X E(R|S = i)P (S = i) i=2 Let P (S = i) = Pi . 1 , 36 4 , P5 = 36 5 P8 = , 36 2 P11 = , 36 P2 = 2 , 36 5 P6 = , 36 4 P9 = , 36 1 P12 = 36 P3 = 3 , 36 6 P7 = , 36 3 P10 = , 36 P4 = Equivalently, Pi = P14−i i−1 = , 36 i = 2, 3, ..., 7 385 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Obviously, 1, if i = 2, 3, 7, 11, 12 E(R|S = i) = 1 + E(Z), otherwise Z = the number of rolls until the sum is either i or 7. Note that i ∈ / {2, 3, 7, 11, 12} at this point. Z is a geometric random variable with success probability = Pi + P7 . 1 Thus E(Z) = P +P . i 7 Therefore, E(R) =1 + 6 X i=4 10 X Pi Pi + = 3.376 Pi + P7 i=8 Pi + P7 386 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Computing Probabilities by Conditioning If Y is discrete, then P (E) = X P (E|Y = y)P (Y = y) y If Y is continuous, then P (E) = Z ∞ −∞ P (E|Y = y)fY (y)dy 387 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li The Best Prize Problem: Deal? No Deal! We are presented with n distinct prizes in sequence. After being presented with a prize we must decide whether to accept it or to reject it and consider the next prize. The objective is to maximize the probability of obtaining the best prize. Assuming that all n! orderings are equally likely. Q: How well can we do? What would be a good strategy? —————————– Many related applications: online algorithms, hiring decisions, dating, etc. 388 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 0 ≤ k ≤ n. Reject the first k prizes and then accepts the first one that is better than all of those first k . A plausible strategy: Fix a number k , Let X = the location of the best prize, i.e., 1 ≤ X ≤ n. Let Pk (best) = the best prize is selected using this strategy. Pk (best) = n X Pk (best|X = i)P (X = i) i=1 n X 1 = n i=1 Pk (best|X = i) Obviously Pk (best|X = i) = 0, if i ≤k 389 Cornell University, BTRY 4080 / STSCI 4080 Assume i Fall 2009 Instructor: Ping Li > k , then Pk (best|X = i) =Pk (best of the first i − 1 is among the first k|X = i) = k i−1 Therefore, n 1X Pk (best) = Pk (best|X = i) n i=1 n k X 1 = n i−1 i=k+1 Next question: which k to use? We choose the k that maximizes Pk (best). Exhaustive search is the best thing to do, which only involves a linear scan. However, it is often useful to have an “analytical” (albeit approximate) expression. 390 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li A useful approximate formula n X 1 i=1 1 1 1 = 1 + + + ... + ≈ log n + γ i 2 3 n where Euler’s constant γ = 0.57721566490153286060651209008240243104215933593992... The approximation is asymptotically (as n → ∞) exact. 391 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 n k X 1 Pk (best) = n i−1 i=k+1 k 1 1 1 = + + ... + n k k+1 n−1 ! n−1 k−1 k X1 X1 − = n i=1 i i i=1 k (log(n − 1) − log(k − 1)) n n−1 k = log n k−1 ≈ Instructor: Ping Li 392 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li k n−1 k n ≈ log Pk (best) ≈ log n k−1 n k ′ [Pk (best)] ≈ Also, when k = n e, 1 n k1 n n log − = 0 =⇒ log = 1 =⇒ k = n k nk k e Pk (best) ≈ n1 ne ≈ 0.36788, as n → ∞. 393 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Conditional Variance Proposition 5.2: V ar(X) = E[V ar(X|Y )] + V ar(E[X|Y ]) Proof: (from the opposite direction, compared to the book) E[V ar(X|Y )] + V ar(E[X|Y ]) 2 2 2 2 =E E(X |Y ) − E (X|Y ) + E(E (X|Y )) − E (E[X|Y ]) 2 2 2 2 = E(E(X |Y )) − E (E[X|Y ]) + −E(E (X|Y )) + E(E (X|Y )) =E(X 2 ) − E 2 (X) =V ar(X) 394 Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li Let X1 , X2 , ..., XN be a sequence of independent and Example 5o: identically distributed random variables and let N be a non-negative integer-valued random variable independent of Xi . Q: Compute V ar P N i=1 Xi . Solution: V ar N X i=1 Xi ! " =E V ar N X i=1 Xi |N !# + V ar E =E (V ar(X)N ) + V ar(N E(X)) =V ar(X)E(N ) + E 2 (X)V ar(N ) " N X i=1 Xi |N #! 395
© Copyright 2026 Paperzz