Theory of Probability - Department of Statistics and Biostatistics

Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
BTRY 4080 / STSCI 4080, Fall 2009
Theory of Probability
Instructor: Ping Li
Department of Statistical Science
Cornell University
Instructor: Ping Li
1
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
General Information
• Lectures: Tue, Thu 10:10-11:25 am, Malott Hall 253
• Section 1: Mon 2:55 - 4:10 pm, Warren Hall 245
• Section 2: Wed 1:25 - 2:40 pm, Warren Hall 261
• Instructor: Ping Li, [email protected],
Office Hours: Tue, Thu 11:25 am -12 pm, 1192, Comstock Hall
• TA: Xiao Luo, [email protected].
Office hours:
(1) Mon, 4:10 - 5:10pm Warren Hall 245;
(2) Wed, 2:40 - 3:40pm, Warren Hall 261.
• Prerequisites: Math 213 or 222 or equivalent; a course in statistical methods
is recommended
• Textbook: Ross, A First Course in Probability, 7th edition, 2006
2
Cornell University, BTRY 4080 / STSCI 4080
• Exams:
Fall 2009
Locations TBD
– Prelim 1: In Class, September 29, 2009
– Prelim 2: In Class, October 20, 2009
– Prelim 3: In Class, TBD
– Final Exam: Wed. December 16, 2009, from 2pm to 4:30pm.
– Policy: Close book, close notes
Instructor: Ping Li
3
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• Homework: Weekly
– Please turn in your homework either in class or to BSCB front desk
(Comstock Hall, 1198).
– No late homework will be accepted.
– Before computing your overall homework grade, the assignment with the
lowest grade (if ≥
grade (if ≥
25%) will be dropped, the one with the second lowest
50%) will also be dropped.
– It is the students’ responsibility to keep copies of the submitted homework.
4
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• Grading:
1. The homework: 30%
2. The prelim with the lowest grade:
3. The other two prelims:
4. The final exam
5%
15% each
35%
• Course Letter Grade Assignment
A ≈ 90% (in the absolute scale)
C ≈ 60% (in the absolute scale)
In borderline cases, participation in section and class interactions will be used as
a determining factor.
5
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Material
Topic
Textbook
Combinatorics
1.1 - 1.6
Axioms of Probability
2.1 - 2.5
Conditional Probability and Independence
3.1 - 3.5
Discrete Random Variables
4.1 - 4.9
Continuous Random Variables
5.1 - 5.7
Jointly Distributed Random Variables
6.1 - 6.7
Expectation
7.1 - 7.9
As time permits: Limit Theorems 8.1 - 8.6,
Poisson Processes and Markov Chains 9.1 - 9.2 Simulation 10.1 - 10.4
6
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Probability Is Increasingly Popular & Truly Useful
An Example: Probabilistic Counting
Consider two sets S1 , S2
∈ Ω = {0, 1, 2, ..., D − 1}, for example
D = 2000
S1 = {0, 15, 31, 193, 261, 495, 563, 919}
S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921}
Q: What is the size of intersection a
A: Easy!
= |S1 ∩ S2 |?
a = |{15, 193, 563}| = 3.
7
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Challenges
• What if the size of space D = 264 ?
• What if there are 10 billion of sets, instead of just 2?
• What if we only need approximate answers?
• What if we need the answers really quickly?
• What if we are mostly interested in sets that are highly similar?
• What if the sets are dynamic and frequently updated?
• ...
Instructor: Ping Li
8
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Search Engines
Google/Yahoo!/Bing all collected at least 1010 (10 billion) web pages.
Two examples (among many tasks)
• (Very quickly) Determine if two Web pages are nearly duplicates
A document can be viewed as a collection (set) of words or contiguous words.
This is a counting of set intersection problem.
• (Very quickly) Determine the number of co-occurrences of two words
A word can be viewed as a set of documents containing that word
This is also a counting of set intersection problem.
9
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Term Document Matrix
Doc 1 Doc 2
Doc m
Word 1
1
0
0
1
0
Word 2
0
1
0
1
0
0
0
0
1
1
0
Word 3
Word 4
Word n
≥ 1010 documents (Web pages).
≥ 105 common English words. (What about total number of contiguous words?)
10
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
One Method for Probabilistic Counting of Intersections
Consider two sets S1 , S2
⊂ Ω = {0, 1, 2, ..., D − 1}
S1 = {0, 15, 31, 193, 261, 495, 563, 919}
S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921}
One can first generate a random permutation π :
π : Ω = {0, 1, 2, ..., D − 1} −→ {0, 1, 2, ..., D − 1}
Apply π to sets S1 and S2 (and all other sets). For example
π (S1 ) = {13, 139, 476, 597, 698, 1355, 1932, 2532}
π (S2 ) = {13, 236, 476, 683, 798, 1456, 1968, 2532, 3983}
11
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Store only the minimums
min{π (S1 )} = 13,
min{π (S2 )} = 13
Suppose we can theoretically calculate the probability
P = Pr (min{π (S1 )} = min{π (S2 )})
S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).
Generate (small) k random permutations and every times store the minimums.
Count the number of matches to estimate the similarity between S1 and S2 .
12
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Later, we can easily show that
P = Pr (min{π (S1 )} = min{π (S2 )}) =
where a
|S1 ∩ S2 |
a
=
|S1 ∪ S2 |
f1 + f2 − a
= |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |.
Therefore, we can estimate P as a binomial probability from k permutations.
13
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Combinatorial Analysis
Ross: Chapter 1, Sections 1.1 - 1.6
Instructor: Ping Li
14
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Basic Principles of Counting, Section 1.2
1. The multiplicative principle
A project requires two sub-tasks, A and B.
There are m ways to complete A, and there are n ways to complete B.
Q: How many different ways to complete this project?
2. The additive principle
A project can be completed either by doing task A, or by doing task B.
There are m ways to complete A, and there are n ways to complete B.
Q: How many different ways to complete this project?
15
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Multiplicative Principle
B
A
1
2
1
2
Destination
Origin
m
n
Q: How many different routes from the origin to reach the destination?
Route 1:
Route 2:
...
Route n:
Route n+1:
...
Route ?
{ A:1 + B:1}
{ A:1 + B:2}
{ A:1 + B:n}
{ A:2 + B:1}
{ A:m + B:n}
16
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The additive Principle
A
1
2
Destination
Origin
m
B
1
2
n
Q: How many different routes from the origin to reach the destination?
17
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors,
and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to
be chosen.
Q: How many different subcommittees are possible?
—————–
Multiplicative or additive?
18
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
In a 7-place license plate, the first 3 places are to be occupied by letters and the
final 4 by numbers, e.g., ITH-0827.
Q: How many different 7-place license plates are possible?
—————–
Multiplicative or additive?
19
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
How many license plates would be possible if repetition among letters or numbers
were prohibited?
20
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Permutations and Combinations, Section 1.3, 1.4
Need to select 3 letters from 26 English letter, to make a license plate.
Q: How many different arrangements are possible?
———————
The answer differs if
• Order of 3 letters matters =⇒ Permutation
e.g., ITH, TIH, HIT, are all different.
• Order of 3 letters does not matter =⇒ Combination
21
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Permutations
Basic formula:
Suppose we have n objects. How many different permutations are there?
n × (n − 1) × (n − 2) × ... × 1 = n!
Why?
22
Cornell University, BTRY 4080 / STSCI 4080
1
Fall 2009
2
Instructor: Ping Li
3
4
...
n
...
n choices
n−1 choices n−2 choices
Which counting principle?:
...
Multiplicative? additive?
1 choice
23
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example:
A baseball team consists of 9 players.
Q: How many different batting orders are possible?
Instructor: Ping Li
24
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A class consists of 6 men and 4 women. The students are ranked according to
their performance in the final, assuming no ties.
Q: How many different rankings are possible?
Q: If the women are ranked among themselves and the men among themselves
how many different rankings are possible?
25
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example:
How many different letter arrangements can be formed using P,E,P ?
The brute-force approach
• Step 1: Permutations assuming P1 and P2 are different.
P1 P2 E
P1 E P2
P2 P1 E
P2 E P1
E P1 P2
E P2 P1
• Step 2: Removing duplicates assuming P1 and P2 are the same.
PPE
PEP
EPP
Q: How many duplicates?
Instructor: Ping Li
26
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
How many different letter arrangements can be formed using P,E,P,P,E,R ?
Answer:
6!
= 60
3! × 2! × 1!
Why?
27
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A More General Example of Permutations:
How many different permutations of n objects, of which n1 are identical, n2 are
identical, ..., nr are identical?
• Step 1: Assuming all n objects are distinct, there are n! permutations.
• Step 2: For each permutation, there are n1 ! × n2 ! × ... × nr ! ways to
switch objects and still remain the same permutation.
• Step 3: The total number of (unique) permutations are
n!
n1 ! × n2 ! × ... × nr !
28
Cornell University, BTRY 4080 / STSCI 4080
1
2
3
4
Fall 2009
5
Instructor: Ping Li
6
7
8
9
10
11
How many ways to switch the elements and remain the same permutation?
29
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A signal consists of 9 flags hung in a line, which can be made from a set of 4
white flags, 3 red flags, and 2 blue flags. All flags of the same color are identical.
Q: How many different signals are possible?
30
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Combinations
Basic question of combination:
The number of different groups of r
objects that could be formed from a total of n objects.
Basic formula of combination:
n
n!
=
r
(n − r)! × r!
Pronounced as “n-choose-r ”
31
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Basic Formula of Combination
When orders are relevant, the number of different groups of r objects
Step 1:
formed from n objects =
1
n × (n − 1) × (n − 2) × ... × (n − r + 1)
2
3
4
...
r
...
n choices
Step 2:
n−1 choices n−2 choices
...
n−r+1 choices
When orders are NOT relevant, each group of r objects can be
formed by r! different ways.
32
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Step 3: Therefore, when orders are NOT relevant, the number of different groups
of r objects would be
n × (n − 1) × ... × (n − r + 1)
r!
n × (n − 1) × ... × (n − r + 1)×(n − r) × (n − r − 1) × ... × 1
r!×(n − r) × (n − r − 1) × ... × 1
n!
=
(n − r)! × r!
n
=
r
=
33
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example:
A committee of 3 is to be formed from a group of 20 people.
Q: How many different committees are possible?
Instructor: Ping Li
34
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
How many different committees consisting of 2 women and 3 men can be formed,
from a group of 5 women and 7 men?
35
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (1.4.4c):
Consider a set of n antennas of which m are defective and n − m are functional.
Q: How many linear orderings are there in which no two defectives are
consecutive?
A:
n−m+1
.
m
Line up n − m functional antennas. Insert (at most) one defective antenna
between two functional antennas. Considering the boundaries, there are in total
n − m + 1 places to insert defective antennas.
36
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n
Useful Facts about Binomial Coefficient
r
1.
n
0
2.
n
r
3.
n
r
=
1, and nn
=
n
n−r
=
n−1
r−1
= 1. Recall 0! = 1.
n−1
r
+
Proof 1 by algebra
n−1
r−1
+
n−1
r
=
(n−1)!
(n−r)!(r−1)!
+
(n−1)!
(n−r−1)!r!
=
(n−1)!(r+n−r)
(n−r)!r!
Proof 2 by a combinatorial argument
A selection of r objects either contains object 1 or does not contain object 1.
37
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
4. Some computational tricks
• Avoid factorial n!, which is easily very large. 170! = 7.2574 × 10306
•
n
Take advantage of r
=
n
n−r
• Convert products to additions
n
n(n − 1)(n − 2)...(n − r + 1)
=
r!
r


r−1
r
X

X
= exp
log(n − j) −
log j


j=0
j=1
• Use accurate approximations of the log factorial function (log n!). Check
http://en.wikipedia.org/wiki/Factorial.
38
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Binomial Theorem
The Binomial Theorem:
n
(x + y) =
n X
n
k=0
Example for n
2
k
xk y n−k
= 2:
2
2
(x + y) = x + 2xy + y ,
2
0
=
2
2
=
1, 21
=2
39
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof of the Binomial Theorem by Induction
Two key components in proof by induction:
1. Verify a base case, e.g., n
= 1.
2. Assume the case n − 1 is true, prove for the case n.
Proof:
The base case is trivially true (but always check ! !)
Assume
(x + y)n−1 =
n−1
X
k=0
n − 1 k n−1−k
x y
k
40
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Then
n X
n
n−1
X
n k n−k
xk y n−k =
x y
+ xn + y n
k
k
k=0
k=1
n−1
n−1
X n − 1
X n − 1
=
xk y n−k +
xk y n−k + xn + y n
k
k−1
k=1
k=1
n−1
n−1
X n − 1
X n − 1
=y
xk y n−1−k − y n +
xk y n−k + xn + y n
k
k−1
k=0
k=1
n−2
X n − 1
=y(x + y)n−1 +
xj+1 y n−1−j + xn ,
(let j = k − 1)
j
j=0
=y(x + y)n−1 + x
n−1
X
j=0
n − 1 j n−1−j
x y
− xn + xn
j
=y(x + y)n−1 + x(x + y)n−1 = (x + y)n
41
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Multinomial Coefficients
A set of n distinct items is to be divided into r distinct groups of respective size
n1 , n2 , ..., nr , where
Pr
i=1
ni = n.
Q: How many different divisions are possible?
Answer:
n
n1 , n2 , ..., nr
=
n!
n1 !n2 !...nr !
42
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof
n
Step 1: There are n
1
ways to form the first group of size n1 .
n−n1
Step 2: There are
n2
...
ways to form the second group of size n2 .
n−n1 −n2 −...−nr−1
Step r : There are
nr
ways to form the last group of size nr .
Therefore, the total number of divisions should be
n − n1
n − n1 − n2 − ... − nr−1
×
× ... ×
n2
nr
n!
(n − n1 )!
(n − n1 − ... − nr−1 )!
=
×
× ... ×
(n − n1 )!n1 ! (n − n1 − n2 )!n2 !
(n − n1 − ...nr−1 − nr )!nr !
n!
=
n1 !n2 !...nr !
n
n1
43
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A police department in a small city consists of 10 officers. The policy is to have 5
patrolling the streets, 2 working full time at the station, and 3 on reserve at the
station.
Q: How many different divisions of 10 officers into 3 groups are possible?
44
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Number of Integer Solutions of Equations
Suppose x1 , x2 , ..., xr , are positive, and x1
+ x2 + ... + xr = n.
Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints.
A:
Line up n 1’s. Draw r
(numbers).
n−1
r−1
− 1 lines from in total n − 1 places to form r groups
45
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Suppose x1 , x2 , ..., xr , are non-negative, and x1
+ x2 + ... + xr = n.
Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints.
A:
n+r−1
,
r−1
by padding r zeros after n one’s.
One can also let yi
= xi + 1, i = 1 to r . Then the problem becomes finding
positive integers (y1 , ..., yr ) so that y1 + y2 + ... + yr = n + r .
46
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example:
8 identical blackboards are to be divided among 4 schools.
Q (a): How many divisions are possible?
Q (b): How many, if each school must receive at least 1 blackboard?
Instructor: Ping Li
47
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
More Examples for Chapter 1
From a group of 8 women and 6 men a committee consisting of 3 men and 3
women is to be formed.
(a): How many different committees are possible?
(b): How many, if 2 of the men refuse to serve together?
(c): How many, if 2 of the women refuse to serve together?
(d): How many, if 1 man and 1 woman refuse to serve together?
48
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution:
(a): How many different committees are possible?
8
3
6
3
×
= 56 × 20 = 1120
(b): How many, if 2 of the men refuse to serve together?
8
3
×
4
3
+
2
1
4
2
= 56 × 16 = 896
(c): How many, if 2 of the women refuse to serve together?
6
3
+
2
1
6
2
×
6
3
= 50 × 20 = 1000
(d): How many, if 1 man and 1 woman refuse to serve together?
7
3
5
7 5
7 5
3 + 3 2 + 2 3 = 350 + 350 + 210 = 910
8 6
7 5
Or 3 3 − 2 2 = 910
Instructor: Ping Li
49
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Fermat’s Combinatorial Identity
Theoretical Exercise 11.
X
n n
i−1
=
,
k
k−1
i=k
n≥k
There is an interesting proof (different from the book) using the result for the
number of integer solutions of equations, by a recursive argument.
50
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof:
Let f (m, r) be the number of positive integer solutions, i.e.,
x1 + x2 + ... + xr = m.
We know f (m, r)
Suppose m
=
m−1
r−1 .
= n + 1 and r = k + 1, Then
m−1
n
f (n + 1, k + 1) =
=
k
r−1
x1 can take values from 1 to (n + 1) − k , because xi ≥ 1, i = 1 to k + 1.
Decompose the problem into (n + 1) − k sub-problems by fixing a value for x1 .
= 1, then x2 + x3 + ... + xk+1 = m − 1 = n.
The total number of positive integer solutions is f (n, k).
If x1
51
Cornell University, BTRY 4080 / STSCI 4080
If x1
If x1
Fall 2009
= 1, there are still f (n, k)
Instructor: Ping Li
n−1
possibilities, i.e., k−1 .
= 2, there are still f (n − 1, k)
n−2
possibilities, i.e., k−1
...
If x1
= (n + 1) − k , there are still f (k, k)
k−1
possibilities, i.e., k−1
Thus, by additive counting principle,
f (n + 1, k + 1) =
n+1−k
X
j=1
But, is this what we want?
Yes!
f (n + 1 − j, k) =
n+1−k
X j=1
n−j
k−1
52
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
By additive counting principle,
f (n + 1, k + 1) =
n+1−k
X
j=1
n+1−k
X f (n + 1 − j, k)
n−j
=
k−1
j=1
n−1
n−2
k−1
=
+
+ ... +
k−1
k−1
k−1
k−1
k
n−1
=
+
+ ... +
k−1
k−1
k−1
n X
i−1
=
k−1
i=k
53
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
A company has 50 employees, 10 of whom are in management
positions.
Q:
How many ways are there to choose a committee consisting of 4 people, if
there must be at least one person of management rank?
A:
4 X
50
40
10
40
−
= 138910 =
4
4
k
4−k
k=1
———10
Can we also compute the answer using 1
49
3 = 184240? No.
10
49
It is not correct because the two sub-tasks, 1 and 3 , are overlapping; and
hence we can not simply apply the multiplicative principle.
54
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Axioms of Probability
Chapter 2, Sections 2.1 - 2.5, 2.7
Instructor: Ping Li
55
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 2.1
In this chapter we introduce the concept of the probability of an event and then
show how these probabilities can be computed in certain situations. As a
preliminary, however, we need the concept of the sample space and the events of
an experiment.
56
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sample Space and Events, Section 2.2
Definition of Sample Space:
The sample space of an experiment, denoted by S , is the set of all possible
outcomes of the experiment.
Example:
S = {girl, boy}, if the output of an experiment consists of the determination of
the gender of a newborn.
Example:
If the experiment consists of flipping two coins (H/T), the the sample space
S = {(H, H), (H, T ), (T, H), (T, T )}
57
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
If the outcome of an experiment is the order of finish in a race among 7 horses
having post positions, 1, 2, 3, 4, 5, 6, 7, then
S = {all 7! permutations of (1, 2, 3, 4, 5, 6, 7)}
Example:
If the experiment consists of tossing two dice, then
S = {(i, j), i, j = 1, 2, 3, 4, 5, 6},
What is the size |S|?
Example:
If the experiment consists of measuring (in hours) the lifetime of a transistor, then
S = {t : 0 ≤ t ≤ ∞}.
58
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Definition of Events:
Any subset E of the sample space S is an event.
Example:
S = {(H, H), (H, T ), (T, H), (T, T )},
E = {(H, H)} is an event. E = {(H, T ), (T, T )} is also an event
Example:
S = {t : 0 ≤ t ≤ ∞},
E = {t : 1 ≤ t ≤ 5} is an event,
But {t : −10 ≤ t ≤ 5} is NOT an event.
Instructor: Ping Li
59
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Set Operations on Events
Let A, B , C be events of the same sample space S
A ⊂ S, B ⊂ S, C ⊂ S.
Notation for subset:
⊂
Set Union:
A ∪ B = the event which occurs if either A OR B occurs.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1},
B = {1, 3, 5},
C = {2, 4, 6},
A ∪ B = {1, 3, 5}, B ∪ C = {1, 2, 3, 4, 5, 6},
A ∪ B ∪ C = {1, 2, 3, 4, 5, 6}
60
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Set Intersection:
A ∩ B = AB = the event which occurs if both A AND B occur.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1},
B = {1, 3, 5},
C = {2, 4, 6},
A ∩ B = {1},
B ∩ C = {} = ∅,
A∩B∩C =∅
Mutually Exclusive:
If A ∩ B
= ∅, then A and B are mutually exclusive.
61
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Set Complement:
Ac = S − A = the event which occurs only if A does not occur.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1, 3, 5, ...} = the set of all positive odd integers.
B = {2, 4, 6, ...} = the set of all positive even integers.
Ac = S − A = B ,
B c = S − B = A.
Instructor: Ping Li
62
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Laws of Set Operations
Commutative Laws:
A∪B =B∪A
A∩B =B∩A
Associative Laws:
(A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C
(A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C = ABC
Distributive Laws:
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) = AC ∪ BC
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
These laws can be verified by Venn diagrams.
Instructor: Ping Li
63
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
DeMorgan’s Laws
n
[
Ei
i=1
n
\
i=1
Ei
!c
!c
=
n
\
Eic
i=1
=
n
[
i=1
Eic
64
Cornell University, BTRY 4080 / STSCI 4080
Proof of
(
Sn
c
i=1 Ei ) =
Fall 2009
Tn
c
E
i
i=1
Sn
Tn
c
Let A = ( i=1 Ei ) and B = i=1 Eic
1. Prove A
⊆B
∈ A. =⇒ x does not belong to any Ei .
Tn
c
=⇒ x belongs to every Ei . =⇒ x ∈ i=1 Eic = B .
Suppose x
2. Proof B
⊆A
∈ B . =⇒ x belongs to every Eic .
Sn
c
=⇒ x does not belong to any Ei . =⇒ x ∈ ( i=1 Ei ) = A.
Suppose x
3. Consequently A
=B
Instructor: Ping Li
65
Cornell University, BTRY 4080 / STSCI 4080
Proof of
Let Fi
(
c
Tn
i=1 Ei ) =
= Eic .
We have proved (
Fall 2009
Sn
c
E
i=1 i
c
Sn
i=1 Fi ) =
n
\
i=1
Ei
!c
=
Instructor: Ping Li
=
Tn
i=1
n
\
Fic
i=1
n
[
i=1
Fic
Fi
!c
!c !c
=
n
[
i=1
Fi =
n
[
i=1
Eic
66
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Axioms of Probability, Section 2.3
For each event E in the sample space S , there exists a value P (E), referred to
as the probability of E .
P (E) satisfies three axioms.
Axiom 1:
0 ≤ P (E) ≤ 1
Axiom 2:
P (S) = 1
Axiom 3:
For any sequence of mutually exclusive events, E1 , E2 , ..., (that
is, Ei Ej
= ∅ whenever i 6= j ), then
P
∞
[
i=1
Ei
!
=
∞
X
i=1
P (Ei )
67
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Consequences of Three Axioms
Consequence 1:
P (∅) = 0.
P (S) = P (S ∪ ∅) = P (S) + P (∅) =⇒ P (∅) = 0
Consequence 2:
If S
Sn
= i=1 Ei and all mutually-exclusive events Ei are equally likely, meaning
P (E1 ) = P (E2 ) = ... = P (En ), then
P (Ei ) =
1
,
n
i = 1 to n
68
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Some Simple Propositions, Section 2.4
Proposition 1:
Proof:
P (E c ) = 1 − P (E)
S = E ∪ E c =⇒ 1 = P (S) = P (E) + P (E c )
Proposition 2:
If E
⊂ F , then P (E) ≤ P (F )
Proof:
E ⊂ F =⇒ F = E ∪ (E c ∩ F )
=⇒ P (F ) = P (E) + P (E c ∩ F ) ≥ P (E)
Instructor: Ping Li
69
Cornell University, BTRY 4080 / STSCI 4080
Proposition 3:
Fall 2009
Instructor: Ping Li
P (E ∪ F ) = P (E) + P (F ) − P (EF )
Proof:
E ∪ F = (E − EF )∪(F − EF )∪EF , union of three mutually exclusive sets.
=⇒ P (E ∪ F ) = P (E − EF ) + P (F − EF ) + P (EF )
= P (E) − P (EF ) + P (F ) − P (EF ) + P (EF )
= P (E) + P (F ) − P (EF )
Note that E
= EF ∪ (E − EF ), union of two mutually exclusive sets.
70
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A Generalization of Proposition 3:
P (E ∪ F ∪ G)
=P (E) + P (F ) + P (G) − P (EF ) − P (EG) − P (F G) + P (EF G)
Proof:
Let B
= F ∪ G. Then P (E ∪ B) = P (E) + P (B) − P (EB)
P (B) = P (F ∪ G) = P (F ) + P (G) − P (F G)
P (EB) = P (E(F ∪ G)) = P (EF ∪ EG) =
P (EF ) + P (EG) − P (EF EG) = P (EF ) + P (EG) − P (EF G)
71
Cornell University, BTRY 4080 / STSCI 4080
Proposition 4:
Fall 2009
Instructor: Ping Li
P (E1 ∪ E2 ∪ ... ∪ En ) = ?
Read this proposition and its proof yourself.
We will temporarily skip this proposition and relevant examples 5m, 5n, and 5o.
However, we will come back to those material at a later time.
72
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A total of 36 members of a club play tennis, 28 play squash, and 18 play
badminton. Furthermore, 22 of the members play both tennis and squash, 12
play both tennis and badminton, 9 play both squash and badminton, and 4 play all
three sports.
Q: How many members of this club play at least one of these sports?
73
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution:
A = { members playing tennis }
B = { members playing squash }
C = { members playing badminton }
|A ∪ B ∪ C| =??
Instructor: Ping Li
74
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sample Space Having Equally Likely Outcomes, Sec. 2.5
Assumption: In an experiment, all outcomes in the sample space S are equally
likely to occur.
For example, S
= {1, 2, 3, ..., N }. Assuming equally likely outcomes, then
P ({i}) =
1
, i = 1 to N
N
For any event E :
|E|
P (E) =
=
Number of outcomes in S
|S|
Number of outcomes in E
75
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example: If two dice are rolled, what is the probability that the sum of the
upturned faces will equal 7?
S = {(i, j), i, j = 1, 2, 3, 4, 5, 6}
E = {??}
76
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
If 3 balls are randomly drawn from a bag containing 6 white and 5
black balls, what is the probability that one of the drawn balls is white and the
other two black?
S = {??}
E = {??}
77
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
An urn contains n balls, of which one is special. If k of these balls
are withdrawn randomly, which is the probability that the special ball is chosen?
78
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Revisit Probabilistic Counting of Set Intersections
Consider two sets S1 , S2
⊂ Ω = {0, 1, 2, ..., D − 1}. For example,
S1 = {0, 15, 31, 193, 261, 495, 563, 919}
S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921}
One can first generate a random permutation π :
π : Ω = {0, 1, 2, ..., D − 1} −→ {0, 1, 2, ..., D − 1}
Apply π to sets S1 and S2 . For example
π (S1 ) = {13, 139, 476, 597, 698, 1355, 1932, 2532}
π (S2 ) = {13, 236, 476, 683, 798, 1456, 1968, 2532, 3983}
79
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Store only the minimums
min{π (S1 )} = 13,
min{π (S2 )} = 13
Suppose we can theoretically calculate the probability
P = Pr (min{π (S1 )} = min{π (S2 )})
S1 and S2 are more similar =⇒ larger P (i.e., closer to 1).
Generate (small) k random permutations and every times store the minimums.
Count the number of matches to estimate the similarity between S1 and S2 .
80
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Now, we can easily (How?) show that
P = Pr (min{π (S1 )} = min{π (S2 )}) =
where a
|S1 ∩ S2 |
a
=
|S1 ∪ S2 |
f1 + f2 − a
= |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |.
Therefore, we can estimate P as a binomial probability from k permutations.
How many permutations (value of k ) we need?:
Depend on the variance of this binomial distribution.
81
Cornell University, BTRY 4080 / STSCI 4080
Poker Example (2.5.5f):
Fall 2009
Instructor: Ping Li
A poker hand consists of 5 cards. If the cards have
distinct consecutive values and are not all of the same suit, we say that the hand
is a straight. What is the probability that one is dealt a straight?
A
2
3
4
5 6 7 8 9 10
J Q K A
82
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
A 5-card poker hand is a full house if it consists of 3 cards of the
same denomination and 2 cards of the same denomination. (That is, a full house
is three of a kind plus a pair.) What is the probability that one is dealt a full house?
83
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example (The Birthday Attack):
Instructor: Ping Li
If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year (365 days) ? How large need n be so that this probability is less than 12 ?
84
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Example (The Birthday Attack):
Instructor: Ping Li
If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year? How large need n be so that this probability is less than 12 ?
365 × 364 × ... × (365 − n + 1)
365n
365 364
365 − n + 1
×
× ... ×
=
365 365
365
p(n) =
p(23) = 0.4927 < 21 ,
p(50) = 0.0296
85
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
1
0.9
0.8
0.7
p(n)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
n
40
50
60
86
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Matlab Code
n = 5:60;
for i = 1:length(n)
p(i) = 1;
for j = 2:n(i);
p(i) = p(i) * (365-j+1)/365;
end;
end;
figure; plot(n,p,’linewidth’,2); grid on; box on;
xlabel(’n’); ylabel(’p(n)’);
How to improve the efficiency for computing many n’s?
87
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
A football team consists of 20 offensive and 20 defensive players.
The players are to be paired into groups of 2 for the purpose of determining
roommates. The pairing is done at random.
(a) What is the probability that there are no offensive-defensive roommate pairs?
(b) What is the probability that there are 2i offensive-defensive roommate pairs?
Denote the answer to (b) by P2i . The answer to (a) is then P0 .
88
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution Steps:
1. Total ways of dividing the 40 players into 20 unordered pairs.
2. Ways of pairing 2i offensive-defensive pairs.
3. Ways of pairing 20 − 2i remaining players within each group.
4. Applying multiplicative principle and equally likely outcome assumption.
5. Obtaining numerical probability values.
89
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution Steps:
1. Total ways of dividing the 40 players into 20 unordered pairs.
ordered
unordered
40
(40)!
=
2, 2, ..., 2
(2!)20
(40)!
(2!)20 (20)!
2. Ways of pairing 2i offensive-defensive pairs.
20
20
×
× (2i)!
2i
2i
3. Ways of pairing 20 − 2i remaining players within each group.
(20 − 2i)!
210−i (10 − i)!
2
90
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
4. Applying multiplicative principle and equally likely outcome assumption.
P2i =
2
20
2i
(2i)!
h
(20−2i)!
10−i
2
(10−i)!
40!
220 20!
i2
5. Obtaining numerical probability values.
Stirling’s approximation:
n+1/2 −n
n! ≈ n
e
√
2π
Be careful!
• Better approximations are available.
• n!
n
could easily overflow, so could k .
• The probability never overflows, P ∈ [0, 1].
• Looking for possible cancelations.
• Rearranging terms in numerator and denominator to avoid overflow.
91
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Conditional Probability and Independence
Chapter 3, Sections 3.1 - 3.5
Instructor: Ping Li
92
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Introduction, Section 3.1
Conditional Probability: one of the most important concepts in probability:
• Calculating probabilities when partial (side) information is available.
• Computing desired probabilities may be easier.
93
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A Famous Brain-Teaser
• There are 3 doors.
• Behind one door is a prize.
• Behind the other 2 is nothing.
• First, you pick one of the 3 doors at random.
• One of the doors that you did not pick will be opened to reveal nothing.
• Q: Do you stick with the door you chose in the first place, or do you switch?
94
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A Joke: Tom & Jerry Conversation
• Jerry: I’ve heard that you will take a plane next week. Are you afraid that
there might be terrist activities?
• Tom: Yes. That’s exactly why I am prepared.
• Jerry: How?
• Tom: It is reported that the chance of having a bomb in a plane is
1
106 .
According to what I learned in my probability class (4080), the chance of
having two bombs in the plane will be 10112 .
• Jerry: So?
• Tom: Therefore, to dramatically reduce the chance from
bring one bomb to the plane myself!
• Jerry: AH...
1
1
106 to 1012 , I will
95
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 3.2, Conditional Probabilities
Example: Suppose we toss a fair dice twice.
• What is the probability that the sum of the 2 dice is 8?
= {??}
Event E = {??}
Sample space S
• What if we know the first dice is 3?
(Reduced) S
= {??}
(Reduced) E
= {??}
96
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Basic Formula of Conditional Probability
Two events E,
F ⊆ S.
P (E|F ): conditional probability that E occurs given that F occurs.
If P (F )
> 0, then
P (E|F ) =
P (EF )
P (F )
97
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Basic Formula of Conditional Probability
P (E|F ) =
P (EF )
P (F )
How to understand this formula? —- Venn diagram
Given that F occurs,
• What is the new (ie, reduced) sample space S ′ ?
• What is the new (ie, reduced) event E ′ ?
• P (E|F ) =
|E ′ |
|S ′ |
=
|E ′ |/S
|S ′ |/S
=??
98
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.2.2c):
In the card game bridge the 52 cards are dealt out equally to 4 players — called
East, West, North, and South. If North and South have a total of 8 spades among
them, what is the probability that East has 3 of the remaining 5 spades?
The reduced sample space S ′
= {??}
The reduced event E ′ = {??}
99
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A transformation of the basic formula
P (E|F ) =
P (EF )
P (F )
=⇒
P (EF ) = P (E|F ) × P (F ) = P (F |E) × P (E)
P (E|F ) may be easier to compute than P (F |E), or vice versa.
100
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
A total of n balls are sequentially and randomly chosen, without replacement,
from an urn containing r red and b blue balls (n
≤ r + b).
Given that k of the n balls are blue, what is the probability that the first ball
chosen is blue?
Two solutions:
1. Working with the reduced sample space — Easy!
Basically a problem of choosing one ball from n balls randomly.
2. Working with the original sample space and basic formula — More difficult!
101
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Working with original sample space:
Event B : the first ball chosen is blue.
Event Bk : a total of k blue balls are chosen.
Desired probability:
P (BBk )
P (Bk |B)P (B)
P (B|Bk ) =
=
P (Bk )
P (Bk )
P (B) =??
P (Bk ) =??
P (Bk |B) =??
102
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Desired probability:
P (B|Bk ) =
b
P (B) =
,
r+b
P (BBk )
P (Bk |B)P (B)
=
P (Bk )
P (Bk )
P (Bk ) =
Therefore,
P (Bk |B)P (B)
P (B|Bk ) =
=
P (Bk )
b
k
r
n−k
r+b
n
b−1
r
k−1 n−k
r+b−1
n−1
P (Bk |B) =
,
b
×
×
r+b
b−1
r
k−1 n−k
r+b−1
n−1
r+b
n
b
r
k n−k
k
=
n
103
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.2.2f):
Suppose an urn contains 8 red balls and 4 white balls. We draw 2 balls from the
urn without replacement. Assume that at each draw each ball in the urn is equally
likely to be chosen.
Q: what is probability that both balls drawn are red?
Two solutions:
1. The direct approach
2. The conditional probability approach
104
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Two solutions:
1. The direct approach
8
2
12
2
14
=
33
2. The conditional probability approach
8
7
14
×
=
12 11
33
105
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Multiplication Rule
An extremely important formula in science
P (E1 E2 E3 ...En ) = P (E1 )P (E2 |E1 )P (E3 |E1 E2 )...P (En |E1 E2 ...En−1 )
Proof?
106
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.2.2g):
An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards
each.
Q: Compute the probability that each pile has exactly 1 ace.
A: Two solutions:
1. Direct approach.
2. Conditional probability (as in the textbook).
107
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution by the direct approach
• The size of sample space)
52!
52 39 26
52!
=
=
13! × 13! × 13! × 13!
(13!)4
13 13 13
• Fix 4 aces in four piles
• Only need to select 12 cards for each pile (from 48 cards total)
48 36 24
48!
=
12 12 12
(12!)4
• Consider 4! permutations
• The desired probability
48 36
12 12
52 39
13 13
24
12
26
13
48! (13!)4
× 4! =
4! = 0.1055
52! (12!)4
108
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution using the conditional probability
Define
• E1 = {The ace of spades is in any one of the piles }
• E2 = {The ace of spades and the ace of hearts are in different piles }
• E3 = {The ace of spades, hearts, and diamonds are all in different piles }
• E4 = {All four aces are in different piles}
The desired probability
P (E1 E2 E3 E4 )
=P (E1 ) P (E2 |E1 ) P (E3 |E1 E2 ) P (E2 |E1 ) P (E4 |E1 E2 E3 )
109
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• P (E1 ) = 1
• P (E2 |E1 ) =
39
51
• P (E3 |E1 E2 ) =
26
50
• P (E4 |E1 E2 E3 ) =
13
49
• P (E1 E2 E3 E4 ) =
39 26 13
51 50 49
= 0.1055.
110
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Section 3.3, Baye’s Formula
The basic Baye’s formula
P (E) = P (E|F )P (F ) + P (E|F c ) (1 − P (F ))
Proof:
• By Venn diagram. Or
• By algebra using the conditional probability formula
Instructor: Ping Li
111
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.3.3a):
An accident-prone person will have an accident within a fixed one-year period
with probability 0.4. This probability decreases to 0.2 for a non-accident-prone
person. Suppose 30% of the population is accident prone.
Q 1: What is the probability that a random person will have an accident within a
fixed one-year period?
Q 2: Suppose a person has an accident within a year. What is the probability that
he/she is accident prone?
112
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example:
Let p be the probability that the student knows the answer and 1 − p the
probability that the student guesses. Assume that a student who guesses will be
1
correct with probability m
, where m is the total number of multiple-choice
alternatives.
Q: What is the conditional probability that a student knew the answer, given that
he/she answered it correctly?
Solution:
Let C = event that the student answers the question correctly.
Let K = event that he/she actually knows the answer.
P (K|C) =??
113
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
Let C = event that the student answers the question correctly.
Let K = event that he/she actually knows the answer.
P (K|C) =
P (KC)
P (C|K)P (K)
=
P (C)
P (C|K)P (K) + P (C|K c )P (K c )
1×p
=
1
1×p+ m
(1 − p)
mp
=
1 + (m − 1)p
————
What happens when m is large?
114
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.3.3d):
A lab blood test is 95% effective in detecting a certain disease when it is
represent. However, the test also yields a false positive result for 1% of the
healthy tested. Assume 0.5% of the population actually has the disease.
Q: What is the probability that a person has the disease given that the test result
is positive?
Solution:
Let D = event that the tested person has the disease.
Let E = event that the test result is positive.
P (D|E) = ??
115
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Generalized Baye’s Formula
Suppose F1 , F2 , ..., Fn are mutually exclusive events such as
then
P (E) =
n
X
Sn
i=1
P (E|Fi )P (Fi ),
i=1
which is easy to understand from the Venn diagram.
P (EFj )
P (E|Fj )P (Fj )
P
P (Fj |E) =
= n
P (E)
i=1 P (E|Fi )P (Fi )
Fi = S ,
116
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.3.3m):
A bin contains 3 different types of disposable flashlights. The probability that a
type 1 flashlight will give over 100 hours of use is 0.7, with the corresponding
probabilities for type 2 and type 3 flashlights being 0.4 and 0.3 respectively.
Suppose that 20% of the flashlights in the bin are type 1, and 30% are type 2,
and 50% are type 3.
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j ?
117
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
0.7 × 0.2 + 0.4 × 0.3 + 0.3 × 0.5 = 0.14 + 0.12 + 0.15 = 0.41
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j ?
Type 1:
Type 2:
Type 3:
0.14
0.41
0.12
0.41
0.15
0.41
= 0.3415
= 0.3927
= 0.3659
118
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Independent Events
Definition:
Two events E and F are said to be independent if
P (EF ) = P (E)P (F )
Consequently, if E and F are independent, then
P (E|F ) = P (E),
P (F |E) = P (F ).
Independent is different from mutually exclusive!
119
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Check Independence by Verifying P (EF )
Example:
Instructor: Ping Li
= P (E)P (F )
Two coins are flipped and all 4 outcomes are equally likely. Let E be
the event that the first lands heads and F the event that the second lands tails.
Verify that E and F are independent.
Example:
Suppose we toss 2 fair dice. Let E denote the event that the sum of
two dice is 6 and F denote the event that the first die equals 4.
Verify that E and F are not independent.
120
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Proposition 4.1:
If E and F are independent, then so are E and F c .
Proof:
Given P (EF )
= P (E)P (F )
Need to show P (EF c ) = P (E)P (F c ).
Which formula contains both P (EF ) and P (EF c )?
Instructor: Ping Li
121
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Independence of Three Events
Definition:
The three events, E , F ,and G, are said to be independent if
P (EF G) = P (E)P (F )P (G)
P (EF ) = P (E)P (F )
P (EG) = P (E)P (G)
P (F G) = P (F )P (G)
Instructor: Ping Li
122
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Independence of More Than Three Events
Definition:
The events, Ei , i
= 1 to n, are said to be independent if, for every
E1 , E2 , ..., Er , r ≤ n, of these events,
P (E1 E2 ...Er ) = P (E1 )P (E2 )...P (Er )
Instructor: Ping Li
123
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.4.4f):
An infinite sequence of independent trials is to be performed. Each trial results in
a success with probability p and a failure with probability 1 − p.
Q: What is the probability that
1. at least one success occurs in the first n trials.
2. exactly k successes occur in the first n trials.
3. all trials result in successes?
124
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution:
Q: What is the probability that
1. at least one success occurs in the first n trials.
1 − (1 − p)n
2. exactly k successes occur in the first n trials.
n
k
pk (1 − p)n−k
How is this formula related to the binomial theorem?
3. all trials result in successes?
pn → 0 if p < 1.
Instructor: Ping Li
125
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (3.4.4h):
Independent trials, consisting of rolling a pair of fair dice, are performed. What is
the probability that an outcome of 5 appears before an outcome of 7 when the
outcome is the sum of the dice?
Solution 1:
Let En denote the event that no 5 or 7 appears on the first n − 1 trials and a 5
appears on the nth trial, then the desired probability is
P
∞
[
n=1
En
!
Why Ei and Ej are mutually exclusive?
=
∞
X
n=1
P (En )
126
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
P (En ) = P ( no 5 or 7 on the first n − 1 trials AND a 5 on the nth trial)
By independence
P (En ) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)
P ( a 5 on the nth trial) = P (a 5 on any trial)
P ( no 5 or 7 on the first n − 1 trials) = [P (no 5 or 7 on any trial)]
n−1
P (no 5 or 7 on any trial) = 1 − P (either a 5 OR a 7 on any trial)
P (either a 5 OR a 7 on any trial) = P (a 5 on any trial) + P (a 7 on any trial)
127
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
P (En ) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)
P (En ) =
4
6
−
1−
36 36
n−1
4
1
=
36
9
13
18
n−1
The desired probability
P
∞
[
n=1
En
!
∞
X
1 1
2
=
P (En ) =
13 = 5 .
9
1
−
18
n=1
128
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution 2 (using a conditional argument):
Let E = the desired event.
Let F = the event that the first trial is a 5.
Let G = the event that the first trial is a 7.
Let H = the event that the first trial is neither a 5 nor 7.
P (E) = P (E|F )P (F ) + P (E|G)P (G) + P (E|H)P (H)
129
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A General Result:
If E and F are mutually exclusive events of an experiment, then when
independent trials of this experiment are performed, E will occur before F with
probability
P (E)
P (E) + P (F )
130
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Gambler’s Ruin Problem (Example 3.4.4k)
Two gamblers, A and B , bet on the outcomes of successive flips of a coin. On
each flip, if the coin comes up head, A collects 1 unit from B , whereas if it comes
up tail, A pays 1 unit to B . They continue until one of them runs out of money.
If it is assumed that the successive flips of the coin are independent and each flip
results in a head with probability p, what is the probability that A ends up with all
the money if he starts with i units and B starts with N
− i units?
Intuition: What is this probability if the coin is fair (i.e., p
= 1/2)?
131
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution:
E = event that A ends with all the money when he starts with i.
H = event that first flip lands heads.
Pi = P (E) = P (E|H)P (H) + P (E|H c )P (H c )
P0 =??
PN =??
P (E|H) =??
P (E|H c ) =??
Instructor: Ping Li
132
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Pi = P (E) =P (E|H)P (H) + P (E|H c )P (H c )
=pPi+1 + (1 − p)Pi−1 = pPi+1 + qPi−1
=⇒Pi+1 =
Pi − qPi−1
q
=⇒ Pi+1 − Pi = (Pi − Pi−1 )
p
p
q
P2 − P1 = P1
p
2
q
q
P3 − P2 = (P2 − P1 ) =
P1
p
p
...
i−1
q
Pi − Pi−1 =
P1
p
133
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
"
Because PN
2
i−1 #
q
q
q
Pi =P1 1 +
+
+ ... +
p
p
p

i 
q
1
−
p


=P1 
if q 6= p

q
1− p
= 1, we know
1 − pq
P1 =
N
1 − pq
i
1 − pq
Pi =
N
1 − pq
(Do we really have to worry about q
if
q 6= p
if
q 6= p
= p = 12 ?)
134
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Matching Problem (Example 3.5.5d)
At a party n men take off their hats. The hats are then mixed up, and each man
randomly selects one. We say that a match occurs if a man selects his own hat.
Q: What is the probability of
(a) no matches;
(b) exactly k matches?
135
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (a): using recursion and conditional probabilities.
1. Let E = event that no matches occurs. Denote Pn
= P (E).
Let M = event that the first man selects his own hat.
2.
Pn = P (E) = P (E|M )P (M ) + P (E|M c )P (M c )
3. Express Pn as a function a Pn−1 , Pn−2 , etc.
4. Identify the boundary conditions, eg P1
5. Prove Pn , eg, by induction.
=?, P2 =?.
136
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (a): using recursion and conditional probabilities.
1. Let E = event that no matches occurs. Denote Pn
= P (E).
Let M = event that the first man selects his own hat.
2.
Pn = P (E) = P (E|M )P (M ) + P (E|M c )P (M c )
P (M c ) =
P (E|M ) = 0,
n−1
n ,
Pn =
n−1
c
n P (E|M )
3. Express Pn as a function a Pn−1 , Pn−2 , etc.
P (E|M c ) =
1
n−1 Pn−2
+ Pn−1 ,
4. Identify the boundary conditions.
P1 = 0, P2 = 1/2.
5. Prove Pn by induction.
Pn =
1
2!
−
1
3!
+
1
4!
− ... +
(−1)n
n! .
(What is a good approximation of Pn ?)
Pn =
n−1
n Pn−1
+ n1 Pn−2 .
137
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (b):
1. Fix a group of k men.
2. The probability that these k men select their own hats.
3. The probability that none of the remaining n − k men select their own hats.
4. Multiplicative counting principle.
5. Adjustment for selecting k men.
138
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (b):
1. Fix a group of k men.
2. The probability that these k men select their own hats.
1
1
1 1
...
n n−1 n−2
n−k+1
=
(n−k)!
n!
3. The probability that none of the remaining n − k men select their own hats.
Pn−k
4. Multiplicative counting principle.
(n−k)!
n!
× Pn−k
5. Adjustment for selecting k men.
(n−k)!
n
n! Pn−k k
139
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
P (.|F ) Is A Probability
Conditional probabilities satisfy all the properties of ordinary probabilities.
Proposition 5.1
• 0 ≤ P (E|F ) ≤ 1
• P (S|F ) = 1
• If Ei , i = 1, 2, ..., are mutually exclusive events, then
!
∞
∞
[
X
P
Ei |F =
P (Ei |F )
i=1
i=1
140
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Conditionally Independent Events
Definition:
Two events, E1 and E2 , are said to be conditionally independent given event F , if
P (E1 |E2 F ) = P (E1 |F )
or, equivalently
P (E1 E2 |F ) = P (E1 |F )P (E2 |F )
141
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Laplace’s Rule of Succession (Example 3.5.5e)
There are k + 1 coins in a box. The ith coin will, when flipped, turn up heads with
probability i/k , i
= 0, 1, ..., k . A coin is randomly selected from the box and is
then repeatedly flipped.
Q: If the first n flip all result in heads, what is the conditional probability that the
(n + 1)st flip will do likewise?
Key: Conditioning on that the ith coin is selected, the outcomes will be
conditionally independent.
142
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
Let Ci = event that the ith coin is selected.
Let Fn = event the first n flips all result in heads.
Let H = event that the (n + 1)st flip is a head.
The desired probability:
P (H|Fn ) =
k
X
i=0
By conditional independence
P (H|Fn Ci )P (Ci |Fn )
P (H|Fn Ci ) =??
Solve P (Ci |Fn ) using Baye’s formula.
143
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The desired probability:
P (H|Fn ) =
k
X
i=0
By conditional independence
P (H|Fn Ci )P (Ci |Fn )
P (H|Fn Ci ) = P (H|Ci ) =
i
k
Using Baye’s formula
P (Ci Fn )
P (Fn |Ci )P (Ci )
P (Ci |Fn ) =
=
= Pk
P (Fn )
j=0 P (Fn |Cj )P (Cj )
The final answer:
P (H|Fn ) =
j n+1
j=0 k
Pk
j n
j=0 k
Pk
i n 1
k
k+1
Pk
j n 1
j=0 k
k+1
144
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Chapter 4: (Discrete) Random Variables
Definition:
A random variable is a function from the sample space S to the real numbers.
A discrete random variable is a random variable that takes only on a finite (or at
most a countably infinite) number of values.
145
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (1a):
Suppose the experiment consists of tossing 3 fair coins. Let Y denote the
number of heads appearing, then Y is a discrete random variable taking on one
of the values 0, 1, 2, 3 with probabilities
P (Y = 0) = P {(T, T, T )} =
1
8
3
P (Y = 1) = P {(H, T, T ), (T, H, T ), (T, T, H)} =
8
P (Y = 2) = P {(T, H, H), (H, T, H), (H, H, T )} =
P (Y = 3) = P {(H, H, H)} =
1
8
Check
P
3
[
!
{Y = i}
i=0
=
3
X
i=0
P {Y = i} = 1.
3
8
146
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (1c)
Independent trials, consisting of the flipping of a coin having probability p of
coming up heads, are continuously performed until either a head occurs or a total
of n flips is made. If we let X denote the number of times the coin is flipped, then
X is a random variable taking on one of the the values 1, 2, ..., n.
P (X = 1) = P {H} = p
P (X = 2) = P {(T, H)} = (1 − p)p
P (X = 3) = P {(T, T, H)} = (1 − p)2 p
...
P (X = n − 1) = (1 − p)n−2 p
P (X = n) = (1 − p)n−1
Check
Pn
i=1
P (X = i) = 1??
147
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (1d)
3 balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black
balls. Suppose that we win $1 for each white ball selected and lose $1 for each
red ball selected. Let X denote our total winnings from the experiments.
Q 1: What possible values can X take on?
Q 2: What are the respective probabilities?
148
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution 1: What possible values can X take on?
X ∈ {3, 2, 1, 0, −1, −2, −3}.
Solution 2: What are the respective probabilities?
P (X = 0) =
5
3
+
3 3
1 1
11
3
5
1
P (X = 3) = P (X = −3) =
55
165
3
=
3
11
3
1
=
165
149
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Coupon Collection Problem (Example 1e)
There are N distinct types of coupons and each time one obtains a coupon
equally likely from one of the N types, independent of prior selections.
A random variable T = number of coupons that needs to be collected until one
obtains a complete set of at least one of each type.
Q: What is P (T
= n)?
150
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution:
1. Suffice to study P (T
> n)
because P (T = n) = P (T > n) − P (T > n − 1)
2. Define event Aj = no type j coupon is collected among first n.
3.
P (T > n) = P
Then what?
S
N
j=1
Aj
Instructor: Ping Li
151
Cornell University, BTRY 4080 / STSCI 4080

P
N
[
j=1
+

Aj  =
X
j1 <j2 <j3
Fall 2009
X
j
P (Aj ) −
Instructor: Ping Li
X
P (Aj1 Aj2 )
j1 <j2
P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN )
152
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
n
N −1
P (Aj ) =
N
n
N −2
P (Aj1 Aj2 ) =
N
...
n
N −k
P (Aj1 Aj2 ...Ajk ) =
N
...
n
1
P (Aj1 Aj2 ...AjN −1 ) =
N
P (A1 A2 ...AN ) = 0
Instructor: Ping Li
153
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009

P (T > n) = P 
+
X
N
[
j=1
Instructor: Ping Li

Aj  =
X
j
P (Aj ) −
X
P (Aj1 Aj2 )
j1 <j2
P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN )
j1 <j2 <j3
n n
N
N −2
N
N −3
=N
−
+
2
N
3
N
n
N
1
N
+ ... + (−1)
N −1
N
n
N
−1 X
N
N −j
(−1)j+1
=
j
N
j=1
N −1
N
n
154
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Cumulative Distribution Function
For a random variable X , its cumulative distribution function F is defined as
F (x) = P {X ≤ x}
F (x) is a non-decreasing function:
If x
≤ y , then F (x) ≤ F (y).
155
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.2: Discrete Random Variables
Discrete random variable
A random variable X that take on at most a
countable number of possible values.
The probability mass function
p(a) = P {X = a}
If X takes one of the values x1 , x2 , ..., then
p(xi ) ≥ 0
∞
X
i=1
p(xi ) = 1
156
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (4.2.2a): The probability mass function of a random variable X is
given by p(i)
=
cλi
i! ,
i = 0, 1, 2, ....
Q: Find
• the relation between c and λ.
• P {X = 0};
• P {X > 2};
157
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
• The relation between c and λ.
∞
X
p(i) = 1 =⇒ c
i=0
∞
X
λi
i=0
i!
= 1 =⇒ ceλ = 1 =⇒ c = e−λ .
• P {X = 0};
P {X = 0} = c = e−λ .
• P {X > 2};
P {X > 2} =1 − P {X = 0} − P {X = 1} − P {X = 2}
λ2
=1 − c − cλ − c
2
−λ
=1 − e
−λ
− λe
λ2 e−λ
−
.
2
158
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.3: Expected Value
Definition:
If X is a discrete random variable having a probability mass function p(x), the
expected value of X is defined by
E(X) =
X
xp(x)
x;p(x)>0
The expected value is a weighted average of possible values that X can take on.
159
Cornell University, BTRY 4080 / STSCI 4080
Example
Fall 2009
Instructor: Ping Li
Find E(x) where X is the outcome when we roll a fair die.
6
6
X
1
1X
7
E(X) =
i =
i=
6
6 i=1
2
i=1
Example
E(X) =??
Let I be an indicator variable for event A, if

 1
I=
 0
if A occurs
if Ac occurs
160
Cornell University, BTRY 4080 / STSCI 4080
Example (3d)
Fall 2009
Instructor: Ping Li
A school class of 120 students are driven in 3 buses to a
symphony. There are 36 students in one of the buses, 40 in another, and 44 in the
third bus. When the buses arrive, one of the 120 students is randomly chosen.
Let X denote the number of students on the bus of that randomly chosen
student, find E(X).
Solution:
X = {??}
P (X) =??
161
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (4a)
P {X = −1} = 0.2,
P {X = 0} = 0.5,
P {X = 1} = 0.2.
Find E(X 2 ).
Solution: Let Y
= X 2.
P {Y = 0} =P {X = 0} = 0.5
P {Y = 1} =P {X = 1 OR X = −1}
=P {X = 1} + P {X = −1}
=0.2 + 0.3 = 0.5
Therefore
E(X 2 ) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5
162
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Verify
X
i
P {X = xi }x2i
=P {X = −1}(−1)2 + P {X = 0}(0)2 + P {X = 1}(1)2
??
=0.5 = E(X 2 )
163
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.4: Expectation of a Function of Random Variable
Proposition 4.1
values xi , i
function g
If X is a discrete random variable that takes on one of the
≥ 1, with respective probabilities p(xi ), then for any real-valued
E (g(X)) =
X
g(xi )p(xi )
i
Proof:
Group the i’s with the same g(xi ).
164
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof:
Group the i’s with the same g(xi ).
X
g(xi )p(xi ) =
i
X
j
=
g(xi )p(xi )
i:g(xi )=yj
X
yj
X
yj P {g(X) = yj }
j
=
X
j
X
p(xi )
i:g(xi )=yj
=E(g(X)).
165
Cornell University, BTRY 4080 / STSCI 4080
Corollary 4.1
Fall 2009
Instructor: Ping Li
If a and b are constants, then
E(aX + b) = aE(X) + b.
Proof
Use the definition of E(X).
The nth moment
n
E(X ) =
X
x;p(x)>0
xn p(x)
166
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.5: Variance
The expectation E(X) measures the average of possible values of X .
The variance V ar(X) measures the variation, or spread of these values.
Definition
If X is a random variable with mean µ, then the variance of X is
defined by
V ar(X) = E (X − µ)
2
167
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
An alternative formula for V ar(X)
2
V ar(X) = E(X 2 ) − (E(X)) = E(X 2 ) − µ2
Proof:
2
V ar(X) =E (X − µ)
X
=
(x − µ)2 p(x)
x
=
X
x
=
X
x
(x2 − 2µx + µ2 )p(x)
2
x p(x) − 2µ
X
=E(X 2 ) − 2µ2 + µ
=E(X 2 ) − µ2 .
x
2
xp(x) + µ
2
X
x
p(x)
168
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Two important identities
If a and b are constants, then
V ar(a) = V ar(b) = 0
V ar(aX + b) = a2 V ar(X)
Proof: Use the definitions.
Instructor: Ping Li
169
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
E(aX + b) = aE(x) + b
V ar(aX + b) =E ((aX + b) − E(aX + b))2
=E ((aX + b) − aEX − b)2
2
=E (aX − aEX)
2
=a2 E (X − EX)
=a2 V ar(X)
170
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.6: Bernoulli and Binomial Random Variables
Bernoulli random variable
X ∼ Bernoulli(p)
P {X = 1} = p(1) = p,
P {X = 0} = p(0) = 1 − p.
Properties:
E(X) = p
V ar(X) = p(1 − p)
171
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Binomial random variable
Definition
A random variable X presents the number of total successes that
occur in n independent trials. Each trial results in a success with probability p and
a failure with probability 1 − p.
X ∼ Binomial(n, p).
The probability mass function
n i
p(i) = P {X = i} =
p (1 − p)n−i
i
• Proof?
Pn
• i=0 p(i) = 1?
172
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A very useful representation:
X ∼ binomial(n, p) can be viewed as the sum of n independent Bernoulli
random variables with parameter p.
X=
n
X
i=1
Yi ,
Yi ∼ Bernoulli(p)
173
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
Five fair coins are flipped. If the outcomes are assumed
independent, find the probability mass function of the numbers of heads obtained.
Solution:
• X = number of heads obtained.
• X ∼ binomial(n, p).
• P {X = i} =?
n=??,
p=??
174
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Properties of X
Expectation:
Variance:
Proof:??
Instructor: Ping Li
∼ binomial(n, p)
E(X) = np.
V ar(X) = np(1 − p).
175
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n X
n i
E(X) =
i
p (1 − p)n−i
i
i=0
n
X
n!
pi (1 − p)n−i
=
i
i!(n − i)!
i=0
=
n
X
i=1
=np
i
n!
pi (1 − p)n−i
i!(n − i)!
n
X
i=1
=np
n−1
X
j=0
=np
(n − 1)!
pi−1 (1 − p)(n−1)−(i−1)
(i − 1)!((n − 1) − (i − 1))!
(n − 1)!
pj (1 − p)(n−1)−j
j!((n − 1) − j)!
176
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n
n
X
X
n
n!
E(X 2 ) =
i2
pi (1 − p)n−i =
i2
pi (1 − p)n−i
i
i!(n − i)!
i=0
i=1
n
X
(n − 1)!
=np
i
pi−1 (1 − p)(n−1)−(i−1)
(i − 1)!((n − 1) − (i − 1))!
i=1
n−1
X
(n − 1)!
pj (1 − p)(n−1)−j
=np
(j + 1)
j!((n − 1) − j)!
j=0
Y ∼ binomial(n − 1, p)
=npE(Y + 1)
=np((n − 1)p + 1)
2
V ar(X) = E(X 2 ) − (E(X)) = np((n − 1)p + 1) − (np)2 = np(1 − p)
177
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
What Happens When n
X ∼ binomial(n, p),
Instructor: Ping Li
→ ∞ and np → λ?
n → ∞ and np → λ
n!
pi (1 − p)n−i
P {X = i} =
(n − i)!i!
(1 − p)n
pi
=
n(n − 1)(n − 2)...(n − i + 1)
i!
(1 − p)i
n
i
(1 − p) (np)
n(n − 1)(n − 2)...(n − i + 1)
=
i!
(1 − p)i ni
e−λ λi
≈
i!
178
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
lim
n→∞
Instructor: Ping Li
λ
1−
n
n
= e−λ
For fixed (small) i,
n(n − 1)(n − 2)...(n − i + 1)
lim
=1
n→∞
(n − λ)i
179
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.7: The Poisson Random Variable
Definition:
A random variable X , taking on one of the values 0, 1, 2, ..., is a
Poisson random variable with parameter λ, if for some λ
i
λ
p(i) = P {X = i} = e−λ ,
i!
i = 0, 1, 2, 3, ...
Check
∞
X
i=0
p(i) =
∞
X
i=0
i
−λ λ
e
> 0,
i!
= 1??
180
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Facts about Poisson Distribution
• Poisson(λ) approximates binomial (n, p), when n large and np ≈ λ.
−λ i
n i
e
λ
n−i
p (1 − p)
≈
i
i!
• Poisson is widely used for modeling rare (p ≈ 0) events (but not too rare).
– The number of misprints on a page of a book
– The number of people in a community living to 100 years of age
– ...
181
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Compare Binomial with Poisson
n = 20 p = 0.2
n = 20 p = 0.2
Binomial
Poisson
0.25
Poisson
Binomial
0.25
0.15
0.15
p(i)
0.2
p(i)
0.2
0.1
0.1
0.05
0.05
0
0
5
10
i
15
Poisson approximation is good
• n does not have to be too large.
• p does not have to be too small.
20
0
0
5
10
i
15
20
182
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Matlab Code
function comp_binom_poisson(n,p);
i=0:n; P1=binopdf(i,n,p); P2=poisspdf(i,n*p);
figure; bar(i,P1,’g’); hold on; grid on;
bar(i,P2,’r’); legend(’Binomial’,’Poisson’);
xlabel(’i’); ylabel(’p(i)’);
axis([-0.5 n 0 max(P1)+0.05]);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
figure; bar(i,P2,’r’); hold on; grid on;
bar(i,P1,’g’); legend(’Poisson’,’Binomial’);
xlabel(’i’); ylabel(’p(i)’);
axis([-0.5 n 0 max(P1)+0.05]);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
183
Cornell University, BTRY 4080 / STSCI 4080
Example
Fall 2009
Instructor: Ping Li
Suppose that the number of typographical errors on a single page
of a book has a Poisson distribution with parameter λ
= 12 .
Q: Calculate the probability that there is at least one error on this page.
P (X ≥ 1) = 1 − P (X = 0) = 1 − e−1/2 = 0.393.
184
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectation and Variance of Poisson
X ∼ P oisson(λ)
Y ∼ binomial(n, p)
X approximates Y when (1): n large, (2): p small, and (3): np ≈ λ
E(Y ) = np,
Guess E(X) = ??, V ar(X)
=??
V ar(Y ) = np(1 − p)
185
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
X ∼ P oisson(λ)
∞
X
∞
∞
i
i−1
j
X
X
λ
λ
−λ
−λ
−λ λ
E(X) =
ie
=λ
e
=λ
e
=λ
i!
(i − 1)!
j!
i=1
i=1
j=0
∞
i
i−1
X
λ
−λ λ
2
2 −λ
=λ
ie
E(X ) =
i e
i!
(i − 1)!
i=1
i=1
∞
X
=λ
∞
X
j
−λ λ
(j + 1)e
j=0
V ar(X) = λ2 + λ − λ2 = λ
j!
= λ (E(X) + 1) = λ2 + λ
186
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.8.1: Geometric Random Variable
Suppose that independent trials, each having probability p of
Definition:
being a success, are performed until a success occurs.
X is the number of trials
required. Then
P {X = n} = (1 − p)n−1 p,
Check
P∞
n=1
P {X = n} =
P∞
n=1 (1
n = 1, 2, 3, ...
− p)n−1 p = 1??
187
Cornell University, BTRY 4080 / STSCI 4080
Expectation
Fall 2009
X ∼ Geometric(p).
E(X) =
∞
X
n=1
We know
Instructor: Ping Li
n(1 − p)n−1 p
(q = 1 − p)
=p + 2qp + 3q 2 p + 4q 3 p + 5q 4 p + ...
2
3
4
=p 1 + 2q + 3q + 4q + 5q + ... =??
1 + q + q 2 + q 3 + q 4 + ... =
1
1−q
How about
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
188
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
First trick
q × I = q + 2q 2 + 3q 3 + 4q 4 + 5q 5 + ...
1
I − qI = 1 + q + q + q + q + ... =
1−q
1
=⇒I =
(1 − q)2
p
1
=⇒E(X) = pI =
=
(1 − q)2
p
2
3
4
189
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
Second trick
d
2
3
4
5
q + q + q + q + q + ...
I=
dq
d 1
1
I=
=
dq 1 − q
(1 − q)2
Instructor: Ping Li
190
Cornell University, BTRY 4080 / STSCI 4080
Variance
Fall 2009
Instructor: Ping Li
X ∼ Geometric(p).
E(X 2 ) =
∞
X
n=1
n2 (1 − p)n−1 p
2
2 2
(q = 1 − p)
2 3
2 4
=p 1 + 2 q + 3 q + 4 q + 5 q + ...
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
191
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
Trick 1
qJ = q + 22 q 2 + 32 q 3 + 42 q 4 + 52 q 5 + ...
J(1 − q) = 1 + (22 − 12 )q + (32 − 22 )q 2 + (42 − 32 )q 3 + ...
= 1 + 3q + 5q 2 + 7q 3 + ...
J(1 − q)q = q + 3q 2 + 5q 3 + 7q 4 + ...
J(1 − q)2 = 1 + 2q + 2q 2 + 2q 3 + ... = −1 +
1+q
J=
(1 − q)3
p(2 − p
2−p
E(X ) =
=
p3
p2
2
2
1+q
=
1−q
1−q
1−p
V ar(X) =
p2
192
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
Trick 2
∞
X
∞
X
d n
2 n−1
J=
n q
=
nq
dq
n=1
n=1
"∞
#
X
d
d q
n
=
nq =
E(X)
dq n=1
dq p
q d 1
q
1
=
=
p dq 1 − q
p (1 − q)2
Instructor: Ping Li
193
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.8.2: Negative Binomial Distribution
Definition:
Independent trials, each having probability p, of being a success
are performed until a total of r successes is accumulated. Let X denote the
number of trials required. Then X is a negative binomial random variable
X ∼ N B(r, p)
X ∼ Geometric(p) is the same as X ∼ N B(1, p).
What is probability mass function of X
∼ N B(r, p)?
194
Cornell University, BTRY 4080 / STSCI 4080
P {X = n} =
Fall 2009
n−1 r
p (1 − p)n−r ,
r−1
E(X k ) =
=
∞
X
n=r
∞
X
n=r
Now what?
Instructor: Ping Li
n = r, r + 1, r + 2, ...
nk
n−1 r
p (1 − p)n−r
r−1
nk
(n − 1)!
pr (1 − p)n−r
(r − 1)!(n − r)!
195
Cornell University, BTRY 4080 / STSCI 4080
E(X k ) =
∞
X
nk−1
n=r
∞
X
Fall 2009
Instructor: Ping Li
n!
pr (1 − p)n−r
(r − 1)!(n − r)!
r
n!
k−1
=
n
pr+1 (1 − p)n−r ,
p n=r
r!(n − r)!
(m = n + 1)
∞
r X
(m − 1)!
=
(m − 1)k−1
pr+1 (1 − p)m−1−r
p m=r+1
r!(m − 1 − r)!
r
= E[(Y − 1)k−1 ],
p
Y ∼ N B(r + 1, p)
E(X) =??, E(X 2 ) =??, V ar(X) =??
196
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
r
E(X k ) = E[(Y − 1)k−1 ],
p
Let k
Y ∼ N B(r + 1, p)
= 1. Then
r
r
E(X) = E[(Y − 1)0 ] =
p
p
Let k
= 2. Then
r
r
E(X ) = E[(Y − 1)] =
p
p
2
r+1
−1
p
r(1 − p)
V ar(X) = E(X ) − E X =
p2
2
2
197
Cornell University, BTRY 4080 / STSCI 4080
Example (4.8.8a)
Fall 2009
Instructor: Ping Li
An urn contains N white and M black balls. Balls are
randomly selected, one at a time, until a black one is obtained. Each selected ball
is replaced before the next one is drawn.
Q(a): What is the probability that exactly n draws are needed?
Q(b): What is the expected number of draws needed?
Q(c): What is the probability that at least k draws are needed?
198
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution: This is a geometric random variable with
p=
M
M +N .
Q(a): What is the probability that exactly n draws are needed?
P (X = n) = (1 − p)
n−1
p=
N
M +N
n−1
M
M N n−1
=
M +N
(M + N )n
Q(b): What is the expected number of draws needed?
1
M +N
N
E(X) = =
=1+
p
M
M
Q(c): What is the probability that at least k draws are needed?
P (X ≥ k) =
∞
X
n=k
(1 − p)n−1 p =
k−1
p(1 − p)
=
1 − (1 − p)
N
M +N
k−1
199
Cornell University, BTRY 4080 / STSCI 4080
Example (4.8.8e):
Fall 2009
Instructor: Ping Li
A pipe-smoking mathematician carries 2 matchboxes, 1 in
the left-hand pocket and 1 in his right-hand pocket. Each time he needs a match
he is equal likely to take it from either pocket. Consider the moment when the
mathematician first discovers that one of his matchboxes is empty. If it is assumed
that both matchboxes initially contained N matches, what is the probability that
there are exactly k matches in the other box, k
= 0, 1, ..., N ?
200
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
1. Assume the left-hand pocket is empty.
2.
X = total matches taken at the moment when left-hand pocket is empty.
3. View X as a random variable X
∼ N B(r, p). What is r , p?
4. Remember there are k matches remaining in the right-hand pocket.
5. The desired probability = P {X
= n}. What is n?
6. Adjustment by considering either left-hand or right-hand can be empty.
201
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
1. Assume the left-hand pocket is empty.
2.
X = total matches taken at the moment when left-hand pocket is empty.
3. View X as a random variable X
∼ N B(r, p). What is r , p?
p = 21 , r = N + 1.
4. Remember there are k matches remaining in the right-hand pocket.
5. The desired probability = P {X
= n}. What is n?
n = 2N − k + 1.
6. Adjustment by considering either left-hand or right-hand can be empty.
2
2N −k
N
1 2N −k+1
.
2
202
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.8.3: Hypergeometric Distribution
Definition
A sample of size n is chosen randomly without replacement from
− m are black. Let X
denote the number of white balls selected, then X ∼ HG(N, m, n)
an urn containing N balls, of which m are white and N
P {X = i} =
m
i
N −m
n−i
N
n
,
i = 0, 1, ..., n.
n − (N − m) ≤ i ≤ min(n, m)
203
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
If the n balls are chosen with replacement, then it is binomial
when m and N are large relative to n.
m
X ∼ HG(N, m, n) ≈ binomial n,
N
nm
E(X) ≈ np =
??
N
??
nm m
1−
??
V ar(X) ≈ np(1 − p) =
N
N
??
n, p =
m
N .
204
Cornell University, BTRY 4080 / STSCI 4080
E(X k ) =
n
X
i
i=1
=
n
X
ik−1
i=1
n−1
X
m
k i
Fall 2009
Instructor: Ping Li
205
N −m
n−i
N
n
(N − m)!
n!(N − n)!
m!
(i − 1)!(m − i)! (n − i)!(N − m − n − i)!
N!
m!
(N − m)!
n!(N − n)!
j!(m − 1 − j)! (n − 1 − j)!(N − m − n − 1 − j)!
N!
j=0
nm k−1
=
E (Y + 1)
,
Y ∼ HG(N − 1, m − 1, n − 1)
N
=
(j + 1)k−1
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 4.9: Properties of Cumulative Distribution Function
F (x) = P {X ≤ x}
1.
F (x) is a non-decreasing function: If a < b, then F (a) ≤ F (b)
2.
limx→∞ = 1
3.
limx→−∞ = 0
4.
F (x) is right continuous.
206
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Chapter 5: Continuous Random Variables
X is a continuous random variable if there exists a non-negative
function f , defined for all real x ∈ (−∞, ∞), having the property that for any
set B of real numbers,
Z
P {X ∈ B} =
f (x)dx
Definition:
B
The function f is called the probability density function of the random variable X .
207
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Some Facts
• P {X ∈ (−∞, ∞)} =
R∞
−∞
f (x)dx = 1
• P {X ∈ (a, b)} = P {a ≤ X ≤ b} =
• P {X = a} =
Ra
a
• f (x) =
d
dx F (x)
a
f (x)dx
f (x)dx = 0
• F (a) = P {X ≤ a} = P {X < a} =
• F (∞) = 1,
Rb
F (−∞) = 0
Ra
−∞
f (x)dx
208
Cornell University, BTRY 4080 / STSCI 4080
Example 5.1.1a:
Fall 2009
Suppose that X is a continuous random variable whose
probability density function is given by

 C(4x − 2x2 ) 0 < x < 2
f (x) =
 0
otherwise
Q(a): What is the value of C ?
Q(b): Find P {X
Instructor: Ping Li
> 1}.
209
Cornell University, BTRY 4080 / STSCI 4080
Example 5.1.1d:
Fall 2009
Instructor: Ping Li
Denote FX and fX the cumulative function and the
density function of a continuous random variable X , respectively. Find the density
function of Y
= 2X .
210
Cornell University, BTRY 4080 / STSCI 4080
Example 5.1.1d:
Fall 2009
Instructor: Ping Li
Denote FX and fX the cumulative function and the
density function of a continuous random variable X , respectively. Find the density
function of Y
= 2X .
Solution:
FY (y) = P {Y ≤ y} = P {2X ≤ y} = P {X ≤ y/2} = FX (y/2)
d
d
1
fY (x) =
FY (y) =
FX (y/2) = fX (y/2)
dy
dy
2
211
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Derivatives of Common Functions
http://en.wikipedia.org/wiki/Derivative
• Exponential and logarithmic functions
d x
d x
x
x
e
=
e
,
a
=
a
log a,
dx
dx
d
1
d
1
log
x
=
,
log
x
=
a
dx
x
dx
x log a
• Trigonometric functions
d
dx sin x = cos x,
d
dx
cos x = − sin x,
(Notation:
log x = loge x)
d
dx
tan x = sec2 x
• Inverse trigonometric functions
d
d
√ 1
√ 1
arcsin
x
=
,
arccos
x
=
−
,
dx
dx
1−x2
1−x2
d
dx
arctan x =
1
1+x2 ,
212
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Integration by Parts
Z
b
Z
b
Z
b
b
f (x)dx = f (x)x|a −
a
f (x)d[g(x)] =
a
f (x)h(x)dx =
a
where h(x)
Z
b
xf ′ (x)dx
a
f (x)g(x)|ba
Z
= H ′ (x).
−
Z
b
g(x)f ′ (x)dx
a
b
f (x)d[H(x)] =
a
b
f (x)H(x)|a
−
Z
b
H(x)f ′ (x)dx
a
213
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Examples:
Z
Z
a
b
b
log xdx
a
=x log x|ba
−
Z
b
′
x [log x] dx
a
b
1
=b log b − a log a −
x dx
x
a
=b log b − a log a − b + a.
b
Z
1
1
log xd[x ] = x2 log x|ba −
2
2
a
Z
1 2
1 2
1 b
= b log b − a log a −
xdx
2
2
2 a
1 2
1 2
1 2
2
= b log b − a log a −
b −a
2
2
4
1
x log xdx =
2
Z
2
Z
b
a
′
x2 [log x] dx
214
Cornell University, BTRY 4080 / STSCI 4080
Z
Z
Z
b
b
xn ex dx =
a
b
xn cos xdx =
a
Z
Fall 2009
Z
b
Instructor: Ping Li
xn d[ex ] =
a
b
xn d[sin x] =
a
n
sin x sin 2xdx =2
a
Z
b
xn ex |ba
b
xn sin x|a
n
b
2
n+2
b
n+2 sin
x
a
ex xn−1 dx
a
−n
sin x sin x cos xdx = 2
a
=
−n
Z
Z
Z
b
sin x xn−1 dx
a
b
a
sinn+1 xd[sin x]
215
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectation
Definition:
Given a continuous random variable X with density f (x), its
density is defined as
E(X) =
Z
∞
−∞
xf (x)dx
216
Cornell University, BTRY 4080 / STSCI 4080
Example:
Fall 2009
Instructor: Ping Li
The density of X is given by
X
Find E e .

 2x if 0 ≤ x ≤ 1
f (x) =
 0
otherwise
Two solutions:
1.
• Find the density function of Y = eX , denoted by fY (y).
R∞
• E(Y ) = −∞ yfY (y)dy .
2. Apply the formula
E [g(X)] =
Z
∞
−∞
g(x)f (x)dx
217
Cornell University, BTRY 4080 / STSCI 4080
Solution 1:
Fall 2009
Instructor: Ping Li
Y = eX .
X
FY (y) =P {Y ≤ y} = P e ≤ y
Z log y
2ydy = log2 y
=P {X ≤ log y} =
0
Therefore,
2
fY (y) = log y,
1≤y≤e
y
Z ∞
Z e
E(Y ) =
yfY (y)dy =
2 log ydy
−∞
=
e
2y log y|1
1
−
Z
1
e
1
2y dy = 2
y
218
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution 2:
X
E e =
=
Z
1
2xex dx =
0
1
2xex |0
−2
Z
Z
1
2xdex
0
1
ex dx
0
=2e − 2(e − 1) = 2
219
Cornell University, BTRY 4080 / STSCI 4080
Lemma 5.2.2.1
Fall 2009
Instructor: Ping Li
For a non-negative random variable X ,
E(X) =
Z
∞
0
P {X > x}dx
220
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
For a non-negative random variable X ,
Lemma 5.2.2.1
E(X) =
Z
∞
0
P {X > x}dx
Proof:
Z
∞
0
P {X > x}dx =
Z
∞
0
Z
∞Z
Z
∞
fX (y)dydx
x
y
fX (y)dxdy
0
0
Z y Z ∞
=
fX (y)
dx dy
0
0
Z ∞
=
fX (y)ydy = E(X)
=
0
221
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
X is a continuous random variable.
• E(aX + b) = aE(X) + b, where a and b are constants.
2
• V ar(X) = E (X − E(X)) = E(X 2 ) − E 2 (X).
• V ar(aX + b) = a2 V ar(X)
Instructor: Ping Li
222
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 5.3: The Uniform Random Variable
Definition:
X is a uniform random variable on the interval (a, b) if its
probability density function is given by
f (x) =
Denote X


1
b−a
 0
if
a<x<b
otherwise
∼ U nif orm(a, b), or simply X ∼ U (a, b).
223
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Cumulative function
F (x) =
If X



 0



if
x≤a
x−a
b−a
if
a<x<b
1
if
x≥b
∼ U (0, 1), then F (x) = x, if 0 ≤ x ≤ 1.
Expectation
E(X) =
a+b
2
Variance
(b − a)2
V ar(X) =
12
224
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 5.4: Normal Random Variable
Definition:
given by
X is a normal random variable, X ∼ N µ, σ
2
1
− (x−µ)
f (x) = √
e 2σ2 ,
2πσ
2
−∞ < x < ∞
if its density is
225
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• If X ∼ N (µ, σ 2 ), then E(X) = µ and V ar(X) = σ 2 .
• The density function is a “bell-shaped” curve.
• It was first introduced by French Mathematician Abraham DeMoivre in 1733,
to approximate the binomial when n was large.
• The distribution became truly famous because of Karl F. Gauss
and hence its another common name is “Gaussian distribution.”
• It is called “normal distribution” (probably started by Karl Pearson) because
“most people” believed that it was “normal” for any well-behaved data set to
follow this distribution.
226
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Check that
Z
∞
f (x)dx =
−∞
Note that, by letting
Z
∞
−∞
z=
∞
−∞
(x−µ)2
1
− 2σ2
√
e
dx = 1.
2πσ
x−µ
σ ,
2
1
− (x−µ)
√
e 2σ2 dx =
2πσ
It suffices to show that I 2
Z
∞
−∞
1 − z2
√ e 2 dz = I.
2π
= 1.
∞
1
1 − w2
2
√ e
√ e 2 dw
I =
dz ×
2π
2π
−∞
−∞
Z ∞Z ∞
1 − z2 +w2
2
=
e
dzdw
2π
−∞ −∞
2
Z ∞ Z 2π
Z 2π Z ∞
2
2
r
r
1
1
r
=
e− 2 rdθdr =
dθ
e− 2 d
= 1.
2π 0
2π 0
2
0
0
Z
∞
Z
2
− z2
Z
227
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectation and Variance
If X
∼ N (µ, σ 2 ), then E(X) = µ and V ar(X) = σ 2 .
∞
2
1
− (x−µ)
2
E(X) =
x√
e 2σ dx
2πσ
−∞
Z ∞
Z ∞
2
(x−µ)2
1
1
− 2σ2
− (x−µ)
=
(x − µ) √
e
dx +
µ√
e 2σ2 dx
2πσ
2πσ
−∞
−∞
Z ∞
2
1
− (x−µ)
=
(x − µ) √
e 2σ2 d[x − µ] + µ
2πσ
−∞
Z ∞
z2
1
− 2σ
2 dz + µ
=
z√
e
2πσ
−∞
=µ.
Z
228
Cornell University, BTRY 4080 / STSCI 4080
2
Fall 2009
2
Instructor: Ping Li
V ar(X ) =E (X − µ)
Z ∞
(x−µ)2
1
− 2σ2
2
=
(x − µ) √
e
d[x − µ]
2πσ
−∞
Z ∞
z2
1
− 2σ
2
2
e
=
z √
dz
2πσ
−∞
Z ∞ 2
z
1 − z22 h z i
2
√ e 2σ d
=σ
2
σ
2π
−∞ σ
Z ∞
2
− t2
2
2 1
=σ
t √ e
d [t]
2π
−∞
2
Z ∞
t2
1
t
2
− 2
t√ e
d
=σ
2
2π
−∞
Z ∞
h t2 i
1
2
=σ
−t √ d e− 2
2π
−∞
Z ∞
∞
t2
t2 1
1
√ e− 2 dt − √
=σ 2
σ 2 te− 2 = σ2
−∞
2π
2π
−∞
229
Cornell University, BTRY 4080 / STSCI 4080
2
− t2
Why limt→∞ te
2
− t2
lim te
t→∞
Fall 2009
Instructor: Ping Li
= 0??
= lim
t→∞
t
et2 /2
[t]′
1
= lim 2 ′ = lim t2 /2 = 0
t→∞ et /2
t→∞ te
230
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Normal Distribution Approximates Binomial
Suppose X
∼ Binomial(n, p). For fixed p, as n → ∞
Binomial(n, p) ≈ N (µ, σ 2 ),
µ = np, σ 2 = np(1 − p).
Instructor: Ping Li
231
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n = 10 p = 0.2
0.35
Density (mass) function
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
x
5
6
7
8
9
10
232
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n = 20 p = 0.2
0.25
Density (mass) function
0.2
0.15
0.1
0.05
0
−10
−5
0
5
10
x
15
20
25
233
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n = 50 p = 0.2
0.16
0.14
Density (mass) function
0.12
0.1
0.08
0.06
0.04
0.02
0
−10
−5
0
5
10
x
15
20
25
30
234
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n = 100 p = 0.2
0.1
0.09
Density (mass) function
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
−10
0
10
20
x
30
40
50
235
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
n = 1000 p = 0.2
0.035
Density (mass) function
0.03
0.025
0.02
0.015
0.01
0.005
0
100
150
200
x
250
300
236
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
237
Matlab code
function NormalApprBinomial(n,p);
mu = n*p; sigma2 = n*p*(1-p);
figure;
bar((0:n),binopdf(0:n,n,p),’g’);hold on; grid on;
x = mu - 3*sigma2:0.001:mu+3*sigma2;
plot(x,normpdf(x,mu,sqrt(sigma2)),’r-’,’linewidth’,2);
xlabel(’x’);ylabel(’Density (mass) function’);
title([’n = ’ num2str(n) ’
p = ’ num2str(p)]);
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Standard Normal Distribution
Z ∼ N (0, 1) is called the standard normal.
• E(Z) = 0, V ar(Z) = 1.
2
• If X ∼ N µ, σ , then Y =
x−µ
σ
∼ N (0, 1).
• Z ∼ N (0, 1). Denote density function fZ (z) by φ(z)
and cumulative function FZ (z) by Φ(z).
Z z
2
1 −z
1 − z2
√ e 2 dz
φ(z) = √ e 2 ,
Φ(z) =
2π
2π
−∞
• φ(−x) = φ(x), Φ(−x) = 1 − Φ(x).
Numerical table on Page 222.
238
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
X ∼ N (µ = 3, σ 2 = 9). Find
Example 5.4.4b:
• P {2 < X < 5}
• P {X > 0}
• P {|X − 3| > 6}.
Solution:
Let Y
=
X−µ
σ
=
X−3
3 . Then
Y ∼ N (0, 1).
2−3
X −3
5−3
<
<
3
3
3
1
2
2
1
=Φ
−Φ −
=P − < Y <
3
3
3
3
=Φ (0.6667) − 1 + Φ (0.3333)
P {2 < X < 5} =P
≈0.7475 − 1 + 0.6306 ≈ 0.3781
239
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 5.5: Exponential Random Variable
Definition:
An exponential random variable X has the probability density
function given by

 λe−λx
f (x) =
 0
if
if
x≥0
x<0
(λ > 0)
X ∼ EXP (λ).
Cumulative function:
F (x) =
F (∞) =??
Z
x
0
λe−λx dx = 1 − e−λx ,
(x ≥ 0).
240
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Moments:
E(X n ) =
Z
∞
xn λe−λx dx
0
Z
∞
=−
xn de−λx
0
Z ∞
=
e−λx nxn−1 dx
0
n
= E(X n−1 )
λ
E(X) = λ1 ,
E(X 2 ) =
2
λ2 ,
V ar(X) =
1
λ2 ,
E(X n ) =
n!
λn ,
241
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Applications of Exponential Distribution
• The exponential distribution often arises as being the distribution of the
amount of time until some specific event occurs.
• For instance, the amount of time (starting from now) until an earthquake
occurs, or until a new war breaks out, etc.
• The exponential distribution is memoryless.
P {X > s + t|X > t} = P {X > s}, for all s, t ≥ 0
242
Cornell University, BTRY 4080 / STSCI 4080
Example 5b:
Fall 2009
Instructor: Ping Li
Suppose that the length of a phone call in minutes is an
exponential random variable with parameter λ
=
1
10 . If someone arrives
immediately ahead of you at a public telephone booth, find the probability that you
will have to wait
• (a) more than 10 minutes;
• (b) between 10 and 20 minutes.
Solution:
• P {X > 10} = e−λ10 = e−1 = 0.368.
• P {10 < X < 20} = e−1 − e−2 = 0.233
243
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 5.6.1: The Gamma Distribution
Definition:
A random variable has a gamma distribution with parameters
(α, λ), where α, λ > 0, if its density function is given by
 −λx
 λe (λx)α−1 , x ≥ 0
Γ(α)
f (x) =
 0
x<0
Denote X
R∞
0
∼ Gamma(α, λ).
f (x)dx = 1??
244
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Gamma function is defined as
Γ(α) =
Γ(1) = 1.
Z
∞
0
Γ(0.5) =
√
e−y y α−1 dy = (α − 1)Γ(α − 1).
π.
Γ(n) = (n − 1)!.
245
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Connection to exponential distribution
• When α = 1, the gamma distribution reduces to an exponential distribution.
Gamma(α = 1, λ) = EXP (λ).
• A gamma distribution Gamma(n, λ) often arises as the distribution of the
amount of time one has to wait until a total of n events has occurred.
246
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
X ∼ Gamma(α, λ).
Expectation and Variance:
E(X) =
α
λ,
V ar(X) =
E(X) =
Z
α
λ2 .
∞
0
∞
λe−λx (λx)α−1
dx
x
Γ(α)
e−λx (λx)α
=
dx
Γ(α)
0
Z
Γ(α + 1) 1 ∞ e−λx (λx)α
λ
dx
=
Γ(α) λ 0
Γ(α + 1)
Γ(α + 1) 1
=
Γ(α) λ
α
= .
λ
Z
Instructor: Ping Li
247
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 5.6.4: The Beta Distribution
Definition:
A random variable X has a beta distribution if its density is given
by
f (x) =
Denote X


1
a−1
x
(1
B(a,b)
 0
− x)b−1
0<x<1
otherwise
∼ Beta(a, b).
The Beta function
Γ(a)Γ(b)
B(a, b) =
=
Γ(a + b)
Z
1
0
xa−1 (1 − x)b−1 dx
248
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
X ∼ Beta(a, b).
Expectation and Variance:
a
,
E(X) =
a+b
ab
V ar(X) =
(a + b)2 (a + b + 1)
Instructor: Ping Li
249
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sec. 5.7: Distribution of a Function of a Random Variable
Given the density function fX (x) and cumulative function FX (x) for a random
variable X , what is the general formula for the density function fY (y) of the
random variable Y
Example 5.7.7a:
= g(X)?
X ∼ U nif orm(0, 1). Y = g(X) = X n .
FY (y) = P {Y ≤ y} = P {X n ≤ y} = P {X ≤ y 1/n } = FX (y 1/n ) = y 1/n
1
fY (y) = y 1/n−1 ,
n
=fX
0≤y≤1
d −1 −1
g (y) g (y)
dy
y = g(x) = xn =⇒ g −1 (y) = y 1/n .
250
Cornell University, BTRY 4080 / STSCI 4080
Theorem 7.1:
Fall 2009
Instructor: Ping Li
Let X be a continuous random variable having probability
density function fX . Suppose g(x) is a strictly monotone (increasing or
decreasing), differentiable function of x. Then the random variable Y
= g(X)
has a probability density function given by

 fX g −1 (y) d g −1 (y)
dy
fY (y) =
 0
if y
= g(x) for some x
if y
6= g(x) for all x
251
Cornell University, BTRY 4080 / STSCI 4080
Example 7b:
Fall 2009
Y = X2
2
FY (y) = P {Y ≤ y} = P X ≤ y
√
√
=P {− y ≤ X ≤ y}
√
√
=FX ( y) − FX (− y)
1
√
√
fY (y) = √ (fX ( y) + fX (− y))
2 y
Does this example violate Theorem 7.1?
Instructor: Ping Li
252
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (Cauchy distribution): Suppose a random variable X has density
function
1
f (x) =
,
π(1 + x2 )
−∞ < x < ∞
1
Q: Find the distribution of X
.
R∞
One can check −∞ f (x)
= 1, but E(X) = ∞.
253
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution: Denote the cumulative function of X by FX (x).
Let Z
= 1/X . Assume z > 0,
FZ (z) =P {1/X ≤ z} = P {X ≤ 0} + P {X ≥ 1/z, X ≥ 0}
Z ∞
1
1
= +
dx
2
2
1/z π(1 + x )
∞
π
1
1
1
1
− tan−1 (1/z)
= + tan−1 (z)
= +
2
π
2 π 2
1/z
Therefore, for z
> 0,
i′
1 hπ
1
fZ (z) =
− tan−1 (1/z) =
.
2
π 2
π(1 + z )
254
Cornell University, BTRY 4080 / STSCI 4080
Assume z
Fall 2009
< 0,
FZ (z) =P {1/X ≤ z} = P {X ≥ 1/z, X ≤ 0}
Z 0
1
=
dx
2
1/z π(1 + x )
0
1
1
−1
−1
= tan (z)
=
0 − tan (1/z)
π
π
1/z
Therefore, for z
< 0,
′
1
1
−1
fZ (z) =
0 − tan (1/z) =
.
2
π
π(1 + z )
Thus, Z and X have the same density function.
Instructor: Ping Li
255
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Chapter 6: Jointly Distributed Random Variables
Definition:
For any two random variables X and Y , the joint cumulative
probability distribution is defined as
F (x, y) = P {X ≤ x, Y ≤ y} ,
−∞ < x, y < ∞
256
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
F (x, y) = P {X ≤ x, Y ≤ y} ,
Instructor: Ping Li
−∞ < x, y < ∞
The distribution of X is
FX (x) =P {X ≤ x}
=P {X ≤ x, Y ≤ ∞}
=F (x, ∞)
The distribution of Y is
FY (y) =P {Y ≤ y} = F (∞, y)
Example:
P {X > x, Y > y} = 1 − FX (x) − FY (y) + F (x, y)
257
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Jointly Discrete Random Variables
X and Y are both discrete random variables.
Joint probability mass function
XX
p(x, y) = P {X = x, Y = y},
x
p(x, y) = 1
y
Marginal probability mass function
pX (x) = P {X = x} =
pY (y) = P {Y = y} =
X
p(x, y)
X
p(x, y)
y:p(x,y)>0
x:p(x,y)>0
258
Cornell University, BTRY 4080 / STSCI 4080
Example 1a:
Fall 2009
Instructor: Ping Li
Suppose that 3 balls are randomly selected from an urn
containing 3 red, 4 white, and 5 blue balls. Let X and Y denote, respectively, the
number of red and white balls chosen.
Q: The joint probability mass function p(x, y)
= P {X = x, Y = y}=??
259
Cornell University, BTRY 4080 / STSCI 4080
p(0, 0) =
p(0, 1) =
p(0, 2) =
p(0, 3) =
Fall 2009
5
10
3
12 = 220
3
4 5
40
1 2
=
12
220
3
4 5
30
2 1
=
12
220
3
4
4
3
12 = 220
3
84
pX (0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) =
220
9
84
3
=
pX (0) = 12
220
3
Instructor: Ping Li
260
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Jointly Continuous Random Variables
X and Y are jointly continuous if there exists a function f (x, y) defined for all
real x and y , having the property that for every set C of pairs of real numbers
P {(X, Y ) ∈ C} =
Z Z
f (x, y)dxdy
(x,y)∈C
The function f (x, y) is called the joint probability density function.
261
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The joint cumulative function
F (x, y) = P {X ∈ (−∞, x], Y ∈ (−∞, y]}
Z y Z x
=
f (x, y)dxdy
−∞
−∞
The joint probability density function
∂2
F (x, y)
f (x, y) =
∂x∂y
The marginal probability density functions
fX (x) =
Z
∞
−∞
f (x, y)dy,
fY (y) =
Z
∞
−∞
f (x, y)dx
262
Cornell University, BTRY 4080 / STSCI 4080
Example 1c:
Fall 2009
Instructor: Ping Li
The joint density function is given by

 Ce−x e−2y
f (x, y) =
 0
• (a) Determine C .
• (b) P {X > 1, Y < 1},
• (c) P {X < Y },
• (d) P {X < a}.
0 < x < ∞, 0 < y < ∞
otherwise
263
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (a):
Z
∞
Z
∞
Ce−x e−2y dxdy
0
0
Z ∞
Z ∞
=C
e−x dx
e−2y dy
1=
0
=C ×
Therefore, C
0
e−x |0∞
1 −2y 0
C
× e
|∞ =
2
2
= 2.
Solution (b):
P {X > 1, Y < 1} =
Z
1
Z
0
1
=e−x |1∞
∞
2e−x e−2y dxdy
e−2y |01 = e−1 (1 − e−2 )
264
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (c):
P {X < Y } =
Z Z
2e−x e−2y dxdy =
(x,y):x<y
Z ∞Z y
2e−x e−2y dxdy
0
0
Z ∞ Z y
=
e−x dx 2e−2y dy
0
0
Z ∞
−2y
−y
=
1−e
2e
dy
0
Z ∞
=
2e−2y − 2e−3y dy
=
0
2
1
=1 − =
3
3
265
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (d):
P {X < a} =P {X < a, Y < ∞}
Z aZ ∞
=
2e−2y e−x dydx
0
0
Z a
e−x dx
=
0
=1 − e−a
266
Cornell University, BTRY 4080 / STSCI 4080
Example 1e:
Fall 2009
Instructor: Ping Li
The joint density is given by

 e−(x+y)
f (x, y) =
 0
0 < x < ∞, 0 < y < ∞
otherwise
Q: Find the density function of the random variable X/Y .
267
Cornell University, BTRY 4080 / STSCI 4080
Solution: Let Z
Fall 2009
Instructor: Ping Li
= X/Y , 0 < Z < ∞.
FZ (z) =P {Z < z} = P {X/Y ≤ z}
Z Z
=
e−(x+y) dxdy
x/y≤z
=
Z
∞
0
=
Z
∞
Z
yz
0
−y
e
0
Z
∞
e−(x+y) dx dy
e−x |0yz
dy
e−y − e−y−yz dy
0
−y−yz ∞
e
=1− 1
=e−y |0∞ +
z + 1 0
z+1
=
h
Therefore, fZ (z) = 1 −
1
z+1
i′
=
1
(z+1)2 .
268
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
FZ (z) =P {Z < z} = P {X/Y ≤ z}
Z Z
=
e−(x+y) dxdy
x/y≤z
=
∞
Z
0
=
∞
Z
0
=
Z
∞
"Z
∞
#
e−(x+y) dy dx
x
z
h
i
x
z
e−x e−y |∞
dy
x
e−x− z dy
0
=
0
−x− x
z e
1+
1
z
∞
=
1
1+
1
z
=1−
1
z+1
269
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Multiple Jointly Distributed Random Variables
Joint cumulative distribution function
F (x1 , x2 , x3 , ..., xn ) = P {X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn }
Suppose X1 , X2 , ..., Xn are discrete random variables.
Joint probability mass function
p(x1 , x2 , ..., xn ) = P {X1 = x1 , X2 = x2 , ..., Xn = xn }
270
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
X1 , X2 , ..., Xn are jointly continuous if there exists a function f (x1 , x2 , ..., fn )
such that for any set C in n-dimensional space
Z Z
Z
f (x1 , x2 , ..., xn )dx1 dx2 ...dxn
P {(X1 , X2 , ..., Xn ) ∈ C} =
...
(x1 ,x2 ,...,xn )∈C
f (x1 , x2 , ..., xn ) is the joint probability density function.
271
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 6.2: Independent Random Variables
The random variable X and Y are said to be independent if for
Definition:
any two sets of real numbers A and B
P {X ∈ A, Y ∈ B} = P {X ∈ A} × P {Y ∈ B}
equivalently
P {X ≤ x, Y ≤ y} = P {X ≤ x} × P {Y ≤ y}
or, equivalently,



 p(x, y) = pX (x)pY (y)



f (x, y) = fX (x)fY (y)
for all x, y
when X and Y are discrete
for all x, y
when X and Y are continous
272
Cornell University, BTRY 4080 / STSCI 4080
Example 2b:
Fall 2009
Instructor: Ping Li
Suppose the number of people that enter a post office on a
given day is a Poisson random variable with parameter λ.
Each person that enters the post office is male with probability p and a female
with probability 1 − p.
Let X and Y , respectively, be the number of males and females that enter the
post office.
• (a): Find the joint probability mass function P {X = i, Y = j}.
• (b): Find the marginal probability mass functions P {X = i}, P {Y = j}.
• (c): Show that X and Y are independent.
273
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Key: Suppose we know there are N
= n people that enter a post office. Then
the number of males is a binomial with parameters (n, p).
[X|N = n] ∼ Binomial (n, p)
[N = X + Y ] ∼ P oi(λ).
Therefore,
P {X = i, Y =} =
∞
X
n=0
P {X = i, Y = j|N = n}P {N = n}
=P {X = i, Y = j|N = i + j}P {N = i + j}
because
P {X = i, Y = j|N 6= i + j} = 0.
274
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Solution (a) Conditional on X
Instructor: Ping Li
+ Y = i + j:
P {X = i, Y = j}
=P {X = i, Y = j|X + Y = i + j}P {X + Y = i + j}
i+j
i+j i
λ
=
p (1 − p)j e−λ
(i + j)!
i
pi (1 − p)j e−λ λi λj
=
i!j!
pi (1 − p)j e−λ(p+(1−p)) λi λj
=
i!j!
i
j
(λp)
(λ(1
−
p))
= e−λp
e−λ(1−p)
i!
j!
275
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution (b)
P {X = i} =
∞
X
j=0
P {X = i, Y = j}
∞ X
i
j
(λp)
(λ(1
−
p))
=
e−λp
e−λ(1−p)
i!
j!
j=0
∞ i X
j
(λp)
(λ(1 − p))
= e−λp
e−λ(1−p)
i!
j!
j=0
i
−λp (λp)
= e
i!
276
Cornell University, BTRY 4080 / STSCI 4080
Example 2c:
Fall 2009
Instructor: Ping Li
A man and a woman decide to meet at a certain location. Each
person independently arrives at a time uniformly distributed between 12 noon and
1 pm.
Q: Find the probability that the first to arrive has to wait longer than 10 minutes.
Solution:
Let X = number of minutes by which one person arrives later than 12 noon.
Let Y = number of minutes by which the other person arrives later than 12 noon.
X and Y are independent.
The desired probability = P {|X
− Y | ≥ 10}.
277
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
P {|X − Y | ≥ 10} =
Z Z
f (x, y)dxdy
Z Z
fX (x)fY (y)dxdy
Z Z
1 1
dxdy
60 60
|x−y|>10
=
|x−y|>10
=2
y<x−10
=2
Z
60
10
25
= .
36
Z
0
x−10
1
dydx
3600
278
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A Necessary and Suffice Condition for Independence
Proposition 2.1
The continuous (discrete) random variables X and Y are
independent if and only if their joint probability density (mass) function can be
expressed as
fX,Y (x, y) = h(x)g(y),
−∞ < x, y < ∞
Proof: By definition, X and Y are independent if fX,Y (x, y)
= fX (x)fY (y).
279
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example 2f:
(a): Two random variables X and Y with joint density given by
f (x, y) = 6e−2x e−3y ,
0 < x, y < ∞
are independent.
(b): Two random variables X and Y with joint density given by
f (x, y) = 24xy
are NOT independent. (Why?)
0 < x, y < 1, 0 < x + y < 1
280
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Independence of More Than Two Random Variables
Definition:
The n random variables, X1 , X2 , ..., Xn , are said to be
independent if, for all sets of real numbers, A1 , A2 , ..., An
P {X1 ∈ A1 , X2 ∈ A2 , ..., Xn ∈ An } =
n
Y
i=1
P {Xi ∈ Ai }
equivalently,
P {X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn } =
for all x1 , x2 , ..., xn .
n
Y
i=1
P {Xi ≤ xi },
281
Cornell University, BTRY 4080 / STSCI 4080
Example 2h:
Fall 2009
Instructor: Ping Li
Let X , Y , Z be independent and uniformly distributed over
(0, 1). Compute P {X ≥ Y Z}.
Solution:
P {X ≥ Y Z} =
Z Z Z
f (x, y, z)dxdydz
x≥yz
=
Z
1
0
=
Z
1
0
=
Z
0
=
3
4
1
Z
Z
1
0
1
0
Z
1
dxdydz
yz
(1 − yz) dydz
z
1−
dz
2
282
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Chapter 6.3: Sum of Independent Random Variables
Suppose that X and Y are independent. Let Z
=X +Y.
Z Z
FZ (z) =P {X + Y ≤ z} =
fX (x)fY (y)dxdy
x+y≤z
=
=
=
Z
∞
−∞
Z ∞
−∞
∞
Z
−∞
Z
z−y
fX (x)fY (y)dxdy
−∞
fY (y)
Z
z−y
fX (x)dxdy
−∞
fY (y)FX (z − y)dy
∂
fZ (z) =
FZ (z) =
∂z
Z
∞
−∞
fY (y)fX (z − y)dy
283
Cornell University, BTRY 4080 / STSCI 4080
The discrete case:
then Z
Fall 2009
Instructor: Ping Li
X and Y are two independent discrete random variables,
= X + Y has probability mass function:
pZ (z) =
X
j
pY (j)pX (z − j)
——————–
Example: Suppose X, Y
∼ Bernouli(p). Then
Z = X + Y ∼ Binomial(2, p)
284
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
∼ Bernouli(p) are independent. Then
Z = X + Y ∼ Binomial(2, p).
Suppose X, Y
pZ (0) =
X
2
pY (j)pX (0 − j) =pY (0)pX (0) = (1 − p)2 =
(1 − p)2
0
X
pY (j)pX (1 − j) =pY (0)pX (1 − 0) + pY (1)pX (1 − 1)
X
2 2
pY (j)pX (2 − j) =pY (1)pX (2 − 1) = p2 =
p
2
j
pZ (1) =
j
pZ (2) =
j
2
p(1 − p)
=(1 − p)p + p(1 − p) =
1
285
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
∼ Binomial(n, p) and Y ∼ Binomial(m, p) are independent.
Then Z = X + Y ∼ Binomial(m + n, p).
Suppose X
pZ (z) =
X
j
Note
pY (j)pX (z − j).
0 ≤ j ≤ m and z − n ≤ j ≤ z . We can assume m ≤ n.
There are three cases to consider:
1. Suppose 0
≤ z ≤ m, then 0 ≤ j ≤ z
2. Suppose m
3. Suppose n
< z ≤ n, then 0 ≤ j ≤ m.
< z ≤ m + n, then z − n ≤ j ≤ m.
286
Cornell University, BTRY 4080 / STSCI 4080
Suppose 0
Fall 2009
Instructor: Ping Li
≤ z ≤ m, then 0 ≤ j ≤ z
pZ (z) =
X
j
pY (j)pX (z − j) =
z X
m
z
X
j=0
pY (j)pX (z − j)
n
pz−j (1 − p)n−z+j
j
z−j
j=0
z X
m
n
=pz (1 − p)m+n−z
j
z−j
j=0
m+n z
=
p (1 − p)m+n−z
z
=
pj (1 − p)m−j
because we have seen (e.g., using a combinatorial argument)
z X
m
n
m+n
=
.
j
z
−
j
z
j=0
287
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sum of Two Independent Uniform Random Variables
X ∼ U (0, 1) and Y ∼ U (0, 1) are independent. Let Z = X + Y .
fZ (z) =
Must have
Z
∞
−∞
fY (y)fX (z − y)dy =
Z
0
1
fX (z − y)dy
0 ≤ y ≤ 1 and 0 ≤ z − y ≤ 1, i.e., z − 1 ≤ y ≤ z .
Two cases to consider
1. If 0
≤ z ≤ 1, then 0 ≤ y ≤ z .
2. If 1
< z ≤ 2, then z − 1 < y ≤ 1.
288
Cornell University, BTRY 4080 / STSCI 4080
If 0
Fall 2009
Instructor: Ping Li
≤ z ≤ 1, then 0 ≤ y ≤ z .
Z 1
Z
fZ (z) =
fX (z − y)dy =
0
If 1
< z ≤ 2, then z − 1 ≤ y ≤ 1.
Z 1
Z
fZ (z) =
fX (z − y)dy =
0
fZ (z) =



 z



2−z
0
z
dy = z.
0
1
z−1
dy = 2 − z.
0<z≤1
1<z<2
otherwise
289
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Sum of Two Gamma Random Variables
X ∼ Gamma(t, λ) and Y ∼ Gamma(s, λ) are independent.
Let Z
=X +Y.
Show that Z
∼ Gamma(s + t, λ).
For an intuition, see notes page 246.
Instructor: Ping Li
290
Cornell University, BTRY 4080 / STSCI 4080
fZ (z) =
Z
∞
Z
z
−∞
=
0
Fall 2009
Instructor: Ping Li
fY (y)fX (z − y)dy =
λe−λy (λy)s−1
Γ(s)
e−λz λ2 λs+t−2
=
Γ(s)Γ(t)
λe−λz λs+t−1
=
Γ(s)Γ(t)
Z
Z
Z
z
0
fY (y)fX (z − y)dy
λe−λ(z−y) (λ(z − y))t−1
dy
Γ(t)
z
0
−λy s−1 λy
t−1
e
y
e (z − y)
dy
z
0
y s−1 (z − y)t−1 dy
291
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Z
λe−λz λs+t−1 z s−1
fZ (z) =
y (z − y)t−1 dy
Γ(s)Γ(t)
0
−λz
Z z
λe
(λz)s+t−1
Γ(s + t)
s−1
t−1
y
(z
−
y)
dy
=
Γ(s + t)
z s+t−1 Γ(s)Γ(t) 0
−λz
Z z s−1 s+t−1
t−1
λe
(λz)
Γ(s + t)
y
y
y
=
1−
d
Γ(s + t)
Γ(s)Γ(t) 0 z
z
z
−λz
Z
λe
(λz)s+t−1
Γ(s + t) 1 s−1
t−1
=
θ
(1 − θ)
dθ
Γ(s + t)
Γ(s)Γ(t) 0
Z 1
Γ(s + t)
θ s−1 (1 − θ)t−1 dθ
=fW (z)
Γ(s)Γ(t) 0
W ∼ Gamma(s + t, λ).
Because fZ (z) is also a density function, we must have
Γ(s + t)
Γ(s)Γ(t)
Z
1
0
θ s−1 (1 − θ)t−1 dθ = 1
292
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Recall the definition of Beta function in Section 5.6.4
B(s, t) =
Z
1
0
θ s−1 (1 − θ)
t−1
dθ
and its property (Eq. (6.3) in Section 5.6.4)
B(s, t) =
Γ(s)Γ(t)
.
Γ(s + t)
Thus,
Γ(s + t)
Γ(s)Γ(t)
Z
0
1
θ s−1 (1 − θ)t−1 dθ =
Γ(s + t) Γ(s)Γ(t)
=1
Γ(s)Γ(t) Γ(s + t)
293
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Generalization to more than two gamma random variables
If X1 , X2 , ..., Xn are independent gamma random variables with respective
parameters (s1 , λ), (s2 , λ), ..., (sn , λ), then
X1 + X2 + ... + Xn =
n
X
i=1
Xi ∼ Gamma
n
X
i=1
!
si , λ
294
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sum of Independent Normal Random Variables
= 1 to n, are independent random variables that
are normally distributed with respective parameters ui and σi2 , i = 1 to n. Then
Proposition 3.2
If Xi , i
X1 + X2 + ... + Xn =
n
X
i=1
Xi ∼ N
n
X
µi ,
i=1
i=1
————————
It suffices to show
X 1 + X 2 ∼ N µ1 + µ2 ,
σ12
+
σ22
n
X
σi2
!
295
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
X ∼ N (µx , σx2 ), Y ∼ N (µy , σy2 ). Let Z = X + Y .
Z
∞
fX (z − y)fY (y)dy
−∞
#
"
Z ∞
(y−µy )2
(z−y−µx )2
− 2σ2
1
1
−
2
2σx
y
√
√
=
e
e
dy
2πσx
2πσy
−∞
2
Z ∞
2
x ) − (y−µy )
− (z−y−µ
1
1
2
2
2σx
2σy
√
e
dy
=√
2π −∞ 2πσx σy
2 +(y−µ )2 σ 2
Z ∞
(z−y−µx )2 σy
y
x
−
1
1
2
2
2σx σy
√
=√
e
dy
2π −∞ 2πσx σy
2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2
Z ∞
y 2 (σx
y x
x
x
y
y
y
y x
−
1
1
2 σ2
2σx
y
√
=√
e
dy
2π −∞ 2πσx σy
fZ (z) =
296
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
= σx σy , B = σx2 σy2 , C = σx2 + σy2 ,
D = −2(µy σx2 − (µx − z)σy2 ),
E = (µx − z)2 σy2 + µ2y σx2 .
Let A
fZ (z) =
Z
∞
−∞
1
=√
2π
=
Z
∞
−∞
Z
∞
−∞
fX (z − y)fY (y)dy
2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2
y 2 (σx
y x
x
x
y
y
y
y x
−
2
2
2σx σy
1
√
e
2πσx σy
2
1
− Cy +Dy+E
2B
√
dy =
e
2πA
(y−µ)
−
1
2σ 2
because −∞ √
e
2πσ
R∞
2
r
B D2 −4EC
e 8BC
2
CA
dy = 1, for any µ, σ 2 .
dy
297
Cornell University, BTRY 4080 / STSCI 4080
Z
=
Z
∞
−∞
∞
−∞
Fall 2009
2
1
− Cy +Dy+E
2B
√
e
dy
2πA
1
√
e
2πA
∞
−
2
− D 2 +E
C
4C
−
B
2
C
=e
y 2 + D y+ E
C
C
2B
C
dy
2
2
y+ D ) − D 2 + E
(
2C
C
4C
−
B
2
C
dy
e
1
√
=
2πA
−∞
2
− D 2 +E Z
− 4C B C
2
C
=e
Z
Instructor: Ping Li
∞
−∞
r
√
2π
B
=
CA2
1
q
B
C
r
√AB
2
y+ D )
(
2C
−
2B
C
e
dy
C
B D2 −4EC
e 8BC
2
CA
298
Cornell University, BTRY 4080 / STSCI 4080
1
fZ (z) = √
2π
r
Fall 2009
Instructor: Ping Li
B D2 −4EC
e 8BC
2
CA
1
q
e
=√
2π σx2 + σy2
2 −(µ −z)σ 2 )2 −((µ −z)2 σ 2 +µ2 σ 2 )(σ 2 +σ 2 )
(µy σx
x
x
y
y
y x
x
y
2
2
2
2
2σx σy (σx +σy )
(
x
y)
− 2(σ2 +σ2 )
1
x
y
=√ q
e
2π σx2 + σy2
Therefore, Z
z−µ −µ
= X + Y ∼ N µx + µy ,
2
σx2
+
σy2 .
299
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sum of Independent Poisson Random Variables
If X1 , X2 , ..., Xn are independent Poisson random variables with respective
parameters λi , i
= 1 to n, then
X1 + X2 + ... + Xn =
n
X
i=1
Xi ∼ P oisson
n
X
i=1
λi
!
300
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Sum of Independent Binomial Random Variables
If X1 , X2 , ..., Xn are independent binomial random variables with respective
parameters (mi , p), i
= 1 to n, then
X1 + X2 + ... + Xn =
n
X
i=1
Xi ∼ Binomial
n
X
i=1
mi , p
!
301
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Sections 6.4, 6.5: Conditional Distributions
Recall P
(E|F ) =
P (EF )
P (F ) .
If X and Y are discrete random variables, then
p(x, y)
pX|Y (x|y) =
pY (y)
If X and Y are continuous random variables, then
f (x, y)
fX|Y (x|y) =
fY (y)
Instructor: Ping Li
302
Cornell University, BTRY 4080 / STSCI 4080
Example 4b:
Fall 2009
Instructor: Ping Li
Suppose X and Y are independent Poisson random variables
with respective parameters λx and λy .
Q: Calculate the conditional distribution of X , given that X
—————–
What is the distribution of X
+ Y ??
+ Y = n.
303
Cornell University, BTRY 4080 / STSCI 4080
Example 4b:
Fall 2009
Instructor: Ping Li
Suppose X and Y are independent Poisson random variables
with respective parameters λx and λy .
Q: Calculate the conditional distribution of X , given that X
+ Y = n.
Solution:
P {X = k, X + Y = n}
P {X = k|X + Y = n} =
P {X + Y = n}
P {X = k, Y = n − k}
P {X = k}P {Y = n − k}
=
=
P {X + Y = n}
P {X + Y = n}
−λx k " −λy n−k # λ
e
e
λx
n!
y
=
k!
(n − k)!
e−λx −λy (λx + λy )n
k n−k
n
λx
λx
=
1−
k
λx + λy
λx + λy
λx
Therefore, X ∼ Binomial n, p = λ +λ
.
x
y
304
Cornell University, BTRY 4080 / STSCI 4080
Example 5b:
Fall 2009
The joint density of X and Y is given by
f (x, y) =
Q: Find P {X
Instructor: Ping Li


e−x/y e−y
y
 0
> 1|Y = y}.
————————–
P {X > 1|Y = y} =
R∞
1
fX|Y (x|y)dx
0 < x, y < ∞
otherwise
305
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
f (x, y)
fX|Y (x|y) =
=
fY (y)
P {X > 1|Y = y} =
Z
1
e−x/y e−y
y
R ∞ e−x/y e−y
dx
y
0
∞
e−x/y
1
= R ∞ −x/y
= e−x/y
y
e
dx
0
fX|Y (x|y)dx =
Z
∞
1
1 −x/y
e
dx = e−1/y
y
306
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Bivariate Normal Distribution
The random variables X and Y have a bivariate normal distribution if, for
constants, ux , uy , σx , σy , −1
all −∞
< ρ < 1, their joint density function is given, for
< x, y < ∞, by
1
− 2(1−ρ
1
2)
p
f (x, y) =
e
2πσx σy 1 − ρ2
If X and Y are independent, then ρ
1 − 12
f (x, y) =
e
2πσx
(y−µy )2
+
2
σy
(x−µx )(y−µy )
−2ρ
σx σy
= 0, and
(x−µx )2
2
σx
(x−µ )2
− 2σ2x
x
1
√
=
e
2πσx
(x−µx )2
2
σx
(y−µy )2
+
2
σy
(y−µy )2
− 2σ2
y
1
√
×
e
2πσy
307
Cornell University, BTRY 4080 / STSCI 4080
fX (x) =
=
=
=
=
Z
Z
Z
Z
∞
−∞
∞
−∞
∞
−∞
∞
−∞
Z
∞
Fall 2009
Instructor: Ping Li
f (x, y)dy
−∞
1
− 2(1−ρ
1
2)
p
e
2πσx σy 1 − ρ2
1
− 2(1−ρ
1
2)
p
e
2πσx 1 − ρ2
(x−µx )
1
−
p
e
2
2πσx 1 − ρ
(x−µx )2
2
σx
(x−µx )2
2
σx
(y−µy )2
+
2
σy
+t2 −2ρ
r
(x−µx )(y−µy )
−2ρ
σx σy
(x−µx )t
σx
2 +t2 σ 2 −2ρ(x−µ )tσ
x
x
x
2
2(1−ρ2 )σx
2
1
− Ct +Dt+E
2B
√
dt =
e
2πA
√ p
A = 2π 1 − ρ2 σx ,
D = −2ρ(x − µx )σx ,
dy
dt
dt
x)
B D2 −4EC
1
− (x−µ
2
2σx
e 8BC = √
e
2
CA
2πσx
B = (1 − ρ2 )σx2 ,
E = (x − µx )2
C = σx2 ,
2
308
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Therefore,
X ∼ N (µx , σx2 ),
Y ∼ N (µy , σy2 )
The conditional density
fX|Y (x|y) =
A useful trick:
f (x, y)
fY (y)
Only worry about the terms that are functions of x; the rest
terms are just normalizing constants.
309
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Any (reasonable) non-negative function g(x) can be essentially viewed as a
probability density function because one can always normalize it.
g(x)
= Cg(x) ∝ g(x),
g(x)dx
−∞
f (x) = R ∞
is a probability density function.
where C
R∞
(One can verify −∞ f (x)
C is just a (normalizing) constant.
When y is treated as a constant, C(y) is still a constant.
X|Y = y means y is a constant.
1
g(x)dx
−∞
= R∞
= 1).
310
Cornell University, BTRY 4080 / STSCI 4080
For example,
Similarly, g(x)
−
g(x) = e
Instructor: Ping Li
(x−1)2
2
2 −2x
2
−x
=e
Fall 2009
,
is essentially the density function of N (1, 1).
2 −2xy
−x
g(x) = e
2
are both (essentially) the
density functions of normals.
————–
1
=
C
Z
∞
2 −2xy
−x
e
2
√
y2
dx = 2πe 2
−∞
Therefore,
1 − y2
1 − (x−y)2
2
f (x) = Cg(x) = √ e 2 × g(x) = √ e
2π
2π
is the density function of N (y, 1), where y is merely a constant.
311
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
(X , Y ) is a bivariate normal.
f (x, y)
∝ f (x, y)
fX|Y (x|y) =
fY (y)
1
− 2(1−ρ
2)
∝e
(x−µx )2
2
σx
(x−µx )(y−µy )
−2ρ
σx σy
σ
(x−µx )2 −2ρ(x−µx )(y−µy ) x
σy
−
2
2
2(1−ρ )σx
∝e
−
∝e
σ
x−µx −ρ(y−µy ) x
σy
2
2
2(1−ρ )σx
2
Therefore,
X|Y ∼ N
Y |X ∼ N
σx
, (1 − ρ2 )σx2
σy
σy
µy + ρ(x − µx ) , (1 − ρ2 )σy2
σx
µx + ρ(y − µy )
312
Cornell University, BTRY 4080 / STSCI 4080
Example 5d:
Fall 2009
Instructor: Ping Li
Consider n + m trials having a common probability of
success. Suppose, however, that this success probability is not fixed in advance
but is chosen from U (0, 1).
Q: What is the conditional distribution of this success probability given that the
n + m trials result in n successes?
Solution:
Let X = trial success probability.
X ∼ U (0, 1).
Let N = total number of successes.
The desired probability:
N |X = x ∼ Binomial(n + m, x).
X|N = n.
313
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
Let X = trial success probability.
X ∼ U (0, 1).
Let N = total number of successes.
N |X = x ∼ Binomial(n + m, x).
P {N = n|X = x}fX (x)
P {N = n}
n+m n
m
x
(1
−
x)
= n
P {N = n}
∝ xn (1 − x)m
fX|N (x|n) =
Therefore, X|N
Here X
∼ Beta(n + 1, m + 1).
∼ U (0, 1) is the prior distribution.
314
Cornell University, BTRY 4080 / STSCI 4080
If X|N
Fall 2009
Instructor: Ping Li
∼ Beta(n + 1, m + 1), then
n+1
E(X|N ) =
(n + 1) + (m + 1)
Suppose we do not have a prior knowledge of the success probability X .
We observe n successes out of n + m trials.
The most intuitive guess (estimate) of X should be
X̂ =
n
n+m
Assuming a uniform prior on X leads to the add-one smoothing.
315
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 6.6: Order Statistics
Let X1 , X2 , ..., Xn be n independent and identically distributed, continuous
random variables having a common density f and distribution function F .
Define
X(1) = smallest of X1 , X2 , ..., Xn
X(2) = second smallest of X1 , X2 , ..., Xn
...
X(j) = j th smallest of X1 , X2 , ..., Xn
...
X(n) = largest of X1 , X2 , ..., Xn
≤ X(2) ≤ ... ≤ X(n) are known as the order
statistics corresponding to the random variables X1 , X2 , ..., Xn .
The ordered values X(1)
316
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
For example, the CDF of the largest (i.e., X(n) ) should be
FX(n) (x) = P (X(n) ≤ x) = P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
because, if the largest is smaller than x, then everyone is smaller than x.
Then, by independence
FX(n) (x) =P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
=P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x)
=F (x)n ,
and the density function is simply
∂F n (x)
fX(n) (x) =
= nF n−1 (x)f (x)
∂x
317
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The CDF of the smallest (i.e., X(1) ) should be
FX(1) (x) = P (X(1) ≤ x) =1 − P (X(1) > x)
=1 − P (X1 > x, X2 > x, ..., Xn > x)
because, if the smallest is larger than x, then everyone is larger than x.
Then, by independence
FX(1) (x) =1 − P (X1 > x)P (X2 > x)...P (Xn > x)
=1 − [1 − F (x)]n
and the density function is simply
n
∂1 − [1 − F (x)]
n−1
fX(1) (x) =
= n [1 − F (x)]
f (x)
∂x
318
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Joint density function of order statistics
for x1
≤ x2 ≤ ... ≤ xn ,
fX(1) ,X(2) ,...,X(n) (x1 , x2 , ..., xn ) = n!f (x1 )f (x2 )...f (xn ),
Intuitively (and heuristically)
P {X(1) = x1 , X(2) = x2 }
=P {{X1 = x1 , X2 = x2 } OR {X2 = x1 , X1 = x2 }}
=P {X1 = x1 , X2 = x2 } + P {X2 = x1 , X1 = x2 }
=f (x1 )f (x2 ) + f (x1 )f (x2 )
=2!f (x1 )f (x2 )
Warning, this is a heuristic argument, not a rigorous proof. Please read the book.
319
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Density function of the k th order statistics, X(k) :
Instructor: Ping Li
fX(k) (x).
• Select 1 from the n values X1 , X2 , ..., Xn . Let it be equal to x.
• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.
• Let the rest be larger than x.
• Total number of choices is ??.
• For a given choice, the probability should be ??
320
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• Select 1 from the n values X1 , X2 , ..., Xn . Let it be equal to x.
• Select k − 1 from the remaining n − 1 values. Let them be smaller than x.
• Let the rest be larger than x.
•
n
Total number of choices is 1,k−1,n−k .
• For a given choice, the probability should be
f (x) [F (x)]
k−1
n−k
[1 − F (x)]
.
Therefore,
fX(k) (x) =
n
k−1
n−k
f (x) [F (x)]
[1 − F (x)]
1, k − 1, n − k
321
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Cumulative function of the k th order statistics
• Via integrating the density function:
=
FX(k) (y) =
Z y −∞
Z
y
fX(k) (x)
−∞
n
k−1
n−k
f (x) [F (x)]
[1 − F (x)]
dx
1, k − 1, n − k
Z
Z
n
=
1, k − 1, n − k
n
=
1, k − 1, n − k
y
f (x) [F (x)]
−∞
F (y)
0
k−1
n−k
[1 − F (x)]
[z]k−1 [1 − z]n−k dz
dx
322
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• Via a direct argument.
FX(k) (y) =P {X(k) ≤ (y)} = P {≥ k of the Xi ’s are ≤ y}
FX(k) (y) =P {X(k) ≤ y}
=P {≥ k of the Xi ’s are ≤ y}
=
n
X
j=k
=
P {j of the Xi ’s are ≤ y}
n X
n
j=k
j
j
[F (y)] [1 − F (y)]
n−j
323
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• A more intuitive solution by dividing y into non-overlapping segments.
[
{X(k) ≤ y} ={y ∈ [X(n) , ∞)} {y ∈ [X(n−1) , X(n) )}
[ [
... {y ∈ [X(k) , X(k+1) )}
n X
n
j
n−j
=
[F (y)] [1 − F (y)]
j
j=k
n
P {y ∈ [X(n) , ∞)} =
[F (y)]n
n
P {y ∈ [X(n−1) , X(n) )} =
n
[F (y)]n−1 [1 − F (y)]
n−1
n
P {y ∈ [X(k) , X(k+1) )} =
[F (y)]k [1 − F (y)]n−k
k
324
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
An interesting identity by letting X
n X
n
j=k
j
y j [1 − y]
n
=
1, k − 1, n − k
∼ U (0, 1):
n−j
Z
y
0
xk−1 [1 − x]
n−k
dx,
0 < y < 1.
325
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Joint Probability Distribution of Functions of Random Variables
X1 and X2 are jointly continuous random variables
with probability density function fX1 ,X2 .
Y1 = g1 (X1 , X2 ),
Y2 = g2 (X1 , X2 ).
The goal: Determine the density function fY1 ,Y2 .
326
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The formula to be remembered
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 )|J(x1 , x2 )|−1
where, the inverse functions:
x1 = h1 (y1 , y2 ),
x2 = h2 (y1 , y2 )
and the Jocobian:
J(x1 , x2 ) = Two conditions
∂g1
∂x1
∂g2
∂x1
∂g1
∂x2
∂g2
∂x2
= ∂g1 ∂g2 − ∂g2 ∂g1
∂x1 ∂x2
∂x1 ∂x2
• y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 ) can be uniquely solved:
x1 = h1 (y1 , y2 ), and x2 = h2 (y1 , y2 )
• The Jocobian J(x1 , x2 ) 6= 0 for all points (x1 , x2 ).
327
Cornell University, BTRY 4080 / STSCI 4080
Example 7a:
Fall 2009
Instructor: Ping Li
Let X1 and X2 be jointly continuous random variables with
probability density fX1 ,X2 . Let Y1
= X1 + X2 and Y2 = X1 − X2 .
Q: Find fY1 ,Y2 in terms of fX1 ,X2 .
328
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
y1 = g1 (x1 , x2 ) = x1 + x2 , y2 = g2 (x1 , x2 ) = x1 − x2 ,
=⇒
y1 −y2
2
,
x
=
h
(y
,
y
)
=
x1 = h1 (y1 , y2 ) = y1 +y
2
2
1
2
2
2 ,
Therefore,
1 1
J(x1 , x2 ) = 1 −1
= −2;
fY1 ,Y2 (y1 , y2 ) =fX1 ,X2 (x1 , x2 )|J(x1 , x2 )|−1
1
y1 + y2 y1 − y2
= fX1 ,X2
,
2
2
2
329
Cornell University, BTRY 4080 / STSCI 4080
Example 7b:
Fall 2009
Instructor: Ping Li
The joint density in the polar coordinates.
x = h1 (r, θ) = r cos θ ,
y = h2 (r, θ) = r sin θ , i.e.,
p
r = g1 (x, y) = x2 + y 2 ,
θ = g2 (x, y) = tan−1 xy .
Q (a): Express fR,Θ (r, θ) in terms of fX,Y (x, y).
Q (b): What if X and Y are independent standard normal random variables?
330
Cornell University, BTRY 4080 / STSCI 4080
Solution:
Fall 2009
First, compute the Jocobian J(x, y):
∂g1
x
=p
= cos θ
2
2
∂x
x +y
∂g1
y
=p
= sin θ
2
2
∂y
x +y
∂g2
1
−y
−y
− sin θ
=
=
=
∂x
1 + (y/x)2 x2
x2 + y 2
r
∂g2
1
1
x
cos θ
=
= 2
=
2
2
∂y
1 + (y/x)
x
x +y
r
∂g1 ∂g2
∂g1 ∂g2
cos2 θ sin2 θ
1
J=
−
=
+
=
∂x ∂y
∂y ∂x
r
r
r
Instructor: Ping Li
331
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
fR,Θ (r, θ) = fX,Y (x, y)|J|−1 = rfX,Y (r cos θ, r sin θ)
where 0
< r < ∞,
0 < θ < 2π .
If X and Y are independent standard normals, then
2
r − x2 +y2
re−r /2
2
fR,Θ (r, θ) = rfX,Y (r cos θ, r sin θ) =
e
=
2π
2π
R and Θ are independent. Θ ∼ U (0, 2π).
332
Cornell University, BTRY 4080 / STSCI 4080
Example 7d:
Fall 2009
Instructor: Ping Li
X1 , X2 , and X3 , are independent standard normal random
variables. Let
Y1 = X1 + X2 + X3
Y2 = X1 − X2
Y3 = X1 − X3
Q: Compute the joint density function fY1 ,Y2 ,Y3 (y1 , y2 , y3 ).
333
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
Compute the Jocobian
1
1
1
J = 1 −1
0
1
0 −1
Solve for X1 , X2 , and X3
Y1 + Y2 + Y3
X1 =
,
3
=3
Y1 − 2Y2 + Y3
X2 =
,
3
Y1 + Y2 − 2Y3
X3 =
3
fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) = fX1 ,X2 ,X3 (x1 , x2 , x3 )|J|−1
x2 +x2 +x2
− 12
1
1
− 1 22 3
=
=
e
e
3/2
3/2
3(2π)
3(2π)
2
y1
3
2y 2
+ 32
2y 2
+ 33
2y y
− 23 3
334
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
More about Determinant
a
d
g
b c e f = aei + bf g + cdh − ceg − bdi − af h
h i x1
0
0
0
... 0
×
x2
0
0
... 0
×
x3
0
... 0
×
×
... 0
×
×
x4
×
×
×
×
... 0
×
×
×
×
×
xn
= x1 x2 x3 ...xn
335
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Chapter 7: Properties of Expectation
If X is a discrete random variable with mass function p(x), then
E(X) =
X
xp(x)
x
If X is a continuous random variable with density f (x), then
E(X) =
Z
∞
−∞
If a
≤ X ≤ b, then a ≤ E(X) ≤ b.
xf (x)dx
336
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proposition 2.1
If X and Y have a joint probability mass function p(x, y), then
E[g(x, y)] =
XX
y
g(x, y)p(x, y)
x
If X and Y have a joint probability density function f (x, y), then
E[g(x, y)] =
What if g(X, Y )
=X +Y?
Z
∞
−∞
Z
∞
−∞
g(x, y)f (x, y)dxdy
337
Cornell University, BTRY 4080 / STSCI 4080
Example 2a:
Fall 2009
Instructor: Ping Li
An accident occurs at a point X that is uniformly distributed on
a road of length L. At the time of the accident an ambulance is at a location Y
that is also uniformly distributed on the road. Assuming that X and Y are
independent, find the expected distance between the ambulance and the point of
the accident.
Solution:
X ∼ U (0, L),
Y ∼ U (0, L).
E(|X − Y |) =
Z Z
To find E(|X
− Y |).
|x − y|f (x, y)dxdy
338
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
X ∼ U (0, L),
Y ∼ U (0, L).
To find E(|X − Y |).
Z Z
E(|X − Y |) =
|x − y|f (x, y)dxdy
1
= 2
L
1
= 2
L
Z
Z
1
= 2
L
Z
1
= 2
L
L
=
3
Z
L
0
0
LZ
L
0
L
0
Z
L
|x − y|dxdy
0
x
L
1
(x − y) dydx + 2
(y − x) dydx
L 0 x
0
L
Z L 2
2 x
y y
1
xy − dx + 2
− xy dx
2 0
L 0 2
x
Z
L 2
x2
1
L
x2
dx + 2
− xL + dx
2
L 0 2
2
Z
L
Z
339
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Let X and Y be continuous random variables with density f (x, y).
g(X, Y ) = X + Y .
E[g(x, y)] = E(X + Y )
Z ∞Z ∞
=
(x + y)f (x, y)dxdy
=
Z
=
Z
−∞
∞
Z
−∞
∞
Z
∞
xf (x, y)dxdy +
−∞
−∞ −∞
Z ∞ Z ∞
Z
x
f (x, y)dy dx +
=
−∞
∞
−∞
−∞
xfX (x)dx +
=E(X) + E(Y )
Z
∞
−∞
Z
∞
yf (x, y)dxdy
−∞
Z ∞
∞
y
f (x, y)dx dy
−∞
yfY (y)dy
−∞
340
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Linearity of Expectations
E(X + Y ) = E(X) + E(Y )
E(X + Y + Z) = E((X + Y ) + Z)
=E(X + Y ) + E(Z) = E(X) + E(Y ) + E(Z)
E(X1 + X2 + ... + Xn ) = E(X1 ) + E(X2 ) + ... + E(Xn )
Linearity is not affected by correlations among Xi ’s.
No assumption of independence is needed.
341
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Sample Mean
Let X1 , X2 , ..., Xn be identically distributed random variables having expected
value µ. The quantity X̄ , defined by
n
1X
X̄ =
Xi
n i=1
is called the sample mean.
342
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
"
n
X
#
1
Xi
n i=1
" n
#
X
1
= E
Xi
n
i=1
E(X̄) =E
n
1X
=
E[Xi ]
n i=1
n
1X
=
µ
n i=1
1
= nµ
n
=µ
When µ is unknown, an unbiased estimator of µ is X̄ .
343
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectation of a Binomial Random Variable
X ∼ Binomial(n, p)
X = Z1 + Z2 + ... + Zn
where Zi , i
= 1 to n, are independent Bernoulli with probability of success p.
Because E(Zi )
= p,
E(X) = E
n
X
i=1
Zi
!
=
n
X
i=1
E(Zi ) = np
344
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectation of a Hypergeometric Random Variable
Suppose n balls are randomly selected without replacement, from an urn
containing N balls of which m are white. Denote X
∼ HG(N, m, n).
Q: Find E(X).
Solution 1:
Let X
= X1 + X2 + ... + Xm , where

 1 if the ith white ball is selected
Xi =
 0 otherwise
Note that the Xi ’s are not independent; but we only need the expectation.
345
Cornell University, BTRY 4080 / STSCI 4080
Solution 1:
Therefore,
Fall 2009
Let X
= X1 + X2 + ... + Xm , where

 1 if the ith white ball is selected
Xi =
 0 otherwise
E(Xi ) =P {Xi = 1} = P {ith white ball is selected}
1 N −1
n
1 n−1
=
=
N
N
n
mn
E(X) =
.
N
Instructor: Ping Li
346
Cornell University, BTRY 4080 / STSCI 4080
Solution 2:
Fall 2009
Let X
= Y1 + Y2 + ... + Yn , where

 1 if the ith ball selected is white
Yi =
 0 otherwise
E(Yi ) =P {Yi = 1} = P {ith ball elected is white}
m
=
N
Therefore,
mn
E(X) =
.
N
Instructor: Ping Li
347
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expected Number of Matches
A group of N people throw their hats into the center of a room. The hats are
mixed up and each person randomly selects one.
Q: Find the expected number of people that selects their own hat.
348
Cornell University, BTRY 4080 / STSCI 4080
Solution:
Fall 2009
Instructor: Ping Li
Let X denote the number of matches and let
X = X1 + X2 + ... + XN ,
where

 1 if the i th person selects his own hat
Xi =
 0 otherwise
1
E(Xi ) = P {Xi = 1} =
N
Thus
E(X) =
N
X
i=1
E(Xi ) =
1
N =1
N
349
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Expectations May Be Undefined
A More Rigorous Definition:
If X is a discrete random variable with mass
function p(x), then
E(X) =
X
xp(x),
x
provided
X
x
|x|p(x) < ∞
If X is a continuous random variable with density f (x), then
E(X) =
Z
∞
−∞
xf (x)dx,
provided
Z
∞
−∞
|x|f (x) < ∞
350
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Example (Cauchy Distribution):
1 1
f (x) =
,
π x2 + 1
E(X) =
Z
∞
−∞
−∞ < x < ∞
x 1
dx = 0 ???
π x2 + 1
(Answer is NO)
because
Z
∞
−∞
∞
|x| 1
dx =
π x2 + 1
Z
∞
0
1 1
2
=
d[x
+ 1]
2+1
π
x
0
∞
1
2
= log(x + 1)0
π
1
= log ∞ = ∞
π
Z
2x 1
dx
π x2 + 1
351
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 7.3: Moments of The Number of Events
X=
Pn
i=1 Ii :

 1 if A occurs
i
Ii =
 0 otherwise
number of these events Ai that occur. We know
E(X) =
n
X
i=1
E(Ii ) =
n
X
P (Ai )
i=1
Suppose we are interested in the number of pairs of events that occur, eg.
Ai Aj
352
Cornell University, BTRY 4080 / STSCI 4080
X=
Pn
X(X−1)
2
i=1 Ii :
=
⇐⇒
P
i<j Ii Ij :
Therefore,
Fall 2009
Instructor: Ping Li
number of these events Ai that occur.
X
2 :
number of pairs of event that occur.
number of pairs of event that occur, because

 1 if both A and A occur
i
j
Ii Ij =
 0 otherwise
E
X(X − 1)
2
E X
2
=
X
P (Ai Aj )
i<j
− E (X) = 2
X
i<j
P (Ai Aj )
353
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Second Moments of a Binomial Random Variables
n independent trials, with each trial being a success with probability p.
Ai = the event that trial i is a success.
If i
6= j , then P (Ai Aj ) = p2 .
E X
2
n 2
− E (X) = 2
P (Ai Aj ) = 2
p = n(n − 1)p2
2
i<j
X
E(X 2 ) = n2 p2 − np2 + np
2 2
2
2
V ar(X) = n p − np + np − [np] = np(1 − p)
354
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Second Moments of a Hypergeometric Random Variables
n balls are randomly selected without replacement from an urn containing N
balls, of which m are white.
Ai = event that the ith ball selected is white.
X = number of white balls selected. We have known
n
X
m
E(X) =
P (Ai ) = n .
N
i=1
355
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
mm−1
P (Ai Aj ) = P (Ai )P (Aj |Ai ) =
,
N N −1
E X
2
i 6= j
n mm−1
− E (X) = 2
P (Ai Aj ) = 2
2 N N −1
i<j
X
m m − 1 nm
+
E(X ) = n(n − 1)
N N −1
N
2
m
nm(N − n)(N − m)
m N − n
V ar(X) =
=n
1−
N 2 (N − 1)
N
N N −1
One can view m
N
= p. Note that
N −n
N −1
≈ 1 if N is much larger than n.
356
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Second Moments of the Matching Problem
Let Ai = event that person i selects his/her own hat in the match problem, i
=1
to N .
1
1
P (Ai Aj ) = P (Ai )P (Aj |Ai ) =
,
N N −1
i 6= j
X = number of people who select their own hat. We have shown E(X) = 1.
1
E(X ) − E(X) = 2
P (Ai Aj ) = N (N − 1)
=1
N
(N
−
1)
i<j
2
X
V ar(X) = E(X 2 ) − E 2 (X) = 1
357
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Section 7.4: Covariance and Correlations
Proposition 4.1:
If X and Y are independent, then
E(XY ) = E(X)E(Y ).
For any function h and g ,
E[h(X) g(Y )] = E[h(X)] E[g(Y )].
Instructor: Ping Li
358
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof:
E[h(X) g(Y )] =
=
=
Z
∞
−∞
Z ∞
Z
g(x)h(y)f (x, y)dxdy
−∞
Z ∞
−∞
∞
Z
∞
−∞
g(x)h(y)fX (x)fY (y)dxdy
−∞
Z ∞
g(x)fX (x)dx
h(y)fY (y)dy
=E[h(X)] E[g(Y )]
−∞
359
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Covariance
Definition:
The covariance between X and Y , is defined by
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
Equivalently
Cov(X, Y ) = E(XY ) − E(X)E(Y )
Instructor: Ping Li
360
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
= E[XY − E[X]Y − E[Y ]X + E[X]E[Y ]]
= E[XY ] − E[E[X]Y ] − E[E[Y ]X] + E[E[X]E[Y ]]
= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ]
Note that E(X) is a constant, so is E(X)E(Y )
361
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
X and Y are independent =⇒ Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0.
However, that X and Y are uncorrelated Cov(X, Y )
does not imply that X and Y are independent
= 0,
362
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Properties of Covariance: Proposition 4.2
• Cov(X, Y ) = Cov(Y, X)
• Cov(X, X) = V ar(X)
• Cov(aX, Y ) = aCov(X, Y )
Directly follow from the definition Cov(X, Y )
•

Cov 
n
X
i=1
Xi ,
m
X
j=1

Yj  =
= E(XY ) − E(X)E(Y ).
n X
m
X
i=1 j=1
Cov(Xi , Yj )
363
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Proof:

Cov 
n
X
i=1
Xi ,
m
X
j=1


Yj  =E 

=E 
=
n
X
i=1
i=1 j=1
n X
m
X
n X
m
X
i=1 j=1
=
j=1
n X
m
X
i=1 j=1
=
Xi
m
X
n X
m
X
i=1 j=1

Yj  − E

Xi Yj  −
E(Xi Yj ) −
n
X
n
X
Xi
i=1
E(Xi )
i=1
n X
m
X
!
m
X
j=1
E(Yj )
j=1
E(Xi )E(Yj )
i=1 j=1
[E(Xi Yj ) − E(Xi )E(Yj )]
Cov(Xi , Yj )


m
X
E
Yj 
364
Cornell University, BTRY 4080 / STSCI 4080

Cov 
n
X
Fall 2009
Xi ,
i=1
m
X
j=1
Instructor: Ping Li

Yj  =
n X
m
X
Cov(Xi , Yj )
i=1 j=1
If n
= m and Xi = Yi , then


!
n
n
n
n X
n
X
X
X
X
V ar
Xi =Cov 
Xi ,
Xj  =
Cov(Xi , Xj )
i=1
i=1
=
n
X
j=1
i=1 j=1
Cov(Xi , Xi ) +
i=1
=
n
X
i=1
=
n
X
i=1
X
Cov(Xi , Xj )
i6=j
V ar(Xi ) +
X
Cov(Xi , Xj )
i6=j
V ar(Xi ) + 2
X
i<j
Cov(Xi , Xj )
365
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
If Xi ’s are pairwise independent, then
V ar
n
X
i=1
Example 4b:
Xi
!
=
n
X
V ar(Xi )
i=1
X ∼ Binomial(n, p) can be written as
X = X1 + X2 + ... + Xn ,
where Xi
∼ Bernoulli(p). Xi ’s are independent. Therefore
!
n
X
V ar(X) =V ar
Xi
i=1
=
n
X
i=1
V ar(Xi ) =
n
X
i=1
p(1 − p) = np(1 − p).
366
Cornell University, BTRY 4080 / STSCI 4080
Example 4a:
Fall 2009
Instructor: Ping Li
Let X1 , X2 , ..., Xn be independent and identically distributed
random variables having expected value µ and variance σ 2 . Let X̄ be the
sample mean and S 2 be the sample variance, where
n
1X
Xi ,
X̄ =
n i=1
Q: Find
V ar(X̄) and E(S 2 ).
n
1 X
2
S =
(Xi − X̄)2
n − 1 i=1
367
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
V ar(X̄) =V ar
1
n
1
= 2 V ar
n
n
X
Xi
i=1
n
X
!
Xi
i=1
!
n
1 X
= 2
V ar (Xi )
n i=1
n
σ2
1 X 2
σ =
= 2
n i=1
n
We’ve already shown E(X̄)
= µ.
As the sample size n increases, the
variance of X̄ , the unbiased estimator of µ, decreases.
368
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Less Clever Approach (different from the book)
E(S 2 ) =E
1
n−1
1
E
=
n−1
1
=
E
n−1
1
=
E
n−1
n
X
i=1
n
X
i=1
n
X
i=1
n
X
i=1
n
X
(Xi − X̄)2
Xi2
!
2
+ X̄ − 2X̄Xi
Xi2 + nX̄ 2 − 2X̄
Xi2 − nX̄ 2
!
1
2
2
=
E(Xi ) − nE X̄
n − 1 i=1
n
2
2
2
µ + σ − E X̄
=
n−1
!
n
X
i=1
!
Xi
!
369
Cornell University, BTRY 4080 / STSCI 4080

Fall 2009
Instructor: Ping Li
!2 


n
n
X
X
X
1
1
2
2



Xi
= 2E
Xi + 2
Xi Xj 
E(X̄ ) = 2 E
n
n
i=1
i
i<j


n
X
1 X
2
= 2
E(Xi ) + 2
E(Xi Xj )
n
i
i<j


n
X
1 X 2
2
= 2
(µ + σ ) + 2
µ2 
n
i
i<j
Or, directly,
2
σ
1 + µ2
= 2 n µ2 + σ 2 + n(n − 1)µ2 =
n
n
2
E(X̄ ) =V ar X̄ + E
2
σ2
X̄ =
+ µ2
n
370
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Therefore,
n
2
2
2
E(S ) =
µ + σ − E X̄
n−1
2
n
σ
=
µ2 + σ 2 −
− µ2
n−1
n
2
=σ 2
Sample variance S
2
=
1
n−1
Pn
i=1 (Xi
− X̄)2 is an unbiased estimator of σ 2 .
1
1
Why n−1
instead of n
?
Pn
2
(X
−
X̄)
is a sum of n terms with only n − 1 degrees of freedom,
i
i=1
P
n
1
because X̄ = n
i=1 Xi .
371
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Standard Clever Technique for evaluating E(S 2 ) (as in the textbook)
(n − 1)S 2 =
=
=
n
X
i=1
n
X
i=1
n
X
i=1
(Xi − X̄)2 =
(Xi − µ)2 +
n
X
i=1
n
X
i=1
(Xi −µ + µ − X̄)2
(X̄ − µ)2 − 2
n
X
i=1
(Xi − µ)(X̄ − µ)
(Xi − µ)2 − n(X̄ − µ)2
E[(n − 1)S 2 ] =(n − 1)E(S 2 ) = E
"
n
X
i=1
(Xi − µ)2 − n(X̄ − µ)2
#
σ2
=nV ar(Xi ) − nV ar(X̄) = nσ − n
= (n − 1)σ 2
n
2
372
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Correlation of Two Random Variables
Definition:
The correlation of two random variables X and Y , denoted by
ρ(X, Y ), is defined, as long as V ar(X) > 0 and V ar(Y ) > 0, by
ρ(X, Y ) = p
Cov(X, Y )
V ar(X)V ar(Y )
It can be shown that
−1 ≤ ρ(X, Y ) ≤ 1
373
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
−1 ≤ ρ(X, Y ) = p
Proof:
Instructor: Ping Li
Cov(X, Y )
V ar(X)V ar(Y )
The basic trick: Variance V ar(X)
V ar
=
≤1
≥ 0 always.
X
Y
p
+p
V ar(X)
V ar(Y )
!
V ar(X) V ar(Y )
2Cov(X, Y )
+
+p
V ar(X) V ar(Y )
V ar(X)V ar(Y )
=2 (1 + ρ(X, Y )) ≥ 0
Therefore, ρ(X, Y )
≥ −1.
374
Cornell University, BTRY 4080 / STSCI 4080
V ar
Fall 2009
X
Y
p
−p
V ar(X)
V ar(Y )
Instructor: Ping Li
!
V ar(X) V ar(Y )
2Cov(X, Y )
p
=
+
−
V ar(X) V ar(Y )
V ar(X)V ar(Y )
=2 (1 − ρ(X, Y )) ≥ 0
Therefore, ρ(X, Y )
≤ 1.
375
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
• ρ(X, Y ) = +1 =⇒ Y = a + bX , and b > 0.
• ρ(X, Y ) = −1 =⇒ Y = a − bX , and b > 0.
• ρ(X, Y ) = 0 =⇒ X and Y are uncorrelated
• ρ(X, Y ) is close to +1 =⇒ X and Y are highly positively correlated.
• ρ(X, Y ) is close to −1 =⇒ X and Y are highly negatively correlated.
376
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Section 7.5: Conditional Expectation
If X and Y are jointly discrete random variables
E[X|Y = y] =
X
x
=
X
xP {X = x|Y = y}
xpX|Y (x|y)
x
If X and Y are jointly continuous random variables
E[X|Y = y] =
Z
∞
−∞
xfX|Y (x|y)dx
377
Cornell University, BTRY 4080 / STSCI 4080
Example 5b:
Fall 2009
Instructor: Ping Li
Suppose the joint density of X and Y is given by
1 −x/y −y
e ,
f (x, y) = e
y
0 < x, y < ∞
Q: Compute
E[X|Y = y] =
What the answer should be like?
Z
∞
xfX|Y (x|y)dx
−∞
(a function a y )
378
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Solution:
fX|Y (x|y) =
f (x, y)
f (x, y)
= R∞
fY (y)
f (x, y)dx
−∞
1 −x/y −y
e
ye
= R ∞ 1 −x/y −y
e
e dx
0 y
=
=
1 −x/y −y
e
ye
0
−x/y
−y
e
e ∞
1 −x/y −y
e
ye
e−y
1
= e−x/y ,
y
E[X|Y = y] =
Z
∞
−∞
y>0
xfX|Y (x|y)dx =
Z
∞
0
x −x/y
e
dx = y
y
379
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Computing Expectations by Conditioning
Proposition 5.1:
The tower property
E(X) = E[E[X|Y ]]
In some cases, computing E(X) is difficult;
but computing E(X|Y ) may be much more convenient.
Instructor: Ping Li
380
Cornell University, BTRY 4080 / STSCI 4080
Proof:
Fall 2009
Instructor: Ping Li
Assume X and Y are continuous. (Book assumes discrete variables)
E[E[X|Y ]] =
Z
∞
E(X|Y )fY (y)dy
−∞
Z ∞ Z ∞
=
xfX|Y (x|y)dx fY (y)dy
−∞
−∞
Z ∞Z ∞
=
xfX|Y (x|y)fY (y)dxdy
−∞ −∞
Z ∞Z ∞
=
x fX|Y (x|y)fY (y) dxdy
−∞ −∞
Z ∞Z ∞
=
xf (x, y)dxdy
−∞ −∞
Z ∞ Z ∞
=
x
f (x, y)dy dx
−∞
−∞
Z ∞
=
xfX (x)dx = E(X)
−∞
381
Cornell University, BTRY 4080 / STSCI 4080
Example 5d:
Fall 2009
Instructor: Ping Li
Suppose N the number of people entering a store on a given
day is a random variable with mean 50. Suppose further that the amounts of
money spent by these customers are independent random variables having a
common mean $8. Assume the amount of money spent by a customer is also
independent of the total number of customers to enter the store.
Q: What is the expected amount of money spent in the store on a given day?
Solution:
N : the number of people entering the store. E(N ) = 50.
Xi : the amount spent by the ith customer. E(Xi ) = 8.
X : the total amount of money spent in the store on a give day. E(X) =??
382
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
We can not use the linearity directly:
E(X) = E
N
X
i=1
Xi
!
=
N
X
E(Xi ) =
i=1
N
X
8 ???
i=1
because N , the number of terms in the summation, is also random.
E(X) =E
N
X
i=1
"
=E E
=E
"
Xi
!
N
X
i=1
N
X
i=1
Xi |N
E(Xi )
!#
#
=E(N 8) = 8E(N )
=8 × 50 = $400
383
Cornell University, BTRY 4080 / STSCI 4080
Example 2e:
Fall 2009
Instructor: Ping Li
A game is begun by rolling an ordinary pair of dice.
• If the sum of the dice is 2, 3, or 12, the player loses.
• If the sum is 7 or 11, the player wins.
• If the sum is any other numbers i,the player continues to roll the dice until the
sum is either 7 or i.
– If it is 7, the player loses;
– If it is i, the player wins;
—————————–
Q : Compute E(R), where R= number of rolls in a game.
384
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Conditional on S , the initial sum,
E(R) =
12
X
E(R|S = i)P (S = i)
i=2
Let P (S
= i) = Pi .
1
,
36
4
,
P5 =
36
5
P8 =
,
36
2
P11 =
,
36
P2 =
2
,
36
5
P6 =
,
36
4
P9 =
,
36
1
P12 =
36
P3 =
3
,
36
6
P7 =
,
36
3
P10 =
,
36
P4 =
Equivalently,
Pi = P14−i
i−1
=
,
36
i = 2, 3, ..., 7
385
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Obviously,

 1,
if i = 2, 3, 7, 11, 12
E(R|S = i) =
 1 + E(Z), otherwise
Z = the number of rolls until the sum is either i or 7.
Note that i ∈
/ {2, 3, 7, 11, 12} at this point.
Z is a geometric random variable with success probability = Pi + P7 .
1
Thus E(Z) = P +P
.
i
7
Therefore,
E(R) =1 +
6
X
i=4
10
X Pi
Pi
+
= 3.376
Pi + P7 i=8 Pi + P7
386
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Computing Probabilities by Conditioning
If Y is discrete, then
P (E) =
X
P (E|Y = y)P (Y = y)
y
If Y is continuous, then
P (E) =
Z
∞
−∞
P (E|Y = y)fY (y)dy
387
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
The Best Prize Problem: Deal? No Deal!
We are presented with n distinct prizes in sequence. After being presented with a
prize we must decide whether to accept it or to reject it and consider the next
prize. The objective is to maximize the probability of obtaining the best prize.
Assuming that all n! orderings are equally likely.
Q: How well can we do? What would be a good strategy?
—————————–
Many related applications:
online algorithms, hiring decisions, dating, etc.
388
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
0 ≤ k ≤ n. Reject the first k prizes
and then accepts the first one that is better than all of those first k .
A plausible strategy:
Fix a number k ,
Let X = the location of the best prize, i.e., 1
≤ X ≤ n.
Let Pk (best) = the best prize is selected using this strategy.
Pk (best) =
n
X
Pk (best|X = i)P (X = i)
i=1
n
X
1
=
n
i=1
Pk (best|X = i)
Obviously
Pk (best|X = i) = 0,
if i
≤k
389
Cornell University, BTRY 4080 / STSCI 4080
Assume i
Fall 2009
Instructor: Ping Li
> k , then
Pk (best|X = i) =Pk (best of the first i − 1 is among the first k|X = i)
=
k
i−1
Therefore,
n
1X
Pk (best) =
Pk (best|X = i)
n i=1
n
k X
1
=
n
i−1
i=k+1
Next question: which k to use? We choose the k that maximizes Pk (best).
Exhaustive search is the best thing to do, which only involves a linear scan.
However, it is often useful to have an “analytical” (albeit approximate) expression.
390
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
A useful approximate formula
n
X
1
i=1
1 1
1
= 1 + + + ... + ≈ log n + γ
i
2 3
n
where Euler’s constant
γ = 0.57721566490153286060651209008240243104215933593992...
The approximation is asymptotically (as n
→ ∞) exact.
391
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
n
k X
1
Pk (best) =
n
i−1
i=k+1
k 1
1
1
=
+
+ ... +
n k k+1
n−1
!
n−1
k−1
k X1 X1
−
=
n i=1 i
i
i=1
k
(log(n − 1) − log(k − 1))
n
n−1
k
= log
n
k−1
≈
Instructor: Ping Li
392
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
k
n−1
k
n
≈ log
Pk (best) ≈ log
n
k−1
n
k
′
[Pk (best)] ≈
Also, when k
=
n
e,
1
n k1
n
n
log −
= 0 =⇒ log = 1 =⇒ k =
n
k
nk
k
e
Pk (best) ≈
n1
ne
≈ 0.36788, as n → ∞.
393
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Conditional Variance
Proposition 5.2:
V ar(X) = E[V ar(X|Y )] + V ar(E[X|Y ])
Proof: (from the opposite direction, compared to the book)
E[V ar(X|Y )] + V ar(E[X|Y ])
2
2
2
2
=E E(X |Y ) − E (X|Y ) + E(E (X|Y )) − E (E[X|Y ])
2
2
2
2
= E(E(X |Y )) − E (E[X|Y ]) + −E(E (X|Y )) + E(E (X|Y ))
=E(X 2 ) − E 2 (X)
=V ar(X)
394
Cornell University, BTRY 4080 / STSCI 4080
Fall 2009
Instructor: Ping Li
Let X1 , X2 , ..., XN be a sequence of independent and
Example 5o:
identically distributed random variables and let N be a non-negative
integer-valued random variable independent of Xi .
Q: Compute V ar
P
N
i=1
Xi .
Solution:
V ar
N
X
i=1
Xi
!
"
=E V ar
N
X
i=1
Xi |N
!#
+ V ar E
=E (V ar(X)N ) + V ar(N E(X))
=V ar(X)E(N ) + E 2 (X)V ar(N )
"
N
X
i=1
Xi |N
#!
395