Final Exam: Probability Theory (ANSWERS)

Final Exam: Probability Theory (ANSWERS)
IST Austria
February 2, 2015 (10:00-12:30)
Instructions:
(i) This is a closed book exam.
(ii) You have to justify your answers. Unjustified results, even if correct, will not be accepted.
(iii) The exam is an individual activity, collaboration will not be tolerated.
(iv) You can solve the problems in any order. You do not have to solve all problems to receive the highest grade.
Partial credits can be given.
(v) The total working time is 150 minutes.
(vi) Write legibly. Clearly cross out the parts that you deem wrong.
Question 1
A football team consists of 20 offensive and 20 defensive players. The players are to be paired in groups of 2 for
the purpose of determining roommates. If the pairing is done at random:
(a) What is the probability that there are no offensive-defensive roommate pairs?
(b) What is the probability that there are 10 offensive-defensive roommate pairs?
The answer should be expressed in terms of factorials or binomial coefficients.
Solution 1
(a) There are 40 · 39/2 ways to select the first pair, 38 · 37/2 ways to select the second pair, etc. This way we see
that there are 40!/220 ordered pairs. The number of ways to split the players into unordered pairs is thus:
40!
.
220 · 20!
There will not be offensive-defensive pairs if and only if these two subsets of players are paired among themselves. It follows that there are
20! 2
,
210 10!
such divisions. Hence, the probability of no offensive-defensive roommate pairs, call it P0 , is given by
P0 =
2
20!
210 10!
40!
220 20!
=
(20!)3
.
(10!)2 40!
(b) To determine P2k , the probability that there are 2k offensive-defensive pairs, we first note that there are
20
2k
2
,
ways of selecting 2k offensive players and 2k defensive players who are to be in the o-d pairs. These 4k
players can then be paired up into (2k)! possible o-d pairs (This is so because the first offensive player can
1
be paired with any of the 2k defensive players, the second offensive player with any of the remaining 2k − 1
defensive players, and so on.). As the remaining 20 − 2k offensive (and defensive) players must be paired
among themselves, it follows that there are
20
2k
2
(2k)!
(20 − 2k)!
10−k
2
(10 − k)!
2
,
divisions which lead to 2k o-d pairs. Hence
20 2
2k (2k)!
P2k =
(20−2k)!
210−k (10−k)!
40!
220 20!
2
.
Question 2
Suppose there are two manufacturers A and B of a smartphone battery. The factory A manufactures the fraction
p ∈ (0, 1) of all the batteries used in the smartphones while the factory B manufactures the rest. Assume that the
lifetime of each battery is a continuous random variable Tλ parametrised by λ > 0, with a density function:
(
2
(1 − λt )
when 0 ≤ t ≤ λ ;
fλ (t) := λ
0
otherwise.
The parameter λ depends on the manufacturer of the battery. If the battery is manufactured by A then λ = 1
while if it is manufactured by B then λ = 2.
Suppose you buy a new smartphone that is randomly picked among all the produced phones.
(a) Compute the expectation and variance of the lifetime of the battery of your new phone?
(b) Suppose the battery functions properly after time t > 0. What is the probability that your phone has a battery
manufactured by A?
Solution 2
(a) The lifetime of the battery in a new phone is a random variable
T := TK ,
where K is a random variable independent of Tλ ’s with P{K = 1} = p and P{K = 2} = 1 − p. Since T1 , T2
and K are independent,
E(T m ) = p E(T1m ) + (1 − p) E(T2m ) , m = 1, 2 .
Now
E(Tλm ) =
2
λ
Z
0
λ
2λm
t
tm 1 −
dt =
λ
(m + 2)(m + 1)
Using this in the previous formula yields:
E(T ) = p · 31 + (1 − p) · 32
2−p
3
2
1
2
·
2
E(T 2 ) = p · 6 + (1 − p) · 12 = 4 −6 3p .
=
The variance is hence
σ 2 := E(T 2 ) − E(T )2 =
3(4 − 3p) − 2(2 − p)2
4 − p − 2p2
=
18
18
(b) As T1 ≤ 1 it remains to consider t ∈ [0, 1]. Since K is independent of Tλ ’s,
Bayes’ formula yields:
P(Tλ ≥ t|K = k) = P(Tλ ≥ t),
P "battery manufactured by A"|"phone has lasted time t"
P(TK ≥ t, K = 1)
= P(K = 1|T ≥ t) =
P(T ≥ t)
p P(T1 ≥ t)
=
.
p P(T1 ≥ t) + (1 − p)P(T2 ≥ t)
2
(1)
Now it remains to compute the probabilities. Clearly
P(Tλ ≥ t) = 0 if t ≥ λ. Otherwise,
Z
2 λ
s
2
λ2 − t2
1−
ds =
(λ − t) −
λ t
λ
λ
2λ
λ + t
2(λ − t) 1−
.
=
λ
2λ
P(Tλ ≥ t) =
In particular,
P(T1 ≥ t) = (1 − t)2 and P(T2 ≥ t) = (1 − t/2)2 . Plugging these numbers in (1) yields
P "battery manufactured by A"|"phone has lasted time t"
p (1−t)2
p (1−t)2 +(1−p) (1−t/2)2
when t ∈ [0, 1] ;
0
when t ∈ [1, 2] .
(
=
Question 3
Individuals of age 50 have a probability 10−5 of having a certain type of cancer. A test is developed with a false
positive rate of 10−3 (i.e., it diagnoses cancer in a healthy individual once in 1000 times). Conversely, it has a false
negative rate of 10% (i.e., it diagnoses 10% of individuals with cancer as healthy). If the cancer is detected early,
there is a 90% chance of survival after treatment, whereas if untreated it is always fatal. However, the treatment
itself is risky, and kills 1% of healthy patients.
(a) What fraction of positive diagnoses are correct?
(b) Would screening a large population using this test save lives?
(c) The incidence of this cancer increases with age; the test might be more effective when applied to an older age
group. Assuming that the other parameters stay the same, at what incidence would the test be worthwhile?
Solution 3
(a) Using Bayes’ formula we obtain:
P "cancer"|"diagonosis"
=
P "cancer" ∩ "diagnosis"
P "diagnosis"
10−5 × 0.9
10−5 × 0.9 + (1 − 10−5 ) × 10−3
≈ 0.0089
=
(b) The screening would not safe lives: The expected number of people with cancer saved, per individual screened,
would be 10−5 × 0.9 × 0.9. The number of people who would die because they were treated after mis-diagnosis
is (1 − 10−5 ) × 10−3 × 10−2 ≈ 10−5 , which is larger.
(c) Substituting an incidence rate I we have 0.9 × 0.9 × I lives saved, and (1 − I) × 10−2 × 10−2 dying through
mis-diagnosis. Thus, screening will save lives overall if I > 1.23 × 10−5 . However, this does not account for
the costs (financial and human) of the test and treatment.
Question 4
A bacterium divides in two at a rate µ, and dies at rate λ, per unit time. All bacteria are equivalent: that is, they
do not age. Initially, there is a single bacterium.
(a) What is the expected number at time t?
(b) What is the probability that a single bacterium will found a growing population?
(c) Suppose µ > λ. After a long time, what is the expected number of bacteria, conditional on survival of the
population?
HINT: Consider what happens in a small time interval δt, so small that only at most one event can happen, namely
division with probability µδt, or death with probability λδt.
t/δt
HINT: You might find useful that (1 + a δt)
→ eat as δt → 0.
3
Solution 4
(a) Dividing time into small time units δt, we end up with a branching process where each node has two children
with probability µ δt, one child with probability 1−(µ+λ) δt, and no child with probability λ δt. The difference
to the branching process in the lecture is that the population size at time kδt is the number of nodes in the
k-th layer of the tree. The expected number of children of any given node is 2 µ δt + 1 (1 − µ δt − λ δt) + 0 λ δt =
1 + (µ − λ)δt. To obtain the expected number at time t, we let tδt time units pass and obtain the expectation
(1 + (µ − λ) δt)t/δt → e(µ−λ)t ,
as δt → 0 .
(b) Let the probability of ultimate extinction, starting from a single cell, be Q. In time δt, there is a probability
λ δt of death, in which case extinction is certain, and a probability µ δt of division, in which case the chance
that both offspring leave no descendants is Q2 . Therefore, conditioning on the first time step we obtain
Q = λ δt + (1 − (λ + µ) δt ) Q + µ δt Q2 ,
and so
0 = λ − (λ + µ) Q + Q2 .
This is solved by Q = 1 and Q = λ/µ. Therefore, the probability of ultimate survival is 0 if µ < λ, (µ − λ)/µ,
otherwise.
(c) As the time t approaches infinite the fate of the population becomes certain. For very large times t the
expected size of the population is
µ−λ
N∗ (t) ,
µ
where N∗ (t) is the expected size of the population at time t, conditional on survival. Combining this with the
part (a) we get
µ
N∗ (t) =
e (µ−λ)t .
µ−λ
Question 5
Consider an Erdös-Rényi random graph with n ≥ 2 vertices that we label by the integers 1, . . . , n. Suppose any
given edge is drawn in the graph with probability p ∈ (0, 1). We say that there exists a path of length ` connecting
two given vertices v 6= w if there exist vertices v1 , . . . , v`−1 such that v ∼ v1 ∼ v2 ∼ · · · ∼ v`−1 ∼ w, where v ∼ w
means that there is an edge between v and w. For each 1 ≤ ` ≤ n define the random variable
X` := "number of paths of length ` connecting the vertices 1 and 2" .
(a) Compute the expectation of X` for 1 ≤ ` ≤ n.
(b) What is the probability that X2 = 0?
(c) What is the probability that X3 = 0?
√
(d) Suppose n = 198 and p = 1/ 2. Estimate
HINT: You may find it useful that if F (x) :=
P(X2 < 84) using the principle of central limit theorem.
√1
2π
Rx
−∞
e−t
2
/2
dt then F (2) ≈ 0.977.
Solution 5
(a) By definitions we have
X` :=
X
1{1 ∼ v1 , v1 ∼ v2 , . . . , v`−1 ∼ 2} ,
i1 ,...,i`−1
where the sum is over ordered but distinct vertices vi taken from the set {3, . . . , n}. Taking expectation yields
X
E(X` ) =
P{1 ∼ v1 , v1 ∼ v2 , . . . , v`−1 ∼ 2}
i1 ,...,i`−1
=
X
P{1 ∼ v1 }P{v1 ∼ v2 } . . . P{v`−1 ∼ 2}
i1 ,...,i`−1
= (n − 2)(n − 3) · · · (n − `) p ` .
4
(b) Let v v 0 mean that the there is no edge connecting the vertices v and v 0 . Using independence of the edges
we compute:
P
X2 = 0 =
P
\
n
1 v or v 2
v=3
=
n
Y
1 − P{1 ∼ v}P{v ∼ 2}
v=3
= (1 − p2 )n−2 .
(c) Analogously,
P X3 = 0 =
=
\
n
n
\
P
1 v or v w or w 2
v=3 w=3 :
w6=v
n Y
n
Y
1 − P{1 ∼ v}P{v ∼ w}P{w ∼ 2}
v=3 w=3 :
w6=v
= (1 − p3 )(n−2)(n−3) .
(d) The random variables Zv := 1{1 ∼ v ∼ 2}, appearing in the sum
X2 =
n
X
Zv ,
v=3
are i.i.d. Hence the central limit theorem says that their sum X2 is close to a Gaussian random variable G
with the same expectation and variance as X2 provided n is large enough (that n is large enough is assumed
here). To this end we compute:
E(X22 ) =
=
n
X
E 1{1 ∼ v ∼ 2}1{1 ∼ w ∼ 2}
v,w=3
n
X
P{1 ∼ v}P{v ∼ 2} +
v=3
n
X
P{1 ∼ v}P{v ∼ 2}P{1 ∼ w}P{w ∼ 2}
v,w=3 :
v6=w
= (n − 2) p2 + (n − 2)(n − 3) p4 .
The variance of X2 is hence
σ 2 := E(X22 ) − E(X2 )2
2
= (n − 2) p2 + (n − 2)(n − 3) p4 − (n − 2) p2
= (n − 2)(1 − p2 ) p2 .
√
Plugging in n = 198 and p = 1/ 2 yields nice round numbers:
E(X2 ) = (198 − 2)
1 2
√
= 98
2
σ =
p
(198 − 2)(1 − 1/2)(1/2) = 7 .
Now we use use the central limit theorem to approximate X2 by a standard Gaussian random variable G:
P X2 < 84 = P X2 −7 98 < 84 −7 98 ≈ P{G < −2} = 1 − P{G ≤ 2} ≈ 0.022 .
Note that here the symmetry of the normal distribution has been used.
5
Question 6
A sample of k genes is taken from a large population, of constant effective size Ne genes. There is a constant rate
of mutation, µ, and every mutation is distinct from all the others (the "infinite sites" model).
(a) What is the expected number of mutations that will be seen in the sample, for k = 2, 3, 10?
(b) What is the variance of the number of mutations, for k = 2, 3, 10?
(c) How does the standard deviation, divided by the mean number of mutations, change as k becomes large?
HINT: You may find the following formulas useful:
k
X
1
j=1
j
≈ γ + ln k ,
∞
X
π2
1
=
.
j2
6
j=1
and
Here γ ≈ 0.57 a is numerical constant.
Solution 6
(a) With j lineages present, coalescence in the next generation happens with probability
j (j − 1)
.
2Ne
The coalescence time of j lineages down to j − 1 is thus given by an exponential distribution with mean
2Ne
.
j (j − 1)
This time, during which there are j lineages, contributes
j
2Ne
,
j (j − 1)
to the expected length of the genealogy. Summing from k down to 2 gives the expected length as
k
X
2Ne
.
j−1
j=2
The expected number of mutations is obtained by multiplying with µ. For k = 2, 3, 10 this is 2 Ne µ, 2 Ne µ×1.5,
2Ne µ × 2.83; for large k, we have
2Ne µ (γ + ln k).
(b) Similarly, the variance of length contributed by the time during which there are j lineages is the variance of
the above exponential distribution,
2N 2
e
,
j (j − 1)
and we obtain for the number of mutations
k X
2Ne µ 2
j=2
j−1
.
For k = 2, 3, 10, this is (2 Ne µ)2 , (2 Ne µ)2 × 1.25, (2 Ne µ)2 × 1.54; for large k, this converges to
(2 Ne µ)2
π2
.
6
Note: The intervals between successive coalescence events are independent and hence we can just sum the
variance.
(c) The standard deviation divided by the mean is 1, 0.745, 0.439, for k = 2, 3, 10, and tends to
√
π
,
6 (γ + ln k)
for large k.
6
Question 7 (Challenge)
Consider n arbitrary vectors v1 , . . . , vn ∈ Rn of unit length
v
u n
uX
kvi k = t
vi2 = 1 ,
i = 1, . . . , n .
i=1
Show that there exists n numbers i ∈ {−1, +1} such that
√
1 v1 + 2 v2 + · · · + n vn ≤ n .
(2)
HINT: Randomise the problem, i.e., consider random coefficients i , and compute the expectation of an appropriate
quantity.
Solution 7
Let i ’s to be i.i.d. random variables with
P(i = ±1) = 1/2, and set
X := k1 v1 + 2 v2 + · · · + n vn k2 .
Recall that kvk2 = v · v. Hence using the linearity of v · w w.r.t the arguments v and w we get
X=
n
X
i j v i · v j .
i,j=1
Since i ’s are independent
E(i j ) = δij , taking expectation yields:
E(X) =
n
X
vi · vj E(i j ) =
i,j=1
n
X
kvi k2 = n .
i=1
Since X ≥ 0 and E(X) is just a convex combination of the term on the left hand side of (2) evaluated over all
possible choice of the tuples (1 , . . . , n ) ∈ {−1, +1}n there must exists at least one choice of (∗1 , . . . , ∗n ) such that
X ≤ n (Actually there must also be another one such that X ≥ n holds). Taking square roots completes the
argument.
7