Random Variable

Random Variable
Definition
A random variable X on a sample space Ω is a real-valued
function on Ω; that is, X : Ω → R. A discrete random variable is
a random variable that takes on only a finite or countably infinite
number of values.
Discrete random variable X and real value a: the event “X = a”
represents the set {s ∈ Ω : X (s) = a}.
Pr(X = a) =
X
s∈Ω:X (s)=a
Pr(s)
Independence
Definition
Two random variables X and Y are independent if and only if
Pr((X = x) ∩ (Y = y )) = Pr(X = x) · Pr(Y = y )
for all values x and y . Similarly, random variables X1 , X2 , . . . Xk
are mutually independent if and only if for any subset I ⊆ [1, k]
and any values xi ,i ∈ I ,
!
\
Y
Pr
Xi = xi
=
Pr(Xi = xi ).
i∈I
i∈I
Expectation
Definition
The expectation of a discrete random variable X , denoted by
E[X ], is given by
X
E[X ] =
i Pr(X = i),
i
where the summation is
Pover all values in the range of X . The
expectation is finite if i |i| Pr(X = i) converges; otherwise, the
expectation is unbounded.
The expectation (or mean or average) is a weighted sum over all
possible values of the random variable.
Median
Definition
The median of a random variable X is a value m such
Pr (X < m) ≤ 1/2
and
Pr (X > m) < 1/2.
Linearity of Expectation
Theorem
For any two random variables X and Y
E [X + Y ] = E [X ] + E [Y ].
Lemma
For any constant c and discrete random variable X ,
E[cX ] = cE[X ].
Bernoulli Random Variable
A Bernoulli or an indicator random variable:
1 if the experiment succeeds,
Y =
0 otherwise.
Let Pr(Y = 1) = p.
E[Y ] = p · 1 + (1 − p) · 0 = p = Pr(Y = 1).
Binomial Random Variable
Definition
A binomial random variable X with parameters n and p, denoted
by B(n, p), is defined by the following probability distribution on
j = 0, 1, 2, . . . , n:
n j
Pr(X = j) =
p (1 − p)n−j .
j
Expectation of a Binomial Random Variable
n X
n j
E[X ] =
j
p (1 − p)n−j
j
=
j=0
n
X
j=0
=
n
X
j=1
= np
j
n!
p j (1 − p)n−j
j!(n − j)!
n!
p j (1 − p)n−j
(j − 1)!(n − j)!
n
X
j=1
n−1
X
(n − 1)!
p j−1 (1 − p)(n−1)−(j−1)
(j − 1)!((n − 1) − (j − 1))!
(n − 1)!
p k (1 − p)(n−1)−k
k!((n − 1) − k)!
k=0
n−1 X
n−1 k
= np
p (1 − p)(n−1)−k = np.
k
= np
k=0
Binomial Random Variable
X represents the number of successes in n independent
experiments, each with success probability p.
Let
1 if the i-th experiment succeeds,
Xi =
0 otherwise.
so Xi is the indicator random variable (for the ith experiment).
Using linearity of expectations
" n
#
n
X
X
E[X ] = E
Xi =
E[Xi ] = np.
i=1
i=1
Example: Finding the k-Smallest Element
Procedure Order(S, k);
Input: A set S, an integer k ≤ |S| = n.
Output: The k smallest element in the set S.
Example: Finding the k-Smallest Element
Procedure Order(S, k);
Input: A set S, an integer k ≤ |S| = n.
Output: The k smallest element in the set S.
1
If |S| = k = 1 return S.
2
Choose a random element y uniformly from S.
3
Compare all elements of S to y . Let S1 = {x ∈ S | x ≤ y }
and S2 = {x ∈ S | x > y }.
4
If k ≤ |S1 | return Order(S1 , k) else return Order(S2 , k − |S1 |).
Theorem
The algorithm returns the k-smallest element in S performing
O(n) comparisons in expectation.
Proof
• We say that a call to Order(S, k) was successful if the random
•
•
•
•
•
element was in the middle 1/3 of the set S. A call is
successful with probability 1/3.
After the i-th successful call the size of the set S is bounded
by n(2/3)i . Then there are at most log3/2 n successful calls.
Let X be the total number of comparisons. Let Xi be the
number of comparisons between the i-th successful call
Plog n
(included) and the i + 1-th (excluded): E[X ] ≤ i=03/2 E[Xi ].
Let Yi (i > 0) be the number of calls between the i-th
successful call and the i + 1-th successful call. The expected
number of calls from one successful call to the next one
(included) is 3: E[Yi ] = 3 for all i.
Therefore E[Xi ] ≤ n(2/3)i E[Yi ] = 3n(2/3)i .
Expected number of comparisons:
log3/2 n
j
X
2
≤ 9n.
E[X ] ≤
3n
3
j=0
Conditional expectation
Definition
Let Y and Z be random variables. The conditional expectation of
Y given that Z = z is defined as
E[Y |Z = z] =
X
yPr (Y = y |Z = z),
y
where summation is over all y in the range of Y .
We can also consider the expectation of Y conditioned on the
event E :
E[Y |E ] =
X
y
yPr (Y = y |E )
Conditional expectation
Lemma
For any random variables X and Y
E[X ] =
X
Pr (Y = y )E[X |Y = y ],
y
where the sum is over the range of Y and all of the expectations
exist.
Conditional expectation
Linearity of expectations also holds for conditional expectations:
Lemma
For any finite collection of discrete random variables X1 , X2 , . . . , Xn
with finite expectation and for any random variable Y ,
"
E
n
X
i=1
#
Xi |Y = y =
n
X
i=1
E[Xi |Y = y ].
Conditional expectation
Note that for given random variables Y , Z the expression E[Y |Z ]
is itself a random variable f (Z ) that takes the value E[Y |Z = z]
when Z = z.
Theorem
For given random variables Y , Z we have E[Y ] = E[E[Y |Z ]]
Proof.
By definition,
E[E[Y |Z ]] =
X
z
E[Y |Z = z]Pr (Z = z) = E[Y ]
Application of conditional expectation
Given a process S that each time when when it is started will
recursively spawn new copies of the process S, where the number
of new copies is a binomial random variable with parameters n and
p. We also assume that these random variables are independent
for each call to S.
Now assume that we call the process S from a program once.
What is the expected number of copies of S generated by this call?
Application of conditional expectation
To analyse we introduce generations:
• The initial process S is (the only element) in generation 0
• A (copy of) process S is in generation i if it was spawned by
another process S in generation i − 1.
• Denote by Yi the number of processes in generation i.
• Then Y0 = 1
Application of conditional expectation
As the number of processes spawned by the first process S is
binomially distributed with parameters n, p, we have E[Y1 ] = np
Let us assume that there are yi−1 copies of process S in generation
i − 1 and let Zk denote the number of processes spawned by the
kth process which was itself spawned in generation i − 1 for
1 ≤ k ≤ yi−1
Each Zk is binomially distributed with parameters n, p and hence
has expectation E[Zk ] = np
Application of conditional expectation
E[Yi |Yi−1 = yi−1 ] = E
"yi−1
X
#
Zk |Yi−1 = yi−1
k=1
=
=
yi−1
X
E[Zk |Yi−1 = yi−1 ]
k=1
yi−1 n
X
X
jPr (Zk = j|Yi−1 = yi−1 )
k=1 j=0
=
yi−1 n
X
X
jPr (Zk = j)
k=1 j=0
yi−1
=
=
X
E[Zk ]
k=1
yi−1 np
Application of conditional expectation
Now we can calculate the expected size of the ith generation
inductively. We have
E[Yi ] = E[E[Yi |Yi−1 ]] = E[Yi−1 np] = npE[Yi−1 ]
so by induction and Y0 = 1 we get E[Yi ] = (np)i
Application of conditional expectation
The expected number of copies of S generated by the first call is
given by


X
X
X
E
Yi  =
E[Yi ] =
(np)i
i≥0
i≥0
i≥0
This is unbounded if np ≥ 1 and equals 1/(1 − np) when np < 1
Hence the expected number of copies spawned by the first call is
finite if and only if the expected number of copies spawned by each
copy is less the 1.
The Geometric Distribution
Definition
A geometric random variable X with parameter p is given by the
following probability distribution on n = 1, 2, . . ..
Pr(X = n) = (1 − p)n−1 p.
Example: repeatedly draw independent Bernoulli random variables
with parameter p > 0 until we get a 1. Let X be number of trials
up to and including the first 1. Then X is a geometric random
variable with parameter p.
Memoryless Property
Lemma
For a geometric random variable with parameter p and n > 0,
Pr(X = n + k | X > k) = Pr(X = n).
Proof.
Pr(X = n + k | X > k) =
Pr((X = n + k) ∩ (X > k))
Pr(X > k)
n+k−1
(1 − p)
p
P∞
i
i=k (1 − p) p
=
Pr(X = n + k)
Pr(X > k)
=
=
(1 − p)n+k−1 p
(1 − p)k
= (1 − p)n−1 p
= Pr(X = n).
Memoryless Property – Intuition
I have a coin and flip it a number of times without telling you
neither the outcomes nor the number of times I flip it.
I give the coin to you, asking what is the probability that the
first head will come out after n times you have flipped it.
Intuitively, this probability should not depend on the number of
times I flip the coin and/or on the corresponding outcomes.
The memoryless property tells us it is indeed so.
Lemma
Let X be a discrete random variable that takes on only
non-negative integer values. Then
E[X ] =
∞
X
Pr(X ≥ i).
i=1
Proof.
∞
X
Pr(X ≥ i) =
i=1
∞ X
∞
X
Pr(X = j)
i=1 j=i
=
=
j
∞ X
X
j=1 i=1
∞
X
Pr(X = j)
j Pr(X = j) = E[X ].
j=1
For a geometric random variable X with parameter p,
Pr(X ≥ i) =
∞
X
(1 − p)n−1 p
n=i
1 1 − (1 − p)i−1
= p( −
) = (1 − p)i−1
p
p
E[X ] =
=
∞
X
Pr(X ≥ i)
i=1
∞
X
(1 − p)i−1
i=1
=
=
1
1 − (1 − p)
1
p
Example: Coupon Collector’s Problem
Suppose that each box of cereal contains a random coupon from a
set of n different coupons.
How many boxes of cereal do you need to buy before you obtain at
least one of every type of coupon?
Let X be the number of boxes bought until at least one of every
type of coupon is obtained.
Let Xi be the number of boxes bought while you had exactly i − 1
different coupons.
X =
n
X
Xi
i=1
Xi is a geometric random variable with parameter
pi
= 1−
i −1
.
n
E[Xi ] =
1
n
.
=
pi
n−i +1
"
E[X ] = E
n
X
#
Xi
i=1
=
n
X
E[Xi ]
i=1
=
n
X
n
n−i +1
i=1
n
X
= n
i=1
1
= n ln n + Θ(n).
i
Randomized Algorithms:
• Analysis is true for any input.
• The sample space is the space of random choices made by the
algorithm.
• Repeated runs are independent.
Probabilistic Analysis:
• The sample space is the space of all possible inputs.
• If the algorithm is deterministic repeated runs give the same
output.
Algorithm classification
A Monte Carlo Algorithm is a randomized algorithm that may
produce an incorrect solution.
For decision problems: A one-side error Monte Carlo algorithm
errs only one one possible output, otherwise it is a two-side error
algorithm.
A Las Vegas algorithm is a randomized algorithm that always
produces the correct output.
In both types of algorithms the run-time is a random variable.
Example of probabilistic analysis: Quicksort
Procedure Quicksort(S);
Input: List S = {x1 , . . . , xn } of n distinct elements (from totally
ordered universe).
Output: The elements of S in sorted order.
1
If |S| ≤ 1 return S.
2
Choose the first element of S as pivot, called x.
3
Compare all elements of S to y . Let S1 = {y ∈ S |y < x} and
S2 = {y ∈ S |y > x}.
4
return Quicksort(S1 ),x,Quicksort(S2 ).
(S1 and S2 are lists. If comparisons performed “left to right”, the
elements in S1 and S2 have the same order as in S.)
Analysis
Worst-Case: if the input S satisfies x1 > x2 > · · · > xn , Quicksort
performs Θ(n2 ) comparison.
Analysis
Worst-Case: if the input S satisfies x1 > x2 > · · · > xn ,
Quicksort(S) performs Θ(n2 ) comparison.
Probabilistic Analysis: what happens if S is chosen uniformly at
random from all possible permutations of the values?
Analysis
Theorem
If S is chosen uniformly at random from all possible permutations
of the values, the expected number of comparisons made by
Quicksort(S) is 2n ln n + O(n).
Proof
• Let y1 , y2 , . . . , yn be the same as x1 , x2 , . . . , xn but sorted
(increasing order)
1 yi and yj are compared
• For i < j: Xij =
0 otherwise.
• Let X be total number of comparisons:
X =
n X
n
X
Xij
i=1 j=i+1
• By linearity of expectation:
E[X ] =
n X
n
X
E[Xij ]
i=1 j=i+1
• Xij is indicator random variable: E [Xij ] = Pr (Xij = 1)
Proof
• Pr (Xij = 1) =?
• Let Y ij = {yi , yi+1 , . . . , yj−1 , yj }. yi and yj are compared if
and only if either yi or yj is the first pivot selected from Y ij .
• Since elements in S1 and S2 are in the same order as in S, the
first pivot selected from Y ij is the first element of Y ij in input
list S.
• Since S is chosen at random from all possible permutations of
the values, every element in Y ij is equally likely be first in S:
probability yi is first in Y ij is 1/(j − i + 1); same for yj .
•
Pr (Xij = 1) =
2
j −i +1
Proof
E[X ] =
n−1 X
n
X
i=1 j=i+1
=
=
n−1 n−i+1
X
X
i=1 k=2
n n+1−k
X
X
k=2
i=1
2
j −i +1
2
k
2
k
n
X
2
=
(n + 1 − k)
k
k=2
n
X
2
= (n + 1)
− 2(n − 1)
k
k=2
n
X
= (2n + 2)
k=1
1
− 4n = 2n ln +Θ(n).
k