EE5319R: Problem Set 1 Assigned: 10/08/16, Due: 17

EE5319R: Problem Set 1
Assigned: 10/08/16, Due: 17/08/16
1. Probability Review: Let V and W be discrete random variables defined on some probability space
with a joint pmf pV,W (v, w)
(a) Prove that E[V + W ] = E[V ] + E[W ]. Do not assume independence.
Solution:
X
X
X
(v + w)pV,W (v, w) =
vpV,W (v, w) +
wpV,W (v, w)
E[V + W ] =
v,w
=
X
v,w
vpV (v) +
v
X
w
wpW (w) = E[V ] + E[W ]
w
(b) Prove that if V and W are independent rv’s, then E[V W ] = E[V ]E[W ].
Solution:
X
X
E[V W ] =
vwpV,W (v, w) =
vwpV (v)pW (w)
v,w
=
X
v,w
vpV (v)
v
X
wpW (w) = E[V ]E[W ]
w
2
are their respective variances. Find the variance
(c) Let V and W be independent rv’s and σV2 and σW
of Z = V + W .
Solution:
2
σZ
= E[Z 2 ] − E[Z]2 = E[(V + W )2 ] − E[V + W ]2
= E[V 2 + W 2 + 2V W ] − (E[V ]2 + 2E[V ]E[W ] + E[W ]2 )
2
= E[V 2 ] − E[V ]2 + E[W 2 ] − E[W ]2 = σV2 + σW
,
where the second to last equality follows from the fact that E[V W ] = E[V ]E[W ].
2. Probability Review:
(a) For a nonnegative integer-valued rv N , show that E[N ] =
Solution:
E[N ] =
∞
X
P
n>0
Pr(N ≥ n).
npN (n) = 0 · pN (0) + 1 · pN (1) + 2 · pN (2) + 3 · pN (3) + . . .
n=0
= [pN (1) + pN (2) + pN (3) + . . .]
+ [pN (2) + pN (3) + pN (4) + . . .]
+ [pN (3) + pN (4) + pN (5) + . . .] + . . .
= Pr(N ≥ 1) + Pr(N ≥ 2) + Pr(N ≥ 3) + . . . =
X
n>0
1
Pr(N ≥ n)
(b) Derive the Markov inequality, which says that for any a > 0 and nonnegative X, Pr(X ≥ a) ≤
E[X]/a. Hint: Consider the inequality 1{x ≥ a} ≤ x/a for any x ≥ 0.
Solution: Using the hint, we know that 1{x ≥ a} ≤ x/a for any x ≥ 0 and so for any non-negative
random variable X, we have
1{X ≥ a} ≤ X/a
for any ≥ 0. Now take expectation on both sides. Note that E[1{X ≥ a}] = Pr(X ≥ a). This
completes the proof.
(c) Derive the Chebyshev inequality, which says that Pr(|Y − E[Y ]| ≥ b) ≤ σY2 /b2 .
Solution: Take X = |Y −E[Y ]| in the Markov inequality. Note that σY2 = E[X 2 ] = E[|Y −E[Y ]|2 ].
(d) Derive the one-sided Chebyshev inequality, which says that Pr(Y ≥ a) ≤ σY2 /(σY2 +a2 ) if E[Y ] = 0
and
p a > 0. Here you may need to use the Cauchy-Schwarz inequality which says that E[AB] ≤
E[A2 ]E[B 2 ].
Solution: Since Y has zero mean, we have
a = E[a − Y ]
Consider the expectation above: We have
X
X
X
X
E[a − Y ] =
PY (y)(a − y) =
PY (y)(a − y) +
PY (y)(a − y) ≤
PY (y)(a − y)
y
y:y<a
y:y≥a
y:y<a
because the second sum is non-positive. This can be written as
X
a≤
PY (y)(a − y)1{y < a}
y
where 1{statement} returns 1 if the statement is true and 0 otherwise. Thus, we have
a ≤ E[(a − Y )1{Y < a}]
Now square both sides,
a2 ≤ (E[(a − Y )1{Y < a}])2
Apply Cauchy-Schwarz inequality to the expectation,
a2 ≤ E[(a − Y )2 ]E[1{Y < a}2 ] = E[(a − Y )2 ]E[1{Y < a}] = (a2 + E[Y 2 ]) Pr(Y < a)
Since E[Y 2 ] = σY2 , rearrangement of the above inequality yields the one-sided Chebyshev inequality as desired.
(e) (Optional) Derive the reverse Markov inequality: Let X be a random variable such that Pr(X ≤
a) = 1 for some constant a. Then for d < EX, we have
Pr(X > d) ≥
EX − d
.
a−d
Solution: Apply Markov’s inequality to the non-negative random variable X̃ := a − X. Then
one has
E[X̃]
E[a − X]
Pr(X ≤ d) = Pr(a − X̃ ≤ d) = Pr(X̃ ≥ a − d) ≤
=
a−d
a−d
Hence,
E[a − X]
E[X] − d
Pr(X < d) ≥ 1− =
=
.
a−d
a−d
2
(f) Weak Law of Large Numbers: Let X1 , . . . , Xn be a sequence of i.i.d. rvs with zero-mean and
finite variance σ 2 . Show that for any > 0, Pr( n1 |X1 + . . . + Xn | > ) ≤ σ 2 /(n2 ). If σ is finite,
what can we say about the random variable n1 (X1 + . . . + Xn ) as n becomes large?
Solution: Let Y = n1 |X1 + . . . + Xn |. The mean of Y is zero and the variance is n1 σ 2 . Apply
Chebyshev’s inequality.
(g) Chernoff Bound: Let X1 , . . . , Xn be a sequence of i.i.d. rvs with zero-mean and moment generating function MX (s) := E[esX ]. Show that for any > 0, Pr( n1 (X1 + . . . + Xn ) > ) ≤
exp[−n mins≥0 (s − log MX (s))]. (Hint: Note that the event { n1 (X1 + . . . + Xn ) > } occurs if
and only if {exp(s(X1 + . . . + Xn )) > exp(ns)} occurs for any fixed s ≥ 0. Now apply Markov’s
inequality.)
Solution: Using the hint, we know that
E[exp(s(X1 + . . . + Xn ))]
1
(X1 + . . . + Xn ) > = Pr (exp(s(X1 + . . . + Xn )) > exp(ns)) ≤
Pr
n
exp(ns)
where the inequality is from Markov’s inequality. Since Xi ’s are independent,
Qn
E[exp(sXi )]
1
(X1 + . . . + Xn ) > = i=1
= exp[−n(s − log MX (s))]
Pr
n
exp(ns)
where the final equality is due to the fact that the Xi are identically distributed. Since s ≥ 0 is
arbitrary, we can maximize the exponent over this parameter.
The moral of the story here is that Pr( n1 (X1 + . . . + Xn ) > ) decays to zero exponentially fast
with exponent IX () := mins≥0 s − log MX (s). It turns out that this rate of decay is optimal
(i.e., the best we can hope for). This forms the basis of large deviation theorems.
(Optional) Show in fact that the above bound is exponentially tight, i.e., that
1
1
lim inf log Pr
(X1 + . . . + Xn ) > ≥ − max(s − log MX (s))
n→∞ n
s≥0
n
Solution: Suppose that the pdf of Xi is p(x). Let MX (θ) = EeθX be the moment generating
function of X. For simplicity too, define the rate function
I() := max {θ − log MX (θ)} .
θ≥0
Then for any δ > 0, one has
! Z
( n
) n
!
n
Y
1X
1X
Pr
Xi > =
1
xi > p(xi ) dxi
n i=1
n i=1
Rn
i=1
(
) n
!
Z
n
Y
1X
≥
1 +δ ≥
xi > p(xi ) dxi
n i=1
Rn
i=1
(
) n
!
n
n ∗ Z
Y eθ∗ xi p(xi )
MX
(θ )
1X
≥ nθ∗ (+δ)
1 +δ ≥
xi > dxi
n i=1
MX (θ∗ )
e
Rn
i=1
(
) n
!
n
n ∗ Z
Y
MX
(θ )
1X
= nθ∗ (+δ)
1 +δ ≥
xi > q(xi ) dxi
n i=1
e
Rn
i=1
where
∗
eθ y p(y)
q(y) =
MX (θ∗ )
3
Note that q(y) integrates to one so it’s a pdf. Now let Y be a random variable with pdf q(y). The
moment generating function of Y is
Z
MX (θ + θ∗ )
MY (θ) = eθy q(y) dy =
.
MX (θ∗ )
Thus
M 0 (θ∗ )
d
= X ∗ .
MY (θ)
EY =
dθ
MX (θ )
θ=0
Since θ∗ achieves the supremum of θ − log MX (θ), thus,
d
θ − log MX (θ) = 0
dθ
0
when θ = θ∗ . From this we obtain that = MX
(θ∗ )/MX (θ∗ ). therefore EY = . In other words,
the pdf q(y) defines a sequence of iid random variables Y1 , . . . , Yn with common mean . Thus,
!
!
n
n
X
∗
1
1X
Xi > ≥ e−nI() e−nθ δ Pr
Yi ∈ (, + δ]
Pr
n i=1
n i=1
Now by the central limit theorem, the above probability is non-vanishing (in fact converges to
1/2) so
!
n
1
1X
Xi > ≥ −I() − θ∗ δ.
lim inf log Pr
n→∞ n
n i=1
Since δ > 0 is arbitrary, we may take δ ↓ 0.
3. Probability Review: Let X1 , X2 , . . . , Xn be a sequence of n binary i.i.d. rvs. Assume that Pr(Xm =
1) = Pr(Xm = 0) = 1/2 for all 1 ≤ m ≤ n. Let Z be a parity check on X1 , . . . , Xn , i.e. Z = X1 ⊕. . .⊕Xn
(where 0 ⊕ 0 = 0, 0 ⊕ 1 = 1 and 1 ⊕ 1 = 0).
(a) Is Z independent of X1 ? (Assume n > 1)
(b) Are Z, X1 , . . . , Xn−1 independent?
Solution: Yes. This also covers case (a). Note that pZ (z) = 1/2 for z = 0, 1. Note that Z = 0 if
and only if Xn = x1 ⊕ . . . ⊕ xn−1 . Therefore the conditional PMF is
pZ|X1 ,...,Xn−1 (0|x1 , . . . , xn−1 ) = pZ|X1 ,...,Xn−1 (1|x1 , . . . , xn−1 ) =
1
2
which is equal to the unconditional pmf pZ (z). Thus, Z and X1 , . . . , Xn−1 are independent.
(c) Are Z, X1 , . . . , Xn independent?
Solution: Clearly not since X1 , . . . , Xn determine Z.
(d) Is Z independent of X1 if Pr(Xm = 1) 6= 1/2 (you may take n = 2 here)
Solution: Z is NOT statistically independent of X1 if Pr(Xi = 1) 6= 1/2. Let Pr(Xi = 1) = p.
For n = 2 and Z = 0, the unconditional pmf pZ (z) is
pZ (0) = Pr(X1 = 1, X2 = 1) + Pr(X1 = 0, X2 = 0) = p2 + (1 − p)2
pZ (1) = (1 − p)p + p(1 − p) = 2p(1 − p).
However, the conditional PMF pZ|X1 (z|x1 ) given x1 = 0 is
pZ|X1 (0|0) = Pr(X2 = 0) = 1 − p
pZ|X1 (1|0) = Pr(X2 = 1) = p
4
Since pZ|X1 (z|0) 6= pZ (z), we conclude that Z is not independent of X1 .
The purpose of this question is to show that in a group of n + 1 random variables, pairwise
independence (part a) does not imply statistical independence of the random variables (part c)
(even with the statistical independence of groups of n random variables in part b).
4. Probability Review: Flip a fair coin four times. Let X be the number of Heads obtained, and let Y
be the position of the first Heads i.e. if the sequence of coin flips is TTHT, then Y = 3, if it is THHH,
then Y = 2. If there are no heads in the four tosses, then we define Y = 0.
(a) Find the joint PMF of X and Y ;
Answer: The underlying sample space is the set
Ω = {T T T T, T T T H, . . . , HHHH}.
Each outcome ω ∈ Ω can be mapped to X(ω) and Y (ω), e.g.,
X(T T T T ) = 0
Y (T T T T ) = 0
X(T HHT ) = 2
Y (T HHT ) = 2
X(T T T H) = 1
Y (T T T H) = 4
etc.
By listing all 16 elements of Ω, and computing X and Y for each we can see that, e.g.,
{X = 2, Y = 2} = {T HHT, T HT H}
and thus PXY (2, 2) =
the table below:
1
16
+
1
16
= 18 . This can be repeated for all feasible values of X and Y as in
y
0
1
2
3
4
0
1
16
x
2
1
1
16
1
16
1
16
1
16
3
16
1
8
1
16
3
4
3
16
1
16
1
16
(b) Using the joint PMF, find the marginal PMF of X
Answer: Summing over the columns of the table below, we get the PMF of X as
 1
k=0

16



 14 k = 1
3
k=2
PX (k) =
8

1

k=3


 41
k
=4
16
5. (Optional) Probability Review: A ternary communication channel is shown in the figure. Suppose
that the input symbols 0, 1, and 2 occur with probabilities 12 , 14 and 41 .
(a) Find the probabilities of the output symbols.
Solution: We have
1
1
(1 − ) + =
2
4
1
1
PY (1) = PX (0) + (1 − )PX (1) = + (1 − ) =
2
4
1
1
PY (2) = PX (1) + (1 − )PX (2) = + (1 − ) =
4
4
PY (0) = (1 − )PX (0) + PX (2) =
5
1 1
− 2 4
1 1
+ 4 4
1
4
Figure 1: Ternary communication channel
We need to check that
P2
y=0
PY (y) = 1.
(b) Suppose that 1 was observed at the output. What’s the probability that the input was 0? 1? 2?
Solution: We use the formula
PX|Y (x|y) =
with y = 1. Now PY (1) =
1
4
PY |X (y|x)PX (x)
PY (y)
+ 14 and so
PX|Y (0|1) =
PY |X (1|0)PX (0)
=
1
1
4 + 4
1
2
1
1
+
4
4
1
4 (1 − )
1
1
4 + 4
PY |X (1|1)PX (1)
=
1
1
4 + 4
PY |X (1|2)PX (2)
PX|Y (2|1) =
=0
1
1
4 + 4
PX|Y (1|1) =
We need to check that
P2
x=0
PX|Y (x|1) = 1.
6. (Optional) Probability Review: Let X and Y be discrete random variables drawn according to
probability mass functions PX and PY over the respective alphabets X = {1, . . . , m} and and Y =
{m + 1, . . . , n}. Let Z be defined as
X
with probability α
Z=
Y with probability 1 − α
(a) What’s the alphabet of Z, namely Z?
Solution: The alphabet is Z = {1, . . . , n}.
(b) Find the probability mass function of Z.
Solution: The probability mass function of Z is
αPX (z)
z = 1, . . . , m
PZ (z) =
(1 − α)PY (z) z = m + 1, . . . , n
(c) Let g : Z → R be an arbitrary function of Z. Find the expectation of g(Z) in terms of α and the
expectations of g(X) and g(Y ).
6
Solution: By definition,
Eg(Z) =
n
X
g(z)PZ (z) =
z=1
m
X
=α
m
X
n
X
g(z)PZ (z) +
z=1
g(z)PZ (z)
z=m+1
n
X
g(z)PZ (z) + (1 − α)
z=1
g(z)PY (z) = αEg(X) + (1 − α)Eg(Y ).
z=m+1
7. (Optional) Linear MMSE estimation: Let X be a signal and Y be a vector of observations, correlated to X. The linear MMSE estimate of X given Y = (Y1 , . . . , Yn ) is
X̂ = aT Y =
n
X
ai Yi
i=1
where the vector a is chosen such that E[(X − X̂)2 ] is minimized. Assume that X and Y have zero
mean.
(a) Find a in terms of the cross-covariance KXY and the covariance KY and the vector Y.
(b) Find the corresponding MSE, i.e., the minimum value of E[(X − X̂)2 ] in terms of the crosscovariance KXY and the covariances KY and KX .
(c) Now, assume that X has mean 0 and variance P and the observations
Yi = X + Zi ,
i ∈ {1, . . . , n},
where Zi has zero mean and variance N . Assume that Zi and Zj are uncorrelated for i 6= j and
X and Zj are uncorrelated for each i ∈ {1, . . . , n}. Show that the MSE is
E[(X − X̂)2 ] =
PN
nP + N
by using the matrix inversion lemma: For
K11
K=
K21
K12
K22
we have
−1
−1
−1
−1
−1
(K11 − K12 K22
K21 )−1 = K11
+ K11
K12 (K22 − K21 K11
K12 )−1 K21 K11
Solutions
(a) We can verify by direct differentiation that
E[(X − aT Y)Yi ] = 0,
∀ i ∈ [n].
Equivalently,
aT KY = KXY
Thus, if KY is non-singular, we have
−1
a = KXY KY
,
and the MMSE estimator is
−1
X̂ = KXY KY
Y.
7
(b) The corresponding MSE is
E[(X − X̂)2 ] = E[(X − X̂)X]
−1
= EX 2 − E[KXY KY
YX]
−1
= KX − KXY KY
KYX ,
where the first step follows because E[(X − X̂)X̂] = 0 by orthogonality.
(c) Now we can compute
KY = P 11T + N I.
Thus by the matrix inversion lemma (with K11 = N I, K12 = 1, K21 = 1T , K22 = −1/P ), we have
−1
KY
=
1
P
1−
11T .
N
N (nP + N )
Hence the MMSE estimate is
−1
X̂ = KXY KY
Y=
n
X
P
Yi
nP + N i=1
and the MMSE is
−1
E[(X − X̂)2 ] = P − KXY KY
KYX =
PN
.
nP + N
8. (Optional) Convexity: Prove that the following functions are convex:
(a) Quadratic function: f (x) = 12 xT P x + q T x + r where P is positive semi-definite and x ∈ Rn ;
The Hessian (double derivative) is ∇2 f (x) = P , which is positive semi-definite, so the function f
is convex.
(b) Least squares objective: f (x) = kAx − bk22 where x ∈ Rn ;
The function can be written as f (x) = xT AT Ax − 2xT AT b + bT b. Hence, the Hessian is ∇2 f (x) =
AT A, which is positive semi-definite, so f is convex.
(c) Quadratic over linear: f (x, y) = x2 /y where (x, y) ∈ R2 ;
Now, the Hessian is
T
2 y
y
∇2 f (x, y) = 3
y −x −x
which is positive semi-definite. So the function f is convex for y > 0.
Pn
(d) Log-sum-exp: f (x) = log k=1 exp(xk ) where x ∈ Rn ;
The Hessian is
1
1
∇2 f (x) = T diag(z) − T 2 zz T
1 z
(1 z)
where zk = exp(xk ). To show ∇2 f (x) 0, we must show that v T ∇f (x)v ≥ 0 for all v:
P
P
P
( k zk vk2 )( k zk ) − ( k vk zk )2
T 2
P
v ∇ f (x)v =
≥0
( k zk )
P
P
P
since ( k vk zk )2 ≤ ( k zk vk2 )( k zk ) from the Cauchy-Schwarz inequality.
Qn
(e) Negative geometric mean: f (x) = −( k=1 xk )1/n where x ∈ Rn .
Similar to the log-sum-exp proof.
8