COGS 202 Homework 1

COGS 202 Homework 1
Yashodhan Karandikar
Mahta Mousavi
Christopher Keown
Lucas Chang
Maria N Florendo
Ning Ma
April 21, 2014
1.1
The sum-of-squares error function is given by:
N
1X
E(w̄) =
(y(xn , w̄) − tn )2 ,
2
n=1
where,
y(x, w̄) =
M
X
wj x j
j=0
.
We take the partial derivative of E with respect to coefficient wi for all 0 ≤ i ≤ M :
N
∂E
1X
∂
=
2(y(xn , w̄) − tn )
y(xn , w̄)
∂wi
2
∂wi
=
n=1
N
X
(y(xn , w̄) − tn )
n=1
=
N
X
M
∂ X
wj xjn
∂wi
j=0
(y(xn , w̄) − tn )xin
n=1
Setting the partial derivative to 0 gives:
N
X
y(xn , w̄)xin =
n=1
N X
M
X
N
X
tn xin
n=1
wj xjn xin =
n=1 j=0
N X
M
X
N
X
tn xin
n=1
wj xni+j =
n=1 j=0
N
X
tn xin
n=1
Switch the order of summations on the left hand side:
M X
N
X
xni+j wj =
j=0 n=1
N
X
n=1
1
tn xin
(1)
.
This can be written as:
M
X
Aij wj = Ti
j=0
P
P
i+j
i
and Ti = N
for all 0 ≤ i ≤ M , where Aij = N
n=1 tn xn .
n=1 xn
As mentioned in the textbook, the error function is a quadratic function of the coefficients w̄,
hence its derivatives with respect to the coefficients will be linear in the elements of w, and so the
minimization of the error function has a unique solution. Hence, the solution obtained by solving
the system of M + 1 linear equations will give the w̄ that minimizes the error function.
1.3
Let F be the random variable denoting the fruit and B be the random variable denoting the box.
r : corresponds to red box
Red box has 3 apples, 4 oranges and 3 limes.
p(r) = 2/10
p(F = a|B = r) = 3/10
p(F = o|B = r) = 4/10
p(F = l|B = r) = 3/10
Blue box has 1 apple, 1 orange and 0 lime
b : corresponds to blue box
p(b) = 2/10
p(F = a|B = b) = 1/2
p(F = o|B = b) = 1/2
p(F = l|B = b) = 0
Green box has 3 apples, 3 oranges, and 4 limes
g : corresponds to green box
p(g) = 6/10
p(F = a|B = g) = 3/10
p(F = o|B = g) = 3/10
p(F = l|B = g) = 4/10
p(F = a) = p(F = a|B = r)p(r) + p(F = a|B = b)p(b) + p(F = a|B = g)p(g)
p(F = a) = (3/10)(2/10) + (1/2)(2/10) + (3/10)(6/10)
p(F = a) = 0.34
p(B
p(F
p(B
p(B
= g|F = o) = p(F =o|B=g)p(B=g)
p(F =o)
= o) = (0.2)(0.4) + (0.2)(0.5) + (0.6)(0.3) = 0.36
= g|F = o) = (0.3)(0.6)
= 0.5
0.36
= g|F = o) = 0.5
2
1.5
By definition:
V ar[f ] = E[(f (x) − E[f (x)])2 ]
= E[f 2 (x) − 2f (x)E[f (x)] + E 2 [f (x)]]
Z
+∞
2
(2)
2
[f (x) − 2f (x)E[f (x)] + E [f (x)]]p(x)dx,
=
−∞
where x is a random variable whose distribution is given by p(x). By expanding the integral over
the sum in brackets we get:
Z +∞
Z +∞
Z +∞
2
(3)
=
f (x)p(x)dx −
2f (x)p(x)E[f (x)]dx +
E 2 [f (x)]p(x)dx.
−∞
−∞
−∞
the middle term, 2E[f (x)] is not dependent
on x, so it can come out of the integral. Hence,
RIn+∞
R +∞
2f
(x)p(x)E[f
(x)]dx
=
2E[f
(x)].
f
(x)p(x)dx.
−∞
−∞
Therefore, by definition of expectation of a function, the above reduces to:
= E[f 2 (x)] − 2E[f (x)].E[f (x)] + E 2 [f (x)]
= E[f 2 (x)] − E 2 [f (x)].
(4)
1.6
Show that if two variables x and y are independent, then their covariance is zero. We are given
that p(x, y) = p(x)p(y)
Z Z
Z
Z
cov[x, y] = Ex,y [xy] − E[x]E[y] =
p(x, y)xydxdy − p(x)xdx p(y)ydx
(5)
Substitute our assumption of independence.
Z Z
Z
Z
=
p(x)p(y)xydxdy − p(x)xdx p(y)ydx
(6)
x is a constant with respect to integratingy, so we can pull those terms out front in our first term.
Z
Z
Z
Z
= p(x)xdx p(y)ydx − p(x)xdx p(y)ydx = 0
(7)
1.8
(a) By definition, the mean of a random variable with univariate Gaussian distribution is
Z +∞
1
1
2
exp − 2 (x − µ) x dx
2 1/2
2σ
−∞ (2πσ )
Using a change of variables y =
Z
+∞
−∞
x−µ
σ ,
(8)
we can obtain
2
1
y
exp −
(σy + µ)σ dy
1/2
2
2
(2πσ )
3
(9)
Expanding Eq.9 gives us
2
2
Z +∞
Z +∞
y
σ
1
y
y exp −
dy + µ
exp −
dy
1/2
1/2
2
2
2π
−∞
−∞ (2π)
(10)
The integrant of the
R +∞left integral is an even function and the integral will be zero. The right
integral is just µ −∞ N (y|0, 1) dy = µ. Thus, we get the mean of the normally distributed
random variable as 0 + µ = µ
(b) We know that the integral of any probability density function is 1. Thus, we have
Z +∞
Z +∞
1
1
2
2
N (x|µ, σ ) dx =
exp − 2 (x − µ)
dx = 1
2 1/2
2σ
−∞
−∞ (2πσ )
Now, we take the derivative of Eq. 11 with respect to σ 2 to obtain
Z +∞
1
1
1
2
− 2
exp − 2 (x − µ)
dx
2σ −∞ (2πσ 2 )1/2
2σ
Z +∞
1
1
1
2
+ 4
exp − 2 (x − µ) (x − µ)2 dx = 0
2σ −∞ (2πσ 2 )1/2
2σ
Multiplying both side by σ 4 , Eq. 12 can be simplified as
Z +∞
Z +∞
2
2
−σ
N (x|µ, σ ) dx +
N (x|µ, σ 2 )(x − µ)2 dx = 0
−∞
(11)
(12)
(13)
−∞
Expanding Eq.13 leads to
Z +∞
Z
2
2
2
−σ +
x N (x|µ, σ ) dx − 2µ
−∞
+∞
xN (x|µ, σ) dx + µ
2
Z
+∞
N (x|µ, σ 2 ) dx = 0 (14)
−∞
−∞
R +∞
R +∞
R +∞
Note that −∞ x2 N (x|µ, σ 2 ) dx = IE[X 2 ], −∞ xN (x|µ, σ) dx = IE[X] = µ and −∞ N (x|µ, σ 2 ) dx =
1. We can obtain the following equation by rearranging Eq. 14
IE[X 2 ] = µ2 + σ 2
Thus, Var[X] = IE[X 2 ] − IE2 [X] = σ 2
1.10
Suppose that the two variables x and z are statistically independent. Show that the mean and
variance of their sum satisfies
E[x + z] = E[x] + E[z]
(15)
var[x + z] = var[x] + var[z]
(16)
Proof of the first statement:
Z Z
E[x + z] =
(x + z)p(x, z)dzdx
x
z
4
(17)
Z Z
(x + z)p(x)p(z)dzdx
(18)
xp(x)p(z) + zp(x)p(z)dzdx
(19)
=
x
z
by independence.
Z Z
=
x
z
z
z
p(x)dxdz
zp(z)
p(z)dzdx +
xp(x)
x
Z
Z
Z
Z
=
(20)
x
= E[x] × 1 + 1 × E[z]
(21)
var[x + z] = E[(x + z)2 ] − (E[x + z])2
(22)
= E[(x2 + 2xz + z 2 ] − (E[x] + E[z])2
(23)
= E[x2 ] + E[2xz] + E[z 2 ] − E[x]2 − 2E[x]E[z] − E[z]2
(24)
= E[x2 ] − E[x]2 + E[z 2 ] − E[z]2 + E[2xz] − 2E[x]E[z]
(25)
= var[x] + var[z] + 2cov[x, z]
(26)
Proof of the second statement:
using part 1.
Regrouping,
cov[x, z] vanishes because of independence, so
var[x + z] = var[x] + var[y]
(27)
Extra Problem
Assumptions: Let L denote a possible location of an event, and let v and a denote the values of
the visual and auditory signals. In addition, let L∗v denote the best location estimate based on
the visual signal, let L∗a denote the best location estimate based on the auditory signal, and let L∗
denote the optimal location estimate based on both visual and auditory signals. Show that:
L∗ = wv L∗v + wa L∗a ,
(28)
where wv and wa satisfy the following:
wv =
1/σv2
1/σv2 + 1/σa2
(29)
wa =
1/σa2
1/σv2 + 1/σa2
(30)
and σv2 and σa2 are the variances of the distributions P (L|v) and P (L|a), respectively.
5
Solution:
By assumption, P (L|a) ∼ N (L∗a , σa2 ) and P (L|v) ∼ N (L∗v , σv2 ). Then by Bayes rule:
P (L|v, a) =
P (v, a|L)P (L)
.
P (v, a)
(31)
Since the sensory cues are statistically independent given the location L, hence by Bayes rule:
P (L|v, a) =
P (v|L)(a|L)P (L)
P (L|a)P (a)P (L|v)P (v)
=
P (v, a)
P (a, v)P (L)
(32)
Also, let’s assume that locations is distributed uniformly, so:
P (L|v, a) ∝ P (L|a)P (L|v)
∝e
−
2
(L−L∗
a)
2
2σa
−
e
2
(L−L∗
v)
2
2σv
(33)
To find the maximum with respect to L, we will take the derivative of the above equation and
make it equal to zero:
L∗ − L∗v
L∗ − L∗a
+
=0
σv2
σa2
⇒ (σa2 + σv2 )L∗ = σa2 L∗v + σv2 L∗a
⇒ L∗ =
1/σa2
L∗
1/σv2 + 1/σa2 a
6
+
1/σv2
L∗ .
1/σv2 + 1/σa2 v
(34)