CS188: Artificial Intelligence, Fall 2011
Written 4: VPI, HMMs and Machine Learning
Due: 11/29 submitted electronically by 11:59pm (no slip days)
Policy: Can be solved in groups (acknowledge collaborators) but must be written up individually.
Instructions and answer templates for submitting your assignment can be found on the website via the assignments page: http://inst.eecs.berkeley.edu/∼cs188/fa11/assignments.html
1
VPI: Crack the Code
You are defusing a bomb constructed by the evil Dr. Xor. You know that the shutdown sequence is three bits
B1 , B2 , B3 , where Bi ∈ {0, 1}, and you know that an odd number of the Bi are 1. Otherwise, all sequences
are equally likely. If you guess the correct shutdown sequence, the bomb defuses and you get a utility of 100;
otherwise, you get a utility of 0 (your bomb-proof gear is pretty good). (Hint: you should not need to do much
calculation for most question parts.)
(a) (1 point) Draw a minimal (fewest arcs) Bayes’ Net that can represent the joint distribution over these
variables. You only need to include B1 , B2 , and B3 (not the utility). If you are submitting in .txt, draw the
Bayes net in text (e.g. B 1->B 2) or describe it concisely if it cannot easily be drawn.
(b) (1 point) What is the MEU given no evidence?
(c) (1 point) What is the VPI of B1 given no information?
(d) (1 point) What is the VPI of B2 given B1 ?
(e) (1 point) What is the VPI of B3 given B1 and B2 ?
1
At the last second, you discover that the bomb was actually set by Xor’s uncreative henchman, Repeato.
Repeato always uses all 1’s or all 0’s in his code (and is not restricted to an odd number of 1’s).
(f ) (1 point) Draw a minimal (fewest arcs) Bayes’ Net that can represent the joint distribution over these
variables. You only need to include B1 , B2 , and B3 (not the utility).
(g) (1 point) What is the VPI of B1 given no information?
(h) (1 point) What is the VPI of B2 given B1 ?
(i) (1 point) What is the VPI of B3 given B1 and B2 ?
Repeato shows up to mock your efforts to defuse the bomb, and, as villains are wont to do, gets excited and
lets slip the crucial hint that he’s been taking number theory classes. You now suspect that his code is actually
a prime number when the sequence of Bi is viewed as a binary number (i.e. you think that 4B1 + 2B2 + B3
is prime) and that all such sequences are equally likely. None of the constraints from previous question parts
apply (i.e. beyond the binary integer being prime, we know nothing about the values of the Bi ). Recall that 2
is the smallest prime number.
(j) (1 point) What is the VPI of B1 given no information?
(k) (1 point) What is the VPI of B2 given B1 = 0?
(l) (1 point) What is the VPI of B3 given B1 = 0 and B2 = 1?
2
2
HMMs: Search and Rescue
You are an interplanetary search and rescue expert who has just received an urgent message: a rover on Mercury
has fallen and become trapped in Death Ravine, a deep, narrow gorge on the borders of enemy territory. You
zoom over to Mercury to investigate the situation.
Death Ravine is a narrow gorge six miles long, as shown below. There are volcanic vents at locations A and D,
indicated by the triangular symbols at those locations.
A
B
C
D
!
E
F
!
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still
functioning are its thermometers, which only register two levels, hot and cold. The rover sends back evidence
E = hot when it is at a volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken
reading.
The rover fell into the gorge at position A on day 1, so X1 = A. Let the rover’s position on day t be
Xt ∈ {A, B, C, D, E, F }. The rover is still executing its original programming, trying to move 1 mile east (i.e.
right, towards F) every day. However, because of the damage, it only moves east with probability 0.5, and it
stays in place with probability 0.5. Your job is to figure out where the rover is, so that you can dispatch your
rescue-bot.
(a) (1 point) Three days have passed since the rover fell into the ravine. The observations were (E1 = hot,
E2 = cold, E3 = cold ). What is P (X3 |E1 = hot, E2 = cold, E3 = cold), the probability distribution over the
rover’s position on day 3, given the observations?
You decide to attempt to rescue the rover on day 4. However, the transmission of E4 seems to have been
corrupted, and so it is not observed.
(b) (1 point) What is the rover’s position distribution for day 4 given the same evidence, P (X4 |E1 =
hot, E2 = cold, E3 = cold)?
3
(c) (2 points) If you deploy the rescue-bot in the correct location, you will save the rover and be rewarded
$10,000. If not, you will get $0. On day 4, what is the MEU (maximum expected utility) location to send the
rescue-bot to, and what is your corresponding expected utility measured in dollars?
(i) MEU location:
(ii) Expected utility:
Rescuing robots is hard, so the next time this happens you decide to try approximate inference using particle
filtering to track the rover.
(d) (1 point) If your particles are initially in the top configuration shown below, what is the probability that
they will be in the bottom configuration shown below after one day (after time elapses, but before evidence is
observed)?
A
00
11
00
11
00
11
00
11
00
11
B
000
111
000
111
000
111
000
111
000
111
C
000
111
000
111
000
111
000
111
!
E
F
E
F
00
11
00
11
00
11
00
11
00
11
!
A
D
!
B
C
D
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
!
4
(e) (2 points) Suppose your particles are in the following configuration after running a dynamics update:
A
B
000
111
000
111
000
111
000
111
00
11
00
11
00
11
C
D
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
E
00
11
00
11
00
11
00
11
000
111
000
111
000
111
000
111
!
00
11
00
11
00
11
00
11
00
11
00
11
00
11
F
00
11
00
11
00
11
00
11
00
11
00
11
00
11
000
111
000
111
000
111
00
11
00
11
00
11
00
11
!
If the observation for the current timestep is E = hot, what is the probability distribution that we use to resample our particles? Express this by writing in the table below the probability of sampling a particle from each state.
A
B
C
D
E
F
P(X)
(f ) (1 point) Your co-pilot thinks you should model P (E|X) differently. Even though the sensors are not
noisy, she thinks you should use P (hot | no volcanic vent) = , and P (cold | volcanic vent) = , meaning that a
hot reading at a non-volcanic location has a small probability, and a cold reading at a volcanic location has a
small probability. She performs some simulations with particle filtering, and her version does seem to produce
more accurate results, despite the false assumption of noise. Explain briefly why this could be.
5
3
Machine Learning
Consider the training data below. X1 and X2 are binary-valued features and Y is the label you’d like to classify.
Y
+1
−1
+1
+1
−1
−1
X1
0
1
1
0
0
1
X2
0
0
1
0
1
0
(a) (1 point) Assuming a Naive Bayes model, fill in the quantities learned from the training data in the
tables below (no smoothing).
Y
P (Y )
X1
P (X1 |Y = −1) P (X1 |Y = +1)
X2
−1
0
0
+1
1
1
P (X2 |Y = −1) P (X2 |Y = +1)
(b) (1 point) Fill in the learned quantities below as in (a), but with add-k (Laplace) smoothing, with k = 1.
Y
P (Y )
X1
P (X1 |Y = −1) P (X1 |Y = +1)
X2
−1
0
0
+1
1
1
(c) (1 point) Use your model in (b) to calculate P (Y |X1 = 0, X2 = 0).
(d) (1 point) What does P (Y |X1 = 0, X2 = 0) approach as k → ∞?
6
P (X2 |Y = −1) P (X2 |Y = +1)
Y
+1
−1
+1
+1
−1
−1
X1
0
1
1
0
0
1
X2
0
0
1
0
1
0
(e) (2 points) Circle the feature sets that would enable a linear binary classifier to classify the training data
perfectly. I indicates an indicator function; i.e. equal to 1 if and only if the given conditions holds.
(i) {X1 }
(ii) {X2 }
(iii) {X1 , X2 }
(iv) {1, X1 , X2 }
(v) {1, abs(X1 − X2 )}
(vi) {1, X1 , X2 , X1 + X2 }
(vii) {1, X1 , X2 , max(X1 , X2 )}
(viii) {X1 , X2 , I(X1 = X2 )}
(ix) {1, X1 , (X1 X2 )}
The following question parts are self-contained and do not rely on the previous data set.
(f ) (1 point) Suppose that we have a model of the positive and negative examples. Namely, the negative
examples are distributed uniformly over the unit square bounded by (0,0), (0,1), (1,1), and (1,0), and the
positive examples are distributed uniformly over the unit square bounded by (1,1), (1,2), (2,2), and (2,1).
− + If the prior probabilities of the two classes are equal, what is the expected error rate of a binary perceptron
with features (1, X1 , X2 ) and weight vector (−1, 1, 1)? (Hint: you should be able to do this part with geometry,
and should not have to compute integrals.)
7
(g) (2 points) Suppose you are given a weight vector w and a training point x with features f (x) 6= 0 and
class label y ∈ {−1, +1}. Show that if the binary perceptron update causes the weight vector to change, then
y(w · f (x)) is guaranteed to increase as a result of the update.
Now suppose we have a multiway classifier over two classes using features (X1 , X2 ) (no bias feature), with weight
vector (1, 0) for the positive class and weight vector (−1, 0) for the negative class. For each of the following
points, compute the weight vectors for the two classes after observing the given point when training with (i)
perceptron and (ii) MIRA with capacity C = 10. Note that we are proposing to observe each of these points
independently rather than observing the three points in succession, so you should start from the initial weights
each time when computing your update.
(h) (1 point) f (x) = (1, 1), y = +1
Initial weights Updated weights
+ class − class + class − class
Perceptron (1, 0) (−1, 0)
MIRA
(1, 0) (−1, 0)
(i) (1 point) f (x) = (−1, 1), y = +1
Initial weights Updated weights
+ class − class + class − class
Perceptron (1, 0) (−1, 0)
MIRA
(1, 0) (−1, 0)
(j) (1 point) f (x) = (−3, 3), y = +1
Initial weights Updated weights
+ class − class + class − class
Perceptron (1, 0) (−1, 0)
MIRA
(1, 0) (−1, 0)
8
© Copyright 2026 Paperzz