CS 520: INTRODUCTION TO ARTIFICIAL INTELLIGENCE
FALL 2015
Assignment 3
Temporal Models - Kalman Filters - Markov Decision Processes
Deadline: December 13, 11:55pm.
Perfect score: 100.
Assignment Instructions:
Teams: Assignments should be completed by teams of two students. No additional credit will be given for students that complete an assignment individually. Please inform the TAs as soon as possible about the members
of your team so they can update the scoring spreadsheet (find the TAs’ contact info under the course’s website:
http://www.abdeslam.net/cs520).
Submission Rules: Submit your reports electronically as a PDF document through Sakai (sakai.rutgers.edu). Do
not submit Word documents, raw text, or hardcopies etc. Make sure to generate and submit a PDF instead. Each team of
students should submit only a single copy of their solutions and indicate all team members on their submission. Failure to
follow these rules will result in lower grade in the assignment.
Late Submissions: No late submission is allowed. 0 points for late assignments.
Extra Credit for LATEX: You will receive 10% extra credit points if you submit your answers as a typeset PDF (using
LATEX, in which case you should also submit electronically your source code). There will be a 5% bonus for electronically
prepared answers (e.g., on MS Word, etc.) that are not typeset. If you want to submit a handwritten report, scan it and
submit a PDF via Sakai. We will not accept hardcopies. If you choose to submit handwritten answers and we are not able
to read them, you will not be awarded any points for the part of the solution that is unreadable.
Precision: Try to be precise. Have in mind that you are trying to convince a very skeptical reader (and computer scientists
are the worst kind...) that your answers are correct.
Collusion, Plagiarism, etc.: Each team must prepare its solutions independently from other teams, i.e., without using
common notes, code or worksheets with other students or trying to solve problems in collaboration with other teams. You
must indicate any external sources you have used in the preparation of your solution. Do not plagiarize online sources and
in general make sure you do not violate any of the academic standards of the department or the university. Failure to follow
these rules may result in failure in the course.
Problem 1: Hidden Markov Models (HMM) (50 points total)
3
[6 pts] HMM: Search and Rescue
You are an interplanetary search and rescue expert who has just received an urgent message: a rover on Mercury has
You
areand
an become
interplanetary
rescue
expertnarrow
who has
just
an of
urgent
rover
on over
Mercury
fallen
trapped search
in Deathand
Ravine,
a deep,
gorge
on received
the borders
enemymessage:
territory. aYou
zoom
to
has
fallen and
become trapped
in Death
narrow
gorge
on the
borders
of enemy
zoom
Mercury
to investigate
the situation.
DeathRavine,
Ravine aisdeep,
a narrow
gorge
6 miles
long,
as shown
below. territory.
There are You
volcanic
over
toatMercury
the situation.
Deathsymbols
Ravineatisthose
a narrow
gorge 6 miles long, as shown below. There
vents
locationstoA investigate
and D, indicated
by the triangular
locations.
are volcanic vents at locations A and D, indicated by the triangular symbols at those locations.
A
!
B
C
D
E
F
!
The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still funcTheare
rover
heavily damaged
the fall,only
and as
of itscold.
sensors
broken.
The
onlyevidence
ones stillEfunctioning
its was
thermometers,
whichinregister
twoa result,
levels: most
hot and
Thearerover
sends
back
= hot
tioning
thermometers,
which
two
levels:
hot and There
cold. The
rover
sendsofback
evidencereading.
E = hotThe
whenrover
it
when
it isare
atits
a volcanic
vent (A
andregister
D), andonly
E=
cold
otherwise.
is no
chance
a mistaken
is at
a volcanic
vent
and D),Aand
= cold
is no
chance
of a mistaken
rover
intoE,the
fell
into
the gorge
at(A
position
onEday
1, sootherwise.
X1 = A. There
Let the
rover’s
position
on dayreading.
t be XtThe
∈ {A,
B,fell
C, D,
F }.
gorge
at position
on day 1,its
so original
X1 = A.programming,
Let the rover’s trying
positiontoonmove
day t 1bemile
Xt ∈east
{A,(i.e.
B, C,right,
D, E,towards
F }. The F)
rover
is still
The
rover
is still A
executing
every
day.
executingbecause
its original
programming,
to moveeast
1 mile
east
(i.e. right, 0.5,
towards
However,
of the
However,
of the
damage, ittrying
only moves
with
probability
and F)
it every
stays day.
in place
with because
probability
0.5.
damage,
only
moves
east
with the
probability
stayscan
in place
withyour
probability
0.5. Your job is to figure out where
Your
job isit to
figure
out
where
rover is,0.5,
so and
thatit you
dispatch
rescue-bot.
the rover is, so that you can dispatch your rescue-bot.
(a) (2 pt) Three days have passed since the rover fell into the ravine. The observations were (E1 = hot, E2 = cold,
E3 =1.cold
). What
is Pdays
(X3 |hot
cold2 , cold
therover
probability
overobservations
the rover’s were
position
Three
have1 ,passed
since
fell into distribution
the ravine. The
(E1 on
= day
hot, 3,
E2given
=
Filtering:
3 ),the
the observations?
cold, E3 = cold). What is P (X3 | hot1 , cold2 , cold3 ), the probability distribution over the rover’s position on day 3,
given the observations? (This is a probability distribution over the six possible positions).
2. Smoothing: What is P (X2 | hot1 , cold2 , cold3 ), the probability distribution over the rover’s position on day 2, given
the observations? (This is a probability distribution over the six possible positions).
You decide to attempt to rescue the rover on day 4. However, the transmission of E4 seems to have been corrupted,
3. Most Likely Explanation: What is the most likely sequence of the rover’s positions in the three days given the obserand so it is not observed.
vations (E1 = hot, E2 = cold, E3 = cold)?
(b) (2
pt) What What
is theisrover’s
distribution
for day 4 given the same evidence, P (X4 |hot1 , cold2 , cold3 )?
4. Prediction:
P (hot4position
, hot5 , cold
6 | hot1 , cold2 , cold3 ), the probability of observing hot4 and hot5 and cold6 in
days 4,5,6 respectively, given the previous observations in days 1,2, and 3? (This is a single value, not a distribution).
5. Prediction: You decide to attempt to rescue the rover on day 4. However, the transmission of E4 seems to have been
corrupted, and so it is not observed. What is the rover’s position distribution for day 4 given the same evidence,
(c) (2 pt)
this computation is taxing your computers, so the next time this happens you decide to try approxiP (XAll
4 | hot1 , cold2 , cold3 )?
mate inference
using
track
the rover.
If yourposition
particles
are initially
in the
top configuration
shown
The same thingparticle
happensfiltering
again onto
day
5. What
is the rover’s
distribution
for day
5 given
the same evidence,
below, what
is
the
probability
that
they
will
be
in
the
bottom
configuration
shown
below
after
one
day
(after
time
P (X5 | hot1 , cold2 , cold3 )?
elapses, but before evidence is observed)?
6. Particle Filtering: All this computation is taxing your computers, so the next time this happens you decide to try
approximate inference using particle filtering to track the rover. You have a population of N = 5 particles, which are
all initially in position A in day 1. Propagate the particles (sample next positions by flipping a coin and reporting the
result) for three days, while weighting each particle with the probability of the evidence (given the particle) observed
each day ((E1 = hot, E2 = cold, E3 = cold)), and re-sampling N particles from the population using the normalized
weights. What is P̂ (X3 | hot1 , cold2 , cold3 ), the estimated probability distribution over the rover’s position on day
3, given the observations? (Note 1: there will be different answers depending on the outcomes of the sampling, all
answers will be accepted as long as they comply with the particle filtering algorithm. Note 2: you don’t need to
implement anything, but you need to have a random numbers generator for sampling).
4
A
00
11
00
11
00
11
00
11
00
11
B
000
111
000
111
000
111
000
111
000
111
C
!
A
000
111
000
111
000
111
000
111
!
D
E
F
E
F
00
11
00
11
00
11
00
11
00
11
!
B
C
D
00
11
00
11
00
11
00
11
00
11
00
11
00
11
00
11
!
Problem 2: Kalman Filters (10 points total)
Consider the problem of implementing an autopilot system for a boat. The boat’s position is represented by two
coordinates (Xt , Yt ). The boat is navigating with a constant velocity vx = 10 meters/second in the X-direction and
vy = 5 meters/second in the Y -direction. Due to random ocean currents, the boat’s position at time t + 1 is distributed as
Xt+1 ∼ N Xt + vx , 1 , Yt+1 ∼ N Yt + vy , 1 . You also have access to the Global Positioning System (GPS). When the
boat is in a position (Xt , Yt ), the coordinates returned by the GPS are X̂t ∼ N Xt , 3 , Ŷt ∼ N Yt , 3 . The initial position
of the boat is not know precisely, it is given by X0 ∼ N 0, 0.1 , Y0 ∼ N 0, 0.1 .
The autopilot repeatedly estimates the position of the boat.
1. Calculate the distribution of the boat’s position at time t = 1 (after sailing for one second), if the estimated position
according to the GPS at time t = 1 is (8, 6).
2. Calculate the distribution of the boat’s position at time t = 2 (after sailing for two seconds), if the estimated position
according to the GPS at time t = 2 is (19, 9).
Problem 3: Markov Decision Processes (40 points total)
The autopilot system does not only estimate the position of the boat, it also controls it. Consider here a simplified version
of this control problem.
• Assume
the
position
of
the
boat
is
discret
{(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)}.
and
takes
values
from
the
grid
• The boat’s position is always known.
• The boat’s actions are move-east, move-west, move-north , move-south.
• (0, 0) is the most west-north position, and (2, 2) is the most east-south position.
• The boat moves into the intended adjacent position with probability 0.9, and to a random position among the other
adjacent positions with probability 0.1.
0.1
Example 1: P ((0, 1) | (1, 1), north) = 0.9, P ((1, 0) | (1, 1), north) = 0.1
3 , P ((1, 2) | (1, 1), north) = 3 ,
0.1
P ((2, 1) | (1, 1), north) = 3 .
Example 2: P ((0, 1) | (0, 0), east) = 0.9, P ((1, 0) | (0, 0), east) = 0.1.
• When the boat tries to move outside the grid, it remains in the same position with probability 1.
• The initial position is (0, 0)
• The reward of all the states is 0, except for state (0, 2) (the goal) where the reward is 10, and for state (0, 1) (iceberg)
where the reward is −5.
Using a discount factor γ = 1:
1. Show the policy π and value function V π at each iteration of the Value Iteration algorithm for 2 iterations. The initial
values are set to 0.
2. Show the policy π and value function V π at each iteration of the Policy Iteration algorithm for 2 iterations. Run the
policy evaluation part of the Policy Iteration algorithm for 2 iterations only. The initial values are set to 0, and the
initial policy for all the states is move east.
© Copyright 2026 Paperzz