Reasoning and Decision under Uncertainty Spring 2017 Assignment

The Hong Kong University of Science & Technology
CSIT 5220: Reasoning and Decision under Uncertainty
Spring 2017
Assignment 5 : Solutions
Assigned: 07/04/2017
Due Date: 21/04/2017
Question 1 In this question, you are asked to perform cluster analysis on a
data set known as HIV using latent class models. The data set is appended
below. You can use the Netica implementation of latent class models. The
documentation is located at http://www.norsys.com/tutorials/netica/secD/tut D1.htm.
The data set is in a format not recognized by Netica. So, you will have to
change the format yourself.
This HIV data set consists of results on 428 patients of four diagnostic
tests for human HIV virus: radioimmunoassay of antigen ag121 (A); radioimmunoassay of HIV p24 (B); radioimmunoassay of HIV gp120 (C);
and enzyme-linked immunosorbent assay (D). A negative result is represented by 0 and a positive result by 1.
You are asked to cluster the data into two classes. Report the model
that you obtain by giving all the conditional probability distributions.
Calculate the posterior probability distribution of the class variable for
each row of the data.
Name:
hiv.data
//Variables: name of variable followed by names of states
RIA-ag020: negative positive
RIA-p24: negative positive
RIA-gp020: negative positive
ELISA: negative positive
//The first four columns contain the values of the four variables
respectively.
// 0 - negative, 1 - positive
//The last contains the counts.
0 0 0 0 170
0 0 0 1 15
0 1 0 0 6
1 0 0 0 4
1 0 0 1 17
1 0 1 1 83
1 1 0 0 1
1 1 0 1 4
1 1 1 1 128
1
Solution The conditional probability distributions:
P (z = 0) = 0.46, P (z = 1) = 0.52
P (RIA-ag020|z = 0)
P (RIA-ag020|z = 1)
RIA-p24 = negative
0.96
0.43
P (RIA-p24|z = 0)
P (RIA-p24|z = 1)
P (RIA-gp020|z = 0)
P (RIA-gp020|z = 1)
P (ELISA|z = 0)
P (ELISA|z = 1)
RIA-ag020 = negative
0.97
0
RIA-p24= positive
0.06
0.57
RIA-gp020 = negative
1
0.08
ELISA = negative
0.92
0
2
RIA-ag020= positive
0.03
1
RIA-gp020= positive
0
0.92
ELISA= positive
0.08
1
Posterior distributions of the latent variable z for each row:
Row
1
2.
3.
4.
5.
6.
7
8
9
P (z = 0)
1.0
1.0
1.0
1.0
0.05
0.0
1.0
0.001
0.0
P (Z = 1)
0.0
0.0
0.0
0.0
0.95
1.0
0.0
0.999
1.0
We see that Rows 1, 2, 3, 4 and 7 are grouped into one cluster, while there
others are grouped into another cluster.
Question 2 After running collapsed sampling on a toy data set of 16 documents
for a number of iterations, we get the following scenario, where the tokens
are assigned to either the black topic or the white topic.
Assume that the hyperparameter α = 0.1 and η = 0.1. What happens to
the tokens w1,1 , w10,6 and w16,16 if we run collapsed sampling for one more
iteration? Recall that wd,n denotes the n-th token in the d-th document.
Pick one of the following choices as your answer, and explain the reason.
1. The token will be assigned to the black topic with probability 1.
2. The token will be assigned to the black topic with probability close
to 1.
3. The token will be assigned to the white topic with probability 1.
4. The token will be assigned to the white topic with probability close
to 1.
5. The token will be assigned to the black topic with probability larger
than 0.5.
3
Solution The token w1,1 will be assigned to the black topic with probability
close to 1. The reason is that all the tokens in document 1 are currently
assigned to the black topic. There is non-zero probability that w1,1 will be
assigned to the white topic because α = 0.1, but the probability is small.
The token w16,16 will be assigned to the while topic with probability close
to 1. The reason is that all the tokens in document 16 are currently
assigned to the white topic. There is non-zero probability that w16,16 will
be assigned to the black topic because α = 0.1, but the probability is
small.
The token w10,6 will be assigned to the black topic with probability large
than 0.5 because more tokens are assigned to black than white in d10 and
the word ”bank” appears more often in the black topic.
4