The Hong Kong University of Science & Technology CSIT 5220: Reasoning and Decision under Uncertainty Spring 2017 Assignment 5 : Solutions Assigned: 07/04/2017 Due Date: 21/04/2017 Question 1 In this question, you are asked to perform cluster analysis on a data set known as HIV using latent class models. The data set is appended below. You can use the Netica implementation of latent class models. The documentation is located at http://www.norsys.com/tutorials/netica/secD/tut D1.htm. The data set is in a format not recognized by Netica. So, you will have to change the format yourself. This HIV data set consists of results on 428 patients of four diagnostic tests for human HIV virus: radioimmunoassay of antigen ag121 (A); radioimmunoassay of HIV p24 (B); radioimmunoassay of HIV gp120 (C); and enzyme-linked immunosorbent assay (D). A negative result is represented by 0 and a positive result by 1. You are asked to cluster the data into two classes. Report the model that you obtain by giving all the conditional probability distributions. Calculate the posterior probability distribution of the class variable for each row of the data. Name: hiv.data //Variables: name of variable followed by names of states RIA-ag020: negative positive RIA-p24: negative positive RIA-gp020: negative positive ELISA: negative positive //The first four columns contain the values of the four variables respectively. // 0 - negative, 1 - positive //The last contains the counts. 0 0 0 0 170 0 0 0 1 15 0 1 0 0 6 1 0 0 0 4 1 0 0 1 17 1 0 1 1 83 1 1 0 0 1 1 1 0 1 4 1 1 1 1 128 1 Solution The conditional probability distributions: P (z = 0) = 0.46, P (z = 1) = 0.52 P (RIA-ag020|z = 0) P (RIA-ag020|z = 1) RIA-p24 = negative 0.96 0.43 P (RIA-p24|z = 0) P (RIA-p24|z = 1) P (RIA-gp020|z = 0) P (RIA-gp020|z = 1) P (ELISA|z = 0) P (ELISA|z = 1) RIA-ag020 = negative 0.97 0 RIA-p24= positive 0.06 0.57 RIA-gp020 = negative 1 0.08 ELISA = negative 0.92 0 2 RIA-ag020= positive 0.03 1 RIA-gp020= positive 0 0.92 ELISA= positive 0.08 1 Posterior distributions of the latent variable z for each row: Row 1 2. 3. 4. 5. 6. 7 8 9 P (z = 0) 1.0 1.0 1.0 1.0 0.05 0.0 1.0 0.001 0.0 P (Z = 1) 0.0 0.0 0.0 0.0 0.95 1.0 0.0 0.999 1.0 We see that Rows 1, 2, 3, 4 and 7 are grouped into one cluster, while there others are grouped into another cluster. Question 2 After running collapsed sampling on a toy data set of 16 documents for a number of iterations, we get the following scenario, where the tokens are assigned to either the black topic or the white topic. Assume that the hyperparameter α = 0.1 and η = 0.1. What happens to the tokens w1,1 , w10,6 and w16,16 if we run collapsed sampling for one more iteration? Recall that wd,n denotes the n-th token in the d-th document. Pick one of the following choices as your answer, and explain the reason. 1. The token will be assigned to the black topic with probability 1. 2. The token will be assigned to the black topic with probability close to 1. 3. The token will be assigned to the white topic with probability 1. 4. The token will be assigned to the white topic with probability close to 1. 5. The token will be assigned to the black topic with probability larger than 0.5. 3 Solution The token w1,1 will be assigned to the black topic with probability close to 1. The reason is that all the tokens in document 1 are currently assigned to the black topic. There is non-zero probability that w1,1 will be assigned to the white topic because α = 0.1, but the probability is small. The token w16,16 will be assigned to the while topic with probability close to 1. The reason is that all the tokens in document 16 are currently assigned to the white topic. There is non-zero probability that w16,16 will be assigned to the black topic because α = 0.1, but the probability is small. The token w10,6 will be assigned to the black topic with probability large than 0.5 because more tokens are assigned to black than white in d10 and the word ”bank” appears more often in the black topic. 4
© Copyright 2024 Paperzz