homework2

HOMEWORK2
Data Mining
Deadline : 2017/4/4 19:00
Name :
Student ID :
1. Discuss the differences between dimensionality reduction based on aggregation
and dimensionality reduction based on techniques such as PCA and SVD.
2. Consider the following data set for a binary class problem.
(1) Calculate the information gain when splitting on A and B. Which attribute
would the decision tree induction algorithm choose?
(2) Calculate the gain in the Gini index when splitting on A and B. Which attribute
would the decision tree induction algorithm choose?
3. Consider the decision trees shown in the following figure. Assume they are
generated from a data set that contains 16 binary attributes and 3 classes, C1, C2,
and C3. Compute the total description length of each decision tree according to the
minimum description length principle.
请在 Deadline 前通过 FTP 提交作业,命名格式为学号+Homework2.pdf (e.g., 123456+Homework2.pdf)





The total description length of a tree is given by :
Cost(tree, data) = Cost(tree) + Cost(data | tree)
Each internal node of the tree is encoded by the ID of the splitting attribute. If
there are m attributes, the cost of encoding each attribute is 𝑙𝑜𝑔2 𝑚 bits.
Each leaf is encoded using the ID of the class it is associated with. If there are
k classes, the cost of encoding a class is 𝑙𝑜𝑔2 𝑘 bits.
Cost(tree) is the cost of encoding all the nodes in the tree. To simplify the
computation, you can assume that the total cost of the tree is obtained by
adding up the costs of encoding each internal node and each leaf node.
Cost(data | tree) is encoded using the classification errors the tree commits on
the training set. Each error is encoded by 𝑙𝑜𝑔2 𝑛 bits, where n is the total
number of training instances.
Which decision tree is better, according to the MDL principle? Please provide
detailed explanations.
4. Consider the data set shown in the following table.
(1) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-), and
请在 Deadline 前通过 FTP 提交作业,命名格式为学号+Homework2.pdf (e.g., 123456+Homework2.pdf)
P(C|-).
(2) Use the estimation of conditional probabilities given in the previous question to
predict the class label for a test sample (A= 0, B= 1, C= 0)using the naïve
Bayes approach.
(3) Estimate the conditional probabilities using the m-estimate approach with p =
1
2
and m = 4.
(4) Repeat part (2) using the conditional probabilities given in part (3).
(5) Compare the two methods for estimating probabilities. Which method is better
and why?
5. The following figure illustrates the Bayesian belief network for the data set shown
as follows. (Assume that all the attributes are binary).
Bayesian belief network
(1) Draw the probability table for each node in the above network.
(2) Use the Bayesian network to compute P(Engine = Bad, Air Conditioner =
Broken).
请在 Deadline 前通过 FTP 提交作业,命名格式为学号+Homework2.pdf (e.g., 123456+Homework2.pdf)