Intelligent Information System Take

Intelligent Information System
Take-Home Examination
Due on 12th December
1. Answer the questions on the following Bayesian network.
(1) Express the joint probability P(B,E,A,J,M) by factors of conditional probabilities and prior
probabilities.
(2) Are J and M independent if A is observed? What about B and E if A is observed? Show the
validities of your answers.
(3) In the graph, suppose A=T. Then if E=T, it is very difficult to infer whether B is T or not. What
is the terminology of this effect?
2. Express Naïve Bayes classifier by Bayes belief network. How can we represent the conditional
probability of a n-dimensional feature vector given a class label.
3. Answer the questions on the following MRF.
(1) Express the joint probability distribution of the random variables by factoring the clique
potential function.
(2) If each potential function in (1) is expressed by Gibb’s distribution, what is the form of joint
probability distribution in (1).
(3) Usually the inference in MRF is a process to find the hidden state that maximizes the a
posteriori probability for a given observation. State why this process is equivalent to find the
hidden state that minimizes the energy function, when Gibb’s distribution is utilized as in (2).
4. What is the difference between Markov blanket in Bayesian network and that in Markov
random field? What is the role of the blanket in the inference process of a graphical model
especially the inference based on Gibb’s sampling.
5. For following Bayesian belief net, find the joint probability table for 16 states of 4 binary
random variables by ancestral sampling method.(You need a program to find the counts of
corresponding samples.) Read the article that has been given to you in the class and summarize
the Gibb’s sampling techniques for this rain network.
6. Derive the log likelihood of observed variables given parameter theta is written as the sum of
lower bound which can be a function of a distribution q of latent random variables and theta, and
the Kullback-Leibler distance between the conditional probability of latent variables given
observed random variables and theta and the distribution q.
7. What are the names of 4 types of inference tasks in HMM for sequential data. Briefly express
the tasks. Now consider the smoothing task one of them. Then what are the messages to
efficiently solve the smoothing task? Specify the messages with Baum-Welch algorithm. Which
task among four is the Viterbi algorithm useful?
8. What are the requirements of valid kernel? Why can we increase the dimension of feature space
to make better discrimination without spending too much computational cost using kernel trick?
9. What is the overfitting problem in learning? Does it occur frequently when we have over ten
times more training data than the number of parameters to be adjusted? Do we have more
chance to get in the problem as the number of parameters to be learned is decreasing? Why we
add ridge or lasso constraint to prevent the overfitting problem? Which optimization problem
adding ridge or lasso constraint is easier to solve but not better? In sparse representation which
type of constraint is desirable. Please give answers with your brief statement with the justifications
of your answers..
10. State the generation process of a document in following supervised LDA. Also precisely
describe the rating process in the model according to the corresponding model parameters.
11. In the following relational topic model briefly describe the types of link probability functions
with their meanings.
12. Explain the difference of following terms.
Supervised learning, Semi-supervised learning, Unsupervised learning
Self-taught learning, Reinforcement learning
13. Describe the procedures how to construct a deep belief network and how to fine tune the
parameters.
14. Calculate the total number of parameters to be adjusted in LeNet 5.
15. In order to avoid the overfitting problem due to the limited number of training data, what
kind of techniques have been used in ImageNet.
16. What is the “drop out” technique in the training of deep networks? Please read the article
about the technique.
17. How can we implement the sparse constraint in the training of neural networks? You can rest
on the article in the class.
18. What are the advantages of denoising auto-encoder against conventional auto-encoder.