Intelligent Information System Take-Home Examination Due on 12th December 1. Answer the questions on the following Bayesian network. (1) Express the joint probability P(B,E,A,J,M) by factors of conditional probabilities and prior probabilities. (2) Are J and M independent if A is observed? What about B and E if A is observed? Show the validities of your answers. (3) In the graph, suppose A=T. Then if E=T, it is very difficult to infer whether B is T or not. What is the terminology of this effect? 2. Express Naïve Bayes classifier by Bayes belief network. How can we represent the conditional probability of a n-dimensional feature vector given a class label. 3. Answer the questions on the following MRF. (1) Express the joint probability distribution of the random variables by factoring the clique potential function. (2) If each potential function in (1) is expressed by Gibb’s distribution, what is the form of joint probability distribution in (1). (3) Usually the inference in MRF is a process to find the hidden state that maximizes the a posteriori probability for a given observation. State why this process is equivalent to find the hidden state that minimizes the energy function, when Gibb’s distribution is utilized as in (2). 4. What is the difference between Markov blanket in Bayesian network and that in Markov random field? What is the role of the blanket in the inference process of a graphical model especially the inference based on Gibb’s sampling. 5. For following Bayesian belief net, find the joint probability table for 16 states of 4 binary random variables by ancestral sampling method.(You need a program to find the counts of corresponding samples.) Read the article that has been given to you in the class and summarize the Gibb’s sampling techniques for this rain network. 6. Derive the log likelihood of observed variables given parameter theta is written as the sum of lower bound which can be a function of a distribution q of latent random variables and theta, and the Kullback-Leibler distance between the conditional probability of latent variables given observed random variables and theta and the distribution q. 7. What are the names of 4 types of inference tasks in HMM for sequential data. Briefly express the tasks. Now consider the smoothing task one of them. Then what are the messages to efficiently solve the smoothing task? Specify the messages with Baum-Welch algorithm. Which task among four is the Viterbi algorithm useful? 8. What are the requirements of valid kernel? Why can we increase the dimension of feature space to make better discrimination without spending too much computational cost using kernel trick? 9. What is the overfitting problem in learning? Does it occur frequently when we have over ten times more training data than the number of parameters to be adjusted? Do we have more chance to get in the problem as the number of parameters to be learned is decreasing? Why we add ridge or lasso constraint to prevent the overfitting problem? Which optimization problem adding ridge or lasso constraint is easier to solve but not better? In sparse representation which type of constraint is desirable. Please give answers with your brief statement with the justifications of your answers.. 10. State the generation process of a document in following supervised LDA. Also precisely describe the rating process in the model according to the corresponding model parameters. 11. In the following relational topic model briefly describe the types of link probability functions with their meanings. 12. Explain the difference of following terms. Supervised learning, Semi-supervised learning, Unsupervised learning Self-taught learning, Reinforcement learning 13. Describe the procedures how to construct a deep belief network and how to fine tune the parameters. 14. Calculate the total number of parameters to be adjusted in LeNet 5. 15. In order to avoid the overfitting problem due to the limited number of training data, what kind of techniques have been used in ImageNet. 16. What is the “drop out” technique in the training of deep networks? Please read the article about the technique. 17. How can we implement the sparse constraint in the training of neural networks? You can rest on the article in the class. 18. What are the advantages of denoising auto-encoder against conventional auto-encoder.
© Copyright 2025 Paperzz