Name: ________________________ Final Exam Statistical Pattern Recognition Department of Computer Engineering Sharif University of Technology Fall 2006 1. You must work out all the problems. 2. Show all your work in the space provided to justify your answers. 3. You may use your text book & class notes in this exam. No Max Problem 1 40 Problem 2 20 Problem 3 30 Problem 4 15 Total 105 Good Luck! 1 Score 1. Short Answer and Conceptual questions a. True/False 1. ………. Every continuous function can be computed by some multi-layer perceptron with arbitrary unit function. 2. ………. If we use mean-squared error to train a perceptron, we will always successfully separate linearly separable classes. 3. ………. 4. ………. We would expect the support vectors to remain the same in general as we move from a linear kernel to higher order polynomial kernels. Any decision boundary that we get from a generative model with classconditional Gaussian distributions could in principle be reproduced with an SVM and a polynomial kernel of degree less than or equal to three. b. The Following diagrams represent training examples for a Boolean-valued concept function plotted in feature space. In each case how many nodes is needed for correctly classification by a perceptron? #Nodes: #Nodes: #Nodes: #Nodes: c. The Following diagrams represent training examples for a Boolean-valued concept function plotted in feature space. Show how each of the following algorithms might partition the space based on these examples. (No need for calculations… just give a qualitative answer.) Neural Network with 1 hidden layer of 2 units Neural Network with unlimited hidden layers/units 1-Nearest Neighbor K-means (k=2) 2 GMM with 2 Gaussian mixtures Models GMM with 3 Gaussian Mixture Models SVM with Linear kernel SVM with polynomial kernel d. The Following diagrams represent training examples for a Boolean-valued concept function plotted in feature space. Consider we have only (v1.v2), (v1.v2)2, RBF σ = 2, RBF σ = 0.1 as kernels. For each diagram, write the simplest kernel that correctly classifies all the data, the number of support vectors, and sketch the middle of the separating street. Kernel: ………. #SVs: …… Kernel: ………. #SVs: …… Kernel: ………. #SVs: …… Kernel: ………. #SVs: …… 3 2. Hidden Markov Models (HMM) Consider the following HMM: Where aij P (qt 1 S j | qt S i ) bi (k ) P (Ot k | qt S i ) Suppose we have observed this sequence: XZXYYZYZZ (In long-hand: O1 X ,O2 Z ,O3 X ,O 4 Y ,O5 Y ,O6 Z ,O7 Y ,O8 Z ,O9 Z ). Fill in the following table with i (t ) values, remembering the definition: i (t ) P (O1 O 2 ...Ot qt s i ) So for example, 3 (2) P (O1 X O2 Z O3 X q 3 S 2 ) . t t (1) t (2) t (3) 1 2 3 4 5 6 7 8 9 Warning: this is a question that will take a few minutes if you really understand HMMs, but could take hours if you don't! 4 3. Neural Network a. Consider a neural network with inputs x1, x2,… xn, which are either 0 or 1. This network is specified by giving the weights on the links and the activation function g at the node. Design a network that computes the "majority function" for n input nodes. A majority function should output 1 if at least half the inputs are high, and 0 otherwise. b. Fill in the missing weights for each of the nodes in the following perceptron network. Make the following assumptions: Perceptron output 0 or 1 A, B, C are classes The Lines , , represent decision boundaries. The directions of the arrows shown on the graphs represent the side of each boundary that causes a perceptron to output 1. 5 6 4. Unsupervised learning and EM The following diagram depicts a mixture of two Gaussians G1 and G2. The mixture model assumption is that each data point is generated by first picking one of the two Gaussians at random (a fifty-fifty chance of either) and then generating the (scalar) value of the data point from the chosen Gaussian. a. For each data point above label which Gaussian, that data point is most likely have been generated by. b. On the following figure sketch roughly where you believe the means of the two Gaussians would be after one iteration of EM. A hat distribution is a one dimensional probability distribution with some slightly different properties compared with conventional Gaussian. If x generated from a Hat with mean μ, then its probability density function is given by: 0 x 1 1 ( x ) 1 x p (x | ) 1 (x ) x 1 0 1 x Assume that the following data was generated by a mixture of three hats: Class w1 is twice as likely as either of the others. It has probability P(w1)=1/2 and mean μ1. Class w2 has probability P(w2)=1/4 and mean μ2. Class w3 has probability P(w3)=1/4 and mean μ3. 7 Once the class of a data point has been chosen, its value is generated according to that class's hat distribution. c. Give values for μ1, μ2 and μ3 that maximize the likelihood of the above data. The values do not have to be integers. d. What is the numerical value of the likelihood of this data given the values of μ1, μ2 and μ3? 8
© Copyright 2026 Paperzz