Name: Final Exam Statistical Pattern Recognition Department of

Name: ________________________
Final Exam
Statistical Pattern Recognition
Department of Computer Engineering
Sharif University of Technology
Fall 2006
1. You must work out all the problems.
2. Show all your work in the space provided to justify your answers.
3. You may use your text book & class notes in this exam.
No
Max
Problem 1
40
Problem 2
20
Problem 3
30
Problem 4
15
Total
105
Good Luck!
1
Score
1. Short Answer and Conceptual questions
a. True/False
1.
……….
Every continuous function can be computed by some multi-layer
perceptron with arbitrary unit function.
2.
………. If we use mean-squared error to train a perceptron, we will always
successfully separate linearly separable classes.
3.
……….
4.
……….
We would expect the support vectors to remain the same in general as
we move from a linear kernel to higher order polynomial kernels.
Any decision boundary that we get from a generative model with classconditional Gaussian distributions could in principle be reproduced with an
SVM and a polynomial kernel of degree less than or equal to three.
b. The Following diagrams represent training examples for a Boolean-valued concept
function plotted in feature space. In each case how many nodes is needed for correctly
classification by a perceptron?
#Nodes:
#Nodes:
#Nodes:
#Nodes:
c. The Following diagrams represent training examples for a Boolean-valued concept
function plotted in feature space. Show how each of the following algorithms might
partition the space based on these examples. (No need for calculations… just give a
qualitative answer.)
Neural Network with 1 hidden layer of 2
units
Neural Network with unlimited hidden
layers/units
1-Nearest Neighbor
K-means (k=2)
2
GMM with 2 Gaussian mixtures Models
GMM with 3 Gaussian Mixture Models
SVM with Linear kernel
SVM with polynomial kernel
d. The Following diagrams represent training examples for a Boolean-valued concept
function plotted in feature space. Consider we have only (v1.v2), (v1.v2)2, RBF σ = 2,
RBF σ = 0.1 as kernels. For each diagram, write the simplest kernel that correctly
classifies all the data, the number of support vectors, and sketch the middle of the
separating street.
Kernel:
……….
#SVs: ……
Kernel:
……….
#SVs: ……
Kernel:
……….
#SVs: ……
Kernel:
……….
#SVs: ……
3
2. Hidden Markov Models (HMM)
Consider the following HMM:
Where
aij  P (qt 1  S j | qt  S i )
bi (k )  P (Ot  k | qt  S i )
Suppose we have observed this sequence: XZXYYZYZZ
(In long-hand: O1  X ,O2  Z ,O3  X ,O 4 Y ,O5 Y ,O6  Z ,O7 Y ,O8  Z ,O9  Z ).
Fill in the following table with i (t ) values, remembering the definition:
i (t )  P (O1  O 2  ...Ot  qt  s i )
So for example, 3 (2)  P (O1  X  O2  Z  O3  X  q 3  S 2 ) .
t
t (1)
t (2)
t (3)
1
2
3
4
5
6
7
8
9
Warning: this is a question that will take a few minutes if you really understand
HMMs, but could take hours if you don't!
4
3. Neural Network
a. Consider a neural network with inputs x1, x2,… xn, which are either 0 or 1. This
network is specified by giving the weights on the links and the activation function g at
the node. Design a network that computes the "majority function" for n input nodes. A
majority function should output 1 if at least half the inputs are high, and 0 otherwise.
b. Fill in the missing weights for each of the nodes in the following perceptron
network. Make the following assumptions:

Perceptron output 0 or 1

A, B, C are classes

The Lines  ,  ,  represent decision boundaries.

The directions of the arrows shown on the graphs represent the side of each
boundary that causes a perceptron to output 1.
5
6
4. Unsupervised learning and EM
The following diagram depicts a mixture of two Gaussians G1 and G2. The mixture
model assumption is that each data point is generated by first picking one of the two
Gaussians at random (a fifty-fifty chance of either) and then generating the (scalar)
value of the data point from the chosen Gaussian.
a. For each data point above label which Gaussian, that data point is most likely have
been generated by.
b. On the following figure sketch roughly where you believe the means of the two
Gaussians would be after one iteration of EM.
A hat distribution is a one dimensional probability distribution with some slightly
different properties compared with conventional Gaussian. If x generated from a Hat
with mean μ, then its probability density function is given by:
0
x   1

1  (   x )   1  x  

p (x |  )  
1  (x   )   x    1

0
 1  x
Assume that the following data was generated by a mixture of three hats:

Class w1 is twice as likely as either of the others. It has probability P(w1)=1/2
and mean μ1.

Class w2 has probability P(w2)=1/4 and mean μ2.

Class w3 has probability P(w3)=1/4 and mean μ3.
7
Once the class of a data point has been chosen, its value is generated according to that
class's hat distribution.
c. Give values for μ1, μ2 and μ3 that maximize the likelihood of the above data. The
values do not have to be integers.
d. What is the numerical value of the likelihood of this data given the values of μ1, μ2
and μ3?
8