Homework 4 1 40-957 Special Topics in Artificial Intelligence: Probabilistic Graphical Models Homework 4 (70 + 20 Pts) Due data: 1393/3/2 Directions: Submit your codes and report to "[email protected]" with the subject "HW4 Submission-student number" (for example: "HW4 Submission-90207018") Late submission policy: late submission carries a penalty of 10% off your assignment per day after the deadline. Problem 1 (30 Pts) In this question, you will implement algorithms for learning an HMM to model a customer and also predict his behavior. At each time step the customer is in one of the states"rich" or "poor", and according to his state his probability of buying "expensive" or "cheap" items will change. Since we only know the history of customer purchases, we can model the problem by an HMM with 2 hidden states ("rich" and "poor")and two kinds of observation (whether he is buying "expensive" or "cheap" products). The parameters of this models (initial probabilities, transition probabilities, and observation probabilities) should be learned from the history of the T=5000 previous observations (which is given in the "Observations.mat" file). We then could use the model to predict the next observations. Hint:You may need to work in log-space to avoid numerical problems when dealing with joint probabilities of O(T) variables. 2 1.1 (10 Pts) Implement the Baum-Welch algorithm for learning the HMM parameters (You are not allowed to use Matlab HMM functions).Run the algorithm for 10 different random initials. Does the algorithm always converge to the same value? If not, choose parameters with the biggest likelihood function l(θ) = P (observations|θ). Note:You can check the convergence of the algorithm by measuring the change in the value of the parameters (for example,you can stop the iterations when the value of the no parameter is changing above 0.00001). 1.2 (6 Pts) In this part, we will use an exhaustive search to find the model parameter θ = (π, A, B) which maximizes the likelihood function l(θ). Note that there is a total of 5 free parameters:π1 , A1,2 , A2,1 , B1,1 , B2,1 . Grid the whole parameter space by changing each parameter from 0 to 1 by 0.1 steps (115 combinations in total) and each time compute the log likelihood of the first 500 observations and store the results in an array. (It may take about an hour to compute them all.) Using this sampled loglikelihood function, answer the following questions: 1.2.1 How many local maxima does the likelihood function have? What is the worst one?(We call a point in this 5-dimensional grid a local maximum, if moving a step in each of the 10 axis-aligned directions, cause the likelihood to decrease.) 1.2.2 Find the parameter set which globally maximize the likelihood. Is it the same as the one you have found in part 1.1? 1.2.3 Is the global maximum unique? Why? 1.3 (4 Pts) Consider an HMM with n states, and suppose that θ∗ = (π ∗ , A∗ , B ∗ ) is a global maximum of the likelihood function. Prove that if all elements of A∗ are distinct (i.e. ∀i, j, k, l : (i, j) ̸= (k, l) → A∗(i,j) ̸= A∗(k,l) ), there exists at least n! distinct θs maximizing the likelihood (which θ∗ is just one of them). 1.4 (10 Pts) Using the best parameters obtained in part 2, and considering that O1 , ..., O5000 are all given, answer the following questions: ∗ 1.4.1 Find the most probable state sequence S1∗ , ..., S5000 using Viterbi algorithm. Mention the last 50 states in your report. S1∗ , ..., ST∗ = argmaxS1 ,...,ST P (S1 , ..., ST |O1 , ..., OT , θ∗ ) (1) 1.4.2 Find the most probable state of the customer at each time. Mention the last 50 states in your report. ∀i : Ŝi∗ = argmaxSi P (Si |O1 , ..., OT , θ∗ ) (2) 1.4.3 What are the most probable next three observations: ∗ ∗ ∗ (O5001 , O5002 , O5003 ) = argmaxO5001 ,O5002 ,O5003 P (O5001 , O5002 , O5003 |O1 , ..., OT , θ∗ ) Homework 4 3 Problem 2 (15 Pts) Assume we have a HMM with the following observation model (Gaussian Mixture model) p(xt |zt = i, θ) = K ∑ wik N (xt |µik , Σik ) i = 1, ..., N, t = 1, ..., T (3) k=1 where xt ∈ RM . In the above equation, θ = (Π, W, A, B), where Π = [π1 , ..., πN ] (πi = p(z1 = i)) is the initial state distribution, W = {wik }(i = 1, ...N ; k = 1, ..., K) is the mixing proportion parameter, A(i, j) = p(zt = j|zt−1 = i) is the transition matrix, and B = {µik , Σik }(i = 1, ..., N ; k = 1, ..., K) are the parameters of the class-conditional densities. Assume we want to learn the parameters θ using a set of training data {X j = [xj1 , ..., xjT ]}P j=1 based on EM algorithm. However, in many applications, the observations are high-dimensional vectors (M is very large). Hence, estimating the parameters of N K Gaussians (N KM + N KM 2 values) requires a large amount of data. A simple solution is to use just K gaussians instead of N K gaussians, and to let the state influence the mixing weights but not the means and covariances. This relaxed model of HMM is called tied-mixture HMM. 2.1 (5 Pts) Plot the graphical representation of this model. 2.2 (10 Pts) Derive the E step and M step for learning θ using the training data. Problem 3 (15 Pts) CRFs have many applications in different fields such as Computer Vision, Speech Recognition, NLP. One of the applications of CRFs in text analysis, is finding the Norm Phrases (NPs) in a sentences. For instance, consider the following sentence: "I am the heisenberg, and I want to kill you now at this place." we denote xi as the tokens in the sentence, and yi ∈ Γ = {B, I, O} as the labels, where B, I, O, stand for Begining of an NP, Intermediate token in NP, and Others respectively. For example for the above sentence, the labels whold be: I [B] am [O] the [B] heisenberg [I], and [O] I [B] want [O] to [O] kill [O] you [B] now [O] at [O] this [B] place [I]. Now, consider the following CRF model for the above problem: N ∑ 1 T P (y|x; w) = exp{w f (xi , yi , yi−1 )} Z(x; w) i=1 (4) where Z(x; w) = ∑ y ′ ∈ΓN exp{wT N ∑ ′ f (xi , yi′ , yi−1 )} (5) i=1 and w ∈ Rd is the free parameter of the model, f : Σ × Γ × Γ → Rd is the feature vector (Σ is the set of English vocabularies), and N is the number of words in the sentence. 4 3.1 (3 Pts) Draw the graphical representation of the above model (for simplicity assume that f can be decomposed as f (xi , yi , yi−1 ) = g(xi , yi ) + h(yi , yi−1 )). 3.2 (2 Pts) Define two sample feature functions for this problem. 3.3 (6 Pts) Given the sequence {x1 , ..., xN }, w, f , and the CRF, suggest a polynomial time algorithm which calculates the marginal probability that subsequence {xj , ...xj+k } is a NP. Hint: a sequence {xj , ..., xj+k } is a NP if and only if yj = B and yj+1 = ... = yj+k = I. 3.3 (4 Pts) In order to learn the free parameter w of the CRF based on the data (t) (t) D = {xj , yj }N j=1 using ML technique, we should maximize the following the conditional log-likelihood function: L(w) = ∑ log p(y|x; w) = (x,y)∈D ) N ∑ ( ∑ wT f (xi , yi , yi−1 ) − log Z(x; w) (x,y)∈D i=1 Show that: ∂L = ∂w N ∑ (∑ (x,y)∈D ) f (xi , yi , yi−1 ) − ′ )] Ep(y′ |x;w) [f (xi , yi′ , yi−1 i=1 Problem 4 (30 Pts) In this problem, you are going to implement a Kalman filter for object tracking in the provided video sequences (in the video sequence you should track the red car). A sample frame of the sequence is provided in Fig. 1. The implemented Kalman filter should estimate the state xt = (xt , yt , ht , wt ) of the red car, where (xt , yt ) denotes the location of the upper left corner of the bounding box in the frame t and ht and wt denote the height and the width of the bounding box, respectively. The output of your code must be a video sequence in which, each frame should display the state of the object as a bounding box (Fig. 2). To implement Kalman filter, consider the state and measurement equations, xk = Fk xk−1 + vk−1 zk = Hk xk + nk where vk−1 ∼ N (0, Qk−1 ) and nk ∼ N (0, Rk ) are state and measurement noise, Fk = Hk = I, and Qk = Rk = diag(σx , σy , σh , σw ). Provided that noises are Gaussian and state and measurement equations being linear. State probability given all observations so far, p(xk |z1:k ) is always Gaussian. So we need only update mean and covariance of state probability distribution. Matlab skeleton code and supplementary functions have been provided. Functions are provided to accomplish tasks such as reading files, displaying images, initializing the tracker. Hints are also provided in the comments of the skeleton code. Homework 4 5 Figure 1 A sample frame of the problem 1’s video sequences. 4.1 (20 Pts) Implement the tracker. You must do the following steps: 1. Initialize state, x0 with the object position in the first frame. Set Qk = Rk = diag(4, 4, 2, 2). 2. Predict object position using p(xk |z1:k−1 ) = N (mk|k−1 , Pk|k−1 ). 3. To find the measurement zk , generate 100 object candidate by sampling from p(xk |z1:k−1 ) and consider the best one as zk . To find the best candidate you must define an observation model (the observation model computes the likelihood of the observed data from the image given the corresponding state). This should be done by comparing a color histogram extracted from the candidate object bounding box to a known color model extracted beforehand. You should model the likelihood as P (xk |yk ) = exp(−λD(h, h∗ )), where yk is the image of current frame and D(h, h∗ ) is the KL divergence of the two histograms (you may also use Bhattacharyya distance), h is a color histogram corresponding to the candidate xk and h∗ is a known color model (histogram of the manually initialized bounding box of the object in the first frame or best candidate in previous frame). For your convenience, set λ = 15. Figure 2 Sample frames of the output video sequence. 6 4. Update object state equation using observation zk by p(xk |z1:k ) = N (mk|k , Pk|k ). In the case of zero mean guassian noises, parameters can be easily found as [1]: mk|k−1 = Fk mk−1|k−1 Pk|k−1 = Qk−1 + Fk Pk−1|k−1 FkT mk|k = mk|k−1 + Kk (zk − Hk mk|k−1 ) Pk|k = Pk|k−1 − Kk Hk Pk|k−1 where Sk = Hk Pk|k−1 HkT + Rk , Kk = Pk|k−1 HkT Sk−1 4.2 (10 Pts) Derive the Kalman filter equations for the mentioned linearGaussian filtering model with non-zero-mean noises, vk−1 ∼ N (mq , Q) and nk ∼ N (mr , R).
© Copyright 2026 Paperzz