Completely Observed Graphical Models Samson Cheung Outline Motivation Directed Acyclic Graph Undirected Graph Iterative Proportional Fitting (IPF) Relation with Junction Tree Density Estimation Problem: Given a graphical model G and N data points, compute the maximum likelihood estimate of the density. X1,i=0 X3,i=0 X2,i=1 X1 P(X1) 0 1 X1 X2 P(X2|X1) 0 0 1 0 0 1 1 1 This lecture X4,i=0 i=1,…,N I : P(X1| I) 2 : P(X2|X1,2) 3 : P(X3|X1,3) 4 : P(X4|X2,X3,3) Maximum Likelihood Parameter Estimation 1. Form the likelihood of N IID data points: 2. Compute logarithm 3. Take the derivative with respect to , set it to zero and solve for Directed Graph X1,i=0 X3,i=0 X2,i=1 X4,i=0 i=1,…,N e.g. x1 x2 x3 x4 0 0 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 2 3 Why (1) ? All possible configurations of xv and xπ(v) # times xv,π(v) occurs All possible configurations of xv Why (2) ? Undirected Graph e.g. C2,i C1,i i=1,…,N C1 C2 010 110 110 110 111 011 110 110 010 110 000 100 111 111 Iterative Proportional Fitting Idea: Modify C(xC) so that the marginal p(XC=xC) matches pML(XC=xC) How? Let’s say after tth iteration, every potential function converges except for one at clique C. Call this potential function C (t)(XC). Goal: Change C (t)(XC) to C (t+1)(XC) so that p(t+1)(XC=xC)= pML(XC=xC) IPF cont. All possible configurations of XV\C with XC fixed at xC The marginal probability of undirected graph is computed as follows: Try this iteration step Is p(t+1)(XC=xC)= pML(XC=xC)? Yes, provided that Z(t+1)=Z(t) Is Z(t+1)=Z(t)? Amazing yes! Plug in p(t)(xC) from last slide Overall IPF algorithm But changing C(xC) may affect other p(XD), so iterate. Here is the overall algorithm: 1. 2. Initialize all the potential functions by assigning 1 to each configuration. Set t=0 and compute Z(0) For each clique C: How to do this 1. 2. 3. 4. Inference step: compute marginal P(t)(xC) Iterate step: efficiently? Check for convergence, say using maximum change < t=t+1, goto 2. Homework: work out the example on slide 8. Does IPF converge? : values of the potential functions ’s Take the derivative of the log-likelihood and we have (read the derivation in the chapter) We cannot find C directly as it got canceled. However, each Iteration in IPF ensures that the above condition is SATISFIED, provided every other potential function is fixed. IPF as coordinate ascent Each iteration MAXIMIZES the log-likelihood in the direction of a potential function IPF must converge to a LOCAL maximum as l()≤1. IPF is a Coordinate Ascent technique ← the movement of candidates is always parallel to one of the coordinates. Alternative view from KL Empirical distribution From last chapter, we know that Focus on the marginal on clique C Parameter for clique C Alternative view from KL (2) Thus minimizing in the direction of C is the same as minimizing But . Since each iteration step of IPF minimizes in the C direction.
© Copyright 2026 Paperzz