Probabilistic Graphical Models Zhongqing Wang The slides is adapted by (Coughlan, 2009), (Wang, 2012), (Gormley and Eisner, 2015), (Tang, 2015), and Daphne Koller’s PGM lessons An (Old) News • 2011 Turing Award goes to Prof. Judea Pearl, for his pioneer work on Probabilistic Graphical Models Outlines • Representation • Inference • Applications • Usage Representation Bayes Networks, MRFs, and Factor Graph Probabilistic Graphical Models: What and Why • PGMs: • A model for joint probability distribution over random variables. • Represent dependencies and independencies between the random variables. • Three types of PGMs • Directed graph: Bayesian Networks (BN) • Undirected graph: Markov Random Field (MRF) • Factor Graph Chain Rule for Bayesian Networks B A D C F E G P( ABCDEFG ) P( A) P( B) P(C | AB) P( D | A) P( E | C ) P( F | D) P(G | E ) Example: The Student Network Bayes Networks: Conditional Independence • a is independent of b given c • Equivalently • Notation Conditional Independence Conditional Independence Conditional Independence Note: this is the opposite of Example 1, with c observed. B A D C F E G The graph must be acyclic! BN must be DAG Representation of MRFs B A C factors Partition function Cliques and Maximal Cliques Clique Maximal Clique Examples of MRFs Examples of MRFs Joint Distribution of MRFs • where is the potential over clique C and • is the normalization coefficient; note: M K-state variables KM terms in Z. • Energies and the Boltzmann distribution Directed vs. Undirected Graphs 20 Factor Graphs • A factor graph is a more general graph • It allows us to be more explicit about the details of the factorization • An example: x2 x1 x3 Variable node Factor node fa fb fc fd p ( x1 , x2 , x3 ) f a ( x1 , x2 ) f b ( x1 , x2 ) f c ( x2 , x3 ) f d ( x3 ) Factor Graphs from Directed Graphs Factor Graphs from Undirected Graphs Inference Belief Propagation Inference Given a factor graph, two common tasks … • Compute the most likely joint assignment, x* = argmaxx p(X=x) • Compute the marginal distribution of variable Xi: p(Xi=xi) for each value xi Both consider all joint assignments. Both are NP-Hard in general. So, we turn to approximations. p(Xi=xi) = sum of p(X=x) over joint assignments with Xi=xi 24 25 Marginals bymany Sampling Graph Suppose we took samples on fromFactor the distribution over taggings: Sample 1: n v p d n Sample 2: n n v d n Sample 3: n v p d n Sample 4: v n p d n Sample 5: v n v d n Sample 6: n v p d n ψ0 X1 X0 <START> ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5 ψ1 ψ3 ψ5 ψ7 ψ9 time flies like an arrow 26 Marginals byi =Sampling on Factor The marginal p(X xi) gives the probability that Graph variable Xi takes value xi in a random sample Sample 1: n v p d n Sample 2: n n v d n Sample 3: n v p d n Sample 4: v n p d n Sample 5: v n v d n Sample 6: n v p d n ψ0 X1 X0 <START> ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5 ψ1 ψ3 ψ5 ψ7 ψ9 time flies like an arrow 27 Marginals by Sampling on Factor Graph Estimate the marginals as: n 4/6 v 2/6 n 3/6 v 3/6 p 4/6 v 2/6 d 6/6 n 6/6 Sample 1: n v p d n Sample 2: n n v d n Sample 3: n v p d n Sample 4: v n p d n Sample 5: v n v d n Sample 6: n v p d n ψ0 X1 X0 <START> ψ2 X2 ψ4 X3 ψ6 X4 ψ8 X5 ψ1 ψ3 ψ5 ψ7 ψ9 time flies like an arrow 28 How do we get marginals without sampling? That’s what Belief Propagation is all about! Why not just sample? • Sampling one joint assignment is also NP-hard in general. • In practice: Use MCMC (e.g., Gibbs sampling) as an anytime algorithm. • So draw an approximate sample fast, or run longer for a “good” sample. • Sampling finds the high-probability values xi efficiently. But it takes too many samples to see the low-probability ones. • How do you find p(“The quick brown fox …”) under a language model? • Draw random sentences to see how often you get it? Takes a long time. • Or multiply factors (trigram probabilities)? That’s what BP would do. Overview of Belief Propagation • Overview: iterative process in which neighboring variables “talk” to each other, passing messages such as: Message Update Overview of BP • Demo by Gormley and Eisner (2015) Demo of BP From Gormley and Eisner’s tutorial Great Ideas in ML: Message Passing Count the soldiers there's 1 of me 1 before you 2 before you 3 before you 4 before you 5 behind you 4 behind you 3 behind you 2 behind you adapted from MacKay (2003) textbook 5 before you 1 behind you 33 Great Ideas in ML: Message Passing Count the soldiers there's 1 of me Belief: Must be 22 + 11 + 3 = 6 of us 2 before you only see my incoming messages adapted from MacKay (2003) textbook 3 behind you 34 Great Ideas in ML: Message Passing Count the soldiers there's 1 of me 1 before you only see my incoming messages Belief: Belief: Must be Must be 11 + 1 + 4 = 6 of 22 + 11 + 3 = 6 of us us 4 behind you adapted from MacKay (2003) textbook 35 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11 here (= 7+3+1) adapted from MacKay (2003) textbook 36 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here (= 3+3+1) 3 here adapted from MacKay (2003) textbook 37 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 11 here (= 7+3+1) 7 here 3 here adapted from MacKay (2003) textbook 38 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 3 here adapted from MacKay (2003) textbook Belief: Must be 14 of us 39 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 3 here adapted from MacKay (2003) textbook Belief: Must be 14 of us 40 Message Passing in Belief Propagation v 6 n 6 a 9 v 1 n 6 a 3 My other factors think I’m a noun … … Ψ X … But my other variables and I think you’re a verb … v 6 n 1 a 3 Both of these messages judge the possible values of variable X. 41 Their product = belief at X = product of all 3 messages to X. The Sum-Product Algorithm (1) • Objective: i. to obtain an efficient, exact inference algorithm for finding marginals; ii. in situations where several marginals are required, to allow computations to be shared efficiently. The Sum-Product Algorithm (2) The Sum-Product Algorithm (3) Sum-product VS. Max-product Applications 基于社交信息的文本摘要(Wang et al., 2013) • 在线个人简历信息可以帮助人们联系其他拥有类似背景的 人们。并提供非常有价值的商业信息给人们 • 从在线简历文本中抽取个人信息 • 技能信息 • 文本摘要信息 抽取个人信息 linkedIn中一个个人简历的例子 抽取个人信息(续) 文本摘要信息 工作经历文本 LinkedIn不同社会关系的分布情况 linkedIn中人与人的关系网络 基于概率图模型构建个人简历关联模型 文本属性函数 1 exp k f k xik , yi Z1 i k 个人联系因子函数 g yi , y j exp ij yi y j 2 模型定义 • 对于一个网络G: P Y | X , G P X , G | Y P Y P X ,G P X | Y P Y | G P Y | X , G P Y | G P xi | yi i 属性函数与因子函数 d 1 P xi | yi exp j f j xij , yi Z1 j 1 1 P Y | G exp g i, j Z2 i jNB (i ) g yi , y j exp ij yi y j 2 对数似然目标函数 • EM算法获取权重 • BP算法预测模型 * arg max L 技能预测-实验 • 技能分布情况 文本摘要-实验 • 实验设置 • 我们在每个简历文本中选择40个单词用来构建摘要结果。 • 数据集包含了497个简历样本 • 我们使用200个简历文本作为测试样本,并且使用剩下的样本构建训练样 本。 • 我们使用ROUGE-1.5.5工具包用来进行验证 文本摘要-实验 • 实验结果 无监督学习 监督学习 Who will follow you back? (Hopcroft et al., 2011) On Twitter… 30% ? 100% ? 60% ? Ladygaga 1% ? Shiteng Obama Huwei JimmyQiao Interaction Retweet vs. reply *Retweeting seems to be more helpful Structural Balance (A) and (B) are balanced, but (C) and (D) are not. • Structural balance • Reciprocal relationships are balanced (88%); • Parasocial relationships are not (only 29%). Triad Factor Graph (TriFG) y2=friend y4 y2 TriFG model 3 v6 5 v4 h (y3, y4, y5) y5 y1 Input: Mobile Network y4=? y1=friend y5=non-friend y3 h (y1, y2, y3) y6 y3=? y6=non-friend f (v3u, v3s, y3) f (v1u, v1s, y1) u f (v5u, v5s, y5) s f (v2 , v2 , y2) f (v4u, v4s ,y4) v3 4 2 6 v2u, v2s v5 (v2, v3) v2 u v1 , v1 1 f (v6u, v6s ,y6) v1 (v2, v1) s v4u, v4s v3u, v3s (v4, v5) (v4, v3) Observations v5u, v5s v6u, v6s (v6, v5) (v4, v6) Usage Usage • The factor graph model toolkit • A very easy way to solve the PGM problem, just as SVM. +1 1:2 3:2 4:5 -1 3:2 5:2 6:3 +1 3:2 4:6 6:1 Attributes # Edge_1 1 2 # Edge_2 2 3 Factors / Connections Thanks Dr Jie Tang’s helpful tools Q&A Thanks
© Copyright 2026 Paperzz