Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Advanced I WS 06/07 MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION PAP SHUNT VENTMACH VENTLUNG DISCONNECT VENITUBE PRESS MINOVL ANAPHYLAXIS SAO2 TPR HYPOVOLEMIA LVEDVOLUME CVP LVFAILURE FIO2 VENTALV PVSAT ARTCO2 INSUFFANESTH CATECHOL STROEVOLUME HISTORY ERRBLOWOUTPUT PCWP Graphical Models - Learning - EXPCO2 CO HR HREKG ERRCAUTER Structure Learning HRSAT HRBP BP Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany Learning With Bayesian Networks Fixed structure A B Fixed variables A ? B fully Partially observed Easiest problem counting Selection of arcs New domain with no domain expert Data mining Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks Encompasses to difficult subproblem, „Only“ Structural EM is known Parameter Estimation Stucture learing ? Hidden variables A ? B ? H Scientific discouvery Bayesian Networks - Learning Advanced I WS 06/07 Advanced I WS 06/07 Why Struggle for Accurate Structure? Earthquake Alarm Set Burglary Sound Earthquake Alarm Set Burglary Sound • Cannot be compensated for by fitting parameters • Wrong assumptions about domain structure Adding an arc Earthquake Alarm Set Burglary Sound • Increases the number of parameters to be estimated • Wrong assumptions about domain structure Bayesian Networks - Learning Missing an arc E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> B E A E, B, A <Y,?,N> <Y,N,?> <N,N,Y> <N,Y,Y> . . <?,Y,Y> Unknown Structure, (In)complete Data • Network structure is not specified – Learnerr needs to select arcs & estimate parameters B E • Data does not contain missing values E B P(A | E,B) e b ? ? e b ? ? e b ? ? e b ? ? Learning algorithm • Network structure is not specified • Data contains missing values – Need to consider assignments to missing values A E B P(A | E,B) e b .9 .1 e b .7 .3 e b .8 .2 e b .99 .01 Bayesian Networks - Learning Advanced I WS 06/07 Score-based Learning Advanced I WS 06/07 Define scoring function that evaluates how well a structure matches the data score B E E E A A B B A Search for a structure that maximizes the score Bayesian Networks - Learning E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> Structure Search as Optimization Advanced I WS 06/07 Input: Output: – A network that maximizes the score Bayesian Networks - Learning – Training data – Scoring function – Set of possible structures Advanced I WS 06/07 Heuristic Search • Define a search space: • Traverse space looking for high-scoring structures • Search techniques: Theorem: Finding – Greedy hill-climbing – Best first search – Simulated Annealing – ... maximal scoring structure with at most k parents per node is NP-hard for k > 1 Bayesian Networks - Learning – search states are possible structures – operators make small changes to structure Advanced I WS 06/07 Typically: Local Search • Start with a given network – empty network, best tree , a random network • At each iteration – Evaluate all possible changes – Apply change based on score • Stop when no modification improves score S E D Bayesian Networks - Learning C Typically: Local Search • Start with a given network – empty network, best tree , a random network S C E • At each iteration – Evaluate all possible changes – Apply change based on score D • Stop when no modification improves score S C E D Bayesian Networks - Learning Advanced I WS 06/07 Typically: Local Search • Start with a given network – empty network, best tree , a random network S C E • At each iteration – Evaluate all possible changes – Apply change based on score D • Stop when no modification improves score S C E S D C E D Bayesian Networks - Learning Advanced I WS 06/07 Typically: Local Search • Start with a given network – empty network, best tree , a random network S C E • At each iteration – Evaluate all possible changes – Apply change based on score D • Stop when no modification improves score S C E S C E D S D C E D Bayesian Networks - Learning Advanced I WS 06/07 Typically: Local Search Advanced I WS 06/07 If data is complete: To update score after local change, only re-score (counting) families that changed S C E D S E S C E D S D If data is incomplete: To update score after local change, reran parameter estimation algorithm C E D Bayesian Networks - Learning C Advanced I WS 06/07 Local Search in Practice • Local search can get stuck in: • Standard heuristics can escape both – Random restarts – TABU search – Simulated annealing Bayesian Networks - Learning – Local Maxima: • All one-edge changes reduce the score – Plateaux: • Some one-edge changes leave the score unchanged Advanced I WS 06/07 Local Search in Practice • Using LL as score, adding arcs always helps – Max score attained by fully connected network – Overfitting: A bad idea… • Minimum Description Length: <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. log N MDL ( BN | D ) log P( D | , G ) || 2 DL(Data|model) DL(Model) • Other: BIC (Bayesian Information Criterion), Bayesian score (BDe) Bayesian Networks - Learning – Learning data compression Local Search in Practice Advanced I WS 06/07 • Perform EM for each candidate graph G1 G3 G2 Parameter space Parametric optimization (EM) G4 Gn Bayesian Networks - Learning Local Maximum Advanced I WS 06/07 Local Search in Practice • Perform EM for each candidate graph G1 G3 G2 Parameter space Parametric optimization (EM) Computationally expensive: Gn G4 Parameter optimization via EM — non-trivial Need to perform EM for all candidate structures Spend time even on poor candidates In practice, considers only a few candidates Bayesian Networks - Learning Local Maximum Advanced I WS 06/07 Structural EM [Friedman et al. 98] Recall, in complete data we had Idea: • Instead of optimizing the real score… • Find decomposable alternative score • Such that maximizing new score improvement in real score Bayesian Networks - Learning –Decomposition efficient search Advanced I WS 06/07 Structural EM [Friedman et al. 98] Outline: • Perform search in (Structure, Parameters) space • At each iteration, use current model for finding either: – Better scoring parameters: “parametric” EM step or – Better scoring structure: “structural” EM step Bayesian Networks - Learning Idea: • Use current model to help evaluate new structures Structural EM Advanced I WS 06/07 [Friedman et al. 98] Reiterate X1 X2 X3 H Y1 Y2 Y3 Training Data Expected Counts N(X1) N(X2) N(X3) N(H, X1, X1, X3) N(Y1, H) N(Y2, H) N(Y3, H) N(X2,X1) N(H, X1, X3) N(Y1, X2) N(Y2, Y1, H) Score & Parameterize X1 X2 X3 H Y1 X1 Y2 X2 Y3 X3 H Y1 Y2 Y3 Bayesian Networks - Learning Computation Structure Learning: incomplete data Data E B A Current model Expectation Inference: P(S|X=0,D=1,C=0,B=1) Maximization Parameters EM-algorithm: iterate until convergence S X D C <? 0 1 0 <1 1 ? 0 <0 0 0 ? <? ? 0 ? ……… B 1> 1> ?> 1> Expected counts S 1 1 0 1 X D C 0 1 0 1 1 0 0 0 0 0 0 0 ……….. B 1 1 0 1 Bayesian Networks - Learning Advanced I WS 06/07 Structure Learning: incomplete data E E B A B Data A Current model S X D C <? 0 1 0 <1 1 ? 0 <0 0 0 ? <? ? 0 ? ……… Expectation Inference: P(S|X=0,D=1,C=0,B=1) Expected counts Maximization Parameters S 1 1 0 1 Maximization Structure SEM-algorithm: iterate until convergence B E X D C 0 1 0 1 1 0 0 0 0 0 0 0 ……….. E B 1 1 0 1 E A A B 1> 1> ?> 1> B B A Bayesian Networks - Learning Advanced I WS 06/07 Advanced I WS 06/07 Structure Learning: Summary • Expert knowledge + learning from data • Structure learning involves parameter estimation (e.g. EM) • Optimization w/ score functions • Local traversing of space of possible structures: – add, reverse, delete (single) arcs • Speed-up: Structural EM – Score candidates w.r.t. current best model Bayesian Networks - Learning – likelihood + complexity penality = MDL
© Copyright 2026 Paperzz