Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Advanced I WS 06/07 Learning With Bayesian Networks Advanced WS 06/07 I MINVOLSET A VENITUBE PRESS MINOVL SAO2 TPR LVEDVOLUME ARTCO2 INSUFFANESTH LVFAILURE CATECHOL STROEVOLUME HISTORY ERRBLOWOUTPUT PCWP Graphical Models - Learning - EXPCO2 CO HR HREKG ERRCAUTER Structure Learning HRSAT HRBP BP Partially CVP PVSAT observed HYPOVOLEMIA VENTALV fully ANAPHYLAXIS FIO2 Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Advanced WS 06/07 I Why Struggle for Accurate Structure? Earthquake Alarm Set E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> Burglary Sound Alarm Set Burglary Sound • Cannot be compensated for by fitting parameters • Wrong assumptions about domain structure Earthquake Alarm Set E Burglary Sound • Increases the number of parameters to be estimated • Wrong assumptions about domain structure - Learning Earthquake Adding an arc Bayesian Networks Missing an arc B A E, B, A <Y,?,N> <Y,N,?> <N,N,Y> <N,Y,Y> . . <?,Y,Y> A ? B Easiest problem counting Selection of arcs New domain with no domain expert Data mining Numerical, nonlinear optimization, Multiple calls to BNs, Difficult for large networks Encompasses to difficult subproblem, „Only“ Structural EM is known Parameter Estimation Albert-Ludwigs University Freiburg, Germany Advanced WS 06/07 I B Fixed variables Stucture learing Hidden variables A ? ? B H Scientific discouvery - Learning VENTLUNG Fixed structure DISCONNECT Bayesian Networks SHUNT VENTMACH ? Unknown Structure, (In)complete Data • Network structure is not specified – Learnerr needs to select arcs & estimate parameters E B • Data does not contain missing values E B P(A | E,B) e b ? ? e b ? ? e b ? ? e b ? ? Learning algorithm • Network structure is not specified • Data contains missing values – Need to consider assignments to missing values A E B P(A | E,B) e b .9 .1 e b .7 .3 e b .8 .2 e b .99 .01 - Learning PAP KINKEDTUBE INTUBATION Bayesian Networks PULMEMBOLUS Score-based Learning Define scoring function that evaluates how well a structure matches the data Input: E E A A B B A Search for a structure that maximizes the score Advanced WS 06/07 I Heuristic Search – A network that maximizes the score Advanced WS 06/07 I Typically: Local Search • Start with a given network • Define a search space: – empty network, best tree , a random network – search states are possible structures – operators make small changes to structure – Evaluate all possible changes – Apply change based on score • Stop when no modification improves score S - Learning maximal scoring structure with at most k parents per node is NP-hard for k > 1 • At each iteration Bayesian Networks • Traverse space looking for high-scoring structures • Search techniques: Theorem: Finding – Greedy hill-climbing – Best first search – Simulated Annealing – ... Output: Bayesian Networks B C E D - Learning E - Learning score - Learning – Training data – Scoring function – Set of possible structures Bayesian Networks E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> Structure Search as Optimization Advanced WS 06/07 I Bayesian Networks Advanced WS 06/07 I C Add • Stop when no modification improves score S D – Evaluate all possible changes – Apply change based on score D - Learning Typically: Local Search – empty network, best tree , a random network S – Evaluate all possible changes – Apply change based on score C Add • Stop when no modification improves score S D E Typically: Local Search S C E D Reve rse C E S C E D C D S S C E D eC let De E C E C Add - Learning let De S E D Bayesian Networks C eC D Reve rse C E If data is complete: To update score after local change, only re-score (counting) families that changed E • At each iteration D D Advanced WS 06/07 I C D C E Bayesian Networks D • Start with a given network C Add • Stop when no modification improves score S C C E • At each iteration E Advanced WS 06/07 I S - Learning – Evaluate all possible changes – Apply change based on score E – empty network, best tree , a random network E • At each iteration S • Start with a given network C Bayesian Networks – empty network, best tree , a random network S D D C E Reve rse C E D If data is incomplete: To update score after local change, reran parameter estimation algorithm - Learning • Start with a given network Typically: Local Search Advanced WS 06/07 I S C E D Bayesian Networks Typically: Local Search Advanced WS 06/07 I Local Search in Practice Advanced WS 06/07 I • Using LL as score, adding arcs always helps • Local search can get stuck in: – Max score attained by fully connected network – Overfitting: A bad idea… – Local Maxima: • All one-edge changes reduce the score – Plateaux: • Some one-edge changes leave the score unchanged • Minimum Description Length: Local Search in Practice DL(Model) Local Search in Practice Parameter space Parametric optimization (EM) Parametric optimization (EM) Local Maximum Local Maximum Gn Computationally expensive: Gn G4 Parameter optimization via EM — non-trivial Need to perform EM for all candidate structures Spend time even on poor candidates In practice, considers only a few candidates Bayesian Networks log N || 2 • Other: BIC (Bayesian Information Criterion), Bayesian score (BDe) • Perform EM for each candidate graph G1 G3 G2 Parameter space - Learning G4 DL(Data|model) Advanced WS 06/07 I Bayesian Networks • Perform EM for each candidate graph G1 G3 G2 MDL( BN | D ) = log P( D | , G ) + - Learning Bayesian Networks – Random restarts – TABU search – Simulated annealing <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. - Learning – Learning data compression - Learning • Standard heuristics can escape both Advanced WS 06/07 I Local Search in Practice Bayesian Networks Advanced WS 06/07 I Structural EM [Friedman et al. 98] [Friedman et al. 98] improvement in real score Structural EM Outline: • Perform search in (Structure, Parameters) space • At each iteration, use current model for finding either: – Better scoring parameters: “parametric” EM step or – Better scoring structure: “structural” EM step Structure Learning: incomplete data Advanced WS 06/07 I [Friedman et al. 98] Reiterate Data E X2 X3 H Y1 Y2 + Y3 Training Data N(X2,X1) N(H, X1, X3) N(Y1, X2) N(Y2, Y1, H) X1 X2 X3 A Current model Expectation Inference: P(S|X=0,D=1,C=0,B=1) H Y1 X1 Y2 X2 Y3 X3 H Y1 Y2 Y3 - Learning X1 N(X1) N(X2) N(X3) N(H, X1, X1, X3) N(Y1, H) N(Y2, H) N(Y3, H) B Bayesian Networks Computation Expected Counts Score & Parameterize Bayesian Networks Bayesian Networks - Learning Idea: • Instead of optimizing the real score… • Find decomposable alternative score • Such that maximizing new score Maximization Parameters EM-algorithm: iterate until convergence S X D C <? 0 1 0 <1 1 ? 0 <0 0 0 ? <? ? 0 ? ……… B 1> 1> ?> 1> Expected counts S 1 1 0 1 X D C 0 1 0 1 1 0 0 0 0 0 0 0 ……….. B 1 1 0 1 - Learning –Decomposition efficient search - Learning Idea: • Use current model to help evaluate new structures Recall, in complete data we had Advanced WS 06/07 I Structural EM Advanced WS 06/07 I Bayesian Networks Advanced WS 06/07 I Structure Learning: incomplete data A • Expert knowledge + learning from data • Structure learning involves parameter estimation (e.g. EM) • Optimization w/ score functions Data A Current model S X D C <? 0 1 0 <1 1 ? 0 <0 0 0 ? <? ? 0 ? ……… Expectation Inference: P(S|X=0,D=1,C=0,B=1) Expected counts Maximization Parameters S 1 1 0 1 Maximization Structure SEM-algorithm: iterate until convergence E B X D C 0 1 0 1 1 0 0 0 0 0 0 0 ……….. E – likelihood + complexity penality = MDL B 1 1 0 1 E A A B 1> 1> ?> 1> B B A • Local traversing of space of possible structures: – add, reverse, delete (single) arcs • Speed-up: Structural EM – Score candidates w.r.t. current best model - Learning B B Structure Learning: Summary Bayesian Networks E - Learning E Advanced WS 06/07 I Bayesian Networks Advanced WS 06/07 I
© Copyright 2026 Paperzz