Probabilistic Graphical Models Learning Parameter Estimation Max Likelihood for BNs Daphne Koller MLE for Bayesian Networks • Parameters: X x0 x1 0.7 0.3 X • Data instances: <x[m],y[m]> Y Y X y0 y1 x0 0.95 0.05 x1 0.2 0.8 Daphne Koller MLE for Bayesian Networks X • Parameters: X Y|X M L( : D ) P ( x[ m], y[ m] : ) m 1 Y M P ( x[ m] : ) P ( y[m] | x[ m] : ) m 1 Data d M M P ( x [ m ] : ) P ( y [ m ] | x [ m ] : ) m 1 m 1 M M P ( x [ m ] : ) P ( y [ m ] | x [ m ] : ) X Y|X m 1 m 1 Daphne Koller MLE for Bayesian Networks • Likelihood for Bayesian network L( : D) P( x[m] : ) m P( xi [m] | U i [m] : i ) m i i m P( xi [m] | U i [m] : i ) Li ( D : i ) i if Xi|Ui are disjoint, then MLE can be computed by maximizing each local likelihood separately Daphne Koller MLE for Table CPDs M M P( x[m] | u[m] : ) P( x[m] | u[m] : m 1 X |U ) m 1 x ,u x ,u x ,u P( x[m] | u[m] : X |U ) m: x[ m ] x , u[ m ] u x| u m: x[ m ] x , u[ m ] u x| u M [ x ,u] x| u M [ x, u] M [ x, u] M [ x' , u] M [u] x' Daphne Koller MLE for Linear Gaussians Daphne Koller Shared Parameters S’|S S(0) S(1) S(2) S(3) Daphne Koller Shared Parameters S’|S S(0) O’|S’ S(1) S(2) S(3) O(1) O(2) O(3) Daphne Koller Summary • For BN with disjoint sets of parameters in CPDs, likelihood decomposes as product of local likelihood functions, one per variable • For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination • For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD Daphne Koller END END END Daphne Koller
© Copyright 2026 Paperzz