5.2.2-Learn-BNparam-MLE

Probabilistic
Graphical
Models
Learning
Parameter Estimation
Max Likelihood
for BNs
Daphne Koller
MLE for Bayesian Networks
• Parameters:
X
x0
x1
0.7
0.3
X
• Data instances: <x[m],y[m]>
Y
Y
X
y0
y1
x0
0.95
0.05
x1
0.2
0.8
Daphne Koller
MLE for Bayesian Networks
X
• Parameters:
X
Y|X
M
L( : D )   P ( x[ m], y[ m] :  )
m 1
Y
M
  P ( x[ m] :  ) P ( y[m] | x[ m] :  )
m 1
Data d
 M
 M


P
(
x
[
m
]
:

)
P
(
y
[
m
]
|
x
[
m
]
:

)




 

 m 1
 m 1

M
M




P
(
x
[
m
]
:

)
P
(
y
[
m
]
|
x
[
m
]
:

)


X  
Y|X 


 m 1
 m 1

Daphne Koller
MLE for Bayesian Networks
• Likelihood for Bayesian network
L( : D)   P( x[m] : )
m
  P( xi [m] | U i [m] : i )
m
i
i
m
  P( xi [m] | U i [m] : i )
  Li ( D : i )
i
 if Xi|Ui are disjoint, then MLE can be computed
by maximizing each local likelihood separately
Daphne Koller
MLE for Table CPDs
M
M
 P( x[m] | u[m] :  )   P( x[m] | u[m] : 
m 1
X |U
)
m 1

x ,u

x ,u

x ,u
 P( x[m] | u[m] : 
X |U
)
m: x[ m ] x , u[ m ] u

 x| u
m: x[ m ] x , u[ m ] u
 x| u
M [ x ,u]
 x| u 
M [ x, u]
M [ x, u]

 M [ x' , u] M [u]
x'
Daphne Koller
MLE for Linear Gaussians
Daphne Koller
Shared Parameters
S’|S
S(0)
S(1)
S(2)
S(3)
Daphne Koller
Shared Parameters
S’|S
S(0)
O’|S’
S(1)
S(2)
S(3)
O(1)
O(2)
O(3)
Daphne Koller
Summary
• For BN with disjoint sets of parameters in
CPDs, likelihood decomposes as product of
local likelihood functions, one per variable
• For table CPDs, local likelihood further
decomposes as product of likelihood for
multinomials, one for each parent combination
• For networks with shared CPDs, sufficient
statistics accumulate over all uses of CPD
Daphne Koller
END END END
Daphne Koller