Graphical Models - Learning -

Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application
to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and
Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts,
Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99.
Advanced
I
WS 06/07
Learning With Bayesian Networks
Advanced
WS 06/07
I
MINVOLSET
A
VENITUBE
PRESS
MINOVL
SAO2
TPR
LVEDVOLUME
ARTCO2
INSUFFANESTH
LVFAILURE
CATECHOL
STROEVOLUME HISTORY ERRBLOWOUTPUT
PCWP
Graphical Models
- Learning -
EXPCO2
CO
HR
HREKG
ERRCAUTER
Structure Learning
HRSAT
HRBP
BP
Partially
CVP
PVSAT
observed
HYPOVOLEMIA
VENTALV
fully
ANAPHYLAXIS
FIO2
Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel
Advanced
WS 06/07
I
Why Struggle for Accurate Structure?
Earthquake
Alarm Set
E, B, A
<Y,N,N>
<Y,N,Y>
<N,N,Y>
<N,Y,Y>
.
.
<N,Y,Y>
Burglary
Sound
Alarm Set
Burglary
Sound
• Cannot be compensated for by
fitting parameters
• Wrong assumptions about
domain structure
Earthquake
Alarm Set
E
Burglary
Sound
• Increases the number of
parameters to be estimated
• Wrong assumptions about domain
structure
- Learning
Earthquake
Adding an arc
Bayesian Networks
Missing an arc
B
A
E, B, A
<Y,?,N>
<Y,N,?>
<N,N,Y>
<N,Y,Y>
.
.
<?,Y,Y>
A
?
B
Easiest problem
counting
Selection of arcs
New domain with no
domain expert
Data mining
Numerical, nonlinear
optimization,
Multiple calls to BNs,
Difficult for large
networks
Encompasses to
difficult subproblem,
„Only“ Structural EM
is known
Parameter
Estimation
Albert-Ludwigs University Freiburg, Germany
Advanced
WS 06/07
I
B
Fixed variables
Stucture learing
Hidden variables
A
?
?
B
H
Scientific discouvery
- Learning
VENTLUNG
Fixed structure
DISCONNECT
Bayesian Networks
SHUNT
VENTMACH
?
Unknown Structure, (In)complete Data
• Network structure is not specified
– Learnerr needs to select arcs &
estimate parameters
E
B
• Data does not contain missing values
E
B
P(A | E,B)
e
b
?
?
e
b
?
?
e
b
?
?
e
b
?
?
Learning
algorithm
• Network structure is not specified
• Data contains missing values
– Need to consider assignments to
missing values
A
E
B
P(A | E,B)
e
b
.9
.1
e
b
.7
.3
e
b
.8
.2
e
b
.99
.01
- Learning
PAP
KINKEDTUBE
INTUBATION
Bayesian Networks
PULMEMBOLUS
Score-based Learning
Define scoring function that evaluates how well a
structure matches the data
Input:
E
E
A
A
B
B
A
Search for a structure that maximizes the score
Advanced
WS 06/07
I
Heuristic Search
– A network that maximizes the score
Advanced
WS 06/07
I
Typically: Local Search
• Start with a given network
• Define a search space:
– empty network, best tree , a random network
– search states are possible structures
– operators make small changes to structure
– Evaluate all possible changes
– Apply change based on score
• Stop when no modification
improves score
S
- Learning
maximal scoring
structure with at most
k parents per node is
NP-hard for k > 1
• At each iteration
Bayesian Networks
• Traverse space looking for high-scoring
structures
• Search techniques:
Theorem: Finding
– Greedy hill-climbing
– Best first search
– Simulated Annealing
– ...
Output:
Bayesian Networks
B
C
E
D
- Learning
E
- Learning
score
- Learning
– Training data
– Scoring function
– Set of possible structures
Bayesian Networks
E, B, A
<Y,N,N>
<Y,Y,Y>
<N,N,Y>
<N,Y,Y>
.
.
<N,Y,Y>
Structure Search as Optimization
Advanced
WS 06/07
I
Bayesian Networks
Advanced
WS 06/07
I
C
Add
• Stop when no modification
improves score
S
D
– Evaluate all possible changes
– Apply change based on score
D
- Learning
Typically: Local Search
– empty network, best tree , a random network
S
– Evaluate all possible changes
– Apply change based on score
C
Add
• Stop when no modification
improves score
S
D
E
Typically: Local Search
S
C
E
D
Reve
rse C
E
S
C
E
D
C
D
S
S
C
E
D
eC
let
De
E
C
E
C
Add
- Learning
let
De
S
E
D
Bayesian Networks
C
eC
D
Reve
rse C
E
If data is complete:
To update score after local change,
only re-score (counting) families that
changed
E
• At each iteration
D
D
Advanced
WS 06/07
I
C
D
C
E
Bayesian Networks
D
• Start with a given network
C
Add
• Stop when no modification
improves score
S
C
C
E
• At each iteration
E
Advanced
WS 06/07
I
S
- Learning
– Evaluate all possible changes
– Apply change based on score
E
– empty network, best tree , a random network
E
• At each iteration
S
• Start with a given network
C
Bayesian Networks
– empty network, best tree , a random network
S
D
D
C
E
Reve
rse C
E
D
If data is incomplete:
To update score after local change,
reran parameter estimation algorithm
- Learning
• Start with a given network
Typically: Local Search
Advanced
WS 06/07
I
S
C
E
D
Bayesian Networks
Typically: Local Search
Advanced
WS 06/07
I
Local Search in Practice
Advanced
WS 06/07
I
• Using LL as score, adding arcs always helps
• Local search can get stuck in:
– Max score attained by fully connected network
– Overfitting: A bad idea…
– Local Maxima:
• All one-edge changes reduce the score
– Plateaux:
• Some one-edge changes leave the score
unchanged
• Minimum Description Length:
Local Search in Practice
DL(Model)
Local Search in Practice
Parameter space
Parametric
optimization
(EM)
Parametric
optimization
(EM)
Local Maximum
Local Maximum
Gn
Computationally
expensive:
Gn
G4
Parameter optimization via EM — non-trivial
Need to perform EM for all candidate structures
Spend time even on poor candidates
In practice, considers only a few candidates
Bayesian Networks
log N
||
2
• Other: BIC (Bayesian Information Criterion), Bayesian
score (BDe)
• Perform EM for each candidate graph
G1
G3
G2
Parameter space
- Learning
G4
DL(Data|model)
Advanced
WS 06/07
I
Bayesian Networks
• Perform EM for each candidate graph
G1
G3
G2
MDL( BN | D ) = log P( D | , G ) +
- Learning
Bayesian Networks
– Random restarts
– TABU search
– Simulated annealing
<9.7 0.6 8 14 18>
<0.2 1.3 5 ?? ??>
<1.3 2.8 ?? 0 1 >
<?? 5.6 0 10 ??>
……………….
- Learning
– Learning data compression
- Learning
• Standard heuristics can escape both
Advanced
WS 06/07
I
Local Search in Practice
Bayesian Networks
Advanced
WS 06/07
I
Structural EM
[Friedman et al. 98]
[Friedman et al. 98]
improvement in real score
Structural EM
Outline:
• Perform search in (Structure, Parameters) space
• At each iteration, use current model for finding
either:
– Better scoring parameters: “parametric” EM step
or
– Better scoring structure: “structural” EM step
Structure Learning: incomplete data
Advanced
WS 06/07
I
[Friedman et al. 98]
Reiterate
Data
E
X2
X3
H
Y1
Y2
+
Y3
Training
Data
N(X2,X1)
N(H, X1, X3)
N(Y1, X2)
N(Y2, Y1, H)
X1
X2
X3
A
Current model
Expectation
Inference:
P(S|X=0,D=1,C=0,B=1)
H
Y1
X1
Y2
X2
Y3
X3
H
Y1
Y2
Y3
- Learning
X1
N(X1)
N(X2)
N(X3)
N(H, X1, X1, X3)
N(Y1, H)
N(Y2, H)
N(Y3, H)
B
Bayesian Networks
Computation Expected Counts
Score
&
Parameterize
Bayesian Networks
Bayesian Networks
- Learning
Idea:
• Instead of optimizing the real score…
• Find decomposable alternative score
• Such that maximizing new score
Maximization
Parameters
EM-algorithm:
iterate until convergence
S X D C
<? 0 1 0
<1 1 ? 0
<0 0 0 ?
<? ? 0 ?
………
B
1>
1>
?>
1>
Expected
counts
S
1
1
0
1
X D C
0 1 0
1 1 0
0 0 0
0 0 0
………..
B
1
1
0
1
- Learning
–Decomposition efficient search
- Learning
Idea:
• Use current model to help evaluate new
structures
Recall, in complete data we had
Advanced
WS 06/07
I
Structural EM
Advanced
WS 06/07
I
Bayesian Networks
Advanced
WS 06/07
I
Structure Learning: incomplete data
A
• Expert knowledge + learning from data
• Structure learning involves parameter
estimation (e.g. EM)
• Optimization w/ score functions
Data
A
Current model
S X D C
<? 0 1 0
<1 1 ? 0
<0 0 0 ?
<? ? 0 ?
………
Expectation
Inference:
P(S|X=0,D=1,C=0,B=1)
Expected
counts
Maximization
Parameters
S
1
1
0
1
Maximization
Structure
SEM-algorithm:
iterate until convergence
E
B
X D C
0 1 0
1 1 0
0 0 0
0 0 0
………..
E
– likelihood + complexity penality = MDL
B
1
1
0
1
E
A
A
B
1>
1>
?>
1>
B
B
A
• Local traversing of space of possible
structures:
– add, reverse, delete (single) arcs
• Speed-up: Structural EM
– Score candidates w.r.t. current best model
- Learning
B
B
Structure Learning: Summary
Bayesian Networks
E
- Learning
E
Advanced
WS 06/07
I
Bayesian Networks
Advanced
WS 06/07
I