Lecture 9: Undirected Graphical Models Machine Learning

Lecture 9: Undirected Graphical Models
Machine Learning
Andrew Rosenberg
March 5, 2010
1/1
Today
Graphical Models
Probabilities in Undirected Graphs
2/1
Undirected Graphs
What if we allow undirected graphs?
What do they correspond to?
It’s not cause/effect, or trigger/response, rather, general
dependence.
Example: Image pixels, where each pixel is a bernouli.
Can have a probability over all pixels p(x11 , x1M , xM1 , xMM )
Bright pixels have bright neighbors.
No parents, just probabilities.
Grid models are called Markov Random Fields.
3/1
Undirected Graphs
z
z
y
x
w
x ⊥⊥ y |{w }
y
x
w
x ⊥⊥ y |{w , z}
cannot represent w ⊥⊥ z|{x, y }
x ⊥⊥ y |{w , z}
Undirected separation is easy.
To check xa ⊥⊥ xb |xc , check Graph reachability of xa and xb
without going through nodes in xc .
4/1
Undirected Graphs
y
x
z
x ⊥⊥ y
y
x
z
y
x
z
x ⊥⊥ y |z OR x ⊥⊥ z
x¬⊥⊥ y |z
Undirected separation is easy.
To check xa ⊥⊥ xb |xc , check Graph reachability of xa and xb
without going through nodes in xc .
5/1
Probabilities in Undirected Graphs
Graph cliques define clusters of dependent variables
a
b
d
f
c
e
Clique: a set of nodes such there is an edge between every
pair of members of the set.
We define probability as a product of functions defined over
cliques
6/1
Probabilities in Undirected Graphs
Graph cliques define clusters of dependent variables
a
b
d
f
c
e
Clique: a set of nodes such there is an edge between every
pair of members of the set.
We define probability as a product of functions defined over
cliques
7/1
Probabilities in Undirected Graphs
Graph cliques define clusters of dependent variables
d
a
b
c
f
e
Clique: a set of nodes such there is an edge between every
pair of members of the set.
We define probability as a product of functions defined over
cliques
8/1
Probabilities in Undirected Graphs
Graph cliques define clusters of dependent variables
a
b
d
f
c
e
Clique: a set of nodes such there is an edge between every
pair of members of the set.
We define probability as a product of functions defined over
cliques
9/1
Probabilities in Undirected Graphs
Graph cliques define clusters of dependent variables
a
b
d
f
c
e
Clique: a set of nodes such there is an edge between every
pair of members of the set.
We define probability as a product of functions defined over
cliques
10 / 1
Representing Probabilities
Potential Functions over cliques
p(x) = p(x0 , . . . , xn−1 ) =
1 Y
ψc (xc )
Z
c∈C
Normalizing Term guarantees that p(x) sums to 1
XY
Z =
ψc (xc )
x c∈C
Potential Functions are positive functions over groups of
connected variables.
Use only maximal cliques.
e.g. ψ(x1 , x2 , x3 )ψ(x2 , x3 ) → ψ(x1 , x2 , x3 )
11 / 1
Logical Inference
a
NOT
b
c
e
AND
d
XOR
In Logic Networks, nodes are binary, and edges represent gates
Gates: AND, OR, XOR, NAND, NOR, NOT, etc.
Inference: given observed variables, predict others.
Problems: Uncertainty, conflicts and inconsistency
Rather than saying a variable is True or False, let’s say it is .8
True and .2 False.
Probabilistic Inference
12 / 1
Inference
Probabilistic Inference
a
NOT
b
c
e
AND
d
XOR
Replace the logic network with a Bayesian Network
Probabilistic Inference: given observed variables, predict
marginals over others.
a=t
a=f
Not
b=t b=f
0
1
1
0
13 / 1
Inference
Probabilistic Inference
a
NOT
b
c
e
AND
d
XOR
Replace the logic network with a Bayesian Network
Probabilistic Inference: given observed variables, predict
marginals over others.
Soft Not
b=t b=f
a=t
.1
.9
a=f
.9
.1
14 / 1
Inference
General Problem
Given a graph and probabilities, for any subset of variables, find
p(xe |xo ) =
p(xe , xo )
p(xo )
Compute both marginals and divide.
But this can be exponential...(Based on the number of parent each
node has, or the size of the cliques)
p(xj , xk ) =
XX
x0
p(xj , xk ) =
x1
Y
X M−1
p(xi |πi )
xM−1 i =0
x1
XX
x0
...
...
XY
ψ(xc )
xM−1 c∈C
Have efficient learning and storage in Graphical Models, now
inference.
15 / 1
Inefficient Marginals
Brute Force.
Given CPTs and a graph structure we can compute arbitrary
marginals by brute force, but it’s inefficient.
For Example
p(x)
=
p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x2 , x5 )
p(x0 , x2 )
=
p(x0 , x5 )
=
p(x0 )p(x2 |x0 )
X
p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x2 , x5 )
x1 ,x2 ,x3 ,x4
p(x0 |x5 )
=
P
x1 ,x2 ,x3 ,x4
p(x)
P
x ,x1 ,x2 ,x3 ,x4
p(x0 |x5 = TRUE )
=
P0
x1 ,x2 ,x3 ,x4
P
p(x)
p(xU\5 |x5 = TRUE )
x0 ,x1 ,x2 ,x3 ,x4
p(xU\5 |x5 = TRUE )
16 / 1
Efficient Computation of Marginals
b
a
d
c
e
Pass messages (small tables) around the graph.
The messages will be small functions that propagate
potentials around an undirected graphical model.
The inference technique is the Junction Tree Algorithm
17 / 1
Junction Tree Algorithm
Efficient Message Passing on Undirected Graphs.
For Directed Graphs, first convert to an Undirected Graph
(Moralization).
Junction Tree Algorithm
Moralization
Introduce Evidence
Triangulate
Construct Junction Tree
Propagate Probabilities
18 / 1
Bye
Next
Junction Tree Algorithm
19 / 1

Download Report

Lecture 9: Undirected Graphical Models Machine Learning

Paperzz.com

Your Paperzz