handout

Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
d-Separation
Belief Propogation
Graphical Models
Graphical Models
1
Conditional Independence
2
d-Separation
3
Belief Propogation
Steven J Zeil
Old Dominion Univ.
Fall 2010
2
1
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Graphical Models
d-Separation
Belief Propogation
Example
a.k.a. Bayesian networks, probabilistic networks
Nodes are hypotheses (random vars)
Knowing that the grass is wet, what is the
probability that rain is the cause?
Values are the probabilities of the observed value of that
variable
P(W |R)P(R)
P(W )
P(W |R)P(R)
=
P(W |R)P(R) + P(W |¬R)(P(¬R)
0.9 × 0.4
=
0.9 × 0.4 + 0.2 × 0.6)
= 0.75
P(R|W ) =
Arcs are direct influences between hypotheses
Forms a directed acyclic graph (DAG)
The parameters are the conditional probabilities in the arcs
3
4
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Causes & Diagnoses
d-Separation
Belief Propogation
Conditional Independence
X and Y are independent if
Graph shows a causal relationship.
P(X , Y ) = P(X )P(Y )
Bayes rules “reverses” the arc to give a
diagnosis.
X and Y are conditionally independent given Z if
P(X , Y |Z ) = P(X |Z )P(Y |Z )
or
P(X |Y , Z ) = P(X |Z )
Three canonical cases: Head-to-tail, Tail-to-tail, head-to-head
6
5
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Head-to-Tail
d-Separation
Belief Propogation
Blocking
P(X , Y , Z ) = P(X )P(Y |X )P(Z |Y )
If we know the state of Y, we know everything we can about
Z without knowingthe state of X.
We say that Y blocks the path from X to Z
P(W |C ) = P(W |R)P(R|C ) + P(W |¬R)P(¬R|C )
or, Y separates X and Z.
7
8
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Tail-to-Tail
d-Separation
Belief Propogation
Head-to-Head
P(X , Y , Z ) = P(X )P(Y |X )P(Z |X )
P(X , Y , Z ) = P(X )P(Y )P(Z |X , Y )
An observed X blocks the path between Y and Z:
Z blocks the path between X and Y when it is unobserved.
P(X , Y , Z )
P(X , Y |X ) =
P(X )
P(X )P(Y |X )P(Z |X )
=
P(X )
= P(Y |X )P(Z |X )
9
Conditional Independence
d-Separation
Belief Propogation
10
Conditional Independence
Causal vs Diagnostic
d-Separation
Belief Propogation
Explaining Away
Causal inference: If the sprinkler is on,
what is the probability that the grass is
wet? (P(W |S))
Diagnostic inference: If the grass is wet,
what is the probability that the sprinkler
is on?
P(S, W ) =
Suppose that we know that it rained:
P(W |R, S)P(S|R)
P(W |R
P(W |R, S)P(S)
=
P(W |R
= 0.21
P(S|R, W ) =
P(W |S)P(S)
= 0.35
P(W )
Note that P(S|R, W ) < P(S|W ).
Explaining Away: Knowing that it rained, the prob that the
sprinkler caused the wet grass is decreased.
11
12
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Larger Systems
d-Separation
Belief Propogation
Example: Classification
Larger systems formed by
combining the three basic
subgraphs
Causal relation P(x|C )
Provides a structure &
explanation of complicated
relationships
P(C |x) =
Bayes’ rule inverts
p(x|C )P(C )
p(x)
This graph describes
P(C , S, R, W , F )
How would you compute
P(F |C )?
14
13
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Example: Naive Bayes Classification
d-Separation
Belief Propogation
Example: Hidden Markov
Given C, the xj are
independent
P(~x |C ) = p(x1 |C )p(x2 |C ) . . . p(xd |C )
State at time t depends only on state at time t − 1
Output depends only on the current state
15
16
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Path Blocking
d-Separation
Belief Propogation
d-Separation
A path from node A to node B is
blocked given {C} if
If all paths from A to B are blocked
given C, A and B are d-separated
(conditionally independent) given C.
The directions of edges on the path
meet head-to-tail (case 1) or
tail-to-tail (case 2) at a node in C, or
The directions of edges meet
head-to-head (case 3) and neither
that node nor any of its descendants
is in C.
Examples
BCDF is blocked given C
BEFG is blocked given E or F
BEFD is blocked unless F (or G) is
given
17
Conditional Independence
d-Separation
Belief Propogation
18
Conditional Independence
Belief Propogation
d-Separation
Belief Propogation
Chains
Use graph-based algorithms to answer queries of the form P(X |E )
where
The query node X is any node in the graph
E is a set of evidence nodes whose values are known
Evidence E + in ancestors of X will flow along as diagnostic
inference
Evidence E − in decendents of X will flow back as causal
inference
E + and E − separate X from any more nodes in the chain, so
we have at most two evidence nodes to consider
19
20
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Chains: Propogated Info
d-Separation
Belief Propogation
Chains: Meeting at the Middle
For each node N,
λ(N) = P(E − |N)
P(X |E ) =
π(N) = P(N|E + )
=
=
=
=
P(E |X )P(X )
P(E )
P(E + , E − |X )P(X )
P(E )
+
P(E |X )P(E − |X )P(X )
P(E )
+
P(X |E )P(E + )P(E − |X )P(X )
P(X )P(E )
+
αP(X |E )P(E − |X )
= απ(X )λ(X )
21
Conditional Independence
d-Separation
Belief Propogation
22
Conditional Independence
Chains: Updating
d-Separation
Belief Propogation
Trees
λ(X )
P(X |E ) = απ(X )λ(X )
X
π(X ) =
P(X |U)π(U)
λ(X ) =
P(EX− |X )
=
λY (X )λZ (X )
X
λ(X )P(X |U)
λX (U)
=
π(X )
=
X
=
U
X
=
P(X |EX+ )
X
P(X |U)πX (U)
U
P(Y |X )λ(Y )
πY (X ) = αλZ (X )π(X )
Y
23
24
Conditional Independence
d-Separation
Belief Propogation
Conditional Independence
Polytrees
d-Separation
Belief Propogation
Polytrees
π(X ) = P(X |EX+ )
=
XX
U1
πYj (X ) = α
U2
Y
...
X
P(X |U1 , U2 , . . . , Uk )
k
Y
λX (Ui ) = β
πX (Ui )
λ(X )
X
j=1
Uk
X
λ(X ) =
λYs (X )π(X )
m
Y
X
r 6=i
P(X |U1 , U2 , . . . , Uk )
Y
πX (Ur )
r 6=i
λYj (X )
j=1
x6=j
25
Conditional Independence
d-Separation
Belief Propogation
Junction Trees
If X does not separate E +
and E − (e.g., loops in
dependencies)
Moralize the graph by
joining all nodes that
have common children
Identify cliques
Embed cliques into single
nodes to form a junction
tree
Each compressed node is a
separately solvable
subproblem
27
26