Bayes` Nets II Independence

Bayes’ Nets II
Independence
1
Bayes’ Nets
• A Bayes’ net is an efficient encoding
of a probabilistic model
(joint distribution) of a domain
• Questions we can ask:
– Inference: given a fixed BN, what is P(X | e)?
– Representation: given a BN graph, what kinds of
distributions can it encode?
– Modeling: what BN is most appropriate for a given
domain?
2
Bayes’ Net Semantics
Let’s formalize the semantics of a Bayes’ net
• A set of nodes, one per variable X
• A directed, acyclic graph
• A conditional distribution for each node
– A collection of distributions over X, one for
each combination of parents’ values
– CPT: conditional probability table
– Description of a noisy “causal” process
A Bayes net = Topology (graph) + Local Conditional Probabilities 3
Size of a Bayes’ Net
• How big is a joint distribution over N boolean variables?
2N
• How big is an N-node net if nodes have up to k parents?
O(N * 2k+1)
• Both give you the power to calculate P(X1, X2, ..., Xn)
• BNs: Huge space savings!
Also easier to elicit local CPTs from data
Also turns out to be faster to answer queries (coming)
4
Building the (Entire) Joint
• We can take a Bayes’ net and build any entry
from the full joint distribution it encodes
– Typically, there’s no reason to build ALL of it
– We build what we need on the fly
• To emphasize:
every BN over a domain implicitly defines a joint distribution
over that domain, specified by local probabilities and graph
structure
5
Bayes’ Nets so far
• We now know:
– What is a Bayes’ net?
– What joint distribution does a Bayes’ net encode?
• Now: properties of that joint distribution (independence)
– Key idea: conditional independence
– Last class: assembled BNs using an intuitive notion of
conditional independence as causality
– Today: formalize these ideas
– Main goal: answer queries about conditional
independence and influence
• Next: how to compute posterior probabilities quickly (inference)
Bayes’ Nets: Assumptions
• Assumptions we are required to make to define the Bayes
net when given the graph:
P(xi | x1, ..., xi-1) = P(xi | parents(Xi))
Xi
{X1, ..., Xi-1}\Parents(Xi) | Parents(Xi)
Explicit
Assumptions
Implicit
Assumptions.
• Probability distributions that satisfy the above (“chain-rule
→Bayes net”) conditional independence assumptions
– Often guaranteed to have many more conditional independences
– Additional conditional independences can be read off the graph
• Important for modeling:
understand assumptions made when choosing a Bayes net graph
7
Conditional Independence
• Reminder: independence
– X and Y are independent if
– X and Y are conditionally independent given Z
– (Conditional) independence is a property of a distribution
8
Example
• Can extract conditional independence assumptions directly from
simplifications in chain rule:
– For the order X,Y,Z,W the chain rule is:
P(X, Y, Z,W) = P(X) P(Y|X) P(Z|X,Y) P(W|X,Y,Z)
– Bayes net:
P(X, Y, Z,W) = P(X) P(Y|X) P(Z|Y) P(W|Z)
– The explicit assumptions:
Z
X|Y, W
X|Z, W
Y|Z
• Additional implied conditional independence assumptions?
– X
W|Y
9
Independence in a BN
• Important question about a BN:
–
–
–
–
Are two nodes independent given certain evidence?
If yes, can prove using algebra (tedious in general)
If no, can prove with a counter example
Necessarily Independent:
Example:
no matter which CPTs you
put in the graph, two
variables will be
independent
Question: are X and Z necessarily independent?
• Answer: no.
• Example: low pressure causes rain, which causes traffic.
X can influence Z, Z can influence X (via Y)
• Note: they actually could be independent: how?
10
D-Separation: outline
• Study independence properties for triples
X
Z
Y
• Any complex example can be analyzed using these
three canonical cases.
11
1. Causal Chains
• This configuration is a “causal chain”
– Bayes net:
X: Low pressure
Y: Rain
– Is X independent of Z? Not necessarily
Z: Traffic
– Is X independent of Z given Y? Yes
– Evidence along the chain “blocks” the influence
that goes form X to Z
12
2. Common Cause
• Another basic configuration:
two effects of the same cause
– Are X and Z independent?
Not necessarily.
– Are X and Z independent given Y? Yes
Y: Alarm
X: John calls
Z: Mary calls
– Observing the cause blocks
influence between effects.
13
3. Common Effect
• Last configuration:
two causes of one effect (v-structure)
– Are X and Z independent? Yes
Bayes’ Net: P(X, Z, Y) = P(X) P(Z) P(Y|X,Z)
Chain rule: P(X, Z, Y) = P(X) P(Z|X) P(Y|X,Z)
→ P(Z|X) = P(Z)
X: Burglary
Z: Earthquake
Y: Alarm
– Are X and Z independent given Y?
Not necessarily: seeing Alarm puts the Burglary and the
Earthquake in competition as explanation.
– This is backwards from the other cases.
Observing an effect activates influence
between possible causes.
If there is alarm, by knowing
there is a burglary, the
probability of earthquake
decreases. This is called
“Explaining-away”
Triplets: Recap
X
Z
Y
• Definition: (X, Y, Z) is an ACTIVE TRIPLET given the
evidence if and only if X and Z are not necessarily independent
given the evidence.
• We established in previous slides that the red-framed triplets are
active, the others are not.
15
The General Case
• Definition: A (undirected) path (X0, X1, ..., Xn) in the graph is an
ACTIVE PATH given the evidence if and only if every triple (Xi-1,
Xi, Xi+1) is active.
• Fact: X and Z are not necessarily independent given the evidence
if and only if at least one path (X, ..., Z) is an active path.
• Equivalent fact: X and Z are guaranteed to be independent given
the evidence if and only if all paths (X, ..., Z) are inactive paths.
•
General solution to analyse independencies:
only need to analyze the graph, not the CPT entries!
16
Reachability / D-Separation
• Active triplets:
Active Triplets
Inactive Triplets
– Causal chain A → B → C where B
is unobserved (either direction)
– Common cause A ← B → C
where B is unobserved
– Common effect (aka v-structure)
A → B ← C where B or one of its
descendents is observed
• A path is active if each triple is active
• Question: Are X and Y
conditionally independent
given evidence vars {Z}?
– Yes, if X and Y “D-separated” by Z
– Look for active paths from X to Y
– No active paths → D-separated → independent!
• All it takes to block a path is a single inactive triplet
17
D-Separation
• Given query
(only having BN structure, no CPTs)
Method:
• Shade all evidence nodes
• For all (undirected!) paths between Xi and Xj
– Check whether path is active
• If path active return
• (If reaching this point, means all paths have been
checked and shown inactive)
return
18
Example
• Are the following statements true?
Yes
Not necessarily
Not necessarily
19
Example
Yes
Yes
Not necessarily
Not necessarily
Yes
20
Example
• Variables:
–
–
–
–
R: Raining
T: Traffic
D: Roof drips
S: I’m sad
• Questions:
Not necessarily
Yes
Not necessarily
21
Topology of Bayes’ Net and
Conditional Independence
• If X and Y are d-separated given {Z1, ..., Zk} then no matter
what CPT’s you choose for each variable in the BN, you will
always have that
X
Y | {Z1, ..., Zk}
• If X and Y are not d-separated given {Z1, ..., Zk} then for most
choices of CPT’s there will be a dependence between X an Y
given {Z1, ..., Zk}, but for some special choices of CPT’s X and
Y will be conditionally independent given {Z1, ..., Zk}.
• One such special choice:
have all CPT’s be a uniform distribution
22
All Conditional Independences
• Given a Bayes net structure,
– we can run d-separation for all pairs of variables given any
subset of other variables (lots of work)
– In this way, we build a complete list of conditional
independences that are necessarily true in the form
– This list determines
the set of probability distributions that can be represented
23
Topology Limits Distributions
• Given some graph topology G,
only certain joint distributions can
be encoded by G.
• The graph structure guarantees
certain (conditional)
independences
• (There might be more independence)
• Adding arcs increases the set of
distributions, but has several costs
• Full conditioning can
encode any distribution
24
Adding Extra Arcs: Coins
• Extra arcs don’t prevent representing independence, just allow
dependence as well
• Adding unneeded arcs isn’t
wrong, it’s just inefficient
25
Changing Bayes’ Net Structure
• The same joint distribution can be encoded in many
different Bayes’ nets
– Causal structure tends to be the simplest
• Analysis question: given some edges, what other edges
do you need to add?
– One answer: fully connect the graph
– Better answer: don’t make any false conditional independence
assumptions
26
Example: Alternate Alarm
If we reverse the edges, we
make different conditional
independence assumptions
To capture the same joint
distribution, we have to add
more edges to the graph
27
Summary
• Bayes nets compactly encode joint distributions
• Guaranteed independencies of distributions can
be deduced from BN graph structure
• D-separation gives precise conditional
independence guarantees from graph alone
• A Bayes’ net’s joint distribution may have further
(conditional) independence that is not detectable
until you inspect its specific distribution
28

Download Report

Bayes` Nets II Independence

Paperzz.com

Your Paperzz