Outline Joint Probability

COMP307
BN: 2
Outline
• A Bayesian Network Example
• Semantics of Bayesian Networks
COMP 307 — Lecture 16
• Probabilities in Bayesian Networks
Uncertainty and Probability 4
Probabilities in BN and How to Build a BN
• Conditional Independence
• Build a Bayesian Network
Dr Bing Xue
• Nodes Ordering and Compactness
Thomas Bayes (/ˈbeɪz/; c. 1701 – 7 April 1761)
[email protected]
COMP307
• Summary
BN: 3
Bayesian Network Example: A Lazy Detective
COMP307
BN: 4
Joint Probability
• P(T, W, M, C) = P(T)*P(W|T)*P(M|T, W)*P(C|T, W,M)
• Number of free parameters: 1+2+4+8=15
• Conditional probability distribution table with n binary variables
as conditions, size: 2n
• Full joint distribution tables:
- n binary variables: (2n-1)
- if each has A possible values/states: the number of free
parameters is (An-1)
- So too big to represent explicitly unless only a few
variables
- Hard to learn(estimate) empirically about a large
number of variables at a time, needs a lot of data
A Bayes Net = Topology (graph) + Local Conditional Probabilities
• How can a Bayesian Network help ?
P(x1 , x2 , . . . , xn )
COMP307
COMP307
=
’ P(xi |Parents(Xi ))
i
provided Parents(Xi ) ✓ {X1 , . . . , Xi 1 }. For example, by examining Figure 2.1, we
COMP307
BN: 57
can simplify its joint probability expressions. E.g., BN:
Probabilities in
Bayesian
in
Bayesian Networks
P(X = pos ^ D = T ^C = T ^ P = low ^ S = F)
= P(X = pos|D = T,C = T, P = low, S = F)
•• Chain
Chain rule
rule (valid
(valid for all distributions):
⇥P(D = T |C = T, P = low, S = F)
=
⇥P(C = T |P = low, S = F)P(P = low|S = F)P(S = F)
P(X = pos|C = T )P(D = T |C = T )P(C = T |P = low, S = F)
⇥P(P = low)P(S = F)
•• If
If conditional
conditional independence
independence to
to preceding
preceding nodes
nodes given
given its
its
parents:
parents:
2.4.2 Pearl’s network
construction
Easier:fewer
freealgorithm
parameters
P(x
P(xii|x
|x1,
1, …
…x
xi-1
i-1)) =
= P(x
P(xii|Parents(x
|Parents(xii)) Easier:fewer free parameters
•• Joint
Joint probability:
probability:
BN: 6
Probabilities in Bayesian Networks
• Bayesian Networks implicitly encode joint
distributions/probabilities:
- describing complex joint distributions(models) using simple,
local distributions (conditional probabilities)
- describe how variables interact
- Local interactions chain together to give global, indirect
interactions
The condition that Parents(Xi ) ✓ {X1 , . . . , Xi 1 } allows us to construct a network
from a given ordering of nodes using Pearl’s network construction algorithm (1988, - as a product of local conditional distribution:
• P(T, W, M, C) = P(T)*P(W)*P(M|T,W)*P(C|M)
Be careful with the orders(numbers):start with nodes with
Be
no careful
parent with the orders (numbers):start with nodes
with no parent
COMP307
• P(T, W, M, C) = P(T)*P(W|T)*P(M|T, W)*P(C|T, W,M)
BN: 7
Probabilities in Bayesian Networks
• P(A,B,C) = P(A)*P(B|A)*P(C|A,B) — (product rule, always true): 7
• Common cause: P(A,B,C)=P(A)*P(B|A)*P(C|A): ?
• Common effect: P(A,B,C)=P(A)*P(B)*P(C|A,B): ?
BN: 8
Conditional independence in BN
• Indirect cause: P(A,B,C)=P(A)*P(B|A)*P(C|B)
- P(A,B)*P(C|A,B)=P(A,B)*P(C|B) <-—>P(C|A,B)=P(C|B)
• Direct cause: P(A,B)=P(A)*P(B|A): 3
• Indirect cause: P(A,B,C)=P(A)*P(B|A)*P(C|B): 5
COMP307
Fewer free
Parameters
• Common cause (multiple effects): P(A,B,C)=P(A)*P(B|A)*P(C|A)
- P(A,B)*P(C|A,B)=P(A,B)*P(C|A) <—-> P(C|A,B)=P(C|A)
- Effect become independent once common cause known
• Common effect (multiple causes): P(A,B,C)=P(A)*P(B)*P(C|A,B)
- Explaining away: causes become dependent if know their effect
(the alternative cause has been explained away)
- C=happy, A=finish assignment, B=sunny
COMP307
BN: 9
COMP307
BN: 10
Common cause: Naive Bayes
Example: Car
• Assume features are conditionally independent given the class
labels
- P(C,X1,X2,X3, …. Xn) = P(C) *P(X1|C)*P(X2|C)….P(Xn|C)
Diagnostic Reasoning:
P(Cause | Effect)
COMP307
BN: 11
COMP307
BN: 12
Build a Bayesian Network
• Simply processes each node in order, adding it to the existing
network and adding arcs from a minimal set of parents such
that the parent set renders the current node conditionally
independent of every other node preceding it.
• Pearl’s Network Construction Algorithm (A way):
1.Choose a set of relevant variables that describe the domain
2.Choose an order for the variables
3.While there are variables left
add the next variable !" to the network
add arcs to the !" node from a minimal set of nodes (parents)
already in the network, such that the conditional independency
property is satisfied:
′
$ !% !1′ , … , !*
= $ !% $,-./01(!%′ )
′
where !1′ , … , !*
are all the variables preceding !%
Define the conditional probability table for !"
Example: Alarm Network
• Variables:
-
Burglary
Earthquake
Alarm goes off
Mary calls
John calls
Traffic
COMP307
BN: 13
COMP307
Example: Alarm Network
BN: 14
Compactness and Nodes Ordering
• Compactness:
- the more compact the model, the more tractable it is: fewer
probability values requiring specification; less computer
memory; more computationally efficient for probability updates
- overly dense networks fail to represent independencies
explicitly.
- overly dense networks fail to represent the causal dependencies
in the domain.
• The compactness depends on getting the node ordering
“right.” The optimal order is to add the root causes first, then
the variable(s) they influence directly, and continue until
leaves are reached.
Fewer parents: smaller table, fewer free parameters
(fewer probability values requiring specification)
COMP307
COMP307
- Of course, one may not know the causal order of variables, the
automated discovery methods should be used
1515
BN:
BN:
Ordering
Orderingand
andCompactness
Compactness
Introducing Bayesian Networks
39
COMP307
BN: 16
Summary
• Semantics of Bayesian Networks
Dyspnoea
XRay
Dyspnoea
Pollution
Cancer
Smoker
Pollution
(a)
XRay
Smoker
Cancer
(b)
FIGURE
2.3: Alternative
structures
obtained using >.
Pearl’s network construction al• •<P,S,C,X,P>;
D,X,C,P,S
>;>;
(b)
<<
D,X,P,S,C
<P,S,C,X,P>;(a)
(a)<<
D,X,C,P,S
(b)
D,X,P,S,C
>.
gorithm with orderings: (a) < D, X,C, P, S >; (b) < D, X, P, S,C >.
It is desirableorder
to build the most compact BN possible,
for three reasons. First,
• •Network
Networkstructure
structuredepend
dependon
on orderofofintroduction,
introduction,top-to-button
top-to-button
the more compact the model, the more tractable it is. It will have fewer probability
values
requiring
specification;
it will occupy
lesstheir
computer
memory;but
probability uptotobe
before
(upstream
of)
effect,
• •Causes
Causesdo
donot
nothave
have
bebe
before
(upstream
of)
their
effect,
but fail to
dates
will
more computationally
efficient.
Second,
overly
dense networks
doing
so
leads
to
simpler
networks,
with
fewer
represent independencies
explicitly.
And third,free
overly parameters
dense networks fail to repredoing so leads to simpler networks, with fewer free parameters
sent the causal dependencies in the domain. We will discuss these last two points just
below.
We can see from the examples that the compactness of the BN depends on getting
the node ordering “right.” The optimal order is to add the root causes first, then
the variable(s) they influence directly, and continue until leaves are reached.4 To
understand why, we need to consider the relation between probabilistic and causal
dependence.
- A Bayes Net = Topology (graph) + Local Conditional Probabilities
- Local interactions chain together to give global, indirect
interactions
- describe joint distributions using simple, local distributions
(conditional probabilities)
• Conditional Independence
- a nodes is independent to its preceding nodes given its parents
- different types of reasoning
• Build a Bayesian Network
- oder of nodes, add to graph, satisfy conditional independence
- compactness import — order of nodes
• Next Lecturers: inference in Bayesian Networks