Probability Calculus and Bayesian Networks

Probability Calculus
Bayesian Networks
Logic and Bayesian Networks
Part 1: Probability Calculus and Bayesian Networks
Jinbo Huang
Jinbo Huang
Probability Calculus and Bayesian Networks
1/ 31
Probability Calculus
Bayesian Networks
What This Course Is About
I
Probabilistic reasoning with Bayesian networks
I
Reasoning by logical encoding and compilation
Jinbo Huang
Probability Calculus and Bayesian Networks
2/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Probabilities of Worlds
Worlds modeled with propositional variables
A world is complete instantiation of variables
P
ωi = 1
An event is a set of worlds
Jinbo Huang
Probability Calculus and Bayesian Networks
3/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Probabilities of Events
def
Pr(α) =
X
Pr(ω)
ω∈α
I
Pr(Earthquake) = Pr(ω1 ) + Pr(ω2 ) + Pr(ω3 ) + Pr(ω4 ) = .1
Jinbo Huang
Probability Calculus and Bayesian Networks
4/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Probabilities of Events
def
Pr(α) =
X
Pr(ω)
ω∈α
I
Pr(Earthquake) = Pr(ω1 ) + Pr(ω2 ) + Pr(ω3 ) + Pr(ω4 ) = .1
I
Pr(¬Burglary ) = .8
I
Pr(Alarm) = .2442
Jinbo Huang
Probability Calculus and Bayesian Networks
4/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Propositional Sentences as Events
Pr((Earthquake ∨ Burglary ) ∧ ¬Alarm) =?
Jinbo Huang
Probability Calculus and Bayesian Networks
5/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Propositional Sentences as Events
Pr((Earthquake ∨ Burglary ) ∧ ¬Alarm) =?
Interpret world ω as model (satisfying assignment) for
propositional sentence α
def
Pr(α) =
X
Pr(ω)
ω|=α
Jinbo Huang
Probability Calculus and Bayesian Networks
5/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Suppose we’re given evidence β that Alarm = true
Need to update Pr(.) to Pr(.|β)
Jinbo Huang
Probability Calculus and Bayesian Networks
6/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
I
Relative beliefs stay put, for ω |= β
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
I
Relative beliefs stay put, for ω |= β
P
ω|=β Pr(ω) = Pr(β)
I
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
I
Relative beliefs stay put, for ω |= β
P
Pr(ω) = Pr(β)
Pω|=β
ω|=β Pr(ω|β) = 1
I
I
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
I
I
Relative beliefs stay put, for ω |= β
P
Pr(ω) = Pr(β)
Pω|=β
ω|=β Pr(ω|β) = 1
I
Above imply we normalize by constant Pr(β)
I
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Pr(β|β) = 1, Pr(¬β|β) = 0
Partition worlds into those |= β and those |= ¬β
I
Pr(ω|β) = 0, for all ω |= ¬β
I
Relative beliefs stay put, for ω |= β
P
Pr(ω) = Pr(β)
Pω|=β
ω|=β Pr(ω|β) = 1
I
I
Above imply we normalize by constant Pr(β)
0,
if ω |= ¬β
def
Pr(ω|β) =
Pr(ω)/ Pr(β), if ω |= β
I
Jinbo Huang
Probability Calculus and Bayesian Networks
7/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Updating Beliefs
Jinbo Huang
Probability Calculus and Bayesian Networks
8/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Bayes Conditioning
Pr(α|β) =
X
Pr(ω|β)
ω|=α
=
X
ω|=α, ω|=β
=
X
=
Pr(ω|β) =
X
Pr(ω|β)
ω|=α∧β
Pr(ω)/ Pr(β) =
ω|=α∧β
=
Pr(ω|β)
ω|=α, ω|=¬β
ω|=α, ω|=β
X
X
Pr(ω|β) +
X
1
Pr(ω)
Pr(β)
ω|=α∧β
Pr(α ∧ β)
Pr(β)
Jinbo Huang
Probability Calculus and Bayesian Networks
9/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Bayes Conditioning
Pr(Burglary )
=
Jinbo Huang
.2
Probability Calculus and Bayesian Networks
10/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Bayes Conditioning
Pr(Burglary )
Pr(Burglary |Alarm)
=
≈
Jinbo Huang
.2
.741 ↑
Probability Calculus and Bayesian Networks
10/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Bayes Conditioning
Pr(Burglary )
Pr(Burglary |Alarm)
Pr(Burglary |Alarm ∧ Earthquake)
Jinbo Huang
=
≈
≈
.2
.741 ↑
.253 ↓
Probability Calculus and Bayesian Networks
10/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Bayes Conditioning
Pr(Burglary )
Pr(Burglary |Alarm)
Pr(Burglary |Alarm ∧ Earthquake)
Pr(Burglary |Alarm ∧ ¬Earthquake)
Jinbo Huang
=
≈
≈
≈
.2
.741 ↑
.253 ↓
.957 ↑
Probability Calculus and Bayesian Networks
10/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Independence
Pr(Earthquake)
Pr(Earthquake|Burglary )
=
=
.1
.1
Jinbo Huang
Probability Calculus and Bayesian Networks
11/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Independence
Pr(Earthquake)
Pr(Earthquake|Burglary )
=
=
.1
.1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Jinbo Huang
Probability Calculus and Bayesian Networks
11/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Independence
Pr(Earthquake)
Pr(Earthquake|Burglary )
=
=
.1
.1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalently
Pr(α ∧ β) = Pr(α) Pr(β)
Jinbo Huang
Probability Calculus and Bayesian Networks
11/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Independence
Pr(Earthquake)
Pr(Earthquake|Burglary )
=
=
.1
.1
Event α independent of event β if
Pr(α|β) = Pr(α) or Pr(β) = 0
Or equivalently
Pr(α ∧ β) = Pr(α) Pr(β)
Independence is symmetric
Jinbo Huang
Probability Calculus and Bayesian Networks
11/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Conditional Independence
Independence is a dynamic notion
Burglary independent of Earthquake, but not after accepting
evidence Alarm
Pr(Burglary |Alarm)
Pr(Burglary |Alarm ∧ Earthquake)
Jinbo Huang
≈
≈
.741
.253
Probability Calculus and Bayesian Networks
12/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Conditional Independence
Independence is a dynamic notion
Burglary independent of Earthquake, but not after accepting
evidence Alarm
Pr(Burglary |Alarm)
Pr(Burglary |Alarm ∧ Earthquake)
≈
≈
.741
.253
α independent of β given γ if
Pr(α ∧ β|γ) = Pr(α|γ) Pr(β|γ) or Pr(γ) = 0
Jinbo Huang
Probability Calculus and Bayesian Networks
12/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Variable Independence
For disjoint sets of variables X, Y, and Z
IPr (X, Z, Y) : X independent of Y given Z in Pr
Means x independent of y given z for all instantiations of x, y, z
Jinbo Huang
Probability Calculus and Bayesian Networks
13/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Chain Rule
Pr(α1 ∧ α2 ∧ . . . ∧ αn )
= Pr(α1 |α2 ∧ . . . ∧ αn ) Pr(α2 |α3 ∧ . . . ∧ αn ) . . . Pr(αn )
Jinbo Huang
Probability Calculus and Bayesian Networks
14/ 31
Probability Calculus
Bayesian Networks
Probabilities
Updating Beliefs
Independence
Chain Rule
Pr(α1 ∧ α2 ∧ . . . ∧ αn )
= Pr(α1 |α2 ∧ . . . ∧ αn ) Pr(α2 |α3 ∧ . . . ∧ αn ) . . . Pr(αn )
Pr(α1 ∧ α2 ∧ . . . ∧ αn ) Pr(α2 ∧ . . . ∧ αn )
. . . Pr(αn )
=
Pr(α2 ∧ . . . ∧ αn ) Pr(α3 ∧ . . . ∧ αn )
Jinbo Huang
Probability Calculus and Bayesian Networks
14/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Probabilistic Reasoning
Basic task: belief updating in face of evidence
Can be done in principle using equations we’ve seen
I
I
Pr(α|β) = Pr(α ∧ β)/ Pr(β)
def P
Pr(γ) =
ω|=γ Pr(ω)
Inefficient, requires joint probability tables (exponential)
Also a modeling difficulty
I
Specifying joint probabilities tedious
I
Beliefs (e.g., independencies) held by human experts not
easily enforced
Jinbo Huang
Probability Calculus and Bayesian Networks
15/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
Jinbo Huang
Probability Calculus and Bayesian Networks
16/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
Evidence R could
increase Pr(A),
Pr(C )
Jinbo Huang
Probability Calculus and Bayesian Networks
16/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
Evidence R could
increase Pr(A),
Pr(C )
But not if we know
¬A
Jinbo Huang
Probability Calculus and Bayesian Networks
16/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
Evidence R could
increase Pr(A),
Pr(C )
But not if we know
¬A
C independent of R
given ¬A
Jinbo Huang
Probability Calculus and Bayesian Networks
16/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
Parents(V): variables with
edge to V
Descendants(V): variables
to which ∃ directed path
from V
Nondescendants(V): all
except V and
Descendants(V)
Jinbo Huang
Probability Calculus and Bayesian Networks
17/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Markovian Assumptions
Every variable is conditionally independent of its nondescendants
given its parents
I (V , Parents(V ), Nondescendants(V ))
Jinbo Huang
Probability Calculus and Bayesian Networks
18/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Markovian Assumptions
Every variable is conditionally independent of its nondescendants
given its parents
I (V , Parents(V ), Nondescendants(V ))
Given direct causes of variable, our beliefs in that variable are no
longer influenced by any other variable except possibly by its effects
Jinbo Huang
Probability Calculus and Bayesian Networks
18/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Capturing Independence Graphically
I (C , A, {B, E , R})
I (R, E , {A, B, C })
I (A, {B, E }, R)
I (B, ∅, {E , R})
I (E , ∅, B)
Jinbo Huang
Probability Calculus and Bayesian Networks
19/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Conditional Probability Tables
DAG G together with Markov (G ) declares a set of independencies,
but does not define Pr
Pr is uniquely defined if a conditional probability table (CPT) is
provided for each variable X
CPT: Pr(x|u) for every value x of variable X and every
instantiation u of parents U
Jinbo Huang
Probability Calculus and Bayesian Networks
20/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Conditional Probability Tables
Pr(c|a)
Pr(r |e)
Pr(a|b, e)
Pr(e)
Pr(b)
Jinbo Huang
Probability Calculus and Bayesian Networks
21/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Conditional Probability Tables
Jinbo Huang
Probability Calculus and Bayesian Networks
22/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Conditional Probability Tables
Half of probabilities
redundant
Only need 10
probabilities for
whole DAG
Jinbo Huang
Probability Calculus and Bayesian Networks
22/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Bayesian Network
A pair (G , Θ) where
I
G is DAG over variables Z, called network structure
I
Θ is set of CPTs, one for each variable in Z, called network
parametrization
Jinbo Huang
Probability Calculus and Bayesian Networks
23/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Bayesian Network
ΘX |U : CPT for X and
parents U
X U: network family
θx|u = Pr(x|u): network
parameter
I
θb|a = .2, θb|a = .8
I
θb|a = .75, θb|a = .25
P
x θx|u = 1 for every
parent instantiation u
Jinbo Huang
Probability Calculus and Bayesian Networks
24/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Bayesian Network
Network instantiation: e.g.,
a, b, c, d, e
θx|u ∼ z: instantiations xu &
z are compatible
θa , θb|a , θc|a , θd|b,c , θe|c all ∼ z
above
Exactly 1 from each CPT
Jinbo Huang
Probability Calculus and Bayesian Networks
25/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Chain Rule
Pr(α1 ∧ α2 ∧ . . . ∧ αn )
= Pr(α1 |α2 ∧ . . . ∧ αn ) Pr(α2 |α3 ∧ . . . ∧ αn ) . . . Pr(αn )
Jinbo Huang
Probability Calculus and Bayesian Networks
26/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Chain Rule
Pr(α1 ∧ α2 ∧ . . . ∧ αn )
= Pr(α1 |α2 ∧ . . . ∧ αn ) Pr(α2 |α3 ∧ . . . ∧ αn ) . . . Pr(αn )
Pr(e, d, c, b, a)
= Pr(e|d, c, b, a) Pr(d|c, b, a) Pr(c|b, a) Pr(b|a) Pr(a)
Jinbo Huang
Probability Calculus and Bayesian Networks
26/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Jinbo Huang
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
= θe|c
Jinbo Huang
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
= θe|c θd|b,c
Jinbo Huang
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
= θe|c θd|b,c θc|a θb|a θa
Jinbo Huang
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
= θe|c θd|b,c θc|a θb|a θa
= (.1)(.9)(.2)(.2)(.6)
= .0216
Jinbo Huang
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Joint Probabilities
Pr(e, d, c, b, a)
=
Pr(e|d, c, b, a)
· Pr(d|c, b, a)
· Pr(c|b, a)
· Pr(b|a)
· Pr(a)
= θe|c θd|b,c θc|a θb|a θa
= (.1)(.9)(.2)(.2)(.6)
= .0216
Every variable appears before
its parents
Jinbo Huang
Probability Calculus and Bayesian Networks
27/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Bayesian Network
Independence constraints imposed by network structure (DAG) and
numeric constraints imposed by network parametrization (CPTs)
are satisfied by unique Pr
def
Pr(z) =
Y
θx|u
θx|u ∼z
Bayesian network is implicit representation of this probability
distribution
Jinbo Huang
Probability Calculus and Bayesian Networks
28/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Compactness of Bayesian Networks
Let d be max variable domain size, k max number of parents
Size of any CPT = O(d k+1 )
Total number of network parameters = O(n · d k+1 ) for n-variable
network
Compact as long as k is relatively small, compared with joint
probability table (up to d n entries)
Jinbo Huang
Probability Calculus and Bayesian Networks
29/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Reasoning with Bayesian Networks
Pr(Call)?
Pr(Burglary |Call)?
argmax
Pr(A, B, E |r , c)?
...
Jinbo Huang
Probability Calculus and Bayesian Networks
30/ 31
Probability Calculus
Bayesian Networks
Capturing Independence Graphically
Conditional Probability Tables
Bayesian Networks
Summary
Probabilities of worlds & events, Bayes conditioning, independence,
chain rule
Capturing independence graphically, Markovian assumptions,
conditional probabilities tables, Bayesian networks, chain rule
Jinbo Huang
Probability Calculus and Bayesian Networks
31/ 31