INFERENCE IN BAYESIAN
NETWORKS
AGENDA
Reading off independence assumptions
Efficient inference in Bayesian Networks
Top-down inference
Variable elimination
Monte-Carlo methods
SOME APPLICATIONS OF BN
Medical diagnosis
Troubleshooting of hardware/software systems
Fraud/uncollectible debt detection
Data mining
Analysis of genetic sequences
Data interpretation, computer vision, image
understanding
MORE COMPLICATED
SINGLY-CONNECTED BELIEF NET
Battery
Radio
Gas
SparkPlugs
Starts
Moves
Region = {Sky, Tree, Grass, Rock}
R1
Above
R2
R3
R4
BN to evaluate
insurance risks
BN FROM LAST LECTURE
Burglary
Intuitive meaning of
arc from x to y: “x
has direct influence
on y”
Earthquake
causes
Alarm
Directed
acyclic graph
effects
JohnCalls
MaryCalls
ARCS DO NOT NECESSARILY ENCODE
CAUSALITY!
A
C
B
B
C
A
2 BN’s that can encode the same joint probability distribution
READING OFF INDEPENDENCE
RELATIONSHIPS
A
Given B, does the value
of A affect the
probability of C?
B
C
P(C|B,A) = P(C|B)?
No!
C parent’s (B) are given,
and so it is independent
of its non-descendents
(A)
Independence is
symmetric:
C A | B => A C | B
WHAT DOES THE BN ENCODE?
Burglary
Earthquake
Alarm
JohnCalls
Burglary Earthquake
JohnCalls MaryCalls | Alarm
JohnCalls Burglary | Alarm
JohnCalls Earthquake | Alarm
MaryCalls Burglary | Alarm
MaryCalls Earthquake | Alarm
MaryCalls
A node is independent of
its non-descendents, given
its parents
READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary Earthquake | Alarm ?
No! Why?
READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary Earthquake | Alarm ?
No! Why?
P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075
P(B|A)P(E|A) = 0.086
READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary Earthquake | JohnCalls?
No! Why?
Knowing JohnCalls affects the probability of
Alarm, which makes Burglary and Earthquake
dependent
INDEPENDENCE RELATIONSHIPS
Rough intuition (this holds for tree-like graphs,
polytrees):
Evidence on the (directed) road between two
variables makes them independent
Evidence on an “A” node makes descendants
independent
Evidence on a “V” node, or below the V, makes the
ancestors of the variables dependent (otherwise
they are independent)
Formal property in general case : D-separation
independence (see R&N)
BENEFITS OF SPARSE MODELS
Modeling
Fewer relationships need to be encoded (either
through understanding or statistics)
Large networks can be built up from smaller ones
Intuition
Dependencies/independencies between variables can
be inferred through network structures
Tractable inference
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(Alarm) = Σb,e P(A,b,e)
P(B)
2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
Burglary
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(Alarm) = Σb,e P(A,b,e)
P(B)
2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
0.001
3. P(Alarm)Burglary
= P(A|B,E)P(B)P(E)
+
P(A|B, E)P(B)P(E) +
P(A|B,E)P(B)P(E) +
P(A|B,E)P(B)P(E)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
Earthquake
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(A) = Σb,e P(A,b,e)
P(B)
2. P(A) = Σb,e P(A|b,e)P(b)P(e)
Burglary 0.001+
3. P(A) = P(A|B,E)P(B)P(E)
P(A|B, E)P(B)P(E) +
P(A|B,E)P(B)P(E) +
P(A|B,E)P(B)P(E)
4. P(A) = 0.95*0.001*0.002 +
0.94*0.001*0.998 + Alarm
0.29*0.999*0.002 +
0.001*0.999*0.998
= 0.00252
JohnCalls
A
P(J|…)
T
F
0.90
0.05
Earthquake
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
1. P(M) = P(M|A)P(A) + P(M| A) P(A)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
1. P(M) = P(M|A)P(A) + P(M| A) P(A)
2. P(M) = 0.70*0.00252 + P(B)
0.01*(1-0.0252)
Burglary 0.001
Earthquake
= 0.0117
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(Alarm|Earthquake)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)
1. P(A|e) = Σb P(A,b|e)
P(B)
2. P(A|e) = Σb P(A|b,e)P(b)
Burglary
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)
1. P(A|e) = Σb P(A,b|e)
P(B)
2. P(A|e) = Σb P(A|b,e)P(b)
3. P(A|e) =Burglary
0.95*0.001 + 0.001
0.29*0.999 +
= 0.29066
Earthquake
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Only works if the graph of ancestors of a variable
is a polytree
Evidence given on ancestor(s) of the query
variable
Efficient:
O(d 2k) time, where d is the number of ancestors of a
variable, with k a bound on # of parents
Evidence on an ancestor cuts off influence of portion
of graph above evidence node
QUERYING THE BN
The BN gives P(T|C)
What about P(C|T)?
Cavity
P(C)
0.1
C P(T|C)
Toothache
T 0.4
F 0.01111
BAYES’ RULE
P(AB)
So…
= P(A|B) P(B)
= P(B|A) P(A)
P(A|B) = P(B|A) P(A) / P(B)
APPLYING BAYES’ RULE
Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(B)?
APPLYING BAYES’ RULE
Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(B)?
P(B)
= Sa P(B,A=a)
P(B,A=a)
So,
= P(B|A=a)P(A=a)
P(B) = Sa P(B | A=a) P(A=a)
[marginalization]
[conditional probability]
APPLYING BAYES’ RULE
Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)?
APPLYING BAYES’ RULE
Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)?
P(A|B)
P(B)
So,
= P(B|A)P(A)/P(B)
[Bayes rule]
= Sa P(B | A=a) P(A=a)
[Last slide]
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) =
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
NO!
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa’ P(B=b | A=a’) P(A=a’)]
Be careful about indices!
QUERYING THE BN
Cavity
P(C)
0.1
The BN gives P(T|C)
What about P(C|T)?
P(Cavity|Toothache) =
P(Toothache|Cavity) P(Cavity)
P(Toothache)
C P(T|C)
Toothache
[Bayes’ rule]
T 0.4
F 0.01111
Denominator computed by
summing out numerator over
Cavity and Cavity
Querying a BN is just applying
Bayes’ rule on a larger scale…
PERFORMING INFERENCE
Variables X
Have evidence set E=e, query variable Q
Want to compute the posterior probability
distribution over Q, given E=e
Let the non-evidence variables be Y (= X \ E)
Straight forward method:
1.
2.
3.
Compute joint P(YE=e)
Marginalize to get P(Q,E=e)
Divide by P(E=e) to get P(Q|E=e)
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|M) = ??
P(E)
0.002
B E P(A|…)
Alarm
Evidence E=e
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
Query Q
JohnCalls
A
P(J|…)
T
F
0.90
0.05
MaryCalls
A P(M|…)
T 0.70
F 0.01
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
P(E)
0.002
24 entries
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
P(x1x
= Pi=1,…,nP(x
JohnCalls
MaryCalls
T 0.70 i))
2…xnT) 0.90
i|parents(X
A
P(J|…)
A P(M|…)
F
0.05
F 0.01
full joint distribution table
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =
A P(J|…)
Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
JohnCalls T 0.90
MaryCalls
F
0.05
P(E)
0.002
2 entries:
one for JohnCalls,
the other for
JohnCalls
A P(M|…)
T 0.70
F 0.01
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =
A P(J|…)
Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
JohnCalls T 0.90
MaryCalls
F 0.05
3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)
= P(J,MaryCalls)/(SjP(j,MaryCalls))
A P(M|…)
T 0.70
F 0.01
HOW EXPENSIVE?
P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))
Straightforward method:
1. Use above to compute P(Y,E=e)
2. P(Q,E=e) = Sy1 … Syk P(Y,E=e)
Normalization factor
– no big deal once
3. P(E=e) = Sq P(Q,E=e)
we have P(Q,E=e)
n-|E|
Step 1: O( 2
) entries!
Can we do better?
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
Rearrange equation…
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)
Computed for each
value of X2
Cache P(x2) for both
values of X3!
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)
How many * and + saved?
*: 2*4*2=16 vs 4+4=8
+ 2*3=8 vs 2+1=3
Computed for each
value of X2
Can lead to huge
gains in larger
networks
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) P(j,m|E,b)
Compute for all
values of E,b
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) P(j,m|E,b)
= P(E) P(j,m|E)
Compute for all
values of E
WHAT ORDER TO PERFORM VE?
For tree-like BNs (polytrees), order so parents
come before children
# of variables in each intermediate probability table
is 2^(# of parents of a node)
If the number of parents of a node is bounded,
then VE is linear time!
Other networks: intermediate factors may
become large
NON-POLYTREE NETWORKS
P(D) = Σa Σb Σc P(A)P(B|A)P(C|A)P(D|B,C)
= Σb Σc P(D|B,C) Σa P(A)P(B|A)P(C|A)
A
B
No more
simplifications…
C
D
APPROXIMATE INFERENCE TECHNIQUES
Based on the idea of Monte Carlo simulation
Basic idea:
To estimate the probability of a coin flipping heads, I
can flip it a huge number of times and count the
fraction of heads observed
Conditional simulation:
1.
2.
3.
4.
To estimate the probability P(H) that a coin picked
out of bucket B flips heads, I can:
Pick a coin C out of B (occurs with probability P(C))
Flip C and observe whether it flips heads (occurs
with probability P(H|C))
Put C back and repeat from step 1 many times
Return the fraction of heads observed (estimate of
P(H))
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION
Sample from the joint distribution
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
B=0
E=0
A=0
J=1
M=0
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION
As more samples are generated, the distribution
of the samples approaches the joint distribution!
B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION
Inference: given evidence E=e (e.g., J=1)
Remove the samples that conflict
B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
Distribution of remaining samples
approximates the conditional distribution!
HOW MANY SAMPLES?
average
1
𝑛
Error of estimate, for n samples, is 𝑂( ) on
Variance-reduction techniques
RARE EVENT PROBLEM:
What if some events are really rare (e.g.,
burglary & earthquake ?)
# of samples must be huge to get a reasonable
estimate
Solution: likelihood weighting
Enforce that each sample agrees with evidence
While generating a sample, keep track of the ratio of
(how likely the sampled value is to occur in the real world)
(how likely you were to generate the sampled value)
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=1
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.008
Burglary
P(B)
Earthquake
0.001
B=0
E=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.0023
B=0
E=1
A=1
Burglary
P(B)
Earthquake
0.001
A=1 is enforced, and
the weight updated
Alarm to
reflect the likelihood
that this occurs
JohnCalls
A
P(J|…)
T
F
0.90
0.05
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.0016
Burglary
P(B)
Earthquake
0.001
B=0
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=3.988
Burglary
P(B)
Earthquake
0.001
B=0
E=0
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.004
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.0028
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.00375
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.0026
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=5e-7
Burglary
P(B)
Earthquake
0.001
B=1
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
Sample B,E with P=0.5
w=0.0016
B=0
E=1
A=1
M=1
J=1
w=0.0028
B=0
E=0
A=1
M=1
J=1
w=0.0026
B=1
E=0
A=1
M=1
J=1
w~=0
B=1
E=1
A=1
M=1
J=1
N=4 gives P(B|A,M)~=0.371
Exact inference gives P(B|A,M) = 0.375
RECAP
Efficient inference in BNs
Variable elimination
Approximate methods: Monte-Carlo sampling
NEXT LECTURE
Statistical learning: from data to distributions
R&N 20.1-2
© Copyright 2025 Paperzz