Tutorial 7

Tutorial 7
Forward sampling, AgenaRisk and Quickscore
Tal Shor
Forward sampling
Using AgenaRisk, Fitting data to a network, and proof
Forward sampling reminder
Un-Weighted version
counter = 0
for i=[1,M]:
s = sample from P(S)
r = sample from P(R)
h = sample from P(H|S=s,R=r)
e = sample from P(E|S=s)
g = sample from P(G|H=h,E=e)
If (condition holds - g=1)
counter++
R
Weighted version
counter = 0
for i=[1,M]:
w=1
S = sample from P(S)
R = sample from P(R)
H = sample from P(H|S,R)
E = set to true. w *= P(E=1|S=s)
G = sample from P(G|H,E)
If (condition holds - g=1)
counter += w
Output = counter / M
S
H
E
G
Forward sampling proof (for the specific BN)
• Let 𝑤 be the weight function, so 𝑤 𝑠 = 𝑃 𝐸 = 1 𝑆 = 𝑠
• So the mean of 𝑤 is
E w(s) = P(E=1|S=0)P(S=0)+P(E=1|S=1)P(S=1)=P(E=1)
• Our counter would be the function
times the condition passed
𝑁𝑤
𝑖
such that 𝑁 is the number of
𝑁
𝐸 𝑐𝑜𝑢𝑛𝑡𝑒𝑟 = 𝐸[𝐸[
𝑤𝑖 𝑁 = 𝑛 ] = 𝐸[𝑁 ⋅ 𝑃(𝐸 = 1)] =
𝑀 ⋅ 𝑃 𝐺 = 1 𝐸 = 1 ⋅ 𝑃 𝐸 = 1 = 𝑀 ⋅ 𝑃(𝐺 = 1, 𝐸 = 1)
Quickscore
A rundown of the paper :
A Tractable Inference Algorithm for Diagnosing Multiple Diseases
Definitions
• We will be looking at bi-partite graphs that represents diseases and symptoms.
• 𝑑 would symbolize a disease node, where 𝑑+ denotes the present of that
particular disease, and 𝑑− denotes it’s absence.
• 𝑓 refers to symptoms, 𝑓 + , 𝑓 − denotes present and absence of that symptom.
• 𝐷𝑘 would be some instance of all diseases,
such that 𝐷𝑘+ are all the present diseases,
and 𝐷𝑘− are all those who are absence.
Assumptions
1. Diseases are marginally independent, and symptoms are
conditionally independent.
2. If a disease is absence, it cannot cause any symptom.
3. Knowing whether or not 𝑑𝑖 causes 𝑓 to be present, does not effect
the probability that 𝑑2 causes 𝑓.
Assumptions (2)
• Those assumptions hold under a new model, where for each 𝑑𝑖 , 𝑑𝑗 , 𝑓
we have the subgraph that includes
causality nodes for each disease followed by an OR gate.
• The lack of edge between the oval nodes is the
embodiment of our 3rd assumption and this condition
independence is called casual independence
Probability of a symptom
• Under this model, we’ll denote
𝑝 𝑓 + 𝑜𝑛𝑙𝑦 𝑑𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = 𝑝𝑖 ⇒ 𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = 1 − 𝑝𝑖
• Due to the or gate we can see that
𝑝 𝑓 − 𝐷𝑘 =
𝑝(𝑓 − |𝑜𝑛𝑙𝑦 𝑑𝑖 )
𝑑∈𝐷𝑘+
Therefore
3 𝑝 𝑓− =
𝑝 𝑓 − 𝐷𝑘 𝑝 𝐷𝑘 =
𝑝(𝑓 − |𝑜𝑛𝑙𝑦 𝑑𝑖 ) 𝑝 𝐷𝑘
𝐷𝑘 ∈𝐷 𝑑∈𝐷𝑘+
𝐷𝑘 ∈𝐷
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖
=
𝐷𝑘 ∈𝐷 𝑑∈𝐷𝑘+
𝑝 𝑑+
𝑑∈D+
𝑘
𝑝 𝑑−
𝑑∈D−
𝑘
3 ⇔ (5) proof
𝑛
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖 𝑝 𝑑𝑖+ + 𝑝 𝑑𝑖−
𝑖=1
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖 𝑝 𝑑𝑖+
=
𝑆𝑆⊆ 𝑛
𝑖∈𝑆
𝑖∉𝑆
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑
=
𝐷𝑘 ∈𝐷 𝑑∈𝐷𝑘+
𝑝 𝑑𝑖−
𝑝 𝑑 = p 𝑓−
𝑝 𝑑
𝑑∈𝐷𝑘+
𝑑∈𝐷𝑘−
Implications of 3 ⇔ (5)
• Now, we don’t− have to search every possible disease instance (2𝑛 ), we can
compute 𝑝(𝑓 ) in 𝑂(𝑛).
• Since our symptoms are
conditionally independent, we get that
𝑛
6 𝑝 𝐹− =
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖
𝑖=1
𝑝 𝑑𝑖+ + 𝑝 𝑑𝑖−
𝑓∈𝐹 −
• Under some other, unspecified calculations,
we can derive that
𝑛
10 𝑝 𝐹
+
=
−1
𝐹 ′ ∈2𝐹
+
𝐹′
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖
𝑖=1
𝑓∈𝐹 ′
𝑝 𝑑𝑖+ + 𝑝 𝑑𝑖−
Quickscore final formula
• The stark difference in complexity between positive
symptoms and
+
negative ones, is quite obvious due to 𝐹 ′ ∈ 2𝐹 .
11 𝑝 𝐹 − , 𝐹 +
=
−1
𝐹 ′ ∈2𝐹
+
𝑛
𝐹′
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖
𝑖=1
𝑝 𝑑𝑖+ + 𝑝 𝑑𝑖−
𝑓∈𝐹 ′ ∪𝐹 −
− , 𝐹 + |𝑑 𝑝(𝑑 )
𝑝
𝐹
𝑖
𝑖
+
−
12 𝑝 𝑑𝑖 𝐹 , 𝐹 =
𝑝 𝐹−, 𝐹+
Where 𝑝 𝐹 − , 𝐹 + |𝑑𝑖 can be calculated by setting 𝑝 𝑑𝑖+ = 1 in (11)
Example (copy to whiteboard)
Figure 4.37 in the textbook at chapter 4.5.3
.01
.1
𝐷1
𝐷2
.1
.8
.2
.2
𝐷3
.9
.1
.5
.2
𝐷4
.7
.8
.2
𝑀1
𝑀2
𝑀3
𝑀4
+
-
+
-
Quicksort calculation
𝑛
−1
𝐹 ′ ∈2𝐹
+
𝐹′
𝑝 𝑓 − 𝑜𝑛𝑙𝑦 𝑑𝑖
𝑖=1
𝑝 𝑑𝑖+ + 𝑝 𝑑𝑖−
𝑓∈𝐹 ′ ∪𝐹 −
• 𝐹′ = ∅
0.9 ∗ 1 ∗ 0.01 + 0.99 ∗
= 0.763
0.7 ∗ 1 ∗ 0.1 + 0.9 ∗
1 ∗ 0.8 ∗ 0.2 + 0.8 ∗
0.5 ∗ 0.2 ∗ 0.2 + 0.8
• 𝐹 ′ = 𝑀1
0.8 ∗ 0.9 ∗ 1 ∗ 0.01 + 0.99 ∗ 0.1 ∗ 1 ∗ 0.7 ∗ 0.1 + 0.9 ∗
∗ 1 ∗ 0.5 ∗ 0.2 ∗ 0.2 + 0.8 = 0.712
1 ∗ 1 ∗ 0.8 ∗ 0.2 + 0.8
• 𝐹 ′ = 𝑀3 : 0,644, 𝐹 ′ = 𝑀1 , 𝑀3 : 0.602 ⇒ 𝑝 𝐹 + , 𝐹 − = 0.763 − 0.712 − 0.644
+ 0.602 = 0.009
Quicksort calculation (2)
• Let’s say we are interested in 𝑑1 . We’ll set the probability as (12)
requires, and
• 𝐹′ = ∅
•
0.9 ∗ 1 ∗ 1 + 0 ∗
0.7 ∗ 1 ∗ 0.1 + 0.9 ∗
1 ∗ 0.8 ∗ 0.2 + 0.8 ∗
0.5 ∗ 0.2

Download Report

Tutorial 7

Paperzz.com

Your Paperzz