Probabilities
Probabilities: Bayes' Theorem
Recap:
Clearly
P(X, Y) = P(Y, X)
by symmetry of ∩
-- BUT
P(X | Y) ≠ P(Y | X)
SUM RULE
P(A or B) =
P(A ∪ B) = P(A) + P(B)
with A, B disjoint.
P(A or B) =
P(A) + P(B) - P(A ∩ B)
otherwise
P(X | Y) P(Y) = P(Y | X) P(X)
Note again that if the knowledge about Y does not change the probability of X, i.e. P(X | Y) = P(X)
then the two events are said to be independent, and P(X,Y) = P(X) P(Y), as in the case of picking
two cards from different decks (or reinserting the card after each test).
CONDITIONAL PROBABILITY OF X GIVEN Y
P(X=A | Y=A)
PRODUCT RULE
It follows
P( X=A,Y=A ) =
P( X=A | Y=A) * P( Y=A )
P(X | Y) = P(Y | X) P(X)
P(Y)
INDEPENDENT EVENTS
P(X,Y) = P(X) P(Y), i.e. P(X | Y) = P(X)
Note that:
by definition and product rule
Bayes' Theorem
Important result. Informally: By knowing the probability of X given Y, and the probability of X
and Y, I can derive the probability of Y given X. We will see an example soon.
P(A or B) = P(B or A)
P(A, B) = P(B, A)
P(A | B)
≠ P(B | A)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
Computing Science & Mathematics
University of Stirling
31
Probabilities
CSC9T6 Information Systems
Apples and oranges in boxes
32
[Bishop]
Summing up:
SUM RULE
P(A or B) =
P(A ∪ B) = P(A) + P(B)
with A, B disjoint.
P(A or B) =
P(A) + P(B) - P(A ∩ B)
otherwise
CONDITIONAL PROBABILITY OF X GIVEN Y
P(X=A | Y=A)
PRODUCT RULE
P( X=A,Y=A ) = P( X=A | Y=A) * P( Y=A )
INDEPENDENT EVENTS
P(X,Y) = P(X) P(Y), i.e. P(X | Y) = P(X)
BAYES' THEOREM
P(X | Y) = P(Y | X) P(X)
P(Y)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
We have Box and Fruit, random variables.
In the red box (B=r) there are 2 apples (a) and 6 oranges (o). In the blue box (B=b) there are 3
apples and 1 orange.
-
-
-
33
If I pick a fruit from the red box, what would you expect?
How can you express this?
Map all conditional probabilities of fruit | box.
WORKING Hypothesis:
P(B=r) = 4/10
P(B=b) = 6/10
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
34
Apples and oranges in boxes
0.4
Apples and oranges in boxes
[Bishop]
0.6
P(F=a) = ?
Let us try to "invert" our reasoning.
It is the sum of the events "pick an apple from red" or "pick an apple from blue"
Hence
[Bishop]
P( (F=a, B=r) or (F=a, B=b) ) =
P(F=a, B=r) + P(F=a, B=b) =
P(F=a | B=r) P(B=r) + P(F=a | B=b) P(B=b) =
0.25 * 0.4
+ 0.75 * 0.6 =
P(F=o) = 1 - 0.55 = .45
Computing Science & Mathematics
University of Stirling
Suppose the two boxes have the same probability. If I observe an orange, on which box
would you bet on?
(sum rule)
(product rule)
0.55
(sum rule)
CSC9T6 Information Systems
35
Apples and oranges in boxes
0.4
Computing Science & Mathematics
University of Stirling
0.6
0.4
(0.75 * 0.4 ) / 0.45
=
[Bishop]
0.6
Are B and F independent ?
P(F=o | B= r) = 0.75
P(F=o | B=r) P(B=r)
P(F=o)
36
Apples and oranges in boxes
[Bishop]
Can we make this more precise, even in the original case, where boxes have
associated probabilities ? How ? Which probability are we looking for ?
P(B=r | F=o) =
CSC9T6 Information Systems
≠
0.45 = P(F=o)
(Bayes' theorem)
= 0.66666...
P(B=b | F=o) ?
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
37
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
38
Apples and oranges in boxes
Grass and rain
[Bishop]
P(rain) = 0.4
P(grass is wet) = 0.6
(rain, watering, ...)
P(R, W) ?
0.4
Independent?
No.
0.6
A note on the pre- post- "inversion" we are going through...
Easily
Suppose the two boxes have the same probability. If I observe an orange, on which box
would you bet ?
Probably, on the red one. (0.75 )
P(R | W) ?
Remember that, without any knowledge of the fruit, one knows that the blue one is more
probable (0.6)
NOTE:
P(W | R) = 1 ≠ 0.6 = P(W)
P(W | R) = 1
so P(R, W) = P(W | R) * P(R) = 1 * 0.4 = 0.4
P(R | W) = P(W, R)/P(W) = 0.4 / 0.6 = 0.66
P(A | B) cannot be calculated from P(A) and P(B) alone.
Via the Bayes' theorem, a subsequent observation (F=o) modifies our prior probability
into a posterior probability.
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
39
Computing Science & Mathematics
University of Stirling
Probabilities
CSC9T6 Information Systems
40
Outline.
Thomas Bayes (1701 - 1761) Logic and theology at Edinburgh. FRS 1742. Probability
relevant for gambling and the new concept of insurance. "Essay towards solving a
problem in the doctrine of chances" (1764 - three years after his death). [Then Laplace
independently rediscovered the theory in a general form].
1. Introduction
2. Probability
3. Bayesian classification
(Informally) An interpretation of probability as measure of uncertainty, based on so-far
observed facts that can be revised. Remember the case of oranges and boxes.
Pierre-Simon Laplace (1749-1827) "Probability theory is nothing else but common
sense reduced to calculation". Discussion about the same ideas of Bayes (inverse
probability calculation), with applications to life expectancy, jurisprudence, planetary
masses, error estimation (1812).
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
41
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
42
Bayes Classifier: An Example
What is a Bayesian Classifier?
Mag. Promotion
Y
Y
N
Y
Y
N
Y
N
Y
Y
•
Bayesian Classifiers are statistical classifiers
– based on Bayes Theorem (see following slides)
•
They can predict the probability that a particular sample is a member of a particular
class
•
Perhaps the simplest Bayesian Classifier is known as the Naïve Bayesian Classifier
based on an independence assumption (see later on …)
•
In very simple terms, this means that we assume that values given for one variable are
not influenced by values given to another variable. No relationship exists between them
•
Although the independence assumption is often a bold assumption to make,
performance is still often comparable to Decision Trees and Neural Network classifiers
(explore on wikipedia! and references therein about "the surprisingly good performance
of Naive Bayes in classification" )
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
43
TV Promotion
N
Y
N
Y
N
N
N
Y
N
Y
Life Insurance Promotion
N
Y
N
Y
Y
N
Y
N
N
Y
Credit Card Insurance
N
N
N
Y
N
N
Y
N
N
Y
Life Insurance Promotion
N
Y
N
Y
Y
N
Y
N
N
Y
Credit Card Insurance
N
N
N
Y
N
N
Y
N
N
Y
Sex
M
F
M
M
F
F
M
M
M
F
Examples of things we can derive from our dataset:
• 4 males took advantage of the Mag.(azine) Promo and these 4 males
represent 2/3 of the total male population,
• 3/4 s of females purchased the Mag. Promo.
For our example, let s use sex as the output attribute whose value is to be predicted
Computing Science & Mathematics
University of Stirling
Bayes Classifier: An Example
Mag. Promotion
Y
Y
N
Y
Y
N
Y
N
Y
Y
TV Promotion
N
Y
N
Y
N
N
N
Y
N
Y
CSC9T6 Information Systems
44
Bayes Classifier: An Example
Sex
M
F
M
M
F
F
M
M
M
F
•
•
Firstly we list the distribution of the output attribute values for each input attribute.
This is done using a distribution table.
Sex
Y
N
Ratio: Yes/Total
Ratio: No/Total
Mag Promo
M
4
2
4/6
2/6
F
3
1
3/4
1/4
TV Promo
M
2
4
2/6
4/6
F
2
2
2/4
2/4
LI Promo
M
2
4
2/6
4/6
F
3
1
3/4
1/4
C.C. Ins
M
2
4
2/6
4/6
F
1
3
1/4
3/4
Suppose we want to classify a new instance (or customer), called Lee. We are told the
following holds true for our new customer, i.e. this is our evidence E =
So for example, 4 males answered Y to the Mag Promo
Mag. Promo = Y and TV Promo = Y and LI Promo = N and C.C. Ins. = N
Ratio for Y/T and N/T sum to 1 for each column
We want to know if Lee is male (H1) or female (H2).
2 out of the total of 6 males answered Y to the LI Promo
Note that there was no example of YYNN in the data.
We apply Bayes classifier and compute a probability for each hypothesis
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
45
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
46
Bayes Classifier: Back to hypothesis H1 and H2
H1: Lee is male
Bayes Classifier: Back to hypothesis H1 and H2
H1: Lee is male
P(sex = M | E) = P(E | sex = M) P(sex = M)
P(E)
P(sex = M | E) = P(E | sex = M) P(sex = M)
P(E)
Bayes’ Theorem
Starting with P(E | sex = M) … This is
Then we can calculate the conditional probability values for each piece of evidence as
explained:
P(Mag. Promo = Y, TV Promo = Y, LI Promo = N, C.C. Ins = N | sex = M)
P(Mag. Promo = Y | sex = M)
P(TV Promo = Y | sex = M)
P(LI Promo = N | sex = M)
P(C.C. Ins = N | sex = M)
We have (mathematical justification):
P( E1 ∩ E2 ∩ E3 ∩ E4 | M ) P(M) =
P( E1 ∩ E2 ∩ E3 ∩ E4 ∩ M ) =
P (A|B)P(B) =P(A∩ B)
P(A∩ B) = P (A|B)P(B)
P( E1 | E2 ∩ E3 ∩ E4 ∩ M ) P( E2 ∩ E3 ∩ E4 ∩ M ) =
P( E1 | M) P( E2 ∩ E3 ∩ E4 ∩ M ) = …
P( E1 | M) P( E2 | M) P( E3 | M) P( E4 | M) P( M )
*
= 4/6
= 2/6
= 4/6
= 4/6
and
P(sex = M) = 6/10 = 3/5
These values are easily obtained from our distribution table. It follows:
* Assumption: E1, … E4 are conditionally independent given C, i.e.. the information added
P(E | sex = M) P(sex = M) = (4/6) * (2/6) * (4/6) * (4/6) * (3/5) = 8/81 * 3/5
by knowing that E2, … E4 have happened does not add much to P(E1 | C) and is forgotten.
This is not always correct, it is an approximation, but often works well (and fast!).
Computing Science & Mathematics
University of Stirling
Bayes’ Theorem
CSC9T6 Information Systems
47
Bayes Classifier: Back to hypothesis H1 and H2
Analogously for H2: Lee is female
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
48
Bayes Classifier: Back to hypothesis H1 and H2
Finally,
P(sex = F | E) = P(E | sex = F) P(sex = F)
P(E)
P(sex = F | E) = (9/128) * (2/5)
P(E)
Bayes’ Theorem
<
(8/81) * (3/5) = P(sex = M | E)
P(E)
And we have
Hence, Bayes classifier tells us that Lee is most likely a male.
P(Mag. Promo = Y | sex = F)
P(TV Promo = Y | sex = F)
P(LI Promo = N | sex = F)
P(C.C. Ins = N | sex = F)
= 3/4
= 2/4
= 1/4
= 3/4
Calculating also the value of P(E), i.e. the (conditionally independent !) probabilities of
Mag. Promo, TV Promo, not LI Promo and not CC Promo
P(sex = F) = 2/5
It follows:
P(E) = (7/10) * (4/10) * (5/10) * (7/10) = 0.098
P(sex = F | E) =
(9/128) * (2/5)
P(E)
we have:
P(sex = F | E) = 0.2815
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
49
Computing Science & Mathematics
University of Stirling
<
0.5926 = P(sex = M | E)
CSC9T6 Information Systems
50
Items Bought from Amazon
CDs
Books
DVDs
Videos
Region
Y
Y
N
N
Stirling
Y
N
Y
N
Glasgow
Y
N
Y
Y
Glasgow
Y
Y
Y
N
Glasgow
N
N
Y
N
Stirling
N
Y
Y
Y
Stirling
Y
N
Y
N
Stirling
Y
Y
Y
Y
Glasgow
Items Bought from Amazon
After a bit of feature extraction ...
3. ??
Y
N
Ratio Y/Tot
1.
2.
3.
What proportion of Glasgow customers buy books?
What proportion of all customers buy DVDs?
Given a new customer that we knows buys Videos, is it more likely that they
live in Glasgow or Stirling?
Classifying according to further evidence !
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
Ratio N/Tot
51
Ratio N/Tot
CDs
G
S
Books
G
S
DVDs
G
S
Videos
G
S
4
0
2
2
2
2
2
2
4
0
3
1
2
2
1
3
1
0
½
½
½
½
½
½
1
0
¾
¼
½
½
¼
¾
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
S
Books
G
S
DVDs
G
S
Videos
G
S
4
0
2
2
2
2
2
2
4
0
3
1
2
2
1
3
1
0
½
½
½
½
½
½
1
0
¾
¼
½
½
¼
¾
CSC9T6 Information Systems
52
Items Bought from Amazon
P(Glasgow | Videos) = P(Videos | Glasgow) P(Glasgow) / P(videos) = 1/2 * 1/2 / 3/8 = 2/3
Ratio Y/Tot
CDs
G
Computing Science & Mathematics
University of Stirling
Items Bought from Amazon
Y
N
1. P(Region = G, Books = Y) = 2/4 = 1/2
2. P(DVDs = Y) = 7/8
P(Stirling | Videos) = P(Videos | Stirling) P(Stirling) / P(videos) = 1/3 * 1/2 / 3/8 = 1/3
Y
N
Ratio Y/Tot
Ratio N/Tot
53
CDs
G
S
Books
G
S
DVDs
G
S
Videos
G
S
4
0
2
2
2
2
2
2
4
0
3
1
2
2
1
3
1
0
½
½
½
½
½
½
1
0
¾
¼
½
½
¼
¾
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
54
Items Bought from Amazon
Why use Bayesian Classifiers?
P(Stirling | Videos) = P(Videos | Stirling) P(Stirling) / P(videos) = 1/3 * 1/2 / 3/8 = 1/3
P(Glasgow | Videos) = P(Videos | Glasgow) P(Glasgow) / P(videos) = 1/2 * 1/2 / 3/8 = 2/3
•
There are several classification methods, no one has been found to be superior over
all others in every case (i.e. a data set drawn from a particular domain of interest)
•
Methods can be compared based on:
– accuracy
– interpretability of the results
– robustness of the method with different datasets
– training time
– scalability
... most likely from Glasgow!
Note: P(Stirling | Videos) + P(Glasgow | Videos) = 1
Exercise. Is in general true that P(a|r) + P(b|r) = 1 ?
Note that in our case a ∪ b = T. (One either comes from Glasgow or from Stirling, but not from
both places. Indeed a ∩ b = ∅, too). Is it true when assuming a ∪ b = T ?
(by def) = P(a,r) /P(r) + P(b,r)/P(r)
(by sum) = P(a U b, r) / P(r)
= P(T,r)/P(r) = P(r)/P(r) =1
Note: Revisit the Naïve Bayesian Classifier example about promotions.
Does the above result hold there?
If not, why ? How is the situation different here ?
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
55
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
56
An Important Remark
Mind the difference between
calculating probabilities from a known set of outcomes, for example, tossing a coin
heads when we know the two outcomes are heads or tails, and
calculating the probability of an event from data, and the event is not directly
expressed by data.
Bayesian Belief Networks
The probabilities calculated from data are estimates, not true values.
If we tossed a coin 10 times to generate data, we might easily get 6 heads and 4 tails.
Without knowing about how coins work, we would estimate the probability of getting
heads as 6/10 – not a bad estimate, but incorrect. The more data we have, the more
reliable our estimates get.
Results can be dramatically sensitive to the specific evidence you have. E.g. suppose you
have evidence that P(Head)=0.6, then you can estimate the probability of getting 6
heads in a row, even if it didn’t happen in your data
=0.6 * 0.6 * 0.6 * 0.6 * 0.6 *0.6 = 0.047,
i.e. 1 in 21 tries!
(with P(Head)=0.5 is 0.016)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
57
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
58
Bayesian Belief Networks
Probabilistic graphical models.
Modern pattern recognition based on probabilities,
mainly on sum and product rules.
•
– Bayesian Belief Networks (BBNs) are a way of modelling probabilities based
on data or knowledge to allow probabilities of new events to be estimated
All could be treated algebraically.
•
However, graphical models offer advantages, in terms of
•
•
•
What are they?
What are they used for?
– Intelligent decision aids, data fusion, intelligent diagnostic aids, automated free
text understanding, data mining
visualisation, e.g. structure and relationships;
communication, easily to grasp;
expressiveness, e.g. graphical manipulations corresponding to
mathematical operations.
•
Where did they come from?
– Cross fertilization of ideas between the artificial intelligence, decision analysis,
and statistic communities
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
59
CSC9T6 Information Systems
Factored joint probability distribution as a directed graph:
Visit to Asia
Smoking
• structure for representing knowledge about uncertain variables
• computational architecture for computing (propagating) the impact of evidence
on beliefs
Patient Information
Tuberculosis
•
Knowledge structure:
B
Tuberculosis
or Cancer
XRay Result
• computes posterior probabilities given evidence about selected nodes
• exploits probabilistic independence for efficient computation
CSC9T6 Information Systems
Bronchitis
Medical Difficulties
Computational architecture:
Computing Science & Mathematics
University of Stirling
Lung Cancer
F
• random variables are depicted as nodes
• arcs represent probabilistic dependence between variables
• conditional probabilities encode the strength of the dependencies
•
60
Example from Medical Diagnostics
Definition of a Bayesian Network
•
Computing Science & Mathematics
University of Stirling
Dyspnea
Diagnostic Tests
61
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
62
Example from Medical Diagnostics
Tuber
Lung Can
Tub or Can
Visit to Asia
Present
Present
True
Present
Absent
True
Absent
Present
True
Absent
Absent
False
Tuberculosis
Visit To Asia
Visit
1.00
No Visit 99.0
Smoking
Patient Information
Lung Cancer
Tuberculosis
Present 1.04
Absent 99.0
Bronchitis
Dyspnea
Bronchitis
True
Present
0.90
0.l0
True
Absent
0.70
0.30
False
Present
0.80
0.20
False
Absent
0.10
0.90
XRay Result
Abnormal 11.0
Normal
89.0
Dyspnea
•
CSC9T6 Information Systems
63
Computing Science & Mathematics
University of Stirling
Example from Medical Diagnostics
Visit To Asia
Visit
100
No Visit
0
Tuberculosis
Present 5.00
Absent 95.0
Dyspnea
Present 43.6
Absent 56.4
Smoking
Smoker
50.0
NonSmoker 50.0
Lung Cancer
Present 5.50
Absent 94.5
XRay Result
Abnormal 14.5
Normal
85.5
Visit To Asia
Visit
100
No Visit
0
Bronchitis
Present 45.0
Absent 55.0
Tuberculosis
Present 5.00
Absent 95.0
Dyspnea
Present 45.0
Absent 55.0
CSC9T6 Information Systems
64
Smoking
Smoker
100
NonSmoker
0
Lung Cancer
Present 10.0
Absent 90.0
Bronchitis
Present 60.0
Absent 40.0
Tuberculosis or Cancer
True
14.5
False
85.5
XRay Result
Abnormal 18.5
Normal
81.5
As a finding is entered, the propagation algorithm updates the beliefs attached to each relevant node
in the network
Interviewing the patient produces the information that Visit to Asia is Visit
This finding propagates through the network and the belief functions of several nodes are updated
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
Example from Medical Diagnostics
Tuberculosis or Cancer
True
10.2
False
89.8
•
•
Bronchitis
Present 45.0
Absent 55.0
Relationship knowledge is modeled by deterministic functions, logic and conditional probability
distributions
Computing Science & Mathematics
University of Stirling
•
Lung Cancer
Present 5.50
Absent 94.5
Propagation algorithm processes relationship information to provide likelihood information for
occurrence of each state for each node
Diagnostic Tests
•
Smoking
Smoker
50.0
NonSmoker 50.0
Tuberculosis or Cancer
True
6.48
False
93.5
Medical Difficulties
Present
Absent
Tub or Can
Tuberculosis
or Cancer
XRay Result
Example from Medical Diagnostics
65
•
•
Dyspnea
Present 56.4
Absent 43.6
Further interviewing of the patient produces the finding Smoking is Smoker
This information propagates through the network
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
66
Example from Medical Diagnostics
Visit To Asia
Visit
100
No Visit
0
Tuberculosis
Present 0.12
Absent 99.9
Example from Medical Diagnostics
Smoking
Smoker
100
NonSmoker
0
Lung Cancer
Present 0.25
Absent 99.8
Visit To Asia
Visit
100
No Visit
0
Bronchitis
Present 60.0
Absent 40.0
Tuberculosis
Present 0.19
Absent 99.8
Tuberculosis or Cancer
True
0.36
False
99.6
XRay Result
Abnormal
0
Normal
100
•
•
•
Dyspnea
Present 52.1
Absent 47.9
XRay Result
Abnormal
0
Normal
100
CSC9T6 Information Systems
•
•
Industrial
•
• Processor Fault Diagnosis - by Intel
• Auxiliary Turbine Diagnosis - GEMS
by GE
• Diagnosis of space shuttle
propulsion systems - VISTA by
NASA/Rockwell
• Situation assessment for nuclear
•
power plant - NRC
Military
• Automatic Target Recognition MITRE
• Autonomous control of unmanned
underwater vehicle - Lockheed
Martin
• Assessment of Intent
Computing Science & Mathematics
University of Stirling
Bronchitis
Present 92.2
Absent 7.84
•
67
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
68
Glandular fever
Medical Diagnosis
• Internal Medicine
• Pathology diagnosis - Intellipath by
Chapman & Hall
• Breast Cancer Manager with
Intellipath
•
Suppose we know that on average, 1% of the population have had glandular fever.
P(had_GF) = 0.01
•
Suppose we have a test for having had glandular fever such that:
– For a person who has had GF the test would give a positive result with probability
0.977
Commercial
• Financial Market Analysis
• Information Retrieval
• Software troubleshooting and advice
- Windows 95 & Office 97
• Pregnancy and Child Care Microsoft
• Software debugging - American
Airlines SABRE online reservation
system
CSC9T6 Information Systems
Dyspnea
Present 100
Absent
0
The physician also determines that the patient is having difficulty breathing, the finding
Present is entered for Dyspnea and is propagated through the network
The doctor might now conclude that the patient has bronchitis and does not have
tuberculosis or lung cancer
Applications
•
Lung Cancer
Present 0.39
Absent 99.6
Tuberculosis or Cancer
True
0.56
False
99.4
Finished with interviewing the patient, the physician begins the examination
The physician now moves to specific diagnostic tests such as an X-Ray, which results in a
Normal finding which propagates through the network
Note that the information from this finding propagates backward and forward through the arcs
Computing Science & Mathematics
University of Stirling
Smoking
Smoker
100
NonSmoker
0
– For a person who has not had GF the test would give a negative result with
probability 0.926
Q: How could this information be represented as a BBN
Q: How could the BBN be used to find out new information?
69
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
70
Step 1: "feature extraction"
Step 1: "feature extraction"
•
List information we are given, determine information we can deduce from that information.
•
“for a person who has not had GF the test would give a negative result with probability
0.926”
P(-ve test | person has not had GF) = 0.926
"on average, 1% of the population have had glandular fever"
from which
P(had GF) = 0.01 => P(not had GF) = 0.99
P(+ve test | person has not had GF) = 1 – p(-ve test | person has not had GF) = 0.074
Summing up:
•
"for a person who has had GF the test would give a positive result with probability 0.977"
Note this is not "a person has had GF"! but
P(+ve test | person has had GF) = 0.977
P(+ve test | had GF) = 0.977
P(-ve test | had GF) = 0.023
from which we have that the probability of having had GF but receiving a –ve test result is
P(-ve test | person has had GF) = 1 – p(+ve test | person has had GF)] = 0.023
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
P(had GF) = 0.01
P(not had GF) = 0.99
71
Step 2: BBN construction
P(+ve test | not had GF) = 0.074
P(-ve test | not had GF) = 0.926
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
72
1 parent related to 1 child model
Define the structure of the BBN. Important issue!
Nodes are random variables. Two choices:
1. Top-level nodes. These have not any probabilistic dependency. Could be understood
as observations.
2. Dependency relationships. Describe our interpretation of dependencies in our model.
Both the choices contribute to the structure of the network. Different choices are generally
possible. The final structure must not have cycles in the dependency relationship. I.e.
the directed graph must be and acyclic (DAG) in order to guarantee efficient
propagation algorithms.
In our example:
•
Nodes:
- Had_GF
- Test_Result
•
Relationships:
- Had_GF node influences state of Test_Result node
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
73
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
74
Step 2: BBN construction
Step 2: BBN construction
Finally,
False
0.99 (99%)
True
False
Test –ve False
Had GF True
0.977 (97.7%)
0.023 (2.3%)
Had GF False
0.074 (7.4%)
0.926 (92.6%)
Test –ve False
Had GF True
0.977 (97.7%)
0.023 (2.3%)
Had GF False
0.074 (7.4%)
0.926 (92.6%)
Computing Science & Mathematics
University of Stirling
0.01 (1%)
Test +ve True+
Application of a formula. Values must be consistent! E.g. rows (union of all the
possible values) should sum to 1.
Test +ve True+
True
Had GFNode
3. Establish values for Conditional Probability Tables (CPTs),
i.e. a representation of the conditional probability for the values of the random
variable represented by a node, conditioned by the parent random variable.
CSC9T6 Information Systems
Test +ve Node
True
False
75
Step 3: Use of the BBN
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
Step 3: Use of the BBN
Use the network to determine new information.
Q1: Given a person has had GF, what is the probability of a negative test?
Examples of information we may wish to determine:
Well, this is easy as we have now determined this information in Step 2.
Formulated in terms of probability, we wish to find out:
P(-ve test | has had GF)
Q1:
Given a person has had GF, what is the probability of a negative test result?
Q2:
What is the probability of a +ve test result?
Q3:
Given a positive test, what is the probability that the person has had GF?
76
Looking back a few slides,
P(-ve test | has had GF) = 0.023
We will start by looking the calculation that these questions require. This is actually
computed by the network, once it has been properly designed.
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
77
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
78
Step 3: Use of the BBN
Step 3: Use of the BBN
Q2: What is the probability of a +ve test result?
Q2: What is the probability of a +ve test result?
Again from P(A ∩ B) = P(B|A) P(A):
Well, we need to think of all the possible situations that could happen which would lead
to a +ve test result.
P(+ve test ∩ had GF)
Situation 1. +ve test result and had GF:
= P(had GF | +ve test) * P(+ve test)
P(+ve test ∩ not had GF) = P(not had GF | +ve test) * P(+ve test)
p(+ve test result ∩ had GF)
Still, P(+ve test) has not yet been defined. BUT from P(A ∩ B) = P(B ∩ A), then
Situation 2. +ve test result and not had GF:
P(had GF ∩ +ve test)
p(+ve test result ∩ not had GF)
P(not had GF ∩ +ve test) = P(+ve test | not had GF) * P(not had GF)
= 0.074 * 0.99
= 0.07326
We don’t know the above information, BUT we can now use what we know of conditional
probability to calculate it…
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
79
Step 3: Use of the BBN
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
80
Step 3: Use of the BBN
Q2: What is the probability of a +ve test result?
Q3: Given a positive test, what is the probability that the person has had GF?
Finally, by the sum rule:
P(+ve test)
= P(+ve test | had GF) * P(had GF)
= 0.977 * 0.01
= 0.00977
P(has had GF | +ve test) ?
= P(+ve test ∩ had GF) + P(+ve test ∩ had GF)
Again, this isn’t a piece of information that has been already defined. We can caluclate
the answer by exploiting the network and Bayes’ theorem:
= 0.00977 + 0.07326
= 0.08303
P(A | B) = P(B | A) P(A) / P(B)
If we chose a random person and gave them a test,
they would have a 0.08 probability of showing positive.
P(has had GF | +ve test) = P( +ve test |has had GF) P(has had GF) / P(+ve test)
= (0.977 * 0.01) / P(+ve test)
= 0.118
Given that, from Q2 we have P(+ve test) = 0.08303
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
81
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
82
Step 3: Use of the BBN
A BBN model with a hierarchy of 3 nodes
Q3: Given a positive test, what is the probability that the person has had GF?
P(has had GF | +ve test) = 0.118
So the test isn’t so useful after all:
– remember that 7% of those who’ve never had GF get a positive result
– and they far out number those people who have had it and got a positive result!
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
83
Computing Science & Mathematics
University of Stirling
Extension to the Glandular Fever Model
True
0.01
(1%)
False
0.99
(99%)
Test +ve True+
Test –ve False
0.977 (97.7%)
0.023 (2.3%)
Had GF False
0.074 (7.4%)
0.926 (92.6%)
84
Extension to the Glandular Fever Model
Had GFNode
•
Suppose we are informed that the school nurse sends home 80% of students
that have a positive GF test.
•
She also sends home 5% of students for other medical reasons (i.e. students
that have not had a positive GF test).
True
False
Had GF True
CSC9T6 Information Systems
Test +ve Node
True
Q1. How do we incorporate this new information into our network?
False
Q2. What is the probability of being sent home?
Facts we deduced from the last time…
Q3. Given that a child is sent home, what is the probability of them having had a
negative test?
P(-ve test | has had GF) = 0.023
P(+ve test) = 0.08303
P(had GF | +ve test) = 0.118
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
85
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
86
Q1. How do we incorporate new information?
Q2. What is the probability of being sent home?
We need to add up all the combinations of scenarios which would lead us to being sent home:
True
0.01
(1%)
False
0.99
(99%)
- due to a +ve test
- due to another reason
Had GFNode
True
P(home)
False
Test +ve True+
Test –ve False
Had GF True
0.977 (97.7%)
0.023 (2.3%)
Had GF False
0.074 (7.4%)
0.926 (92.6%)
Sent home True
Test +ve Node
We need to calculate P(+ve test) and P(-ve test). We already know
True
P(+ve test)
False
From which
P(-ve test)
Sent home False
Sent home
Test +ve True
0.8 (80%)
0.2 (20%)
Test +ve False
0.05 (5%)
0.95 (95%)
Computing Science & Mathematics
University of Stirling
True
And finally
P(home)
False
CSC9T6 Information Systems
87
Q3. Given that a child is sent home, what is the
probability of them having had a negative test?
We’ve already done a similar calculation to this before:
P(-ve test | home)
= P(home | +ve test) * P(+ve test) + P(home | -ve test) * P(-ve test)
= 0.8 * P(+ve test)
+ 0.05 * P(-ve test)
= P(+ve test | GF) * P(GF) + P(+ve test | not had GF) * P(not had GF)
= 0.00977 + 0.07326
= 0.08303
= 1 -0.08303
= 0.91697
= 0.8 * P(+veTest) + 0.05 * P(-ve Test)
= 0.8 * 0.08303 + 0.05 * 0.91697
= 0.1122725 = 0.112 (3 d.p.)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
88
Things to remember about calculations in BBNs
1. Know parent information, want to find out child node information:
Use conditional probability
= P(home| -ve test) * P(-ve test) / P(home)
2. Know child information want to find out parent node information:
Use Bayes’ Theorem
= (0.05 * 0.91697) / 0.1122725
= 0.408 (3 d.p.)
3. If have more than 1 parent to a node, remember parents are independent (unless they
share an arc). Thus
P(parent A ∩ parent B) = P(parent A) * P(parent B)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
89
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
90
A BBN model with 2 parents and 1 child
Diane’s shopping model (2p 1c)
•
Diane does her shopping each week. The bill for her shop is sometimes under £30,
sometimes £30 or over. Two factors influence the cost of her shopping: whether she
takes her 2 year old son with her, and whether she takes her 40 year old husband
with her.
•
If we know Diane has gone shopping by herself, the likelihood that the bill will be less
than £30 is 90%. If we know that Diane was accompanied only by her husband, the
likelihood that the bill will be less than £30 is 70%, and if we know that Diane took
only her son, the likelihood that the bill will be less than £30 is 80%. Given we know
both son and husband accompanied Diane to the shops, then the likelihood that the
bill is under £30 reduces to 60%.
•
50% of the time Diane’s husband accompanies her to the shops. 60% of the time
Diane is accompanied by her son.
How do you imagine a BBN could represent the above information?
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
91
Computing Science & Mathematics
University of Stirling
Diane’s shopping model (2p 1c)
•
If we know Diane has gone shopping by herself, the likelihood that the bill will be
less than £30 is 90%. If we know that Diane was accompanied only by her
husband, the likelihood that the bill will be less than £30 is 70%, and if we know that
Diane took only her son, the likelihood that the bill will be less than £30 is 80%.
Given we know both son and husband accompanied Diane to the shops, then the
likelihood that the bill is under £30 reduces to 60%.
•
50% of the time Diane’s husband accompanies her to the shops. 60% of the time
Diane is accompanied by her son.
92
Diane’s shopping model (2p 1c)
Questions:
Diane does her shopping each week. The bill for her shop is sometimes under £30,
sometimes £30 or over. Two factors influence the cost of her shopping: whether
she takes her 2 year old son with her, and whether she takes her 40 year old
husband with her.
•
CSC9T6 Information Systems
Q1. What is the probability of the bill being under £30?
Q2. Given that the bill is under £30, what is the probability that Diane’s husband (with or
without her son) accompanied her to the shops?
How do you imagine a BBN could represent the above information?
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
93
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
94
Diane’s shopping model (2p 1c)
Diane’s shopping model (2p 1c)
Firstly, let’s list what we know, ...
P(under_30 | no husband ∩ no son) = 0.9
P(under_30 | husband ∩ no son) = 0.7
P(under_30 | no husband ∩ son) = 0.8
P(under_30 | husband ∩ son) = 0.6
P(husband) = 0.5
P(son) = 0.6
CPT
... and what we can deduce ( 1-P(...) )
P(not_under_30 | no husband ∩ no son) = 0.1
P(not_under_30 | husband ∩ no son) = 0.3
P(not_under_30 | no husband ∩ son) = 0.2
P(not_under_30 | husband ∩ son) = 0.4
P(no_husband) = 0.5
P(no_son) = 0.4
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
95
Q1. What is the probability of the bill being under £30?
Firstly, list situations that lead to bill being under £30
Computing Science & Mathematics
University of Stirling
True
False
Hub T
Son T
60%
40%
Hub T
Son F
70%
30%
Hub F
Son T
80%
20%
Hub F
Son F
90%
10%
CSC9T6 Information Systems
96
Q1. What is the probability of the bill being under £30?
Firstly, list situations that lead to bill being under £30
P(under_30) = P(under_30 | no husband ∩ no son) * P(no husband ∩ no son) +
P(under_30 | no husband ∩ son) * P(no husband ∩ son) +
P(under_30 | husband ∩ no son) * P(husband ∩ no son) +
P(under_30 | husband ∩ son) * P(husband ∩ son)
P(under_30) = P(under_30 | no husband ∩ no son) * P(no husband ∩ no son) +
P(under_30 | no husband ∩ son) * P(no husband ∩ son) +
P(under_30 | husband ∩ no son) * P(husband ∩ no son) +
P(under_30 | husband ∩ son) * P(husband ∩ son)
What do we know about the presence of Diane’s husband and the presence of her son?
Are these 2 events dependent on each other in any way? No…
These events are independent. Independent events are where the occurrence of one
event does not impact on the occurrence of another event:
P(A | B) = P(A)
P(no husband ∩ no son) = P(no husband) * P(no son)
= 0.5 * 0.4 = 0.2
P(no husband ∩ son)
= P(no husband) * P(son)
= 0.5 * 0.6 = 0.3
P(husband ∩ no son)
= P(husband) * P(no son)
= 0.5 * 0.4 = 0.2
P(husband ∩ son)
= P(husband) * P(son)
= 0.5 * 0.6 = 0.3
therefore, if A and B are independent events,
P(A, B) = P(A | B) * P(B) = P(A) * P(B)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
97
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
98
Q1. What is the probability of the bill being under £30?
Q2. Given that the bill is under £30, what is the
probability that Diane’s husband (with or without her son)
accompanied her to the shops?
2 scenarios:
Firstly, list situations that lead to bill being under £30
P(under_30) = P(under_30 | no husband ∩ no son) * P(no husband ∩ no son) +
P(under_30 | no husband ∩ son) * P(no husband ∩ son) +
P(under_30 | husband ∩ no son) * P(husband ∩ no son) +
P(under_30 | husband ∩ son) * P(husband ∩ son)
P(husband | under £30) = P(husband ∩ son | under £30) +
P(husband ∩ not son | under £30)
P(husband ∩ son | under £30) = { using Bayes’)
= P(under £30 | P(husband ∩ son)) * P(husband ∩ son) /
P(under £30)
= (0.6 * 0.3) / 0.74 = 0.243r (r = recurring)
P(under_30) = (0.9 * 0.2) +
(0.8 * 0.3) +
(0.7 * 0.2) +
(0.6 * 0.3)
= 0.74
P(husband ∩ not son | under £30) = {using Bayes’}
= P(under £30 | P(husband ∩ not son)) * P(husband ∩
not son))/
P(under £30)
= (0.7 * 0.2) / 0.74 = 0.189r
Overall:
P(husband | under £30) = 0.243r + 0.189r = 0.432r (3 d.p.)
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
99
Computing Science & Mathematics
University of Stirling
Why are BBNs interesting?
CSC9T6 Information Systems
100
Outline.
BBNs:
•
•
•
•
•
1.
2.
3.
4.
Development of propagation algorithms followed by availability of easy to use
commercial software
Growing number of creative applications, e.g. dementia diagnosis, cancer care
symptom modelling, likelihood of car purchase, ...
Introduction
Probability
Bayesian classification
Bayesian Belief Networks
Different from other knowledge-based systems tools because uncertainty is handled
in mathematically rigorous yet efficient and simple way
Different from other probabilistic analysis tools because of
- network representation of problems,
- use of Bayesian statistics, and
- the synergy between these.
Issue: How to build a network ?
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
101
Computing Science & Mathematics
University of Stirling
CSC9T6 Information Systems
102
© Copyright 2026 Paperzz