1 CMSC 310 Artificial Intelligence Bayesian

CMSC 310 Artificial Intelligence
Bayesian Belief Networks
1. Definition
Joint probability distribution can answer any question about the domain, but there are two major
difficulties:
It becomes very large as the number of variables grow. The space and time complexity is
O(d n) where n is the number of random variables and d is the maximum number of values
per variable
Computing the probability of each atomic event may be difficult unless a large amount of
data is available to derive statistical estimates.
Bayesian belief networks are compact representations of joint probability distribution over a set of
random variables, X1, . . . , Xn, that make use of conditional dependence and independence among the
random variables.
A Bayesian network specification has two components:
(i) a directed acyclic graph G = (V,E) with a node set V corresponding to the random variables
X1, . . . ,Xn, and edge set E on these nodes. The edges reflect conditional independence assumptions
made. A node is conditionally independent of all other nodes given its parents in the network.
(ii) a set  of conditional probability distributions for each node in the graph G. These
probability distributions are local, and are of the form P(Xi|ParentsG (Xi)).
The two components (G, ) specify a unique distribution on the random variables X1, . . . ,Xn.
P(X1, . . . ,Xn) =  (Xi|ParentsG (Xi)), i = 1 to n.
Bayesian networks contain quantitative information in the form of conditional probabilities of variables
given their parents in the network.
2. Example (from Ben Coppin, Artificial Intelligence Illuminated)
Consider the following Bayesian network:
C
S
P
E
F
1
Where the nodes represent the following statements:
C = that you will go to college
S = that you will study
P = that you will party
E = that you will be successful in your exams
F = that you will have fun
If you go to college, this will affect the likelihood that you will study and the likelihood that you will
party. Studying and partying affect your chances of exam success, and partying affects your chances of
having fun.
Conditional probability tables:
P(C)
0.2
C
P(S)
true
0.8
false
0.2
C
P(P)
true
0.6
false
0.5
S
P
P(E)
true
true
0.6
true
false
0.9
false
true
0.1
false
false
0.2
P
P(F)
true
0.9
false
0.7
2
Note that there is a dependence between F and c, but it is not a direct dependence, so no information is
stored about it.
These conditional probability tables give us all the information necessary to carry out any reasoning
about the domain.
Example 1: Compute the probability that you will go to college, you will study and be successful in
your exams, but you will not party or have fun:
P(C = true, S = true, P = false, E = true, F = false)
To simplify the notation, we will use P(C, S, ~P, E, ~F)
To compute P(C, S, ~P, E, ~F), we use the rule
P(X1, . . . ,Xn) =  (Xi|ParentsG (Xi)), i = 1 to n.
P(C, S, ~P, E, ~F) = P(C) * P(S|C) * P(~P | C) * P(E | S  ~P) * P(~F | ~P) =
= 0.2 * 0.8 * 0.4 * 0.9 * 0.3 = 0.01728
Example 2: Compute the probability that you will have success in your exams if you have fun and
study at college, but don’t party:
P( E | F , S, C, ~P)
In the network, E is dependent only on S and P, so we can drop C and F
From the table for P(E) we obtain P(E | S, ~P) = 0.9
Example 3: Let’s say we know that you had fun and studied hard while at college, and that you
succeeded in your exams, but we don’t know whether you partied or not.
Thus, we know C, S, E, and F, we don’t know P. We want to determine the most likely value for P.
We can compare the values of the following two expressions:
P(C, S, P, E, F) and P(C, S, ~P, E, F)
P(C, S, P, E, F) = P(C) * P(S | C) * P(P | C) * P (E | S, P), * P (F | P) =
= 0.2 * 0.8 * 0.6 * 0.6 * 0.9 = 0.05184
P(C, S, ~P, E, F) = P(C) * P(S | C) * P(~P | C) * P (E | S, ~P), * P (F | ~P) =
= 0.2 * 0.8 * 0.4 * 0.9 * 0.7 = 0.04032
So it is slightly more likely that you did party while at college than that you didn’t.
3
3. Constructing Bayesian belief networks
If we know the causal relations between events, we can build partial networks and then combine them
in one network. The result should be directed acyclic graph.
If we don’t know the causal relations, but we do know some of the joint probability distribution, we can
use algorithms that learn Bayesian networks, e.g. genetic algorithms. Here we are looking the network
that best matches (explains) the observations. This is a research topic - there is a large number of
publications on building Bayesian belief networks, and there are also some commercially available
systems.
https://www.cra.com/pdf/bnetbuilderbackground.pdf
4. Applications
Bayesian belief networks represent models of probabilistic causal relations in partially observable
domains. They are very useful as a reasoning tool in expert systems and decision support systems,
especially in domains where uncertainty is inherent, e.g. weather prediction, medical expert systems,
business intelligence systems, risk management systems, etc.
Additional Readings:
https://www.cra.com/pdf/bnetbuilderbackground.pdf
http://www.eecs.qmul.ac.uk/~norman/BBNs/BBNs.htm
4