BMI/CS 576 Fall 2016 Homework #6

BMI/CS 576 Fall 2016
Homework #6
Prof. Colin Dewey
Due Thursday, December 15th, 2016 by 11:59pm (Note: cannot be turned in late)
The goal of this assignment is to become more familiar with Bayesian networks.
To turn in your assignment, copy all relevant files to the directory:
/u/medinfo/handin/bmi576/hw6/USERNAME
where USERNAME is your account name for the BMI network.
1. (30 points) Consider the Bayesian network below
P(A)
P(B | A)
P(C | B)
T
F
A
T
F
B
T
F
0.5
0.5
F
0.2
0.8
F
0.75
0.25
T
0.8
0.2
T
0.25
0.75
A
B
C
(a) Give a table specifying the joint probability distribution, P (A, B, C) represented
by the Bayesian network.
(b) Given your table from (a), compute P (A = true | C = true)
(c) Given your table from (a), compute P (A = true | B = true)
(d) Given your table from (a), compute P (A = true | B = true, C = true)
(e) Given your table from (a), is C independent of A? Justify your answer.
(f) Given your table from (a), is C independent of A given B? Justify your answer.
1
2. (25 points) As shown in slide 13 of the lecture “Bayesian Networks for Regulatory
Network Learning” some conditional probability distributions (CPD) can also be
represented with a tree.
(a) Give the CPD table for the distribution P (D|A, B, C) represented by the tree
below.
B
F
T
P(D=T) = 0.75
C
F
T
P(D=T) = 0.5
A
F
T
P(D=T) = 0.5
P(D=T) = 0.25
(b) Give the most compact tree represention of the distribution P (D|A, B, C) specified by the CPD table below.
P(D | A, B, C)
A
B
C
T
F
T
T
T
0.5
0.5
T
T
F
0.4
0.6
T
F
T
0.5
0.5
T
F
F
0.4
0.6
F
T
T
0.7
0.3
F
T
F
0.7
0.3
F
F
T
0.8
0.2
F
F
F
0.8
0.2
2
(c) Suppose that you know that the CPD P (D|A, B, C) can be represented by the
tree structure of part (a), but that you don’t know the parameters at the leaves of
the tree. Now suppose you are given some training data with which to estimate
the CPD. What is the major advantage of the tree representation over the CPD
table representation in estimating the parameters of the CPD?
3
3. (30 points) Suppose we wish to reconstruct the gene regulatory network for three
genes, X, Y , and Z, using the Bayesian network approach and the “sparse candidate” algorithm. We are given data from 100 independent experiments in which the
expression levels of the three genes are measured. For simplicity, we model each
gene as being either “on” (T) or “off” (F). Below is a table summarizing the number
of times (count) each configuration of gene expression status was observed in these
experiments.
X
T
T
T
T
F
F
F
F
Y
T
T
F
F
T
T
F
F
Z
T
F
T
F
T
F
T
F
count
4
16
4
1
3
12
48
12
(a) Suppose we wish to compute a single candidate parent for Z. In the first round of
the “sparse candidate” algorithm, we compute the mutual information between
Z and the other random variables. Compute the mutual information between
Z and X, I(X, Z), based on the frequencies observed in the data.
(b) Compute the mutual information between Z and Y , I(Y, Z), based on the frequencies observed in the data.
(c) Based on your answers to (a) and (b), which gene would be selected as the
candidate parent for Z? Briefly explain your answer.
(d) In the first round of the algorithm, suppose that we choose Y to be the parent
of Z in our network, X to be the parent of Y , and that X remains parent-less.
Estimate the parameters of the three CPDs of the current network given the
data, Pnet (X), Pnet (Y |X), and Pnet (Z|Y ) given the data.
(e) In the next round of the algorihm, we wish to see if X should also be considered
as a candidate parent of Z. To do this, we will use the Kullback-Leibler (KL)
divergence between the marginal distribution of X and Z as estimated from the
data, P̂ (X, Z), and that implied by the current network, Pnet (X, Z). Compute
DKL (P̂ (X, Z))||Pnet (X, Z)).
(f) Based on your answer to (e), should we consider X as a candidate parent for
Z? Briefly explain your answer.
4
4. (15 points)
P(A)
T
0.4
F
0.6
P(D | A)
A
P(B | A)
A
T
F
T
0.7
0.2
F
0.3
0.8
T
0.6
0.2
T
0.7
0.2
F
P(E | B)
B
T
F
B
T
F
T
0.6
0.2
F
0.3
0.8
C
F
0.4
0.8
D
G
P(F | B)
F
0.4
0.8
T
0.6
0.2
P(C | A)
A
T
F
B
E
A
T
F
H
P(G | C)
C
T
F
F
0.4
0.8
T
0.1
0.9
P(H | D)
F
0.9
0.1
D
T
F
T
0.1
0.9
F
0.9
0.1
(a) Give the most compact module network that represents the same joint probability distribution as the general Bayesian network above.
(b) Suppose that we know that the network can by properly modeled using the
structure of the module network of (a), but that we do not know the parameters
of the model. Now suppose you are given some training data with which to
estimate the parameters of the model. What is the major advantage of the
module network model over the general Bayesian network model with respect
to parameter estimation?
5