Oct 25. Penfold CA et al. Nonparametric Bayesian Inference for

Nonparametric Bayesian inference
for perturbed and orthologous
gene regulatory networks
Christopher A. Penfold
Vicky Buchanan-Wollaston
Katherine J. Denby
And
David L. Wild
Published in Bioinformatics
Motivation
• Reverse engineering of Gene Regulation
Networks– interesting area of research
• Previous assumptions– multiple time series
data assumes identical topology
• Whereas as in reality different sets of
transcription factors (but overlapping) are
expected to bind in different conditions
• How to handle the information from this
somewhat diverse set of multiple datasets?
• Some researchers used non-parametric
Bayesian learning strategies  but for those
techniques to be computationally feasible
the no. of transcription factors (TFs) , that can
bind to the promoter region of a gene, need
to be limited.
• However, recent studies a large no. of TFs
have the potential to bind to any gene
• However, it is noticed that the no. of TFs
binding under some specific conditions are
fewer.
• So, it is of interest to find this subset of TFs
w.r.t each specific condition applied which can
result in a different GRN.
Causal Structure Identification
• The CSI algorithm (Klemm, 2008; Penfold and Wild, 2011) and
related approaches (Äijö and Lähdesmäki, 2009) have
previously been used to reverse engineer GRNs and shown to
perform well
• The discrete-time version of CSI assumes that the mRNA
expression level of a particular gene in a larger set, i∈G, as:
• where xi(t) represents the expression level of
gene i at time t, Pa(i)⊆G represents the genes
encoding for TFs binding the promoter regions
of gene i (parents of gene i) with xPa(i)(t) the
vector expression level of those parents at
time t, and f (·) represents some unknown
(non-linear) function capturing the dynamics
of the system.
• y and matrix X :
• Usually the parent genes are not known as a
prior so, that data is used to infer them as
follows:
where T ⊆G represents the set of all transcription
factors and θk the set of hyperparameters for the k-th parental set.
The distribution depends on the values of the parameters for which the
Expectation Maximization is being used
• Finally, a distribution over causal network
structures, P(M), can be assembled from the
distribution over individual parental sets,
constituting the CSI algorithm:
Hierarchical modelling for CSI
• In this framework, the joint distribution for all
model parameters conditioned on the data is
factorised as:
• The conditional distribution for the parents of
gene i in dataset j given the hyperparent is
chosen to correspond to a Gibbs distribution:
• Again, a network structure can be assembled
from the parent distributions for each node,
with a hypernetwork assembled from the the
distributions over hyperparents:
Results
Combining hierarchical modelling and yeast
one-hybrid
• YIH used to identify the genes capable of
binding to the promoter region .
• In this study a gene RD29A was used and
previous study suggests 9 such genes.
• However, in this study, time series data was
collected using 6 timestamps under different
conditions.
Thank you!