slides on GuntherPaper.pdf

Neural networks for modeling
gene-gene interactions in
association studies
Review of article in BMC Genetics
by Gunther, Waro and Bammann
Goal
• Investigate the ability of neural networks to
model two-locus disease models, including
epistatic interactions
• Compare with standard methods: logstic
regression and multi-factor dimensionality
reduction (MDR)
Background
• Disease terminology
• Previous work
Disease Terminology
• Epidemiology: the study of health events,
characteristics or patterns in society.
• Penetrance: the probability of exhibiting a
given phenotype (ex: disease) given a
specific genotype
• Epistasis: interaction of 2 or more gene to
control a single phenotype
Epistasis
• Biology: one gene influences the effect of
another on phenotype
• Statistics: deviation from an additive effect
of single risk factors on the outcome of
disease
Previous Work
• Machine learning: support vector
machines, random forests, multi-factor
dimensionality reduction (MDR),
combinatorial partitioning methods
• Regression-based methods: logistic
regression and lasso regression
Previous Work
• Parametric methods have difficulty
detecting interactions unless there are
main effects.
• Non-parametric methods are not effective
when main effects are present.
• Most used are generalized linear models,
logistic regression, MDR
Methods
• Neural Network: a form of
machine learning
• 6 Disease models
• Modeling the data
Neural Network:
learning modeled on the brain
Neural Network Topology [1]
Gunther et all figure 1: feed forward ANN is a weighted directed graph:
nodes (neurons) and edges (). synapses. Each layer is fully connected to
the next.
Input layer has 2 nodes, hidden layer has 3, output layer has 1:
Activation function is applied to the weighted sum of inputs into a node
Supervised learning, backpropogation of errors
Disease Models
(Risch [2] and Gunther et al [1]): Some model
gene interactions and some model biological
independence.
Let penetrance matrix f = |fij| where i,j Є {0,1,2}.
fij = P(Y=1 | GA = i, GB = j)
where Y Є {0,1} denotes the case-control status:
control=0 (no disease), case = 1 (disease)
and GA,GB Є {0,1,2} are the genotypes at the 2 loci.
Let ai and bj denote the penetrance values for GA
and GB respectively.
Risch Disease Models [2]
Additivity model: no interactions
fij= ai + bj
subject to: 0 <= ai, bj <= 1
Heterogeneity model: no interactions
fij= ai + bj – ai • bj
Multiplicative model: interactions
fij= ai • bj
Gunther Disease Models [1]:
EPIRR: recessive epistatic
Both genes must have some
mutation to impact disease.
Both genes are recessive,
therefore both alleles must
be mutated in both genes in
order to affect penetrance
Gunther Disease Models [1]:
EPIDD: dominant epistatic
Both genes must have some
mutation to impact disease.
Both genes are dominant,
therefore a mutation of either
allele in both genes will
affect penetrance, larger
affect if both alleles of both
genes are mutated.
Gunther Disease Models [1]:
EPIRD: mixed epistatic
Not required that both genes
have some mutation to
impact disease.
One gene is recessive, the
other is dominant. A double
mutation of the recessive
gene has an effect. A double
mutation of the dominant
gene enhances this effect
Table 1
Risk scenarios.
Two-locus disease model
Low risk scenario
High risk scenario
a1 = 2·a0
a1 = 5·a0
a2 = 4·a0
a2 = 10·a0
b1 = 5·b0
b1 = 5·b0
b2 = 10·b0
b2 = 10·b0
EPI RR
r=5
r = 10
EPI DD, EPI RD
r1 = 2
r1 = 5
r2 = 4
r2 = 10
ADD, HET, MULT
Applied risk scenarios for all two-locus disease models.
Günther et al. BMC Genetics 2009 10:87
doi:10.1186/1471-2156-10-87
Representation of Genes
• Genotype is represented by number of
mutated alleles:
Both alleles not mutated = 0
Either allele mutated = 1
Both alleles mutated = 2
• logistic regression models also fitted to
data with 2 separate variables for each
locus: single mutation and double mutation